Small LLMs

Created: by Pradeep Gowda . Tagged: smol-llm · llm

smoler the better

A look at Apple’s new Transformer-powered predictive text model

the model being used by AppleSpell , an internal macOS application that checks for spelling and grammar mistakes as you type.

found the predictive text model in /System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle . The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).

a set of 15,000 tokens in unilm.bundle/sp.dat that pretty clearly look like they form the vocabulary set for a large language model.

Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.

Orca 2: Teaching Small Language Models How to Reason - Microsoft Research ; see

M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via