Small LLMs
A look at Apple’s new Transformer-powered predictive text model
the model being used by
AppleSpell
, an internal macOS application that checks for spelling and grammar mistakes as you type.
found the predictive text model in
/System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle
. The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).
a set of 15,000 tokens in
unilm.bundle/sp.dat
that pretty clearly look like they form the vocabulary set for a large language model.
Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.
Orca 2: Teaching Small Language Models How to Reason - Microsoft Research ; see
M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via