Small LLMs aka SLMs
smoler the better
Created:
LLMs that you can run on the desktop or a “regular(ish) PC”.
A look at Apple’s new Transformer-powered predictive text model
the model being used by
AppleSpell
, an internal macOS application that checks for spelling and grammar mistakes as you type.
found the predictive text model in
/System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle
. The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).
a set of 15,000 tokens in
unilm.bundle/sp.dat
that pretty clearly look like they form the vocabulary set for a large language model.
Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.
Orca 2: Teaching Small Language Models How to Reason - Microsoft Research; see
M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via
moonbeam is a computer-vision model can answer real-world questions about images. It’s tiny by today’s models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.
Apache 2.0. You can use moondream for commercial purposes.
Applications:
- Security
- Drone and Robotics
- Retail and shopping –
- apache 2.0 license
- “Our goal is to create models that excel at RAG. Since RAG works by processing information at runtime, the main constraint is LLM size. For RAG, models don’t need to be huge; they just need strong text comprehension to give accurate answers when provided with the right context.”
- blog post: SLM Journey Unveiled – “In recent months, the landscape of language models has been enriched by the emergence of several small language models (e.g. TinyLlama, Phi2, Gemma, and StableLM2)”
Florence - a Microsoft Collection; SOTA 200M & 800M parameter vision foundation model. MIT Licensed!. 200M checkpoint beats Flamingo 80B (400x bigger model) by a huge margin. Performs captioning, object detection and segmentation, OCR, phrase grounding and more. Leverages FLD-5B dataset - 5.4 billion annotations across 126 million images. Multi task learning. Fine-tuned model checkpoints beat the likes of PaLI, PaLI-X.
“Florence2 200M, Qwen2 500M, MSFT InstructLM 500M With little fine-tuning they unlock so many creative and on-device use cases” via
Fine-tune Llama-3-8B with Llama-3-405B synthetic data
A simple notebook for fine-tuning a small model (Llama-3-8B) to be an expert in a specific domain, by letting a larger, more capable model (Llama-3-405B) teach it (by generating synthetic dataset for that domain).
–
nisten/Biggie-SmoLlm-0.15B-Base · Hugging Face via
MacOS desktop
- Quantized Gemma 2B running at 157 toks/sec in MLX on M1 Max laptop
- simonmysun/ell: A command-line interface for LLMs written in Bash.
Phone
- “phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.” Abdin, Marah, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, et al. “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. https://arxiv.org/abs/2404.14219. (No code, or model was announced with the paper).
aiOS™ by Hyperspace “Organizing the World’s AI Agents. Join the world’s largest peer-to-peer AI network and start earning points”