Small LLMs aka SLMs

LLMs that you can run on the desktop or a “regular(ish) PC”.

A look at Apple’s new Transformer-powered predictive text model

the model being used by AppleSpell, an internal macOS application that checks for spelling and grammar mistakes as you type.

found the predictive text model in /System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle. The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).

a set of 15,000 tokens in unilm.bundle/sp.dat that pretty clearly look like they form the vocabulary set for a large language model.

Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.

Orca 2: Teaching Small Language Models How to Reason - Microsoft Research; see

M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via

moondream

moonbeam is a computer-vision model can answer real-world questions about images. It’s tiny by today’s models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.

Apache 2.0. You can use moondream for commercial purposes.

Applications:

Security
Drone and Robotics
Retail and shopping —

Prem 1B and Prem 1B chat

apache 2.0 license
“Our goal is to create models that excel at RAG. Since RAG works by processing information at runtime, the main constraint is LLM size. For RAG, models don’t need to be huge; they just need strong text comprehension to give accurate answers when provided with the right context.”
blog post: SLM Journey Unveiled — “In recent months, the landscape of language models has been enriched by the emergence of several small language models (e.g. TinyLlama, Phi2, Gemma, and StableLM2)”

Florence - a Microsoft Collection; SOTA 200M & 800M parameter vision foundation model. MIT Licensed!. 200M checkpoint beats Flamingo 80B (400x bigger model) by a huge margin. Performs captioning, object detection and segmentation, OCR, phrase grounding and more. Leverages FLD-5B dataset - 5.4 billion annotations across 126 million images. Multi task learning. Fine-tuned model checkpoints beat the likes of PaLI, PaLI-X.

“Florence2 200M, Qwen2 500M, MSFT InstructLM 500M With little fine-tuning they unlock so many creative and on-device use cases” via

Fine-tune Llama-3-8B with Llama-3-405B synthetic data

A simple notebook for fine-tuning a small model (Llama-3-8B) to be an expert in a specific domain, by letting a larger, more capable model (Llama-3-405B) teach it (by generating synthetic dataset for that domain).

—

nisten/Biggie-SmoLlm-0.15B-Base · Hugging Face via

—

AMD Unveils Its First Small Language Model AMD-135… - AMD Community

MacOS desktop

Phone

“phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.” (Abdin et al., 2024). (No code, or model was announced with the paper).

aiOS™ by Hyperspace “Organizing the World’s AI Agents. Join the world’s largest peer-to-peer AI network and start earning points”

Abdin, M., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., Bach, N., Bahree, A., Bakhtiari, A., Behl, H., Benhaim, A., Bilenko, M., Bjorck, J., Bubeck, S., Cai, M., Mendes, C. C. T., Chen, W., Chaudhary, V., Chopra, P., … Zhou, X. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. https://arxiv.org/abs/2404.14219

btbytes.com

Small LLMs aka SLMs

MacOS desktop

Phone

Table of Contents

Graph View

Backlinks