Minions - the rise of small, on-device LMs

🔗 Minions - the rise of small, on-device LMs

shift a substantial portion of LLM workloads to consumer devices by having small on-device models collaborate with frontier models in the cloud. By only reading long contexts locally, we reduce cloud costs with minimal or no quality degradation. We imagine a future where an “intelligence layer” running persistently on-device interacts with frontier models in the cloud to deliver applications with cost-effective, “always-on” intelligence.

Reseach by Stanford’s Hazy Research.

Github: https://github.com/HazyResearch/minions
arXiv Paper: https://arxiv.org/abs/2502.15964

ollama also has a related blog post

Cites this work: “LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models”

The proposed LLM×MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output. The main challenge for divide-and-conquer long text processing frameworks lies in the risk of losing essential long-range information when splitting the document, which can lead the model to produce incomplete or incorrect answers based on the segmented texts. Disrupted long-range information can be classified into two categories: inter-chunk dependency and inter-chunk conflict. We design a structured information protocol to better cope with inter-chunk dependency and an in-context confidence calibration mechanism to resolve inter-chunk conflicts.

btbytes.com

Explorer

Minions - the rise of small, on-device LMs

Graph View