đź”— Minions - the rise of small, on-device LMs
shift a substantial portion of LLM workloads to consumer devices by having small on-device models collaborate with frontier models in the cloud. By only reading long contexts locally, we reduce cloud costs with minimal or no quality degradation. We imagine a future where an “intelligence layer” running persistently on-device interacts with frontier models in the cloud to deliver applications with cost-effective, “always-on” intelligence.
Reseach by Stanford’s Hazy Research.
- Github: https://github.com/HazyResearch/minions
- arXiv Paper: https://arxiv.org/abs/2502.15964
ollama also has a related blog post
Cites this work: “LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models”
The proposed LLMĂ—MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output. The main challenge for divide-and-conquer long text processing frameworks lies in the risk of losing essential long-range information when splitting the document, which can lead the model to produce incomplete or incorrect answers based on the segmented texts. Disrupted long-range information can be classified into two categories: inter-chunk dependency and inter-chunk conflict. We design a structured information protocol to better cope with inter-chunk dependency and an in-context confidence calibration mechanism to resolve inter-chunk conflicts.