large language models

Created: by Pradeep Gowda Updated: Jul 07, 2024 Tagged: llm · deep-learning · chatgpt

See also generative-ai, smol-llm, transformer-math, LlamaIndex, RAG, local-llm, llm-embedding, AI SaaS, LLM Training, Reward Models, AI Code Assistants, Building LLM Based Systems

Introductory Materials

Generative AI exists because of the transformer – A visual story from Financial Times; Sept 2023.

Large language models, explained with a minimum of math and jargon

Study Guides


  • Lil’Log Hi, this is Lilian. I’m documenting my learning notes in this blog. Other than writing a ML blog, I’m leading Applied Research at OpenAI on the side.
  • Finbarr Timbers – eg: Five years of GPT progress
  • Vespa Blog


  • Sparks of Artificial General Intelligence: Early experiments with GPT-4. (2023) PDFBubeck, Sébastien, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, et al. “Sparks of Artificial General Intelligence: Early experiments with GPT-4,” 2023.
  • SeamlessM4T—Massively Multilingual & Multimodal Machine Translation | Meta AI ResearchBarrault, Loïc, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, et al. SeamlessM4T-Massively multilingual & multimodal machine translation,” 2023. https://arxiv.org/abs/2308.11596.
  • Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. “Language models are few-shot learners,” 2020. https://arxiv.org/abs/2005.14165.


An observation on Generalization - YouTube by Ilya Sutskever (OpenAI); Aug 14, 2023.

  • Supervised Learning - precise mathematical condition under which learning should succeed, which is - Low training error + more training data than “degrees of freedom” = low test error

Prompt Engineering


See this page - Models Table – Dr Alan D. Thompson – Life Architect for a visual representation of models, and a table of various attributes of models.

Stuff you can run on your computer

smol-ai/developer: with 100k context windows on the way, it’s now feasible for every dev to have their own smol developer

How is LLaMa.cpp possible? how can we run llama.cpp on local machines when the expectation is that large models need expensive GPUS (eg: A100) to run

Introducing Code Llama, a state-of-the-art large language model for coding Code Llama is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer. Essentially, Code Llama features enhanced coding capabilities, built on top of Llama 2. It can generate code, and natural language about code, from both code and natural language prompts (e.g., “Write me a function that outputs the fibonacci sequence.”) It can also be used for code completion and debugging. It supports many of the most popular languages being used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

Ask HN: Cheapest way to run local LLMs? | Hacker News

See also Perplexity Labs where they have multiple models to try from

 Perplexity labs models  
Perplexity labs models; Apr 2024

LLMs in your language

All languages are NOT created (tokenized) equal

Small Language Models

  • Eldan, Ronen, and Yuanzhi Li. TinyStories: How Small Can Language Models Be and Still Speak Coherent English?” 2023.

Using LLMs


LlamaIndex 🦙 0.8.13

Haystack | Haystack Open-source LLM framework to build production-ready applications. > Use the latest LLMs: hosted models by OpenAI or Cohere, open-source LLMs, or other pre-trained models > All tooling in one place: preprocessing, pipelines, agents & tools, prompts, evaluation and finetuning > Choose your favorite database: Elasticsearch, OpenSearch, Weaviate, Pinecone, Qdrant, Milvus and more > Scale to millions of documents: use Haystack’s proven retrieval architecture > Compare it to LangChainAI

GPT4All A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.

AI Proxy

an AI proxy that lets you use a variety of providers (OpenAI, Anthropic, LLaMa2, Mistral, and others) behind a single interface w/ caching & API key management.

MLC LLM Machine Learning Compilation for Large Language Models (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone’s devices with ML compilation techniques.

Project Overview Project Overview of MLC LLM

Multimodal Learning



OWASP | Top 10 for Large Language Models


Uncensor any LLM with abliteration

Operational Issues


GGUF and GGML are file formats used for storing models for inference, especially in the context of language models like GPT

Benchmarking LLMs

LLM Benchmark Report for: NousResearch/Redmond-Puffin-13B




  • Stuff we figured out about AI in 2023

  • Large language models use a surprisingly simple mechanism to retrieve some stored knowledge | MIT News | Massachusetts Institute of Technology

  • Survey of Open source repos – What I learned from looking at 900 most popular open source AI tools (via)

  • Allen-Zhu, Zeyuan, and Yuanzhi Li. “Physics of language models: Part 3.3, knowledge capacity scaling laws,” 2024. https://arxiv.org/abs/2404.05405.

  • You, Keen, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, and Zhe Gan. “Ferret-UI: Grounded mobile UI understanding with multimodal LLMs,” 2024. https://arxiv.org/abs/2404.05719.

  • About BERTGeiping, Jonas, and Tom Goldstein. “Cramming: Training a language model on a single GPU in one day,” 2022. https://arxiv.org/abs/2212.14034.via

  • Ma, Xuezhe, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, and Chunting Zhou. “Megalodon: Efficient LLM pretraining and inference with unlimited context length,” 2024. https://arxiv.org/abs/2404.08801.; github repo (the repo link on paper wasn’t working as of 2024-04-17).

  • “phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.” Abdin, Marah, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, et al. “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. https://arxiv.org/abs/2404.14219. (No code, or model was announced with the paper.)

  • Liu, Ziming, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov-arnold networks,” 2024. https://arxiv.org/abs/2404.19756.

  • Lin, Yiming, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeigham, Aditya G. Parameswaran, and Eugene Wu. “Towards accurate and efficient document analytics with large language models,” 2024. https://arxiv.org/abs/2405.04674. “Unstructured data formats account for over 80% of the data currently stored, and extracting value from such formats remains a considerable challenge…. ZenDB efficiently extracts semantic hierarchical structures from such templatized documents, and introduces a novel query engine that leverages these structures for accurate and cost-effective query execution. Users can impose a schema on their documents, and query it, all via SQL. Extensive experiments on three real-world document collections demonstrate ZenDB’s benefits, achieving up to 30% cost savings compared to LLM-based baselines, while maintaining or improving accuracy, and surpassing RAG-based baselines by up to 61% in precision and 80% in recall, at a marginally higher cost.”

  • Call to Build Open Multi-Modal Models for Personal Assistants | LAION

  • What We Learned from a Year of Building with LLMs (Part I) – O’Reilly

  • What We Learned from a Year of Building with LLMs (Part II) – O’Reilly

  • ARC Prize

    • Chollet, François. “On the measure of intelligence,” 2019. https://arxiv.org/abs/1911.01547. by François Chollet.
    • “To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans.”

    • “We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience, as critical pieces to be accounted for in characterizing intelligent systems.”

  • Ren, Liliang, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, and Weizhu Chen. “Samba: Simple hybrid state space models for efficient unlimited context language modeling,” 2024. https://arxiv.org/abs/2406.07522.


Multimodal Learning Multi Agent Frameworks AI Agent Framework