RAG
Retrieval-augmented generation
Created:
RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs’ generative process. [1]
Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits: It ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.
See the Survey paper - Gao, Yunfan, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. “Retrieval-augmented generation for large language models: A survey,” 2024. http://arxiv.org/abs/2312.10997.
Algorithms
BM42: The combination of semantic and keyword search – BM42: New Baseline for Hybrid Search - Qdrant
Developing RAG Applications
- Mastering RAG Pattern Chatbots: Azure OpenAI and LangChain.js Integration | Azure Devs JS Day 2024
- Indexify - By Tensorlake is a an open source data framework featuring a real-time extraction engine and pre-built extraction adapters. “Build fast AI applications with reliability and precision, driving smarter decisions”
- Jina AI Reader “provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.”
Self RAG
Self RAG = For familiar topics, answer quickly; for unfamiliar ones, open the reference book to look them up, quickly find the relevant parts, sort and summarize them in your mind, then answer on the exam paper.
via and https://selfrag.github.io
Asai, Akari, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. “Self-RAG: Learning to retrieve, generate, and critique through self-reflection,” 2023. http://arxiv.org/abs/2310.11511.
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (SELF-RAG) that enhances an LM’s quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that SELF- RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, SELF-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.
Hierarchical cluster and indexing RAG
An emerging technique to better represent your data for RAG/LLM applications is to only chunk the data, but also hierarchically cluster and index it. – via
Read: Salmon Run: Hierarchical (and other) Indexes using LlamaIndex for RAG Content Enrichment
RAG from Scratch
RAG From Scratch: Indexing w/ RAPTOR
Sarthi, Parth, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. “RAPTOR: Recursive abstractive processing for tree-organized retrieval,” 2024. http://arxiv.org/abs/2401.18059.
novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.
Deepdive – Building long context RAG with RAPTOR from scratch - YouTube; langchain/cookbook/RAPTOR.ipynb @ langchain-ai/langchain; via
RAG-enhanced MetaGPT
Learning
Building and Evaluating Advanced RAG Applications - DeepLearning.AI
“In this course, we’ll explore:”
- Two advanced retrieval methods: Sentence-window retrieval and auto-merging retrieval that perform better compared to the baseline RAG pipeline.
- Evaluation and experiment tracking: A way evaluate and iteratively improve your RAG pipeline’s performance.
- The RAG triad: Context Relevance, Groundedness, and Answer Relevance, which are methods to evaluate the relevance and truthfulness of your LLM’s response.
Articles
- Hrishi Olickel’s articles on RAG (3 part)
- https://newsletter.pragmaticengineer.com/p/rag
- cookbook/third_party/LlamaIndex/ollama_mistral_llamaindex.ipynb at main · mistralai/cookbook
- Considerations for Chunking for Optimal RAG Performance – Unstructured
Designing RAGS
Design choices you need to build high-performing RAG systems, across 5 main pillars (ISRSE):
- Indexing: Embedding external data into a vector representation.
- Storing: Persisting the indexed embeddings in a database.
- Retrieval: Finding relevant pieces in the stored data.
- Synthesis: Generating answers to user queries.
- Evaluation: Quantifying how good the RAG system is.
RAGs and Long Context LLMs
RAG for Long Context LLMs aka “Is RAG Really Dead” talk by Lance Martin of LangChainAI.
RAG Queries
LlamaParse
llama_parse is an API created by LlamaIndex to efficiently parse and represent files for efficient retrieval and context augmentation using LlamaIndex frameworks. Notebook Example for an insurance document query. Product page with screenshots of how to use it.
Frameworks
- Command R+ from cohere – Command R+ from Cohere first on Azure AI
Personal data
Hands-On RAG guide for personal data with Vespa and LLamaIndex | Vespa Blog