RAG

RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs’ generative process. [1]

Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits: It ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.

See the Survey paper - (Gao et al., 2024)

Algorithms

BM42: The combination of semantic and keyword search — BM42: New Baseline for Hybrid Search - Qdrant

Developing RAG Applications

Mastering RAG Pattern Chatbots: Azure OpenAI and LangChain.js Integration | Azure Devs JS Day 2024
Indexify - By Tensorlake is a an open source data framework featuring a real-time extraction engine and pre-built extraction adapters. “Build fast AI applications with reliability and precision, driving smarter decisions”
Jina AI Reader “provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.”

Self RAG

Self RAG = For familiar topics, answer quickly; for unfamiliar ones, open the reference book to look them up, quickly find the relevant parts, sort and summarize them in your mind, then answer on the exam paper.

[via](https://arxiv.org/pdf/2310.11511.pdf) and

(Asai et al., 2023)

We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (SELF-RAG) that enhances an LM’s quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that SELF- RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, SELF-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.

Hierarchical cluster and indexing RAG

An emerging technique to better represent your data for RAG/LLM applications is to only chunk the data, but also hierarchically cluster and index it. — via

Read: Salmon Run: Hierarchical (and other) Indexes using LlamaIndex for RAG Content Enrichment

RAG from Scratch

langchain-ai/rag-from-scratch

RAG From Scratch: Indexing w/ RAPTOR

(Sarthi et al., 2024)

novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

Deepdive — Building long context RAG with RAPTOR from scratch - YouTube; langchain/cookbook/RAPTOR.ipynb @ langchain-ai/langchain; via

RAG-enhanced MetaGPT

Learning

Building and Evaluating Advanced RAG Applications - DeepLearning.AI

“In this course, we’ll explore:”

Two advanced retrieval methods: Sentence-window retrieval and auto-merging retrieval that perform better compared to the baseline RAG pipeline.
Evaluation and experiment tracking: A way evaluate and iteratively improve your RAG pipeline’s performance.
The RAG triad: Context Relevance, Groundedness, and Answer Relevance, which are methods to evaluate the relevance and truthfulness of your LLM’s response.

Articles

Designing RAGS

Design choices you need to build high-performing RAG systems, across 5 main pillars (ISRSE):

Indexing: Embedding external data into a vector representation.
Storing: Persisting the indexed embeddings in a database.
Retrieval: Finding relevant pieces in the stored data.
Synthesis: Generating answers to user queries.
Evaluation: Quantifying how good the RAG system is.

[via](https://x.com/llama_index/status/1774240631231267285)

RAGs and Long Context LLMs

RAG for Long Context LLMs aka “Is RAG Really Dead” talk by Lance Martin of LangChainAI.

RAG Queries

LlamaParse

llama_parse is an API created by LlamaIndex to efficiently parse and represent files for efficient retrieval and context augmentation using LlamaIndex frameworks. Notebook Example for an insurance document query. Product page with screenshots of how to use it.

Frameworks

Command R+ from cohere — Command R+ from Cohere first on Azure AI
swiftide — fast, streaming indexing and query library for Retrieval Augment Generation (RAG), written in Rust

Personal data

Hands-On RAG guide for personal data with Vespa and LLamaIndex | Vespa Blog

To research

RagOps
- RAGOps: Advanced Retrieval Strategies with LangChain, Langsmith and Supabase

[via]()

References

https://research.ibm.com/blog/retrieval-augmented-generation-RAG

Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection.

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-Augmented Generation for Large Language Models: A Survey.

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., & Manning, C. D. (2024). RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval.

btbytes.com

Explorer

RAG