LLMs in Data Management
Research and progress around using LLMs in managing data.
Created:
Fernandez, Raul Castro, Aaron J. Elmore, Michael J. Franklin, Sanjay Krishnan, and Chenhao Tan. “How large language models will disrupt data management,” 2023. 16 (11)https://doi.org/10.14778/3611479.3611527.; PDF
We argue that the disruptive influence that LLMs will have on data management will come from two angles. (1) A number of hard database problems, namely, entity resolution, schema matching, data discovery, and query synthesis, hit a ceiling of automation because the system does not fully understand the semantics of the underlying data. Based on large training corpora of natural language, structured data, and code, LLMs have an unprece- dented ability to ground database tuples, schemas, and queries in real-world concepts. We will provide examples of how LLMs may completely change our approaches to these problems. (2) LLMs blur the line between predictive models and information retrieval systems with their ability to answer questions. We will present examples showing how large databases and information retrieval systems have complementary functionality.
Shi, Liang, Zhengju Tang, and Zhi Yang. “A survey on employing large language models for text-to-SQL tasks,” 2024. https://arxiv.org/abs/2407.15186.
writing SQL queries requires specialized knowledge, which poses a challenge for non-professional users trying to access and query databases. Text-to-SQL parsing solves this issue by converting natural language queries into SQL queries, thus making database access more accessible for non-expert users. To take advantage of the recent developments in Large Language Models (LLMs), a range of new methods have emerged, with a primary focus on prompt engineering and fine-tuning. This survey provides a comprehensive overview of LLMs in text-to-SQL tasks, discussing benchmark datasets, prompt engineering, fine-tuning methods, and future research directions.