Vector Databases
A vector database indexes and stores vector embeddings for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling.
Created:
What is a Vector Database? from the pinecone.io team.
Common Use cases for vector search:
- Semantic Search
- Similarity search for images, audio, video, JSON, and other forms of unstructured data
- Ranking and Recommendation Engines
- Deduplication and record matching
- Anomaly detection (eg: IT Threat detection)
Capabilities of a vector database:
- Vector Indexes for Search and Retrieval
- Single-Stage Filtering
- Data Sharding
- Replication
- Hybrid Storage
- API
Vector databases
pinecone
Elasticsearch
turbopuffer is a serverless vector database.
sdan/vlite: fast vector database made in numpy – “there is no database you need to set up, no server to run, and no complex configuration. just install vlite and start using it. take the CTX file with you wherever you go. its like a browser cookie but with embeddings.”
Articles
- Let’s check back in on the vector databases; apr 2024
- Building vector search in 200 lines of Rust
- Posts · The Data Quarry – a few articles on vector databases by Prashanth Rao at The Data Quarry
- VectorDB – see explanation of “Why use vector search and embeddings with large language models?” with example. Uses FAISS and mrpt under the hood.
- Build a search engine, not a vector DB; Dec 2023.
- I’m writing a new vector search SQLite Extension; May 2024.