Vector Databases — RAG Engineering: Building AI That Knows Your Data

What Vector Databases Do

A vector database is a specialized data store optimized for storing, indexing, and querying high-dimensional vectors. In a RAG system, it holds the embeddings of your document chunks and allows you to quickly find the chunks most similar to a query embedding.

Regular databases are designed for exact matches: give me the row where id = 42 or where name = 'Alice'. Vector databases solve a fundamentally different problem: give me the 10 vectors closest to this query vector in a space with hundreds or thousands of dimensions. This is called Approximate Nearest Neighbor (ANN) search, and it requires specialized indexing algorithms to perform it quickly.

A typical vector database stores three things per entry:

The vector -- the embedding itself (e.g., 1536 floats)
The metadata -- structured fields like source, date, author, or category
The document text -- the original chunk of text that was embedded

When you query, you send a vector and get back the top-K most similar entries, each with its metadata and text.

The Landscape: Choosing the Right Database

FAISS (Facebook AI Similarity Search)

FAISS is a library, not a database. Developed by Meta, it provides highly optimized similarity search algorithms that run in-memory. It is extremely fast -- often the fastest option for datasets under 10 million vectors.

Best for: Local development, prototyping, applications where all data fits in memory, batch processing pipelines.

Limitations: No built-in persistence (you save/load index files manually), no metadata filtering, no built-in API server, no authentication. You have to build everything around it.

import faiss
import numpy as np

# Create an index for 1536-dimensional vectors
dimension = 1536
index = faiss.IndexFlatL2(dimension)  # Exact search with L2 distance

# Add vectors
vectors = np.random.rand(1000, dimension).astype('float32')
index.add(vectors)

# Search for 5 nearest neighbors
query = np.random.rand(1, dimension).astype('float32')
distances, indices = index.search(query, k=5)

print(f"Nearest indices: {indices[0]}")
print(f"Distances: {distances[0]}")

# Save and load the index
faiss.write_index(index, "my_index.faiss")
loaded_index = faiss.read_index("my_index.faiss")

ChromaDB

ChromaDB is an open-source embedding database designed specifically for AI applications. It has a clean Python API, handles persistence automatically, supports metadata filtering, and includes a built-in embedding function so you can pass raw text instead of pre-computed vectors.

Best for: Prototyping, small to medium datasets (up to a few million documents), projects where developer experience matters, learning RAG concepts.

Limitations: Not designed for massive scale (billions of vectors), limited query language compared to full databases.

import chromadb

# Create a persistent client
client = chromadb.PersistentClient(path="./chroma_db")

# Create a collection (like a table)
collection = client.get_or_create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}  # Use cosine similarity
)

# Add documents with metadata
collection.add(
    documents=[
        "Photosynthesis converts light energy to chemical energy.",
        "The mitochondria is the powerhouse of the cell.",
        "Neural networks are inspired by biological neurons.",
    ],
    metadatas=[
        {"source": "biology_textbook", "chapter": 5},
        {"source": "biology_textbook", "chapter": 3},
        {"source": "cs_textbook", "chapter": 12},
    ],
    ids=["doc1", "doc2", "doc3"]
)

# Query with natural language
results = collection.query(
    query_texts=["How do plants create energy?"],
    n_results=2,
    where={"source": "biology_textbook"}  # Metadata filter
)

print(results["documents"])
print(results["distances"])

Pinecone

Pinecone is a fully managed vector database service. You do not run any infrastructure -- you create an index through their API and Pinecone handles storage, indexing, scaling, and availability.

Best for: Production deployments, teams that do not want to manage infrastructure, applications requiring high availability and scaling.

Limitations: Vendor lock-in, recurring costs, data leaves your infrastructure, latency depends on network (though they offer regional deployments).

pgvector

pgvector is a PostgreSQL extension that adds vector similarity search to your existing PostgreSQL database. If you already use PostgreSQL, this is a compelling option because you keep your vectors alongside your relational data.

Best for: Teams already using PostgreSQL, applications that need SQL joins between vector results and relational data, simpler infrastructure requirements.

Limitations: Performance is below dedicated vector databases at very high scale. Indexing options are more limited.

Weaviate

Weaviate is an open-source vector database with a GraphQL API, built-in modules for embedding generation, and hybrid search (combining vector and keyword search in a single query).

Best for: Teams that want hybrid search out of the box, GraphQL-oriented architectures, applications needing built-in multimodal support.

Qdrant

Qdrant is an open-source vector database written in Rust, emphasizing performance and type-safe APIs. It has excellent filtering capabilities and supports both in-memory and on-disk storage.

Best for: Performance-critical applications, teams that need advanced filtering, Rust or gRPC ecosystems.

Comparison Summary

| Feature | FAISS | ChromaDB | Pinecone | pgvector | Qdrant | |---------|-------|----------|----------|----------|--------| | Type | Library | Database | Managed | Extension | Database | | Persistence | Manual | Built-in | Managed | PostgreSQL | Built-in | | Metadata filtering | No | Yes | Yes | SQL | Yes | | Managed hosting | No | No | Yes | Via providers | Cloud option | | Hybrid search | No | No | Yes | With tsvector | Yes | | Best scale | Millions | Millions | Billions | Millions | Billions | | Learning curve | Medium | Low | Low | Low (if you know SQL) | Medium |

Indexing Strategies

When your dataset grows beyond tens of thousands of vectors, exact nearest-neighbor search (comparing the query to every stored vector) becomes too slow. Vector databases use approximate algorithms to trade a small amount of accuracy for massive speed improvements.

HNSW (Hierarchical Navigable Small World)

HNSW is the most popular indexing algorithm and the default in ChromaDB, Qdrant, and pgvector. It builds a multi-layered graph where each node is a vector. Searching starts at the top layer (coarse) and navigates down to finer layers.

Search speed: Excellent (sub-millisecond for millions of vectors)
Index build time: Moderate
Memory: High (the graph structure lives in memory)
Accuracy: Very high (typically 95-99% recall)

Key parameters:

M -- number of connections per node (higher = better recall, more memory)
ef_construction -- beam width during index building (higher = better recall, slower build)
ef_search -- beam width during search (higher = better recall, slower search)

IVF (Inverted File Index)

IVF partitions the vector space into clusters using k-means. At query time, it only searches the clusters closest to the query vector. This is FAISS's default approach for large-scale search.

Search speed: Good (depends on nprobe parameter)
Index build time: Fast
Memory: Lower than HNSW
Accuracy: Good (depends on number of clusters and nprobe)

import faiss
import numpy as np

dimension = 1536
n_vectors = 100000

# Create IVF index with 100 clusters
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, 100)

# Train the index (IVF requires training on sample data)
training_data = np.random.rand(n_vectors, dimension).astype('float32')
index.train(training_data)
index.add(training_data)

# Search -- nprobe controls accuracy vs speed tradeoff
index.nprobe = 10  # Search 10 nearest clusters
query = np.random.rand(1, dimension).astype('float32')
distances, indices = index.search(query, k=5)

When to Use Which Database

Just starting out or learning: ChromaDB. The API is intuitive, persistence is automatic, and you can go from zero to searching in five minutes.

Need maximum local performance: FAISS. Nothing beats it for raw speed on a single machine. Pair it with a simple metadata store if you need filtering.

Going to production with a team: Pinecone if you want zero infrastructure management. Qdrant if you want to self-host with excellent performance. Weaviate if you need hybrid search and GraphQL.

Already using PostgreSQL: pgvector. Keeping everything in one database simplifies your architecture enormously, and the performance is sufficient for most workloads up to a few million vectors.

Tip: Start with ChromaDB for development, then migrate to your production database once your requirements are clear. The retrieval code changes are minimal -- usually just swapping the client initialization and adjusting query syntax.

In the next lesson, you will learn how to get your documents into these databases by processing and preparing them for embedding.