Episode 4 — Generative AI Engineering / 4.12 — Integrating Vector Databases
4.12 — Integrating Vector Databases: Quick Revision
Compact cheat sheet. Print-friendly.
How to use this material (instructions)
- Skim before labs or interviews.
- Drill gaps — reopen
README.md->4.12.a...4.12.c. - Practice —
4.12-Exercise-Questions.md. - Polish answers —
4.12-Interview-Questions.md.
Core vocabulary
| Term | One-liner |
|---|---|
| Vector database | Database designed to store, index, and search high-dimensional vectors (embeddings) |
| Embedding | Array of floats representing semantic meaning (e.g., 1536 dimensions) |
| ANN | Approximate Nearest Neighbor — finds "close enough" vectors in sub-linear time |
| HNSW | Hierarchical Navigable Small World — multi-layer graph index, O(log n) search |
| IVF | Inverted File Index — cluster-based partitioning, O(n/k) search |
| Collection | Logical grouping of vectors with shared config (like a table) |
| Namespace | Lightweight partition within an index (Pinecone-specific) |
| Top-k | Number of nearest neighbors to return |
| Cosine similarity | 0-1, higher = more similar (Pinecone, Qdrant) |
| Cosine distance | 0-2, lower = more similar (Chroma, pgvector). distance = 1 - similarity |
| Metadata | Structured key-value data attached to each vector (source, category, date, etc.) |
| Upsert | Insert or update a vector by ID |
| Recall | Fraction of true nearest neighbors found by ANN (95-99%+) |
Vector record anatomy
{
id: "doc_001", // Unique identifier
vector: [0.023, -0.041, 0.087, ..., -0.032], // 1536 floats
metadata: { // Structured facts
text: "Preview text here...",
source: "help-center",
category: "billing",
date: "2026-03-15",
language: "en",
is_published: true,
tenant_id: "customer_abc"
}
}
Popular vector databases
| DB | Type | Best For |
|---|---|---|
| Pinecone | Managed cloud | Zero-ops production |
| Chroma | Open source, local | Prototyping, dev |
| Qdrant | Open source + cloud | Performance-critical |
| Weaviate | Open source + cloud | Hybrid search (vector + keyword) |
| pgvector | PostgreSQL extension | Teams already on Postgres |
| Milvus | Open source + cloud | Billion-scale datasets |
Query flow
User question
|
v
Embed query (SAME model as stored docs)
|
v
Search vector DB (top-k nearest neighbors + optional filters)
|
v
Return results with scores + metadata
|
v
Filter by score threshold (reject low-relevance)
|
v
Pass context to LLM -> Grounded answer
Rule: Query embedding model MUST match stored embedding model.
Top-k guide
top-k 1-3 -> Simple classification, FAQ lookup
top-k 3-5 -> Standard RAG Q&A (most common)
top-k 10-20 -> Research, complex questions
top-k 20-50 -> Retrieve-then-rerank pipeline
top-k 50+ -> Broad retrieval for re-ranking
Similarity scores
Cosine similarity (Pinecone, Qdrant):
0.90 - 1.00 Very high (near paraphrase)
0.80 - 0.90 High (same topic)
0.70 - 0.80 Moderate (loosely related)
0.60 - 0.70 Low (tangential)
< 0.60 Probably irrelevant
Convert: distance = 1 - similarity
similarity = 1 - distance
Score threshold cheat sheet
| Use Case | Threshold | Rationale |
|---|---|---|
| Factual Q&A | 0.80+ | Wrong > no answer |
| General chatbot | 0.70-0.75 | Balance coverage and relevance |
| Product search | 0.60-0.70 | Users expect results |
| Duplicate detection | 0.90+ | Only near-exact matches |
| Recommendation | No threshold | Always show something |
Indexing algorithms
HNSW (most common):
How: Multi-layer graph, greedy navigation top-down
Speed: O(log n)
Recall: 95-99%+
Memory: High (graph in RAM)
Inserts: Good (graph updates)
Used by: Pinecone, Qdrant, Chroma, pgvector
IVF:
How: K-means clustering, search nearest clusters only
Speed: O(n/k) where k = num clusters
Recall: 90-99% (depends on nprobe)
Memory: Lower
Inserts: Poor (may need re-clustering)
Used by: Milvus, FAISS, older systems
Metadata filter syntax
Pinecone (MongoDB-like)
filter: {
category: { $eq: 'billing' },
date: { $gte: '2026-01-01' },
source: { $in: ['help-center', 'docs'] },
$or: [
{ language: { $eq: 'en' } },
{ language: { $eq: 'es' } },
],
}
Chroma (where clause)
where: {
$and: [
{ category: 'billing' },
{ language: 'en' },
],
}
Qdrant (must/should/must_not)
filter: {
must: [
{ key: 'category', match: { value: 'billing' } },
],
must_not: [
{ key: 'status', match: { value: 'draft' } },
],
}
Operators quick reference
| Operation | Pinecone | Chroma | Qdrant |
|---|---|---|---|
| Equals | $eq | $eq / shorthand | match: { value } |
| Not equals | $ne | $ne | must_not + match |
| Greater than | $gt | $gt | range: { gt } |
| In list | $in | $in | match: { any } |
| AND | $and / top-level | $and | must: [...] |
| OR | $or | $or | should: [...] |
Metadata schema best practices
1. FLAT structure -> No nested objects (most DBs can't filter them)
2. NORMALIZE casing -> "billing" not "Billing" or "BILLING"
3. ISO dates -> "2026-03-15" not "March 15, 2026"
4. TRUNCATE text -> 300-500 chars in metadata, full text elsewhere
5. FILTER-ONLY fields -> Don't store data you won't filter on
6. BOOLEAN as boolean -> true not "true"
7. ALWAYS tenant_id -> Security requirement for multi-tenant apps
Pinecone metadata limit: 40 KB per vector
Common patterns
Multi-tenant isolation
// ALWAYS include tenant_id in every query
filter: { tenant_id: { $eq: currentTenantId } }
Time-scoped search
// Only last 30 days
const cutoff = new Date(Date.now() - 30*24*60*60*1000).toISOString().split('T')[0];
filter: { date: { $gte: cutoff } }
Score threshold
const results = searchResults.matches.filter(m => m.score >= 0.75);
if (results.length === 0) return "I don't have enough info to answer that.";
Batch ingestion
1. Batch embedding calls (up to 2048 texts per OpenAI call)
2. Batch upserts (100 vectors per Pinecone call)
3. Use idempotent IDs (safe re-runs, no duplicates)
4. Truncate metadata (respect 40KB limit)
5. Track progress (resume on failure)
6. Validate dimensions (must match index)
Embedding model dimensions
| Model | Dims | Cost |
|---|---|---|
text-embedding-3-small | 1536 | $0.02/1M tokens |
text-embedding-3-large | 3072 | $0.13/1M tokens |
voyage-3 | 1024 | $0.06/1M tokens |
embed-v4.0 (Cohere) | 1024 | $0.10/1M tokens |
all-MiniLM-L6-v2 | 384 | Free (open source) |
Rule: ALL vectors in a collection MUST have the same dimension. Changing models requires full re-indexing.
Debugging checklist
[ ] Query embedding model matches stored embedding model?
[ ] Correct index/collection/namespace being queried?
[ ] Score distribution — are top scores reasonable (>0.7)?
[ ] Metadata filters not too restrictive?
[ ] Documents actually exist in the database?
[ ] Dimensions match between query vector and index?
[ ] Score threshold not filtering out all results?
[ ] Full text available for LLM context (not just preview)?
Performance tips
1. includeValues: false -> Don't return vectors (saves bandwidth)
2. Scope to namespace -> Search fewer vectors
3. Minimize metadata filters -> Complex filters slow queries
4. Warm cache -> Frequent queries get faster
5. Same-region deployment -> Reduce network latency
6. Appropriate top-k -> Don't over-fetch
Quick formulas
Cosine distance = 1 - cosine similarity
Cosine similarity = 1 - cosine distance
Storage estimate = num_vectors x dimensions x 4 bytes (float32)
= 1M vectors x 1536 dims x 4 = ~6 GB (vectors only)
Brute force ops = num_vectors x dimensions x 2 (multiply + add)
End of 4.12 quick revision.