Episode 4 — Generative AI Engineering / 4.12 — Integrating Vector Databases
4.12.a — Storing Embeddings
In one sentence: Vector databases are purpose-built storage systems that hold high-dimensional embedding vectors alongside metadata, use specialized indexes like HNSW and IVF for lightning-fast approximate nearest-neighbor search, and organize data into namespaces and collections so you can scale from prototype to production.
Navigation: <- 4.12 Overview | 4.12.b — Querying Similar Vectors ->
1. What Is a Vector Database?
A vector database is a database specifically designed to store, index, and query high-dimensional vectors (embeddings). While a traditional database stores rows of structured data (names, dates, prices) and lets you query with exact matches or range filters, a vector database stores arrays of floating-point numbers and lets you query with "find me the most similar vectors."
Traditional Database:
Query: SELECT * FROM products WHERE category = 'shoes' AND price < 50
Result: Exact matches based on column values
Vector Database:
Query: Find top 10 vectors closest to [0.023, -0.041, 0.087, ..., 0.015]
Result: Ranked list of the most semantically similar items
Every embedding you generate (from text, images, audio, or any other data) is just a long array of numbers — typically 256 to 3072 floating-point values. A vector database is where those arrays live, get indexed for fast retrieval, and get paired with metadata that describes what each vector represents.
The anatomy of a stored vector
┌──────────────────────────────────────────────────────────────────┐
│ Vector Record │
│ │
│ id: "doc_4821" │
│ vector: [0.023, -0.041, 0.087, 0.015, ..., -0.032] │
│ └─── 1536 dimensions (OpenAI text-embedding-3-small) │
│ metadata: { │
│ "source": "knowledge-base", │
│ "category": "billing", │
│ "date": "2026-03-15", │
│ "title": "How to update payment method", │
│ "chunk_index": 2, │
│ "text": "To update your payment method, go to..." │
│ } │
└──────────────────────────────────────────────────────────────────┘
Each record has three parts:
- ID — A unique identifier (string or number) you assign to each vector.
- Vector — The embedding itself: an array of floats.
- Metadata — A JSON-like object with any additional information (source, category, original text, dates, tags, etc.).
2. Why Regular Databases Are Not Enough
You might wonder: "Can't I just store embeddings as a JSON array column in PostgreSQL and query them?" Technically, yes. Practically, it fails at scale.
The math of brute-force search
If you have 1 million vectors, each with 1536 dimensions, a brute-force similarity search requires:
1,000,000 vectors x 1,536 dimensions x 2 operations (multiply + add) per dimension
= ~3 billion floating-point operations per query
At scale (10 queries/second):
= 30 billion operations per second — just for search
A single PostgreSQL query doing this would take seconds, not milliseconds. And it gets worse linearly with data size.
What vector databases solve
| Challenge | Traditional DB | Vector DB |
|---|---|---|
| Similarity search | Full table scan, O(n) per query | ANN index, sub-linear search time |
| High-dimensional data | No native support | Built for 256-3072+ dimensions |
| Distance metrics | Must implement manually | Cosine, Euclidean, dot product built in |
| Scalability | Degrades with vector count | Optimized for billions of vectors |
| Real-time updates | Works fine | Varies (some need re-indexing) |
| Metadata filtering | Excellent (SQL WHERE) | Good (varies by DB, improving rapidly) |
The exception: pgvector
pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. It bridges the gap — you get real vector indexing inside a database you already know. It is an excellent choice when:
- You already use PostgreSQL
- Your dataset is under ~5 million vectors
- You want SQL joins between vector data and relational data
- You prefer operational simplicity (one database, not two)
For larger datasets or dedicated vector workloads, purpose-built vector databases typically outperform pgvector.
3. Popular Vector Databases Compared
| Database | Type | Hosting | Language | Best For | Pricing Model |
|---|---|---|---|---|---|
| Pinecone | Managed cloud | Cloud only | Any (REST API) | Production SaaS, zero-ops | Per-vector storage + queries |
| Weaviate | Open source + cloud | Self-host or cloud | Any (REST/GraphQL) | Hybrid search (vector + keyword) | Free self-host, paid cloud |
| Chroma | Open source | Local or embedded | Python, JS | Prototyping, local development | Free (open source) |
| Qdrant | Open source + cloud | Self-host or cloud | Any (REST/gRPC) | Performance-critical apps | Free self-host, paid cloud |
| pgvector | PostgreSQL extension | Anywhere Postgres runs | SQL | Teams already on Postgres | Free (extension) |
| Milvus | Open source + cloud | Self-host or Zilliz cloud | Any (SDK) | Massive-scale (billions of vectors) | Free self-host, paid cloud |
Choosing a vector database
Decision tree:
1. Just prototyping or learning?
→ Chroma (runs locally, zero setup, great JS/Python SDK)
2. Already using PostgreSQL?
→ pgvector (add vector search without new infrastructure)
3. Want zero-ops managed service?
→ Pinecone (fully managed, generous free tier)
4. Need hybrid search (vector + full-text keyword)?
→ Weaviate (built-in BM25 + vector search)
5. Need maximum performance + control?
→ Qdrant (Rust-based, excellent filtering performance)
6. Need to handle billions of vectors?
→ Milvus / Zilliz Cloud (built for massive scale)
4. Storing Embeddings: Code Examples
4.1 Storing with Pinecone
Pinecone is a fully managed vector database. You create an index (like a table), then upsert (insert or update) vectors into it.
// ─── Setup: npm install @pinecone-database/pinecone openai ───
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
const pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const openai = new OpenAI();
// ─── Step 1: Create an index (do this once) ───
async function createIndex() {
await pinecone.createIndex({
name: 'knowledge-base',
dimension: 1536, // Must match your embedding model's output
metric: 'cosine', // cosine | euclidean | dotproduct
spec: {
serverless: {
cloud: 'aws',
region: 'us-east-1',
},
},
});
console.log('Index created: knowledge-base');
}
// ─── Step 2: Generate embeddings for your documents ───
async function generateEmbedding(text) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding; // Array of 1536 floats
}
// ─── Step 3: Upsert vectors with metadata ───
async function storeDocuments(documents) {
const index = pinecone.index('knowledge-base');
// Generate embeddings for all documents
const vectors = await Promise.all(
documents.map(async (doc) => {
const embedding = await generateEmbedding(doc.text);
return {
id: doc.id,
values: embedding,
metadata: {
text: doc.text, // Store original text for retrieval
source: doc.source,
category: doc.category,
date: doc.date,
chunk_index: doc.chunkIndex,
},
};
})
);
// Upsert in batches of 100 (Pinecone limit per request)
const BATCH_SIZE = 100;
for (let i = 0; i < vectors.length; i += BATCH_SIZE) {
const batch = vectors.slice(i, i + BATCH_SIZE);
await index.upsert(batch);
console.log(`Upserted batch ${Math.floor(i / BATCH_SIZE) + 1}`);
}
console.log(`Stored ${vectors.length} vectors in Pinecone`);
}
// ─── Usage ───
const documents = [
{
id: 'doc_001',
text: 'To reset your password, go to Settings > Security > Change Password.',
source: 'help-center',
category: 'account',
date: '2026-03-15',
chunkIndex: 0,
},
{
id: 'doc_002',
text: 'Refunds are processed within 5-7 business days after approval.',
source: 'help-center',
category: 'billing',
date: '2026-03-10',
chunkIndex: 0,
},
{
id: 'doc_003',
text: 'Two-factor authentication adds an extra layer of security to your account.',
source: 'help-center',
category: 'account',
date: '2026-03-20',
chunkIndex: 0,
},
];
await storeDocuments(documents);
4.2 Storing with Chroma
Chroma is an open-source vector database that runs locally or embedded in your application. Perfect for development and smaller-scale production.
// ─── Setup: npm install chromadb openai ───
import { ChromaClient } from 'chromadb';
import OpenAI from 'openai';
const chroma = new ChromaClient(); // Connects to local Chroma server
const openai = new OpenAI();
// ─── Step 1: Create a collection (like a table) ───
async function setupCollection() {
// getOrCreateCollection is idempotent — safe to call multiple times
const collection = await chroma.getOrCreateCollection({
name: 'knowledge-base',
metadata: {
'hnsw:space': 'cosine', // Distance metric: cosine | l2 | ip
},
});
return collection;
}
// ─── Step 2: Generate embeddings ───
async function generateEmbeddings(texts) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts, // Chroma can batch multiple texts
});
return response.data.map((item) => item.embedding);
}
// ─── Step 3: Add documents with metadata ───
async function storeDocuments(documents) {
const collection = await setupCollection();
const ids = documents.map((doc) => doc.id);
const texts = documents.map((doc) => doc.text);
const metadatas = documents.map((doc) => ({
source: doc.source,
category: doc.category,
date: doc.date,
chunk_index: doc.chunkIndex,
}));
// Generate embeddings
const embeddings = await generateEmbeddings(texts);
// Add to collection
// Chroma also stores the original text in a "documents" field
await collection.add({
ids: ids,
embeddings: embeddings,
documents: texts, // Chroma stores original text natively
metadatas: metadatas,
});
console.log(`Stored ${documents.length} documents in Chroma`);
}
// ─── Alternative: Let Chroma generate embeddings for you ───
async function storeWithBuiltInEmbedding(documents) {
// Chroma can use a built-in embedding function
// (requires configuring an embedding function on collection creation)
const collection = await chroma.getOrCreateCollection({
name: 'auto-embed-collection',
});
await collection.add({
ids: documents.map((d) => d.id),
documents: documents.map((d) => d.text), // Chroma auto-embeds these
metadatas: documents.map((d) => ({
source: d.source,
category: d.category,
})),
});
}
// ─── Usage ───
const documents = [
{
id: 'doc_001',
text: 'To reset your password, go to Settings > Security > Change Password.',
source: 'help-center',
category: 'account',
date: '2026-03-15',
chunkIndex: 0,
},
{
id: 'doc_002',
text: 'Refunds are processed within 5-7 business days after approval.',
source: 'help-center',
category: 'billing',
date: '2026-03-10',
chunkIndex: 0,
},
];
await storeDocuments(documents);
4.3 Storing with Qdrant
// ─── Setup: npm install @qdrant/js-client-rest openai ───
import { QdrantClient } from '@qdrant/js-client-rest';
import OpenAI from 'openai';
const qdrant = new QdrantClient({
url: 'http://localhost:6333', // Local Qdrant server
// For cloud: url: 'https://your-cluster.qdrant.io', apiKey: '...'
});
const openai = new OpenAI();
// ─── Step 1: Create a collection ───
async function createCollection() {
await qdrant.createCollection('knowledge-base', {
vectors: {
size: 1536, // Dimension of your embeddings
distance: 'Cosine', // Cosine | Euclid | Dot
},
});
console.log('Collection created: knowledge-base');
}
// ─── Step 2: Upsert vectors ───
async function storeDocuments(documents) {
const points = await Promise.all(
documents.map(async (doc, index) => {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: doc.text,
});
return {
id: index + 1, // Qdrant uses numeric IDs (or UUIDs)
vector: response.data[0].embedding,
payload: { // Qdrant calls metadata "payload"
text: doc.text,
source: doc.source,
category: doc.category,
date: doc.date,
},
};
})
);
await qdrant.upsert('knowledge-base', {
wait: true, // Wait for indexing to complete
points: points,
});
console.log(`Stored ${points.length} vectors in Qdrant`);
}
5. Indexing Strategies: How Vector Search Gets Fast
The magic of vector databases is that they don't do brute-force comparison against every vector. They use Approximate Nearest Neighbor (ANN) algorithms that trade a tiny bit of accuracy for massive speed improvements.
5.1 HNSW (Hierarchical Navigable Small World)
HNSW is the most popular indexing algorithm in modern vector databases. Think of it as building a multi-layer skip-list graph over your vectors.
How HNSW works (conceptual):
Layer 3 (top): A ────────────────── D (few nodes, long jumps)
│ │
Layer 2: A ──── B ──────── D ── E (more nodes, medium jumps)
│ │ │ │
Layer 1: A ─ B ─ C ──── D ─ E ─ F (many nodes, short jumps)
│ │ │ │ │ │
Layer 0 (bottom): A B C D E F G H I J K L M N (all nodes, finest detail)
Search: Start at top layer, greedily jump to nearest neighbor,
drop down a layer, repeat until reaching bottom layer.
Result: Find approximate nearest neighbors in O(log n) time.
Key properties of HNSW:
| Property | Value |
|---|---|
| Search speed | O(log n) — excellent |
| Recall | 95-99%+ (configurable) |
| Memory | High — stores the graph in memory |
| Insert speed | Moderate (must update the graph) |
| Best for | Most use cases, especially <100M vectors |
Tuning parameters:
M — Number of connections per node per layer
Higher M = better recall, more memory, slower insert
Default: 16, Range: 8-64
efConstruction — Build-time search depth
Higher = better index quality, slower build
Default: 200, Range: 100-500
efSearch — Query-time search depth
Higher = better recall, slower query
Default: 100, Range: 50-500
5.2 IVF (Inverted File Index)
IVF partitions the vector space into clusters (called Voronoi cells), then only searches the clusters closest to the query.
How IVF works (conceptual):
Step 1: Partition vectors into clusters using k-means
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Cluster 1│ │Cluster 2│ │Cluster 3│ │Cluster 4│
│ • • • │ │ • • │ │ • • • │ │ • • │
│ • • │ │ • • • │ │ • • │ │ • • │
│ • │ │ • │ │ • │ │ • │
└────────┘ └────────┘ └────────┘ └────────┘
Step 2: Query arrives → Find nearest cluster centroid(s)
Query vector ──→ Closest to Cluster 2 and Cluster 3
Step 3: Only search vectors IN those clusters (not all vectors)
Result: Much faster than brute force, slight accuracy trade-off
Key properties of IVF:
| Property | Value |
|---|---|
| Search speed | O(n/k) where k = number of clusters |
| Recall | 90-99% (depends on nprobe) |
| Memory | Lower than HNSW |
| Insert speed | Fast (just assign to cluster) |
| Best for | Very large datasets, memory-constrained environments |
Tuning parameters:
nlist — Number of clusters to create
Higher = more partitions, faster per-partition search, longer build
Rule of thumb: sqrt(n) to 4*sqrt(n) where n = total vectors
nprobe — Number of clusters to search at query time
Higher = better recall, slower query
Default: 1-10, can go up to nlist
nprobe=1: fastest, lowest recall
nprobe=nlist: equivalent to brute force
5.3 HNSW vs IVF: When to Use Which
| Factor | HNSW | IVF |
|---|---|---|
| Dataset size | Up to ~100M vectors | Any size, better for 100M+ |
| Memory | High (graph in memory) | Lower |
| Query speed | Faster | Slightly slower |
| Build time | Slower | Faster |
| Dynamic inserts | Good (graph updates) | Poor (may need re-clustering) |
| Used by | Pinecone, Qdrant, Chroma, pgvector | Milvus, FAISS, older systems |
Most modern vector databases (Pinecone, Qdrant, Chroma) use HNSW as their default index because it offers the best recall-to-speed trade-off for typical workloads.
6. Namespaces and Collections: Organizing Your Vectors
Vector databases provide organizational structures to separate different groups of vectors within the same database instance.
6.1 Collections
A collection is the primary organizational unit — like a table in SQL databases. Each collection has its own configuration (dimension, distance metric, index settings).
Vector Database Instance
├── Collection: "help-articles"
│ ├── dimension: 1536
│ ├── metric: cosine
│ └── vectors: 50,000 help center articles
│
├── Collection: "product-catalog"
│ ├── dimension: 1536
│ ├── metric: cosine
│ └── vectors: 200,000 product descriptions
│
└── Collection: "user-queries"
├── dimension: 1536
├── metric: cosine
└── vectors: 1,000,000 past search queries
When to use separate collections:
- Different types of data (articles vs products vs queries)
- Different embedding models (different dimensions)
- Different distance metrics
- Different index configurations (performance tuning)
- Multi-tenant isolation (one collection per customer)
6.2 Namespaces (Pinecone-specific)
Pinecone uses namespaces within an index to logically separate vectors. Unlike collections, all namespaces share the same index configuration.
// ─── Storing in different namespaces ───
const index = pinecone.index('my-app');
// Store help articles in the "help" namespace
await index.namespace('help').upsert([
{
id: 'help_001',
values: helpEmbedding,
metadata: { text: 'How to reset password', category: 'account' },
},
]);
// Store product data in the "products" namespace
await index.namespace('products').upsert([
{
id: 'prod_001',
values: productEmbedding,
metadata: { text: 'Wireless headphones', price: 49.99 },
},
]);
// Query only within a specific namespace
const helpResults = await index.namespace('help').query({
vector: queryEmbedding,
topK: 5,
});
// Only searches help articles, not products
Namespace use cases:
| Use Case | Implementation |
|---|---|
| Multi-tenant | One namespace per customer (tenant_123) |
| Data types | Separate namespaces for articles, FAQs, docs |
| Environments | staging vs production namespaces |
| Versioning | v1, v2 namespaces when re-embedding with a new model |
| A/B testing | Compare search results across different embeddings |
6.3 Partitioning strategies
Strategy 1: Collection per data type
└── Simplest, best for different schemas/dimensions
Strategy 2: Single collection + metadata filtering
└── All vectors in one collection, use metadata to filter by type
└── Simpler to manage, may be slower for very filtered queries
Strategy 3: Namespaces per tenant
└── One index, one namespace per customer
└── Good isolation without multiple indexes
Strategy 4: Index per environment
└── "myapp-staging" index, "myapp-production" index
└── Complete isolation between environments
7. Batch Ingestion: Loading Data at Scale
When you have thousands or millions of documents to store, you need efficient batch ingestion patterns.
// ─── Robust batch ingestion pipeline ───
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const openai = new OpenAI();
async function batchIngest(documents, indexName, options = {}) {
const {
batchSize = 100, // Vectors per upsert call
embeddingBatchSize = 50, // Texts per embedding call
namespace = '',
onProgress = () => {},
} = options;
const index = pinecone.index(indexName);
const ns = namespace ? index.namespace(namespace) : index;
let processed = 0;
const total = documents.length;
// Process in embedding batches
for (let i = 0; i < documents.length; i += embeddingBatchSize) {
const batch = documents.slice(i, i + embeddingBatchSize);
// Generate embeddings for the batch
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch.map((doc) => doc.text),
});
// Prepare vectors
const vectors = batch.map((doc, idx) => ({
id: doc.id,
values: response.data[idx].embedding,
metadata: {
text: doc.text.slice(0, 1000), // Pinecone metadata limit: truncate long text
source: doc.source || '',
category: doc.category || '',
date: doc.date || '',
},
}));
// Upsert in sub-batches if needed
for (let j = 0; j < vectors.length; j += batchSize) {
const upsertBatch = vectors.slice(j, j + batchSize);
await ns.upsert(upsertBatch);
}
processed += batch.length;
onProgress({ processed, total, percent: ((processed / total) * 100).toFixed(1) });
}
return { processed, total };
}
// ─── Usage ───
const documents = loadYourDocuments(); // Your document loading logic
const result = await batchIngest(documents, 'knowledge-base', {
batchSize: 100,
embeddingBatchSize: 50,
namespace: 'help-articles',
onProgress: ({ processed, total, percent }) => {
console.log(`Progress: ${processed}/${total} (${percent}%)`);
},
});
console.log(`Ingestion complete: ${result.processed} documents stored`);
Ingestion best practices
| Practice | Why |
|---|---|
| Batch embedding calls | OpenAI allows up to 2048 texts per embedding call — batch to reduce API calls and latency |
| Batch upserts | Most vector DBs limit upsert size (Pinecone: 100 vectors per call, up to 2MB) |
| Truncate metadata | Pinecone limits metadata to 40KB per vector. Store long text elsewhere and reference by ID |
| Use idempotent IDs | Using deterministic IDs (e.g., doc_${hash}) allows safe re-runs without duplicates |
| Handle rate limits | Add retry logic with exponential backoff for embedding API rate limits |
| Track progress | Log batch numbers so you can resume from the last successful batch on failure |
| Validate dimensions | Ensure all embeddings have the same dimension as the index/collection |
8. Updating and Deleting Vectors
Vectors are not static — documents change, get deleted, or need re-embedding.
// ─── Pinecone: Update, delete, and check ───
const index = pinecone.index('knowledge-base');
// Update: upsert with the same ID replaces the vector
await index.upsert([
{
id: 'doc_001',
values: newEmbedding, // Re-embedded with updated text
metadata: {
text: 'Updated: To reset your password, use the new Security Hub.',
source: 'help-center',
category: 'account',
date: '2026-04-01', // Updated date
},
},
]);
// Delete by ID
await index.deleteOne('doc_001');
// Delete multiple by ID
await index.deleteMany(['doc_001', 'doc_002', 'doc_003']);
// Delete by metadata filter (Pinecone serverless)
await index.deleteMany({
filter: {
source: { $eq: 'deprecated-source' },
},
});
// Delete all vectors in a namespace
await index.namespace('old-data').deleteAll();
// Check index statistics
const stats = await index.describeIndexStats();
console.log(stats);
// {
// dimension: 1536,
// indexFullness: 0,
// totalRecordCount: 2847,
// namespaces: {
// 'help-articles': { recordCount: 1500 },
// 'products': { recordCount: 1347 },
// }
// }
// ─── Chroma: Update and delete ───
const collection = await chroma.getCollection({ name: 'knowledge-base' });
// Update existing documents
await collection.update({
ids: ['doc_001'],
embeddings: [newEmbedding],
documents: ['Updated text content here.'],
metadatas: [{ source: 'help-center', category: 'account', date: '2026-04-01' }],
});
// Delete by ID
await collection.delete({
ids: ['doc_001', 'doc_002'],
});
// Delete by metadata filter
await collection.delete({
where: { source: 'deprecated-source' },
});
// Get collection info
const count = await collection.count();
console.log(`Collection has ${count} documents`);
9. Embedding Model and Dimension Considerations
Choosing the right embedding model affects your vector database setup and performance.
| Model | Provider | Dimensions | Performance | Cost |
|---|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | Good | $0.02/1M tokens |
text-embedding-3-large | OpenAI | 3072 | Better | $0.13/1M tokens |
text-embedding-ada-002 | OpenAI | 1536 | Older, still good | $0.10/1M tokens |
voyage-3 | Voyage AI | 1024 | Excellent for code | $0.06/1M tokens |
embed-v4.0 | Cohere | 1024 | Excellent multilingual | $0.10/1M tokens |
Open-source (e.g., all-MiniLM-L6-v2) | Hugging Face | 384 | Good for simple tasks | Free |
Dimension trade-offs
Lower dimensions (384-512):
+ Faster search
+ Less memory
+ Cheaper storage
- Less semantic nuance
- Lower recall on complex queries
Higher dimensions (1536-3072):
+ More semantic detail
+ Better recall on nuanced queries
+ Better for diverse content
- Slower search
- More memory and storage
- Higher cost
Sweet spot for most applications: 1024-1536 dimensions
Critical rule: dimension consistency
IMPORTANT: All vectors in a collection/index MUST have the same dimension.
If you create an index with dimension=1536:
✅ text-embedding-3-small (1536 dims) → works
❌ text-embedding-3-large (3072 dims) → ERROR: dimension mismatch
❌ all-MiniLM-L6-v2 (384 dims) → ERROR: dimension mismatch
If you change embedding models, you MUST:
1. Create a new collection/index with the new dimension
2. Re-embed ALL existing documents with the new model
3. Migrate to the new collection/index
4. Delete the old collection/index
10. Key Takeaways
- Vector databases are purpose-built for storing and searching high-dimensional embeddings — regular databases cannot perform similarity search efficiently at scale.
- Every vector record has three parts: a unique ID, the embedding vector, and metadata (JSON-like structured data about the source document).
- HNSW is the dominant indexing algorithm — it provides O(log n) approximate nearest-neighbor search with 95-99%+ recall; IVF is an alternative for very large or memory-constrained workloads.
- Collections organize vectors into logical groups (like tables); Pinecone also uses namespaces for lightweight partitioning within an index.
- Batch ingestion is essential at scale — batch your embedding API calls and vector upserts, use idempotent IDs, and track progress for resumability.
- Dimension consistency is non-negotiable — every vector in a collection must have the same number of dimensions, and changing embedding models requires full re-indexing.
Explain-It Challenge
- A colleague asks: "Why can't we just add a column to our PostgreSQL users table to store embeddings and run similarity search?" Explain when this works, when it breaks, and what pgvector offers.
- Your team needs to re-embed 2 million documents with a new model. Design the migration plan — what steps are needed, and how do you avoid downtime?
- Explain to a product manager (non-technical) what HNSW does and why "approximate" nearest-neighbor is acceptable for a search feature.
Navigation: <- 4.12 Overview | 4.12.b — Querying Similar Vectors ->