Episode 4 — Generative AI Engineering / 4.12 — Integrating Vector Databases

4.12.a — Storing Embeddings

In one sentence: Vector databases are purpose-built storage systems that hold high-dimensional embedding vectors alongside metadata, use specialized indexes like HNSW and IVF for lightning-fast approximate nearest-neighbor search, and organize data into namespaces and collections so you can scale from prototype to production.

Navigation: <- 4.12 Overview | 4.12.b — Querying Similar Vectors ->

1. What Is a Vector Database?

A vector database is a database specifically designed to store, index, and query high-dimensional vectors (embeddings). While a traditional database stores rows of structured data (names, dates, prices) and lets you query with exact matches or range filters, a vector database stores arrays of floating-point numbers and lets you query with "find me the most similar vectors."

Traditional Database:
  Query: SELECT * FROM products WHERE category = 'shoes' AND price < 50
  Result: Exact matches based on column values

Vector Database:
  Query: Find top 10 vectors closest to [0.023, -0.041, 0.087, ..., 0.015]
  Result: Ranked list of the most semantically similar items

Every embedding you generate (from text, images, audio, or any other data) is just a long array of numbers — typically 256 to 3072 floating-point values. A vector database is where those arrays live, get indexed for fast retrieval, and get paired with metadata that describes what each vector represents.

The anatomy of a stored vector

┌──────────────────────────────────────────────────────────────────┐
│  Vector Record                                                    │
│                                                                    │
│  id:        "doc_4821"                                            │
│  vector:    [0.023, -0.041, 0.087, 0.015, ..., -0.032]          │
│             └─── 1536 dimensions (OpenAI text-embedding-3-small) │
│  metadata:  {                                                     │
│               "source": "knowledge-base",                         │
│               "category": "billing",                              │
│               "date": "2026-03-15",                               │
│               "title": "How to update payment method",            │
│               "chunk_index": 2,                                   │
│               "text": "To update your payment method, go to..."   │
│             }                                                     │
└──────────────────────────────────────────────────────────────────┘

Each record has three parts:

ID — A unique identifier (string or number) you assign to each vector.
Vector — The embedding itself: an array of floats.
Metadata — A JSON-like object with any additional information (source, category, original text, dates, tags, etc.).

2. Why Regular Databases Are Not Enough

You might wonder: "Can't I just store embeddings as a JSON array column in PostgreSQL and query them?" Technically, yes. Practically, it fails at scale.

The math of brute-force search

If you have 1 million vectors, each with 1536 dimensions, a brute-force similarity search requires:

1,000,000 vectors x 1,536 dimensions x 2 operations (multiply + add) per dimension
= ~3 billion floating-point operations per query

At scale (10 queries/second):
= 30 billion operations per second — just for search

A single PostgreSQL query doing this would take seconds, not milliseconds. And it gets worse linearly with data size.

What vector databases solve

Challenge	Traditional DB	Vector DB
Similarity search	Full table scan, O(n) per query	ANN index, sub-linear search time
High-dimensional data	No native support	Built for 256-3072+ dimensions
Distance metrics	Must implement manually	Cosine, Euclidean, dot product built in
Scalability	Degrades with vector count	Optimized for billions of vectors
Real-time updates	Works fine	Varies (some need re-indexing)
Metadata filtering	Excellent (SQL WHERE)	Good (varies by DB, improving rapidly)

The exception: pgvector

pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. It bridges the gap — you get real vector indexing inside a database you already know. It is an excellent choice when:

You already use PostgreSQL
Your dataset is under ~5 million vectors
You want SQL joins between vector data and relational data
You prefer operational simplicity (one database, not two)

For larger datasets or dedicated vector workloads, purpose-built vector databases typically outperform pgvector.

3. Popular Vector Databases Compared

Database	Type	Hosting	Language	Best For	Pricing Model
Pinecone	Managed cloud	Cloud only	Any (REST API)	Production SaaS, zero-ops	Per-vector storage + queries
Weaviate	Open source + cloud	Self-host or cloud	Any (REST/GraphQL)	Hybrid search (vector + keyword)	Free self-host, paid cloud
Chroma	Open source	Local or embedded	Python, JS	Prototyping, local development	Free (open source)
Qdrant	Open source + cloud	Self-host or cloud	Any (REST/gRPC)	Performance-critical apps	Free self-host, paid cloud
pgvector	PostgreSQL extension	Anywhere Postgres runs	SQL	Teams already on Postgres	Free (extension)
Milvus	Open source + cloud	Self-host or Zilliz cloud	Any (SDK)	Massive-scale (billions of vectors)	Free self-host, paid cloud

Choosing a vector database

Decision tree:

1. Just prototyping or learning?
   → Chroma (runs locally, zero setup, great JS/Python SDK)

2. Already using PostgreSQL?
   → pgvector (add vector search without new infrastructure)

3. Want zero-ops managed service?
   → Pinecone (fully managed, generous free tier)

4. Need hybrid search (vector + full-text keyword)?
   → Weaviate (built-in BM25 + vector search)

5. Need maximum performance + control?
   → Qdrant (Rust-based, excellent filtering performance)

6. Need to handle billions of vectors?
   → Milvus / Zilliz Cloud (built for massive scale)

4. Storing Embeddings: Code Examples

4.1 Storing with Pinecone

Pinecone is a fully managed vector database. You create an index (like a table), then upsert (insert or update) vectors into it.

// ─── Setup: npm install @pinecone-database/pinecone openai ───

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY,
});

const openai = new OpenAI();

// ─── Step 1: Create an index (do this once) ───

async function createIndex() {
  await pinecone.createIndex({
    name: 'knowledge-base',
    dimension: 1536,              // Must match your embedding model's output
    metric: 'cosine',             // cosine | euclidean | dotproduct
    spec: {
      serverless: {
        cloud: 'aws',
        region: 'us-east-1',
      },
    },
  });
  console.log('Index created: knowledge-base');
}

// ─── Step 2: Generate embeddings for your documents ───

async function generateEmbedding(text) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return response.data[0].embedding; // Array of 1536 floats
}

// ─── Step 3: Upsert vectors with metadata ───

async function storeDocuments(documents) {
  const index = pinecone.index('knowledge-base');

  // Generate embeddings for all documents
  const vectors = await Promise.all(
    documents.map(async (doc) => {
      const embedding = await generateEmbedding(doc.text);
      return {
        id: doc.id,
        values: embedding,
        metadata: {
          text: doc.text,             // Store original text for retrieval
          source: doc.source,
          category: doc.category,
          date: doc.date,
          chunk_index: doc.chunkIndex,
        },
      };
    })
  );

  // Upsert in batches of 100 (Pinecone limit per request)
  const BATCH_SIZE = 100;
  for (let i = 0; i < vectors.length; i += BATCH_SIZE) {
    const batch = vectors.slice(i, i + BATCH_SIZE);
    await index.upsert(batch);
    console.log(`Upserted batch ${Math.floor(i / BATCH_SIZE) + 1}`);
  }

  console.log(`Stored ${vectors.length} vectors in Pinecone`);
}

// ─── Usage ───

const documents = [
  {
    id: 'doc_001',
    text: 'To reset your password, go to Settings > Security > Change Password.',
    source: 'help-center',
    category: 'account',
    date: '2026-03-15',
    chunkIndex: 0,
  },
  {
    id: 'doc_002',
    text: 'Refunds are processed within 5-7 business days after approval.',
    source: 'help-center',
    category: 'billing',
    date: '2026-03-10',
    chunkIndex: 0,
  },
  {
    id: 'doc_003',
    text: 'Two-factor authentication adds an extra layer of security to your account.',
    source: 'help-center',
    category: 'account',
    date: '2026-03-20',
    chunkIndex: 0,
  },
];

await storeDocuments(documents);

4.2 Storing with Chroma

Chroma is an open-source vector database that runs locally or embedded in your application. Perfect for development and smaller-scale production.

// ─── Setup: npm install chromadb openai ───

import { ChromaClient } from 'chromadb';
import OpenAI from 'openai';

const chroma = new ChromaClient(); // Connects to local Chroma server
const openai = new OpenAI();

// ─── Step 1: Create a collection (like a table) ───

async function setupCollection() {
  // getOrCreateCollection is idempotent — safe to call multiple times
  const collection = await chroma.getOrCreateCollection({
    name: 'knowledge-base',
    metadata: {
      'hnsw:space': 'cosine',     // Distance metric: cosine | l2 | ip
    },
  });
  return collection;
}

// ─── Step 2: Generate embeddings ───

async function generateEmbeddings(texts) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,                  // Chroma can batch multiple texts
  });
  return response.data.map((item) => item.embedding);
}

// ─── Step 3: Add documents with metadata ───

async function storeDocuments(documents) {
  const collection = await setupCollection();

  const ids = documents.map((doc) => doc.id);
  const texts = documents.map((doc) => doc.text);
  const metadatas = documents.map((doc) => ({
    source: doc.source,
    category: doc.category,
    date: doc.date,
    chunk_index: doc.chunkIndex,
  }));

  // Generate embeddings
  const embeddings = await generateEmbeddings(texts);

  // Add to collection
  // Chroma also stores the original text in a "documents" field
  await collection.add({
    ids: ids,
    embeddings: embeddings,
    documents: texts,              // Chroma stores original text natively
    metadatas: metadatas,
  });

  console.log(`Stored ${documents.length} documents in Chroma`);
}

// ─── Alternative: Let Chroma generate embeddings for you ───

async function storeWithBuiltInEmbedding(documents) {
  // Chroma can use a built-in embedding function
  // (requires configuring an embedding function on collection creation)
  const collection = await chroma.getOrCreateCollection({
    name: 'auto-embed-collection',
  });

  await collection.add({
    ids: documents.map((d) => d.id),
    documents: documents.map((d) => d.text),   // Chroma auto-embeds these
    metadatas: documents.map((d) => ({
      source: d.source,
      category: d.category,
    })),
  });
}

// ─── Usage ───

const documents = [
  {
    id: 'doc_001',
    text: 'To reset your password, go to Settings > Security > Change Password.',
    source: 'help-center',
    category: 'account',
    date: '2026-03-15',
    chunkIndex: 0,
  },
  {
    id: 'doc_002',
    text: 'Refunds are processed within 5-7 business days after approval.',
    source: 'help-center',
    category: 'billing',
    date: '2026-03-10',
    chunkIndex: 0,
  },
];

await storeDocuments(documents);

4.3 Storing with Qdrant

// ─── Setup: npm install @qdrant/js-client-rest openai ───

import { QdrantClient } from '@qdrant/js-client-rest';
import OpenAI from 'openai';

const qdrant = new QdrantClient({
  url: 'http://localhost:6333',    // Local Qdrant server
  // For cloud: url: 'https://your-cluster.qdrant.io', apiKey: '...'
});

const openai = new OpenAI();

// ─── Step 1: Create a collection ───

async function createCollection() {
  await qdrant.createCollection('knowledge-base', {
    vectors: {
      size: 1536,                  // Dimension of your embeddings
      distance: 'Cosine',         // Cosine | Euclid | Dot
    },
  });
  console.log('Collection created: knowledge-base');
}

// ─── Step 2: Upsert vectors ───

async function storeDocuments(documents) {
  const points = await Promise.all(
    documents.map(async (doc, index) => {
      const response = await openai.embeddings.create({
        model: 'text-embedding-3-small',
        input: doc.text,
      });

      return {
        id: index + 1,             // Qdrant uses numeric IDs (or UUIDs)
        vector: response.data[0].embedding,
        payload: {                 // Qdrant calls metadata "payload"
          text: doc.text,
          source: doc.source,
          category: doc.category,
          date: doc.date,
        },
      };
    })
  );

  await qdrant.upsert('knowledge-base', {
    wait: true,                    // Wait for indexing to complete
    points: points,
  });

  console.log(`Stored ${points.length} vectors in Qdrant`);
}

5. Indexing Strategies: How Vector Search Gets Fast

The magic of vector databases is that they don't do brute-force comparison against every vector. They use Approximate Nearest Neighbor (ANN) algorithms that trade a tiny bit of accuracy for massive speed improvements.

5.1 HNSW (Hierarchical Navigable Small World)

HNSW is the most popular indexing algorithm in modern vector databases. Think of it as building a multi-layer skip-list graph over your vectors.

How HNSW works (conceptual):

Layer 3 (top):     A ────────────────── D          (few nodes, long jumps)
                   │                    │
Layer 2:           A ──── B ──────── D ── E        (more nodes, medium jumps)
                   │      │          │    │
Layer 1:           A ─ B ─ C ──── D ─ E ─ F       (many nodes, short jumps)
                   │   │   │      │   │   │
Layer 0 (bottom):  A B C D E F G H I J K L M N    (all nodes, finest detail)

Search: Start at top layer, greedily jump to nearest neighbor,
        drop down a layer, repeat until reaching bottom layer.
        Result: Find approximate nearest neighbors in O(log n) time.

Key properties of HNSW:

Property	Value
Search speed	O(log n) — excellent
Recall	95-99%+ (configurable)
Memory	High — stores the graph in memory
Insert speed	Moderate (must update the graph)
Best for	Most use cases, especially <100M vectors

Tuning parameters:

M          — Number of connections per node per layer
             Higher M = better recall, more memory, slower insert
             Default: 16, Range: 8-64

efConstruction — Build-time search depth
                 Higher = better index quality, slower build
                 Default: 200, Range: 100-500

efSearch   — Query-time search depth
             Higher = better recall, slower query
             Default: 100, Range: 50-500

5.2 IVF (Inverted File Index)

IVF partitions the vector space into clusters (called Voronoi cells), then only searches the clusters closest to the query.

How IVF works (conceptual):

Step 1: Partition vectors into clusters using k-means
┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│Cluster 1│  │Cluster 2│  │Cluster 3│  │Cluster 4│
│ • • •   │  │  • •    │  │ • • •  │  │   • •   │
│  •  •   │  │ •  • •  │  │  • •   │  │  •   •  │
│   •     │  │    •    │  │   •    │  │    •    │
└────────┘  └────────┘  └────────┘  └────────┘

Step 2: Query arrives → Find nearest cluster centroid(s)
Query vector ──→ Closest to Cluster 2 and Cluster 3

Step 3: Only search vectors IN those clusters (not all vectors)
Result: Much faster than brute force, slight accuracy trade-off

Key properties of IVF:

Property	Value
Search speed	O(n/k) where k = number of clusters
Recall	90-99% (depends on nprobe)
Memory	Lower than HNSW
Insert speed	Fast (just assign to cluster)
Best for	Very large datasets, memory-constrained environments

Tuning parameters:

nlist   — Number of clusters to create
          Higher = more partitions, faster per-partition search, longer build
          Rule of thumb: sqrt(n) to 4*sqrt(n) where n = total vectors

nprobe  — Number of clusters to search at query time
          Higher = better recall, slower query
          Default: 1-10, can go up to nlist
          nprobe=1: fastest, lowest recall
          nprobe=nlist: equivalent to brute force

5.3 HNSW vs IVF: When to Use Which

Factor	HNSW	IVF
Dataset size	Up to ~100M vectors	Any size, better for 100M+
Memory	High (graph in memory)	Lower
Query speed	Faster	Slightly slower
Build time	Slower	Faster
Dynamic inserts	Good (graph updates)	Poor (may need re-clustering)
Used by	Pinecone, Qdrant, Chroma, pgvector	Milvus, FAISS, older systems

Most modern vector databases (Pinecone, Qdrant, Chroma) use HNSW as their default index because it offers the best recall-to-speed trade-off for typical workloads.

6. Namespaces and Collections: Organizing Your Vectors

Vector databases provide organizational structures to separate different groups of vectors within the same database instance.

6.1 Collections

A collection is the primary organizational unit — like a table in SQL databases. Each collection has its own configuration (dimension, distance metric, index settings).

Vector Database Instance
├── Collection: "help-articles"
│   ├── dimension: 1536
│   ├── metric: cosine
│   └── vectors: 50,000 help center articles
│
├── Collection: "product-catalog"
│   ├── dimension: 1536
│   ├── metric: cosine
│   └── vectors: 200,000 product descriptions
│
└── Collection: "user-queries"
    ├── dimension: 1536
    ├── metric: cosine
    └── vectors: 1,000,000 past search queries

When to use separate collections:

Different types of data (articles vs products vs queries)
Different embedding models (different dimensions)
Different distance metrics
Different index configurations (performance tuning)
Multi-tenant isolation (one collection per customer)

6.2 Namespaces (Pinecone-specific)

Pinecone uses namespaces within an index to logically separate vectors. Unlike collections, all namespaces share the same index configuration.

// ─── Storing in different namespaces ───

const index = pinecone.index('my-app');

// Store help articles in the "help" namespace
await index.namespace('help').upsert([
  {
    id: 'help_001',
    values: helpEmbedding,
    metadata: { text: 'How to reset password', category: 'account' },
  },
]);

// Store product data in the "products" namespace
await index.namespace('products').upsert([
  {
    id: 'prod_001',
    values: productEmbedding,
    metadata: { text: 'Wireless headphones', price: 49.99 },
  },
]);

// Query only within a specific namespace
const helpResults = await index.namespace('help').query({
  vector: queryEmbedding,
  topK: 5,
});
// Only searches help articles, not products

Namespace use cases:

Use Case	Implementation
Multi-tenant	One namespace per customer (`tenant_123`)
Data types	Separate namespaces for articles, FAQs, docs
Environments	`staging` vs `production` namespaces
Versioning	`v1`, `v2` namespaces when re-embedding with a new model
A/B testing	Compare search results across different embeddings

6.3 Partitioning strategies

Strategy 1: Collection per data type
  └── Simplest, best for different schemas/dimensions

Strategy 2: Single collection + metadata filtering
  └── All vectors in one collection, use metadata to filter by type
  └── Simpler to manage, may be slower for very filtered queries

Strategy 3: Namespaces per tenant
  └── One index, one namespace per customer
  └── Good isolation without multiple indexes

Strategy 4: Index per environment
  └── "myapp-staging" index, "myapp-production" index
  └── Complete isolation between environments

7. Batch Ingestion: Loading Data at Scale

When you have thousands or millions of documents to store, you need efficient batch ingestion patterns.

// ─── Robust batch ingestion pipeline ───

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const openai = new OpenAI();

async function batchIngest(documents, indexName, options = {}) {
  const {
    batchSize = 100,          // Vectors per upsert call
    embeddingBatchSize = 50,  // Texts per embedding call
    namespace = '',
    onProgress = () => {},
  } = options;

  const index = pinecone.index(indexName);
  const ns = namespace ? index.namespace(namespace) : index;

  let processed = 0;
  const total = documents.length;

  // Process in embedding batches
  for (let i = 0; i < documents.length; i += embeddingBatchSize) {
    const batch = documents.slice(i, i + embeddingBatchSize);

    // Generate embeddings for the batch
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map((doc) => doc.text),
    });

    // Prepare vectors
    const vectors = batch.map((doc, idx) => ({
      id: doc.id,
      values: response.data[idx].embedding,
      metadata: {
        text: doc.text.slice(0, 1000),  // Pinecone metadata limit: truncate long text
        source: doc.source || '',
        category: doc.category || '',
        date: doc.date || '',
      },
    }));

    // Upsert in sub-batches if needed
    for (let j = 0; j < vectors.length; j += batchSize) {
      const upsertBatch = vectors.slice(j, j + batchSize);
      await ns.upsert(upsertBatch);
    }

    processed += batch.length;
    onProgress({ processed, total, percent: ((processed / total) * 100).toFixed(1) });
  }

  return { processed, total };
}

// ─── Usage ───

const documents = loadYourDocuments(); // Your document loading logic

const result = await batchIngest(documents, 'knowledge-base', {
  batchSize: 100,
  embeddingBatchSize: 50,
  namespace: 'help-articles',
  onProgress: ({ processed, total, percent }) => {
    console.log(`Progress: ${processed}/${total} (${percent}%)`);
  },
});

console.log(`Ingestion complete: ${result.processed} documents stored`);

Ingestion best practices

Practice	Why
Batch embedding calls	OpenAI allows up to 2048 texts per embedding call — batch to reduce API calls and latency
Batch upserts	Most vector DBs limit upsert size (Pinecone: 100 vectors per call, up to 2MB)
Truncate metadata	Pinecone limits metadata to 40KB per vector. Store long text elsewhere and reference by ID
Use idempotent IDs	Using deterministic IDs (e.g., `doc_${hash}`) allows safe re-runs without duplicates
Handle rate limits	Add retry logic with exponential backoff for embedding API rate limits
Track progress	Log batch numbers so you can resume from the last successful batch on failure
Validate dimensions	Ensure all embeddings have the same dimension as the index/collection

8. Updating and Deleting Vectors

Vectors are not static — documents change, get deleted, or need re-embedding.

// ─── Pinecone: Update, delete, and check ───

const index = pinecone.index('knowledge-base');

// Update: upsert with the same ID replaces the vector
await index.upsert([
  {
    id: 'doc_001',
    values: newEmbedding,       // Re-embedded with updated text
    metadata: {
      text: 'Updated: To reset your password, use the new Security Hub.',
      source: 'help-center',
      category: 'account',
      date: '2026-04-01',       // Updated date
    },
  },
]);

// Delete by ID
await index.deleteOne('doc_001');

// Delete multiple by ID
await index.deleteMany(['doc_001', 'doc_002', 'doc_003']);

// Delete by metadata filter (Pinecone serverless)
await index.deleteMany({
  filter: {
    source: { $eq: 'deprecated-source' },
  },
});

// Delete all vectors in a namespace
await index.namespace('old-data').deleteAll();

// Check index statistics
const stats = await index.describeIndexStats();
console.log(stats);
// {
//   dimension: 1536,
//   indexFullness: 0,
//   totalRecordCount: 2847,
//   namespaces: {
//     'help-articles': { recordCount: 1500 },
//     'products': { recordCount: 1347 },
//   }
// }

// ─── Chroma: Update and delete ───

const collection = await chroma.getCollection({ name: 'knowledge-base' });

// Update existing documents
await collection.update({
  ids: ['doc_001'],
  embeddings: [newEmbedding],
  documents: ['Updated text content here.'],
  metadatas: [{ source: 'help-center', category: 'account', date: '2026-04-01' }],
});

// Delete by ID
await collection.delete({
  ids: ['doc_001', 'doc_002'],
});

// Delete by metadata filter
await collection.delete({
  where: { source: 'deprecated-source' },
});

// Get collection info
const count = await collection.count();
console.log(`Collection has ${count} documents`);

9. Embedding Model and Dimension Considerations

Choosing the right embedding model affects your vector database setup and performance.

Model	Provider	Dimensions	Performance	Cost
`text-embedding-3-small`	OpenAI	1536	Good	$0.02/1M tokens
`text-embedding-3-large`	OpenAI	3072	Better	$0.13/1M tokens
`text-embedding-ada-002`	OpenAI	1536	Older, still good	$0.10/1M tokens
`voyage-3`	Voyage AI	1024	Excellent for code	$0.06/1M tokens
`embed-v4.0`	Cohere	1024	Excellent multilingual	$0.10/1M tokens
Open-source (e.g., `all-MiniLM-L6-v2`)	Hugging Face	384	Good for simple tasks	Free

Dimension trade-offs

Lower dimensions (384-512):
  + Faster search
  + Less memory
  + Cheaper storage
  - Less semantic nuance
  - Lower recall on complex queries

Higher dimensions (1536-3072):
  + More semantic detail
  + Better recall on nuanced queries
  + Better for diverse content
  - Slower search
  - More memory and storage
  - Higher cost

Sweet spot for most applications: 1024-1536 dimensions

Critical rule: dimension consistency

IMPORTANT: All vectors in a collection/index MUST have the same dimension.

If you create an index with dimension=1536:
  ✅ text-embedding-3-small (1536 dims) → works
  ❌ text-embedding-3-large (3072 dims) → ERROR: dimension mismatch
  ❌ all-MiniLM-L6-v2 (384 dims)       → ERROR: dimension mismatch

If you change embedding models, you MUST:
  1. Create a new collection/index with the new dimension
  2. Re-embed ALL existing documents with the new model
  3. Migrate to the new collection/index
  4. Delete the old collection/index

10. Key Takeaways

Vector databases are purpose-built for storing and searching high-dimensional embeddings — regular databases cannot perform similarity search efficiently at scale.
Every vector record has three parts: a unique ID, the embedding vector, and metadata (JSON-like structured data about the source document).
HNSW is the dominant indexing algorithm — it provides O(log n) approximate nearest-neighbor search with 95-99%+ recall; IVF is an alternative for very large or memory-constrained workloads.
Collections organize vectors into logical groups (like tables); Pinecone also uses namespaces for lightweight partitioning within an index.
Batch ingestion is essential at scale — batch your embedding API calls and vector upserts, use idempotent IDs, and track progress for resumability.
Dimension consistency is non-negotiable — every vector in a collection must have the same number of dimensions, and changing embedding models requires full re-indexing.

Explain-It Challenge

A colleague asks: "Why can't we just add a column to our PostgreSQL users table to store embeddings and run similarity search?" Explain when this works, when it breaks, and what pgvector offers.
Your team needs to re-embed 2 million documents with a new model. Design the migration plan — what steps are needed, and how do you avoid downtime?
Explain to a product manager (non-technical) what HNSW does and why "approximate" nearest-neighbor is acceptable for a search feature.

Navigation: <- 4.12 Overview | 4.12.b — Querying Similar Vectors ->