Episode 4 — Generative AI Engineering / 4.11 — Understanding Embeddings

4.11.b — Similarity Search

In one sentence: Similarity search finds documents whose embedding vectors are closest to a query vector — using metrics like cosine similarity, Euclidean distance, or dot product — enabling you to find relevant content by meaning rather than exact keyword matches.

Navigation: ← 4.11.a — What Embeddings Represent · 4.11.c — Document Chunking Strategies →

1. The Core Idea: Finding the Nearest Vectors

Once you have text converted to vectors (embeddings), the fundamental operation is: given a query vector, find the most similar vectors in a collection.

Similarity Search — Conceptual Flow:

  Step 1: You have a collection of embedded documents
  ┌────────────────────────────────────────────────────┐
  │  doc1: "React hooks tutorial"     → [0.12, -0.34, ...]  │
  │  doc2: "Python machine learning"  → [0.45, 0.23, ...]   │
  │  doc3: "Vue component patterns"   → [0.11, -0.31, ...]  │
  │  doc4: "Baking sourdough bread"   → [-0.56, 0.12, ...]  │
  │  doc5: "JavaScript event loop"    → [0.18, -0.29, ...]  │
  └────────────────────────────────────────────────────┘

  Step 2: User asks a question — embed the query
  "How do I manage state in React?" → [0.14, -0.32, ...]

  Step 3: Compare query vector to every document vector
  similarity(query, doc1) = 0.93  ← MOST SIMILAR
  similarity(query, doc3) = 0.87  ← Similar (Vue ≈ React)
  similarity(query, doc5) = 0.72  ← Somewhat related (JS)
  similarity(query, doc2) = 0.41  ← Different domain
  similarity(query, doc4) = 0.08  ← Completely unrelated

  Step 4: Return top-k most similar
  Results: [doc1, doc3, doc5]  (top 3)

This is the backbone of RAG (Retrieval-Augmented Generation), semantic search, recommendation systems, and duplicate detection.

2. Cosine Similarity — The Most Common Metric

Cosine similarity measures the angle between two vectors, ignoring their magnitude (length). It returns a value between -1 and 1:

Cosine Similarity Scale:

  1.0 ──── Identical meaning (same direction)
  0.9 ──── Very similar
  0.8 ──── Similar
  0.7 ──── Related
  0.5 ──── Somewhat related
  0.3 ──── Weakly related
  0.0 ──── No relationship (perpendicular)
 -1.0 ──── Opposite meaning (rare in practice)

The math (simplified)

cosine_similarity(A, B) = (A · B) / (|A| × |B|)

Where:
  A · B = sum of (A[i] × B[i]) for all dimensions    (dot product)
  |A|   = sqrt(sum of A[i]²) for all dimensions       (magnitude)
  |B|   = sqrt(sum of B[i]²) for all dimensions       (magnitude)

For NORMALIZED vectors (length = 1, which OpenAI returns):
  cosine_similarity(A, B) = A · B    (just the dot product!)
  Because |A| = 1 and |B| = 1, the denominator is 1.

Why cosine similarity works for embeddings

Geometric Intuition (2D simplified):

    ▲
    │      B (doc2)
    │     /
    │    /  θ = small angle → high similarity
    │   / /
    │  / /  A (query)
    │ / /
    │//
    └──────────────────►

    Cosine similarity = cos(θ)
    Small angle (similar direction) → cos(θ) ≈ 1.0
    Right angle (unrelated)         → cos(θ) = 0.0
    Opposite direction              → cos(θ) = -1.0

    KEY: It only cares about DIRECTION, not length.
    A short document and a long document about the same topic
    will have similar directions (high cosine similarity).

Implementation

// Cosine similarity between two vectors
function cosineSimilarity(vecA, vecB) {
  if (vecA.length !== vecB.length) {
    throw new Error('Vectors must have same dimensions');
  }

  let dotProduct = 0;
  let magnitudeA = 0;
  let magnitudeB = 0;

  for (let i = 0; i < vecA.length; i++) {
    dotProduct += vecA[i] * vecB[i];
    magnitudeA += vecA[i] * vecA[i];
    magnitudeB += vecB[i] * vecB[i];
  }

  magnitudeA = Math.sqrt(magnitudeA);
  magnitudeB = Math.sqrt(magnitudeB);

  if (magnitudeA === 0 || magnitudeB === 0) return 0;

  return dotProduct / (magnitudeA * magnitudeB);
}

// For normalized vectors (like OpenAI embeddings), this is equivalent:
function cosineSimilarityNormalized(vecA, vecB) {
  let dotProduct = 0;
  for (let i = 0; i < vecA.length; i++) {
    dotProduct += vecA[i] * vecB[i];
  }
  return dotProduct;
}

// Example usage
const similarity = cosineSimilarity(
  [0.1, 0.2, 0.3],
  [0.1, 0.2, 0.31]
);
console.log(similarity); // ~0.9999 (very similar)

3. Euclidean Distance

Euclidean distance measures the straight-line distance between two points in vector space. It's the "as the crow flies" distance.

Euclidean distance = sqrt( sum of (A[i] - B[i])² )

Scale:
  0.0  ──── Identical (same point)
  0.5  ──── Very close
  1.0  ──── Moderately close
  1.5+ ──── Far apart

NOTE: Unlike cosine similarity, LOWER is better.
      Euclidean distance = 0 means identical.
      Cosine similarity = 1 means identical.

Geometric intuition

    ▲
    │      ● B (doc2)
    │      |
    │      | d = Euclidean distance
    │      |
    │      ● A (query)
    │
    └──────────────────►

    Euclidean distance measures the LENGTH of the line
    connecting two points. Shorter = more similar.

Implementation

function euclideanDistance(vecA, vecB) {
  if (vecA.length !== vecB.length) {
    throw new Error('Vectors must have same dimensions');
  }

  let sum = 0;
  for (let i = 0; i < vecA.length; i++) {
    const diff = vecA[i] - vecB[i];
    sum += diff * diff;
  }

  return Math.sqrt(sum);
}

// Example
const distance = euclideanDistance(
  [1.0, 2.0, 3.0],
  [1.1, 2.1, 3.1]
);
console.log(distance); // ~0.173 (very close)

When to use Euclidean distance

Euclidean distance is SENSITIVE to vector magnitude.

  Vector A: [1, 0]     (short vector)
  Vector B: [100, 0]   (long vector, same direction)

  Cosine similarity:    1.0  (same direction → identical)
  Euclidean distance:   99.0 (far apart → different)

  For NORMALIZED vectors (like OpenAI embeddings, all length 1):
  Euclidean distance and cosine similarity are mathematically related:
    euclidean² = 2 × (1 - cosine_similarity)

  So for normalized vectors, they give equivalent rankings.

4. Dot Product

The dot product is the simplest metric — just multiply corresponding elements and sum them up. For normalized vectors, it equals cosine similarity.

dot_product(A, B) = sum of (A[i] × B[i])

For normalized vectors: dot_product = cosine_similarity
For non-normalized vectors: dot_product also considers magnitude

Implementation

function dotProduct(vecA, vecB) {
  if (vecA.length !== vecB.length) {
    throw new Error('Vectors must have same dimensions');
  }

  let result = 0;
  for (let i = 0; i < vecA.length; i++) {
    result += vecA[i] * vecB[i];
  }
  return result;
}

// For normalized OpenAI embeddings, this IS cosine similarity
const sim = dotProduct(embeddingA, embeddingB);
console.log(sim); // e.g., 0.87

Why use dot product over cosine similarity? It's slightly faster (no magnitude calculation needed) and for normalized vectors gives the same result. Most vector databases use dot product internally for this reason.

5. Comparing Similarity Metrics

Metric	Range	"Similar" means	Sensitive to magnitude?	Speed	Best for
Cosine similarity	-1 to 1	Higher is better	No	Medium	General-purpose, most embedding use cases
Euclidean distance	0 to infinity	Lower is better	Yes	Medium	When magnitude matters (rare for embeddings)
Dot product	-infinity to infinity	Higher is better	Yes	Fast	Normalized vectors (equivalent to cosine)

Decision tree

Which metric should I use?

  Are your vectors normalized (length = 1)?
  ├── YES (OpenAI embeddings are normalized)
  │   └── Use DOT PRODUCT (fastest, equivalent to cosine)
  │
  └── NO (custom embeddings, unnormalized)
      ├── Does magnitude carry meaning?
      │   ├── YES → Use EUCLIDEAN DISTANCE or DOT PRODUCT
      │   └── NO  → Use COSINE SIMILARITY (ignores magnitude)
      │
      └── Not sure → Use COSINE SIMILARITY (safest default)

Side-by-side comparison

import OpenAI from 'openai';

const openai = new OpenAI();

async function compareMetrics(textA, textB) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: [textA, textB],
  });

  const vecA = response.data[0].embedding;
  const vecB = response.data[1].embedding;

  console.log(`\nComparing:\n  A: "${textA}"\n  B: "${textB}"\n`);
  console.log(`  Cosine similarity:  ${cosineSimilarity(vecA, vecB).toFixed(4)}`);
  console.log(`  Dot product:        ${dotProduct(vecA, vecB).toFixed(4)}`);
  console.log(`  Euclidean distance: ${euclideanDistance(vecA, vecB).toFixed(4)}`);
}

await compareMetrics(
  'JavaScript is a programming language',
  'TypeScript adds types to JavaScript'
);
// Cosine similarity:  0.8732
// Dot product:        0.8732  (same — vectors are normalized)
// Euclidean distance: 0.5038  (low = similar)

await compareMetrics(
  'JavaScript is a programming language',
  'The weather is sunny today'
);
// Cosine similarity:  0.1245
// Dot product:        0.1245
// Euclidean distance: 1.3224  (high = dissimilar)

6. Practical Similarity Search

Building a simple in-memory search engine

import OpenAI from 'openai';

const openai = new OpenAI();

class SimpleVectorSearch {
  constructor() {
    this.documents = []; // { text, embedding }
  }

  // Index documents
  async addDocuments(texts) {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: texts,
    });

    for (let i = 0; i < texts.length; i++) {
      this.documents.push({
        text: texts[i],
        embedding: response.data[i].embedding,
      });
    }

    console.log(`Indexed ${texts.length} documents (total: ${this.documents.length})`);
  }

  // Search by query
  async search(query, topK = 3) {
    // Step 1: Embed the query
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: query,
    });
    const queryEmbedding = response.data[0].embedding;

    // Step 2: Calculate similarity to every document
    const results = this.documents.map(doc => ({
      text: doc.text,
      similarity: cosineSimilarity(queryEmbedding, doc.embedding),
    }));

    // Step 3: Sort by similarity (highest first) and return top-k
    results.sort((a, b) => b.similarity - a.similarity);
    return results.slice(0, topK);
  }
}

// Usage
const search = new SimpleVectorSearch();

await search.addDocuments([
  'React is a JavaScript library for building user interfaces',
  'Vue.js is a progressive framework for building UIs',
  'Express.js is a minimal web framework for Node.js',
  'PostgreSQL is a powerful open-source relational database',
  'MongoDB is a NoSQL document database',
  'Redis is an in-memory data structure store',
  'Docker containers package applications with dependencies',
  'Kubernetes orchestrates container deployments',
  'Git is a distributed version control system',
  'CI/CD pipelines automate testing and deployment',
]);

const results = await search.search('frontend framework for building web apps');
console.log('\nSearch: "frontend framework for building web apps"\n');
results.forEach((r, i) => {
  console.log(`  ${i + 1}. [${r.similarity.toFixed(3)}] ${r.text}`);
});

// Search: "frontend framework for building web apps"
//   1. [0.891] React is a JavaScript library for building user interfaces
//   2. [0.872] Vue.js is a progressive framework for building UIs
//   3. [0.634] Express.js is a minimal web framework for Node.js

7. Semantic Search vs Keyword Search

This is the killer feature of embeddings. Keyword search matches exact words. Semantic search matches meaning.

Example Query: "How do I handle errors in my API?"

KEYWORD SEARCH (traditional):
  Searches for exact words: "handle", "errors", "API"
  ✓ Matches: "Handle errors in your API with try-catch"
  ✗ Misses:  "Exception management for REST endpoints"  (no shared keywords!)
  ✗ Misses:  "Error recovery strategies for web services"

SEMANTIC SEARCH (embeddings):
  Searches for meaning: "dealing with problems in web service code"
  ✓ Matches: "Handle errors in your API with try-catch"       (sim: 0.91)
  ✓ Matches: "Exception management for REST endpoints"         (sim: 0.87)
  ✓ Matches: "Error recovery strategies for web services"      (sim: 0.85)
  ✓ Matches: "Debugging failed HTTP requests"                  (sim: 0.79)

Detailed comparison

Feature	Keyword Search	Semantic Search
Matches	Exact words/phrases	Meaning and concepts
"car" finds "automobile"?	No	Yes
Typo tolerance	Poor (unless fuzzy matching)	Good (embeddings are robust)
Speed	Very fast (inverted index)	Fast (approximate nearest neighbor)
Setup	Simple (text index)	Requires embedding model + vector DB
Cost	Free (no API calls)	Embedding API calls + vector storage
Precision on exact terms	Excellent	Good (may return loosely related results)
Multilingual	Needs separate indexes	Natural (concepts cross languages)

When to use each

Use KEYWORD SEARCH when:
  ✓ Users search for exact product names, IDs, error codes
  ✓ You need exact phrase matching ("error 404")
  ✓ Cost is a major constraint
  ✓ The vocabulary is well-defined and consistent

Use SEMANTIC SEARCH when:
  ✓ Users describe what they want in natural language
  ✓ Synonyms and paraphrasing are common
  ✓ Documents use different terminology than queries
  ✓ You're building a RAG pipeline

Use BOTH (hybrid search) when:
  ✓ You want the precision of keywords + the recall of semantics
  ✓ Most production search systems use hybrid approaches

Hybrid search example

async function hybridSearch(query, documents, topK = 5) {
  // Semantic search scores
  const semanticResults = await semanticSearch(query, documents);

  // Keyword search scores (simple TF-IDF approximation)
  const keywordResults = keywordSearch(query, documents);

  // Combine scores (weighted)
  const alpha = 0.7; // Weight for semantic search
  const combined = documents.map((doc, i) => ({
    text: doc.text,
    score: alpha * (semanticResults[i]?.similarity || 0) +
           (1 - alpha) * (keywordResults[i]?.score || 0),
  }));

  combined.sort((a, b) => b.score - a.score);
  return combined.slice(0, topK);
}

// Simple keyword scoring function
function keywordSearch(query, documents) {
  const queryWords = query.toLowerCase().split(/\s+/);

  return documents.map(doc => {
    const docWords = doc.text.toLowerCase().split(/\s+/);
    const matches = queryWords.filter(w => docWords.includes(w)).length;
    return {
      text: doc.text,
      score: matches / queryWords.length, // 0 to 1
    };
  });
}

8. Similarity Thresholds: When Is "Similar Enough"?

Not every result from similarity search is actually relevant. You need a threshold to filter out noise.

Cosine Similarity Threshold Guidelines (text-embedding-3-small):

  >= 0.90  │  Very high confidence — near-duplicate or paraphrase
  0.80-0.90│  High confidence — clearly about the same topic
  0.70-0.80│  Moderate confidence — related topic, probably relevant
  0.60-0.70│  Low confidence — tangentially related
  0.50-0.60│  Very low confidence — might be related by accident
  < 0.50   │  Noise — probably not relevant at all

  IMPORTANT: These thresholds vary by:
    - Embedding model (different models, different scales)
    - Domain (medical text vs casual conversation)
    - Application (strict factual retrieval vs broad exploration)
    - Document length (shorter texts tend to have lower similarity)

  Always CALIBRATE thresholds on your specific data.

Implementing thresholds

async function searchWithThreshold(query, documents, {
  topK = 5,
  minSimilarity = 0.7,
  maxResults = 10
} = {}) {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });
  const queryVec = response.data[0].embedding;

  const results = documents
    .map(doc => ({
      text: doc.text,
      metadata: doc.metadata,
      similarity: cosineSimilarity(queryVec, doc.embedding),
    }))
    .filter(r => r.similarity >= minSimilarity)  // Apply threshold
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, Math.min(topK, maxResults));

  if (results.length === 0) {
    console.log(`No results above threshold ${minSimilarity}`);
    return { results: [], belowThreshold: true };
  }

  return { results, belowThreshold: false };
}

// Usage with different thresholds for different scenarios
// Strict: factual lookup (only very relevant results)
const strictResults = await searchWithThreshold(query, docs, {
  minSimilarity: 0.85,
  topK: 3,
});

// Moderate: general search
const moderateResults = await searchWithThreshold(query, docs, {
  minSimilarity: 0.70,
  topK: 5,
});

// Loose: exploration (cast a wide net)
const looseResults = await searchWithThreshold(query, docs, {
  minSimilarity: 0.50,
  topK: 10,
});

Calibrating thresholds for your data

// Generate threshold calibration data
async function calibrateThreshold(testQueries) {
  // testQueries = [{ query, relevantDocs: [...], irrelevantDocs: [...] }]

  const allScores = { relevant: [], irrelevant: [] };

  for (const test of testQueries) {
    const queryEmb = await getEmbedding(test.query);

    for (const doc of test.relevantDocs) {
      const docEmb = await getEmbedding(doc);
      allScores.relevant.push(cosineSimilarity(queryEmb, docEmb));
    }

    for (const doc of test.irrelevantDocs) {
      const docEmb = await getEmbedding(doc);
      allScores.irrelevant.push(cosineSimilarity(queryEmb, docEmb));
    }
  }

  // Find the threshold that best separates relevant from irrelevant
  const avgRelevant = average(allScores.relevant);
  const avgIrrelevant = average(allScores.irrelevant);
  const suggestedThreshold = (avgRelevant + avgIrrelevant) / 2;

  console.log(`Average relevant similarity:   ${avgRelevant.toFixed(3)}`);
  console.log(`Average irrelevant similarity: ${avgIrrelevant.toFixed(3)}`);
  console.log(`Suggested threshold:           ${suggestedThreshold.toFixed(3)}`);

  return suggestedThreshold;
}

function average(arr) {
  return arr.reduce((a, b) => a + b, 0) / arr.length;
}

9. Performance: Brute-Force vs Approximate Nearest Neighbor

For small datasets (< 10,000 documents), comparing the query to every document (brute-force) is fine. For larger datasets, you need Approximate Nearest Neighbor (ANN) algorithms.

Search Performance:

  Documents │  Brute Force  │  ANN (HNSW)    │  ANN Accuracy
  ──────────┼───────────────┼────────────────┼──────────────
  1,000     │  < 1ms        │  < 1ms         │  99%+
  10,000    │  ~5ms         │  < 1ms         │  99%+
  100,000   │  ~50ms        │  ~1-2ms        │  98%+
  1,000,000 │  ~500ms       │  ~2-5ms        │  95-98%
  10,000,000│  ~5 seconds   │  ~5-10ms       │  95%+

  ANN algorithms trade a tiny bit of accuracy for MASSIVE speed gains.
  "Approximate" means it might miss the #1 result but will almost
  always find it in the top 5.

ANN algorithms (what vector databases use)

Common ANN Algorithms:

  HNSW (Hierarchical Navigable Small World):
    - Used by: Pinecone, Qdrant, Weaviate, pgvector
    - How: Builds a graph of vectors, navigates it like a skip list
    - Pros: Fast, accurate, good for dynamic data
    - Cons: High memory usage

  IVF (Inverted File Index):
    - Used by: FAISS, some Pinecone configurations
    - How: Clusters vectors, only searches nearby clusters
    - Pros: Lower memory, good for very large datasets
    - Cons: Requires training step, slightly less accurate

  Product Quantization (PQ):
    - Used by: FAISS (combined with IVF)
    - How: Compresses vectors to reduce memory
    - Pros: Very low memory
    - Cons: Lower accuracy, complex setup

In-memory search for small datasets

// For datasets under ~50,000 documents, brute force works fine
class BruteForceSearch {
  constructor() {
    this.vectors = [];  // Float32Array for efficiency
    this.metadata = [];
  }

  add(embedding, meta) {
    this.vectors.push(new Float32Array(embedding));
    this.metadata.push(meta);
  }

  search(queryEmbedding, topK = 5) {
    const queryVec = new Float32Array(queryEmbedding);
    const scores = [];

    for (let i = 0; i < this.vectors.length; i++) {
      // Dot product for normalized vectors = cosine similarity
      let score = 0;
      for (let j = 0; j < queryVec.length; j++) {
        score += queryVec[j] * this.vectors[i][j];
      }
      scores.push({ index: i, score, metadata: this.metadata[i] });
    }

    scores.sort((a, b) => b.score - a.score);
    return scores.slice(0, topK);
  }
}

10. Common Pitfalls

Pitfall	Explanation	Solution
Mixing embedding models	Query embedded with model A, documents with model B → garbage results	Always use the same model for queries and documents
No threshold filtering	Returning results with 0.3 similarity → irrelevant noise	Set a minimum similarity threshold
Embedding too much text	Embedding a 50-page document as one vector → diluted meaning	Chunk documents (see 4.11.c)
Ignoring metadata	Returning semantically similar but contextually wrong results	Combine similarity with metadata filters (date, category, author)
Not normalizing	Using cosine similarity on unnormalized vectors → correct but slower	Normalize vectors at index time, use dot product
Stale embeddings	Documents updated but embeddings not re-generated	Build a re-embedding pipeline triggered by content updates

11. Real-World Similarity Search Patterns

Pattern 1: Semantic FAQ lookup

// Find the most relevant FAQ for a user's question
async function findFAQ(userQuestion, faqs) {
  const results = await search.search(userQuestion, 1);

  if (results[0].similarity > 0.85) {
    return {
      found: true,
      question: results[0].text,
      answer: results[0].metadata.answer,
      confidence: results[0].similarity,
    };
  }

  return { found: false, message: 'No matching FAQ found' };
}

Pattern 2: Duplicate detection

// Check if a support ticket is a duplicate
async function isDuplicate(newTicket, existingTickets, threshold = 0.92) {
  const newEmb = await getEmbedding(newTicket.description);

  for (const ticket of existingTickets) {
    const sim = cosineSimilarity(newEmb, ticket.embedding);
    if (sim >= threshold) {
      return {
        isDuplicate: true,
        duplicateOf: ticket.id,
        similarity: sim,
      };
    }
  }

  return { isDuplicate: false };
}

Pattern 3: Content recommendation

// Recommend similar articles based on what the user just read
async function recommendSimilar(currentArticle, allArticles, topK = 5) {
  const currentEmb = await getEmbedding(currentArticle.text);

  const recommendations = allArticles
    .filter(a => a.id !== currentArticle.id) // Exclude current
    .map(article => ({
      ...article,
      similarity: cosineSimilarity(currentEmb, article.embedding),
    }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK);

  return recommendations;
}

12. Key Takeaways

Cosine similarity is the standard metric for embedding search — it measures angle between vectors, ranges from -1 to 1, higher is more similar.
For normalized vectors (like OpenAI embeddings), dot product equals cosine similarity and is faster to compute.
Semantic search finds meaning, not keywords — "fix a bug" matches "debugging techniques" even with zero shared words.
Set similarity thresholds to filter noise — 0.7-0.8 is a typical starting point, but always calibrate on your data.
Brute-force search works for up to ~50K documents — beyond that, use ANN algorithms via vector databases.
Hybrid search (semantic + keyword) is the production best practice — you get the recall of embeddings with the precision of keywords.
Never mix embedding models — the query and documents must be embedded with the same model.

Explain-It Challenge

A product manager asks "why doesn't our search find results when users type 'how to fix broken code' but the document says 'debugging techniques'?" Explain how switching from keyword to semantic search solves this.
Two documents have cosine similarity 0.65. Is this "similar enough" to show as a search result? What factors affect your decision?
Your vector database has 50 million documents. Why can't you just loop through all of them to find the closest match? What do real systems do instead?

Navigation: ← 4.11.a — What Embeddings Represent · 4.11.c — Document Chunking Strategies →