Episode 4 — Generative AI Engineering / 4.11 — Understanding Embeddings
4.11.b — Similarity Search
In one sentence: Similarity search finds documents whose embedding vectors are closest to a query vector — using metrics like cosine similarity, Euclidean distance, or dot product — enabling you to find relevant content by meaning rather than exact keyword matches.
Navigation: ← 4.11.a — What Embeddings Represent · 4.11.c — Document Chunking Strategies →
1. The Core Idea: Finding the Nearest Vectors
Once you have text converted to vectors (embeddings), the fundamental operation is: given a query vector, find the most similar vectors in a collection.
Similarity Search — Conceptual Flow:
Step 1: You have a collection of embedded documents
┌────────────────────────────────────────────────────┐
│ doc1: "React hooks tutorial" → [0.12, -0.34, ...] │
│ doc2: "Python machine learning" → [0.45, 0.23, ...] │
│ doc3: "Vue component patterns" → [0.11, -0.31, ...] │
│ doc4: "Baking sourdough bread" → [-0.56, 0.12, ...] │
│ doc5: "JavaScript event loop" → [0.18, -0.29, ...] │
└────────────────────────────────────────────────────┘
Step 2: User asks a question — embed the query
"How do I manage state in React?" → [0.14, -0.32, ...]
Step 3: Compare query vector to every document vector
similarity(query, doc1) = 0.93 ← MOST SIMILAR
similarity(query, doc3) = 0.87 ← Similar (Vue ≈ React)
similarity(query, doc5) = 0.72 ← Somewhat related (JS)
similarity(query, doc2) = 0.41 ← Different domain
similarity(query, doc4) = 0.08 ← Completely unrelated
Step 4: Return top-k most similar
Results: [doc1, doc3, doc5] (top 3)
This is the backbone of RAG (Retrieval-Augmented Generation), semantic search, recommendation systems, and duplicate detection.
2. Cosine Similarity — The Most Common Metric
Cosine similarity measures the angle between two vectors, ignoring their magnitude (length). It returns a value between -1 and 1:
Cosine Similarity Scale:
1.0 ──── Identical meaning (same direction)
0.9 ──── Very similar
0.8 ──── Similar
0.7 ──── Related
0.5 ──── Somewhat related
0.3 ──── Weakly related
0.0 ──── No relationship (perpendicular)
-1.0 ──── Opposite meaning (rare in practice)
The math (simplified)
cosine_similarity(A, B) = (A · B) / (|A| × |B|)
Where:
A · B = sum of (A[i] × B[i]) for all dimensions (dot product)
|A| = sqrt(sum of A[i]²) for all dimensions (magnitude)
|B| = sqrt(sum of B[i]²) for all dimensions (magnitude)
For NORMALIZED vectors (length = 1, which OpenAI returns):
cosine_similarity(A, B) = A · B (just the dot product!)
Because |A| = 1 and |B| = 1, the denominator is 1.
Why cosine similarity works for embeddings
Geometric Intuition (2D simplified):
▲
│ B (doc2)
│ /
│ / θ = small angle → high similarity
│ / /
│ / / A (query)
│ / /
│//
└──────────────────►
Cosine similarity = cos(θ)
Small angle (similar direction) → cos(θ) ≈ 1.0
Right angle (unrelated) → cos(θ) = 0.0
Opposite direction → cos(θ) = -1.0
KEY: It only cares about DIRECTION, not length.
A short document and a long document about the same topic
will have similar directions (high cosine similarity).
Implementation
// Cosine similarity between two vectors
function cosineSimilarity(vecA, vecB) {
if (vecA.length !== vecB.length) {
throw new Error('Vectors must have same dimensions');
}
let dotProduct = 0;
let magnitudeA = 0;
let magnitudeB = 0;
for (let i = 0; i < vecA.length; i++) {
dotProduct += vecA[i] * vecB[i];
magnitudeA += vecA[i] * vecA[i];
magnitudeB += vecB[i] * vecB[i];
}
magnitudeA = Math.sqrt(magnitudeA);
magnitudeB = Math.sqrt(magnitudeB);
if (magnitudeA === 0 || magnitudeB === 0) return 0;
return dotProduct / (magnitudeA * magnitudeB);
}
// For normalized vectors (like OpenAI embeddings), this is equivalent:
function cosineSimilarityNormalized(vecA, vecB) {
let dotProduct = 0;
for (let i = 0; i < vecA.length; i++) {
dotProduct += vecA[i] * vecB[i];
}
return dotProduct;
}
// Example usage
const similarity = cosineSimilarity(
[0.1, 0.2, 0.3],
[0.1, 0.2, 0.31]
);
console.log(similarity); // ~0.9999 (very similar)
3. Euclidean Distance
Euclidean distance measures the straight-line distance between two points in vector space. It's the "as the crow flies" distance.
Euclidean distance = sqrt( sum of (A[i] - B[i])² )
Scale:
0.0 ──── Identical (same point)
0.5 ──── Very close
1.0 ──── Moderately close
1.5+ ──── Far apart
NOTE: Unlike cosine similarity, LOWER is better.
Euclidean distance = 0 means identical.
Cosine similarity = 1 means identical.
Geometric intuition
▲
│ ● B (doc2)
│ |
│ | d = Euclidean distance
│ |
│ ● A (query)
│
└──────────────────►
Euclidean distance measures the LENGTH of the line
connecting two points. Shorter = more similar.
Implementation
function euclideanDistance(vecA, vecB) {
if (vecA.length !== vecB.length) {
throw new Error('Vectors must have same dimensions');
}
let sum = 0;
for (let i = 0; i < vecA.length; i++) {
const diff = vecA[i] - vecB[i];
sum += diff * diff;
}
return Math.sqrt(sum);
}
// Example
const distance = euclideanDistance(
[1.0, 2.0, 3.0],
[1.1, 2.1, 3.1]
);
console.log(distance); // ~0.173 (very close)
When to use Euclidean distance
Euclidean distance is SENSITIVE to vector magnitude.
Vector A: [1, 0] (short vector)
Vector B: [100, 0] (long vector, same direction)
Cosine similarity: 1.0 (same direction → identical)
Euclidean distance: 99.0 (far apart → different)
For NORMALIZED vectors (like OpenAI embeddings, all length 1):
Euclidean distance and cosine similarity are mathematically related:
euclidean² = 2 × (1 - cosine_similarity)
So for normalized vectors, they give equivalent rankings.
4. Dot Product
The dot product is the simplest metric — just multiply corresponding elements and sum them up. For normalized vectors, it equals cosine similarity.
dot_product(A, B) = sum of (A[i] × B[i])
For normalized vectors: dot_product = cosine_similarity
For non-normalized vectors: dot_product also considers magnitude
Implementation
function dotProduct(vecA, vecB) {
if (vecA.length !== vecB.length) {
throw new Error('Vectors must have same dimensions');
}
let result = 0;
for (let i = 0; i < vecA.length; i++) {
result += vecA[i] * vecB[i];
}
return result;
}
// For normalized OpenAI embeddings, this IS cosine similarity
const sim = dotProduct(embeddingA, embeddingB);
console.log(sim); // e.g., 0.87
Why use dot product over cosine similarity? It's slightly faster (no magnitude calculation needed) and for normalized vectors gives the same result. Most vector databases use dot product internally for this reason.
5. Comparing Similarity Metrics
| Metric | Range | "Similar" means | Sensitive to magnitude? | Speed | Best for |
|---|---|---|---|---|---|
| Cosine similarity | -1 to 1 | Higher is better | No | Medium | General-purpose, most embedding use cases |
| Euclidean distance | 0 to infinity | Lower is better | Yes | Medium | When magnitude matters (rare for embeddings) |
| Dot product | -infinity to infinity | Higher is better | Yes | Fast | Normalized vectors (equivalent to cosine) |
Decision tree
Which metric should I use?
Are your vectors normalized (length = 1)?
├── YES (OpenAI embeddings are normalized)
│ └── Use DOT PRODUCT (fastest, equivalent to cosine)
│
└── NO (custom embeddings, unnormalized)
├── Does magnitude carry meaning?
│ ├── YES → Use EUCLIDEAN DISTANCE or DOT PRODUCT
│ └── NO → Use COSINE SIMILARITY (ignores magnitude)
│
└── Not sure → Use COSINE SIMILARITY (safest default)
Side-by-side comparison
import OpenAI from 'openai';
const openai = new OpenAI();
async function compareMetrics(textA, textB) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: [textA, textB],
});
const vecA = response.data[0].embedding;
const vecB = response.data[1].embedding;
console.log(`\nComparing:\n A: "${textA}"\n B: "${textB}"\n`);
console.log(` Cosine similarity: ${cosineSimilarity(vecA, vecB).toFixed(4)}`);
console.log(` Dot product: ${dotProduct(vecA, vecB).toFixed(4)}`);
console.log(` Euclidean distance: ${euclideanDistance(vecA, vecB).toFixed(4)}`);
}
await compareMetrics(
'JavaScript is a programming language',
'TypeScript adds types to JavaScript'
);
// Cosine similarity: 0.8732
// Dot product: 0.8732 (same — vectors are normalized)
// Euclidean distance: 0.5038 (low = similar)
await compareMetrics(
'JavaScript is a programming language',
'The weather is sunny today'
);
// Cosine similarity: 0.1245
// Dot product: 0.1245
// Euclidean distance: 1.3224 (high = dissimilar)
6. Practical Similarity Search
Building a simple in-memory search engine
import OpenAI from 'openai';
const openai = new OpenAI();
class SimpleVectorSearch {
constructor() {
this.documents = []; // { text, embedding }
}
// Index documents
async addDocuments(texts) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
for (let i = 0; i < texts.length; i++) {
this.documents.push({
text: texts[i],
embedding: response.data[i].embedding,
});
}
console.log(`Indexed ${texts.length} documents (total: ${this.documents.length})`);
}
// Search by query
async search(query, topK = 3) {
// Step 1: Embed the query
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
const queryEmbedding = response.data[0].embedding;
// Step 2: Calculate similarity to every document
const results = this.documents.map(doc => ({
text: doc.text,
similarity: cosineSimilarity(queryEmbedding, doc.embedding),
}));
// Step 3: Sort by similarity (highest first) and return top-k
results.sort((a, b) => b.similarity - a.similarity);
return results.slice(0, topK);
}
}
// Usage
const search = new SimpleVectorSearch();
await search.addDocuments([
'React is a JavaScript library for building user interfaces',
'Vue.js is a progressive framework for building UIs',
'Express.js is a minimal web framework for Node.js',
'PostgreSQL is a powerful open-source relational database',
'MongoDB is a NoSQL document database',
'Redis is an in-memory data structure store',
'Docker containers package applications with dependencies',
'Kubernetes orchestrates container deployments',
'Git is a distributed version control system',
'CI/CD pipelines automate testing and deployment',
]);
const results = await search.search('frontend framework for building web apps');
console.log('\nSearch: "frontend framework for building web apps"\n');
results.forEach((r, i) => {
console.log(` ${i + 1}. [${r.similarity.toFixed(3)}] ${r.text}`);
});
// Search: "frontend framework for building web apps"
// 1. [0.891] React is a JavaScript library for building user interfaces
// 2. [0.872] Vue.js is a progressive framework for building UIs
// 3. [0.634] Express.js is a minimal web framework for Node.js
7. Semantic Search vs Keyword Search
This is the killer feature of embeddings. Keyword search matches exact words. Semantic search matches meaning.
Example Query: "How do I handle errors in my API?"
KEYWORD SEARCH (traditional):
Searches for exact words: "handle", "errors", "API"
✓ Matches: "Handle errors in your API with try-catch"
✗ Misses: "Exception management for REST endpoints" (no shared keywords!)
✗ Misses: "Error recovery strategies for web services"
SEMANTIC SEARCH (embeddings):
Searches for meaning: "dealing with problems in web service code"
✓ Matches: "Handle errors in your API with try-catch" (sim: 0.91)
✓ Matches: "Exception management for REST endpoints" (sim: 0.87)
✓ Matches: "Error recovery strategies for web services" (sim: 0.85)
✓ Matches: "Debugging failed HTTP requests" (sim: 0.79)
Detailed comparison
| Feature | Keyword Search | Semantic Search |
|---|---|---|
| Matches | Exact words/phrases | Meaning and concepts |
| "car" finds "automobile"? | No | Yes |
| Typo tolerance | Poor (unless fuzzy matching) | Good (embeddings are robust) |
| Speed | Very fast (inverted index) | Fast (approximate nearest neighbor) |
| Setup | Simple (text index) | Requires embedding model + vector DB |
| Cost | Free (no API calls) | Embedding API calls + vector storage |
| Precision on exact terms | Excellent | Good (may return loosely related results) |
| Multilingual | Needs separate indexes | Natural (concepts cross languages) |
When to use each
Use KEYWORD SEARCH when:
✓ Users search for exact product names, IDs, error codes
✓ You need exact phrase matching ("error 404")
✓ Cost is a major constraint
✓ The vocabulary is well-defined and consistent
Use SEMANTIC SEARCH when:
✓ Users describe what they want in natural language
✓ Synonyms and paraphrasing are common
✓ Documents use different terminology than queries
✓ You're building a RAG pipeline
Use BOTH (hybrid search) when:
✓ You want the precision of keywords + the recall of semantics
✓ Most production search systems use hybrid approaches
Hybrid search example
async function hybridSearch(query, documents, topK = 5) {
// Semantic search scores
const semanticResults = await semanticSearch(query, documents);
// Keyword search scores (simple TF-IDF approximation)
const keywordResults = keywordSearch(query, documents);
// Combine scores (weighted)
const alpha = 0.7; // Weight for semantic search
const combined = documents.map((doc, i) => ({
text: doc.text,
score: alpha * (semanticResults[i]?.similarity || 0) +
(1 - alpha) * (keywordResults[i]?.score || 0),
}));
combined.sort((a, b) => b.score - a.score);
return combined.slice(0, topK);
}
// Simple keyword scoring function
function keywordSearch(query, documents) {
const queryWords = query.toLowerCase().split(/\s+/);
return documents.map(doc => {
const docWords = doc.text.toLowerCase().split(/\s+/);
const matches = queryWords.filter(w => docWords.includes(w)).length;
return {
text: doc.text,
score: matches / queryWords.length, // 0 to 1
};
});
}
8. Similarity Thresholds: When Is "Similar Enough"?
Not every result from similarity search is actually relevant. You need a threshold to filter out noise.
Cosine Similarity Threshold Guidelines (text-embedding-3-small):
>= 0.90 │ Very high confidence — near-duplicate or paraphrase
0.80-0.90│ High confidence — clearly about the same topic
0.70-0.80│ Moderate confidence — related topic, probably relevant
0.60-0.70│ Low confidence — tangentially related
0.50-0.60│ Very low confidence — might be related by accident
< 0.50 │ Noise — probably not relevant at all
IMPORTANT: These thresholds vary by:
- Embedding model (different models, different scales)
- Domain (medical text vs casual conversation)
- Application (strict factual retrieval vs broad exploration)
- Document length (shorter texts tend to have lower similarity)
Always CALIBRATE thresholds on your specific data.
Implementing thresholds
async function searchWithThreshold(query, documents, {
topK = 5,
minSimilarity = 0.7,
maxResults = 10
} = {}) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
const queryVec = response.data[0].embedding;
const results = documents
.map(doc => ({
text: doc.text,
metadata: doc.metadata,
similarity: cosineSimilarity(queryVec, doc.embedding),
}))
.filter(r => r.similarity >= minSimilarity) // Apply threshold
.sort((a, b) => b.similarity - a.similarity)
.slice(0, Math.min(topK, maxResults));
if (results.length === 0) {
console.log(`No results above threshold ${minSimilarity}`);
return { results: [], belowThreshold: true };
}
return { results, belowThreshold: false };
}
// Usage with different thresholds for different scenarios
// Strict: factual lookup (only very relevant results)
const strictResults = await searchWithThreshold(query, docs, {
minSimilarity: 0.85,
topK: 3,
});
// Moderate: general search
const moderateResults = await searchWithThreshold(query, docs, {
minSimilarity: 0.70,
topK: 5,
});
// Loose: exploration (cast a wide net)
const looseResults = await searchWithThreshold(query, docs, {
minSimilarity: 0.50,
topK: 10,
});
Calibrating thresholds for your data
// Generate threshold calibration data
async function calibrateThreshold(testQueries) {
// testQueries = [{ query, relevantDocs: [...], irrelevantDocs: [...] }]
const allScores = { relevant: [], irrelevant: [] };
for (const test of testQueries) {
const queryEmb = await getEmbedding(test.query);
for (const doc of test.relevantDocs) {
const docEmb = await getEmbedding(doc);
allScores.relevant.push(cosineSimilarity(queryEmb, docEmb));
}
for (const doc of test.irrelevantDocs) {
const docEmb = await getEmbedding(doc);
allScores.irrelevant.push(cosineSimilarity(queryEmb, docEmb));
}
}
// Find the threshold that best separates relevant from irrelevant
const avgRelevant = average(allScores.relevant);
const avgIrrelevant = average(allScores.irrelevant);
const suggestedThreshold = (avgRelevant + avgIrrelevant) / 2;
console.log(`Average relevant similarity: ${avgRelevant.toFixed(3)}`);
console.log(`Average irrelevant similarity: ${avgIrrelevant.toFixed(3)}`);
console.log(`Suggested threshold: ${suggestedThreshold.toFixed(3)}`);
return suggestedThreshold;
}
function average(arr) {
return arr.reduce((a, b) => a + b, 0) / arr.length;
}
9. Performance: Brute-Force vs Approximate Nearest Neighbor
For small datasets (< 10,000 documents), comparing the query to every document (brute-force) is fine. For larger datasets, you need Approximate Nearest Neighbor (ANN) algorithms.
Search Performance:
Documents │ Brute Force │ ANN (HNSW) │ ANN Accuracy
──────────┼───────────────┼────────────────┼──────────────
1,000 │ < 1ms │ < 1ms │ 99%+
10,000 │ ~5ms │ < 1ms │ 99%+
100,000 │ ~50ms │ ~1-2ms │ 98%+
1,000,000 │ ~500ms │ ~2-5ms │ 95-98%
10,000,000│ ~5 seconds │ ~5-10ms │ 95%+
ANN algorithms trade a tiny bit of accuracy for MASSIVE speed gains.
"Approximate" means it might miss the #1 result but will almost
always find it in the top 5.
ANN algorithms (what vector databases use)
Common ANN Algorithms:
HNSW (Hierarchical Navigable Small World):
- Used by: Pinecone, Qdrant, Weaviate, pgvector
- How: Builds a graph of vectors, navigates it like a skip list
- Pros: Fast, accurate, good for dynamic data
- Cons: High memory usage
IVF (Inverted File Index):
- Used by: FAISS, some Pinecone configurations
- How: Clusters vectors, only searches nearby clusters
- Pros: Lower memory, good for very large datasets
- Cons: Requires training step, slightly less accurate
Product Quantization (PQ):
- Used by: FAISS (combined with IVF)
- How: Compresses vectors to reduce memory
- Pros: Very low memory
- Cons: Lower accuracy, complex setup
In-memory search for small datasets
// For datasets under ~50,000 documents, brute force works fine
class BruteForceSearch {
constructor() {
this.vectors = []; // Float32Array for efficiency
this.metadata = [];
}
add(embedding, meta) {
this.vectors.push(new Float32Array(embedding));
this.metadata.push(meta);
}
search(queryEmbedding, topK = 5) {
const queryVec = new Float32Array(queryEmbedding);
const scores = [];
for (let i = 0; i < this.vectors.length; i++) {
// Dot product for normalized vectors = cosine similarity
let score = 0;
for (let j = 0; j < queryVec.length; j++) {
score += queryVec[j] * this.vectors[i][j];
}
scores.push({ index: i, score, metadata: this.metadata[i] });
}
scores.sort((a, b) => b.score - a.score);
return scores.slice(0, topK);
}
}
10. Common Pitfalls
| Pitfall | Explanation | Solution |
|---|---|---|
| Mixing embedding models | Query embedded with model A, documents with model B → garbage results | Always use the same model for queries and documents |
| No threshold filtering | Returning results with 0.3 similarity → irrelevant noise | Set a minimum similarity threshold |
| Embedding too much text | Embedding a 50-page document as one vector → diluted meaning | Chunk documents (see 4.11.c) |
| Ignoring metadata | Returning semantically similar but contextually wrong results | Combine similarity with metadata filters (date, category, author) |
| Not normalizing | Using cosine similarity on unnormalized vectors → correct but slower | Normalize vectors at index time, use dot product |
| Stale embeddings | Documents updated but embeddings not re-generated | Build a re-embedding pipeline triggered by content updates |
11. Real-World Similarity Search Patterns
Pattern 1: Semantic FAQ lookup
// Find the most relevant FAQ for a user's question
async function findFAQ(userQuestion, faqs) {
const results = await search.search(userQuestion, 1);
if (results[0].similarity > 0.85) {
return {
found: true,
question: results[0].text,
answer: results[0].metadata.answer,
confidence: results[0].similarity,
};
}
return { found: false, message: 'No matching FAQ found' };
}
Pattern 2: Duplicate detection
// Check if a support ticket is a duplicate
async function isDuplicate(newTicket, existingTickets, threshold = 0.92) {
const newEmb = await getEmbedding(newTicket.description);
for (const ticket of existingTickets) {
const sim = cosineSimilarity(newEmb, ticket.embedding);
if (sim >= threshold) {
return {
isDuplicate: true,
duplicateOf: ticket.id,
similarity: sim,
};
}
}
return { isDuplicate: false };
}
Pattern 3: Content recommendation
// Recommend similar articles based on what the user just read
async function recommendSimilar(currentArticle, allArticles, topK = 5) {
const currentEmb = await getEmbedding(currentArticle.text);
const recommendations = allArticles
.filter(a => a.id !== currentArticle.id) // Exclude current
.map(article => ({
...article,
similarity: cosineSimilarity(currentEmb, article.embedding),
}))
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK);
return recommendations;
}
12. Key Takeaways
- Cosine similarity is the standard metric for embedding search — it measures angle between vectors, ranges from -1 to 1, higher is more similar.
- For normalized vectors (like OpenAI embeddings), dot product equals cosine similarity and is faster to compute.
- Semantic search finds meaning, not keywords — "fix a bug" matches "debugging techniques" even with zero shared words.
- Set similarity thresholds to filter noise — 0.7-0.8 is a typical starting point, but always calibrate on your data.
- Brute-force search works for up to ~50K documents — beyond that, use ANN algorithms via vector databases.
- Hybrid search (semantic + keyword) is the production best practice — you get the recall of embeddings with the precision of keywords.
- Never mix embedding models — the query and documents must be embedded with the same model.
Explain-It Challenge
- A product manager asks "why doesn't our search find results when users type 'how to fix broken code' but the document says 'debugging techniques'?" Explain how switching from keyword to semantic search solves this.
- Two documents have cosine similarity 0.65. Is this "similar enough" to show as a search result? What factors affect your decision?
- Your vector database has 50 million documents. Why can't you just loop through all of them to find the closest match? What do real systems do instead?
Navigation: ← 4.11.a — What Embeddings Represent · 4.11.c — Document Chunking Strategies →