Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11.a — MongoDB Indexing

Indexes are special data structures that store a small portion of the collection's data in an easy-to-traverse form, enabling MongoDB to find documents without scanning every document in a collection.

< README | 3.11.b — Indexing Best Practices >

1. What Are Indexes?

An index in MongoDB is a B-tree data structure that maps field values to the documents containing those values. Think of it like the index at the back of a textbook — instead of reading every page to find a topic, you look up the topic in the index and jump directly to the right page.

WITHOUT INDEX (Collection Scan)           WITH INDEX (Index Scan)
┌─────────────────────────────┐          ┌──────────────────┐
│ Scan doc 1... no match      │          │   B-Tree Index   │
│ Scan doc 2... no match      │          │                  │
│ Scan doc 3... no match      │          │     "alice"      │
│ Scan doc 4... MATCH!        │          │    /      \      │
│ Scan doc 5... no match      │          │ "bob"   "dave"   │
│ ... scan ALL N documents    │          │   |       |      │
│ Time: O(n)                  │          │  doc4    doc7    │
└─────────────────────────────┘          │ Time: O(log n)   │
                                         └──────────────────┘

Metric	Without Index	With Index
Scan type	`COLLSCAN` (collection scan)	`IXSCAN` (index scan)
Documents examined	All N documents	Only matching documents
Time complexity	O(n)	O(log n)
1M documents	~1000ms	~1ms

2. The Default _id Index

Every MongoDB collection automatically has an index on the _id field. This cannot be removed.

// This query always uses the _id index
db.users.findOne({ _id: ObjectId("507f1f77bcf86cd799439011") });
// IXSCAN on _id — fast regardless of collection size

3. Single-Field Indexes

The simplest index type — indexes a single field:

// MongoDB Shell
db.users.createIndex({ email: 1 });    // 1 = ascending order
db.users.createIndex({ age: -1 });     // -1 = descending order

// Mongoose schema definition
const userSchema = new mongoose.Schema({
  email: { type: String, index: true },           // Simple index
  username: { type: String, unique: true },        // Unique index
  createdAt: { type: Date, index: true },
});

// Or define indexes separately
userSchema.index({ email: 1 });

// These queries use the { email: 1 } index:
db.users.find({ email: "alice@example.com" });           // Equality
db.users.find({ email: { $regex: /^alice/ } });          // Prefix regex
db.users.find({}).sort({ email: 1 });                    // Sort ascending
db.users.find({}).sort({ email: -1 });                   // Sort descending (index works both ways for single field)

4. Compound Indexes

Indexes on multiple fields — field order matters significantly:

// Create compound index
db.orders.createIndex({ status: 1, createdAt: -1 });

// This index supports these queries:
db.orders.find({ status: "completed" });                              // Uses first field
db.orders.find({ status: "completed" }).sort({ createdAt: -1 });     // Uses both fields
db.orders.find({ status: "completed", createdAt: { $gte: date } });  // Uses both fields

// This query does NOT efficiently use this index:
db.orders.find({ createdAt: { $gte: date } });  // Skipping first field — may COLLSCAN
db.orders.find({}).sort({ createdAt: -1 });       // Cannot use — first field not filtered

Index Prefix Rule

A compound index supports queries on any prefix of the indexed fields:

// Index: { a: 1, b: 1, c: 1 }
// Supports queries on:
//   { a }           ✅ (prefix: a)
//   { a, b }        ✅ (prefix: a, b)
//   { a, b, c }     ✅ (full index)
//   { b }           ❌ (not a prefix)
//   { b, c }        ❌ (not a prefix)
//   { a, c }        ⚠️ (uses prefix {a}, then scans for c)

5. Text Indexes

Enable full-text search on string fields:

// Create text index
db.products.createIndex({ name: "text", description: "text" });

// Search for products
db.products.find({ $text: { $search: "wireless bluetooth headphones" } });

// Search with score — sort by relevance
db.products.find(
  { $text: { $search: "wireless headphones" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });

// Phrase search — exact phrase match
db.products.find({ $text: { $search: '"noise cancelling"' } });

// Exclude terms
db.products.find({ $text: { $search: "headphones -wired" } });

Limitations:

Only one text index per collection
Text indexes can be large (indexes every word)
For advanced search, consider Atlas Search or Elasticsearch

6. Wildcard Indexes

For flexible or dynamic schemas where field names are not known in advance:

// Index ALL fields in the document
db.collection.createIndex({ "$**": 1 });

// Index all fields under a specific path
db.products.createIndex({ "attributes.$**": 1 });

// Useful when documents have varying structures:
// { attributes: { color: "red", size: "L" } }
// { attributes: { weight: 500, material: "cotton" } }
// Wildcard index covers queries on ANY attribute field

Use sparingly — wildcard indexes can be very large and are not a substitute for targeted indexes.

7. Unique Indexes

Enforce uniqueness on a field:

// No two users can have the same email
db.users.createIndex({ email: 1 }, { unique: true });

// Compound unique — combination must be unique
db.ratings.createIndex({ userId: 1, productId: 1 }, { unique: true });
// Same user cannot rate the same product twice

// Mongoose
const userSchema = new mongoose.Schema({
  email: { type: String, unique: true, required: true },
});

Important: Unique indexes allow ONE document with a null/missing field. If multiple documents can have a missing field, combine with sparse or partialFilterExpression.

8. Sparse, Partial, and TTL Indexes

Sparse Indexes

Only index documents that have the indexed field:

// Only index documents where 'phone' exists
db.users.createIndex({ phone: 1 }, { sparse: true });
// Documents without 'phone' field are not in the index
// Saves space when many documents lack the field

Partial Indexes

Index only documents matching a filter — more flexible than sparse:

// Only index active users
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
);

// Only index orders over $100
db.orders.createIndex(
  { customerId: 1, createdAt: -1 },
  { partialFilterExpression: { total: { $gte: 100 } } }
);
// Smaller index, faster updates, only covers queries that match the filter

TTL (Time-To-Live) Indexes

Automatically delete documents after a specified time:

// Delete sessions after 24 hours
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });

// Delete at a specific time (set expiresAt in the document)
db.tokens.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 });
// Document: { token: "abc", expiresAt: ISODate("2025-01-01T00:00:00Z") }
// Deleted when current time passes expiresAt

9. Index Direction: 1 vs -1

For single-field indexes, direction does not matter — MongoDB can traverse in either direction.

For compound indexes, direction matters for sort operations:

// Index: { price: 1, rating: -1 }
// This means: sort by price ascending, then rating descending

// These sorts are supported:
db.products.find({}).sort({ price: 1, rating: -1 });  // ✅ Matches index
db.products.find({}).sort({ price: -1, rating: 1 });  // ✅ Reverse of index (OK)

// These sorts are NOT supported:
db.products.find({}).sort({ price: 1, rating: 1 });   // ❌ Mismatch
db.products.find({}).sort({ price: -1, rating: -1 }); // ❌ Mismatch

10. Covered Queries

A query is "covered" when the index contains ALL fields the query needs — MongoDB never reads the actual documents:

// Index: { email: 1, name: 1 }
// Covered query — only needs fields in the index
db.users.find(
  { email: "alice@example.com" },  // Filter: in index
  { email: 1, name: 1, _id: 0 }   // Projection: in index, _id excluded
);
// totalDocsExamined: 0 (never reads documents — fastest possible)

// NOT covered — needs 'age' which is not in the index
db.users.find(
  { email: "alice@example.com" },
  { email: 1, name: 1, age: 1, _id: 0 }  // 'age' requires document read
);

11. Index Intersection

MongoDB can sometimes combine multiple single-field indexes:

// Indexes: { status: 1 } and { createdAt: 1 }
// MongoDB MAY use both for:
db.orders.find({ status: "active", createdAt: { $gte: date } });
// But a compound index { status: 1, createdAt: 1 } is almost always faster

In practice: Do not rely on index intersection. Create compound indexes for common query patterns.

12. When NOT to Index

Scenario	Reason
Small collections (< 1000 docs)	Collection scan is fast enough
Write-heavy, read-light collections	Indexes slow down every write
Fields with low selectivity	`{ gender: 1 }` — only 2-3 values, index barely helps
Fields rarely used in queries	Unused index wastes RAM and slows writes
Rapidly changing fields	Index constantly rebuilt

13. Mongoose Index Definition

const productSchema = new mongoose.Schema({
  name: { type: String, required: true },
  price: { type: Number, required: true },
  category: { type: String, required: true },
  tags: [String],
  description: String,
  seller: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
  createdAt: { type: Date, default: Date.now },
});

// Single-field indexes
productSchema.index({ price: 1 });
productSchema.index({ category: 1 });

// Compound index
productSchema.index({ category: 1, price: -1 });

// Text index
productSchema.index({ name: 'text', description: 'text' });

// Unique compound index
productSchema.index({ seller: 1, name: 1 }, { unique: true });

// Partial index
productSchema.index(
  { price: 1 },
  { partialFilterExpression: { price: { $exists: true } } }
);

// TTL index
productSchema.index({ createdAt: 1 }, { expireAfterSeconds: 2592000 }); // 30 days

const Product = mongoose.model('Product', productSchema);

Auto-Index Behavior

// By default, Mongoose calls createIndex for each index in the schema
// when the model is first used. In production, disable auto-indexing:
mongoose.connect(uri, { autoIndex: false });

// Or per-schema:
const schema = new mongoose.Schema({...}, { autoIndex: false });

// Then create indexes manually during deployment:
// await Product.createIndexes();

Key Takeaways

Indexes trade write speed for read speed — every insert/update must update all indexes
Compound index field order matters — queries must use a prefix of the index fields
Use unique indexes to enforce data constraints at the database level
TTL indexes auto-delete expired documents (sessions, tokens, logs)
Covered queries are the fastest — index contains all needed fields
Do not over-index — unused indexes waste RAM and slow writes
Disable auto-indexing in production — create indexes during deployment

Explain-It Challenge

You are designing the database for an e-commerce platform with 10 million products. Users can search products by name, filter by category and price range, and sort by price or date. Each product belongs to a seller, and sellers can view their own products. Design a complete indexing strategy: list every index you would create, explain why, and identify which queries each index supports. Also explain which indexes you deliberately chose NOT to create and why.