Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11.a — MongoDB Indexing

Indexes are special data structures that store a small portion of the collection's data in an easy-to-traverse form, enabling MongoDB to find documents without scanning every document in a collection.


< README | 3.11.b — Indexing Best Practices >


1. What Are Indexes?

An index in MongoDB is a B-tree data structure that maps field values to the documents containing those values. Think of it like the index at the back of a textbook — instead of reading every page to find a topic, you look up the topic in the index and jump directly to the right page.

WITHOUT INDEX (Collection Scan)           WITH INDEX (Index Scan)
┌─────────────────────────────┐          ┌──────────────────┐
│ Scan doc 1... no match      │          │   B-Tree Index   │
│ Scan doc 2... no match      │          │                  │
│ Scan doc 3... no match      │          │     "alice"      │
│ Scan doc 4... MATCH!        │          │    /      \      │
│ Scan doc 5... no match      │          │ "bob"   "dave"   │
│ ... scan ALL N documents    │          │   |       |      │
│ Time: O(n)                  │          │  doc4    doc7    │
└─────────────────────────────┘          │ Time: O(log n)   │
                                         └──────────────────┘
MetricWithout IndexWith Index
Scan typeCOLLSCAN (collection scan)IXSCAN (index scan)
Documents examinedAll N documentsOnly matching documents
Time complexityO(n)O(log n)
1M documents~1000ms~1ms

2. The Default _id Index

Every MongoDB collection automatically has an index on the _id field. This cannot be removed.

// This query always uses the _id index
db.users.findOne({ _id: ObjectId("507f1f77bcf86cd799439011") });
// IXSCAN on _id — fast regardless of collection size

3. Single-Field Indexes

The simplest index type — indexes a single field:

// MongoDB Shell
db.users.createIndex({ email: 1 });    // 1 = ascending order
db.users.createIndex({ age: -1 });     // -1 = descending order

// Mongoose schema definition
const userSchema = new mongoose.Schema({
  email: { type: String, index: true },           // Simple index
  username: { type: String, unique: true },        // Unique index
  createdAt: { type: Date, index: true },
});

// Or define indexes separately
userSchema.index({ email: 1 });
// These queries use the { email: 1 } index:
db.users.find({ email: "alice@example.com" });           // Equality
db.users.find({ email: { $regex: /^alice/ } });          // Prefix regex
db.users.find({}).sort({ email: 1 });                    // Sort ascending
db.users.find({}).sort({ email: -1 });                   // Sort descending (index works both ways for single field)

4. Compound Indexes

Indexes on multiple fields — field order matters significantly:

// Create compound index
db.orders.createIndex({ status: 1, createdAt: -1 });

// This index supports these queries:
db.orders.find({ status: "completed" });                              // Uses first field
db.orders.find({ status: "completed" }).sort({ createdAt: -1 });     // Uses both fields
db.orders.find({ status: "completed", createdAt: { $gte: date } });  // Uses both fields

// This query does NOT efficiently use this index:
db.orders.find({ createdAt: { $gte: date } });  // Skipping first field — may COLLSCAN
db.orders.find({}).sort({ createdAt: -1 });       // Cannot use — first field not filtered

Index Prefix Rule

A compound index supports queries on any prefix of the indexed fields:

// Index: { a: 1, b: 1, c: 1 }
// Supports queries on:
//   { a }           ✅ (prefix: a)
//   { a, b }        ✅ (prefix: a, b)
//   { a, b, c }     ✅ (full index)
//   { b }           ❌ (not a prefix)
//   { b, c }        ❌ (not a prefix)
//   { a, c }        ⚠️ (uses prefix {a}, then scans for c)

5. Text Indexes

Enable full-text search on string fields:

// Create text index
db.products.createIndex({ name: "text", description: "text" });

// Search for products
db.products.find({ $text: { $search: "wireless bluetooth headphones" } });

// Search with score — sort by relevance
db.products.find(
  { $text: { $search: "wireless headphones" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });

// Phrase search — exact phrase match
db.products.find({ $text: { $search: '"noise cancelling"' } });

// Exclude terms
db.products.find({ $text: { $search: "headphones -wired" } });

Limitations:

  • Only one text index per collection
  • Text indexes can be large (indexes every word)
  • For advanced search, consider Atlas Search or Elasticsearch

6. Wildcard Indexes

For flexible or dynamic schemas where field names are not known in advance:

// Index ALL fields in the document
db.collection.createIndex({ "$**": 1 });

// Index all fields under a specific path
db.products.createIndex({ "attributes.$**": 1 });

// Useful when documents have varying structures:
// { attributes: { color: "red", size: "L" } }
// { attributes: { weight: 500, material: "cotton" } }
// Wildcard index covers queries on ANY attribute field

Use sparingly — wildcard indexes can be very large and are not a substitute for targeted indexes.


7. Unique Indexes

Enforce uniqueness on a field:

// No two users can have the same email
db.users.createIndex({ email: 1 }, { unique: true });

// Compound unique — combination must be unique
db.ratings.createIndex({ userId: 1, productId: 1 }, { unique: true });
// Same user cannot rate the same product twice

// Mongoose
const userSchema = new mongoose.Schema({
  email: { type: String, unique: true, required: true },
});

Important: Unique indexes allow ONE document with a null/missing field. If multiple documents can have a missing field, combine with sparse or partialFilterExpression.


8. Sparse, Partial, and TTL Indexes

Sparse Indexes

Only index documents that have the indexed field:

// Only index documents where 'phone' exists
db.users.createIndex({ phone: 1 }, { sparse: true });
// Documents without 'phone' field are not in the index
// Saves space when many documents lack the field

Partial Indexes

Index only documents matching a filter — more flexible than sparse:

// Only index active users
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { isActive: true } }
);

// Only index orders over $100
db.orders.createIndex(
  { customerId: 1, createdAt: -1 },
  { partialFilterExpression: { total: { $gte: 100 } } }
);
// Smaller index, faster updates, only covers queries that match the filter

TTL (Time-To-Live) Indexes

Automatically delete documents after a specified time:

// Delete sessions after 24 hours
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 });

// Delete at a specific time (set expiresAt in the document)
db.tokens.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 });
// Document: { token: "abc", expiresAt: ISODate("2025-01-01T00:00:00Z") }
// Deleted when current time passes expiresAt

9. Index Direction: 1 vs -1

For single-field indexes, direction does not matter — MongoDB can traverse in either direction.

For compound indexes, direction matters for sort operations:

// Index: { price: 1, rating: -1 }
// This means: sort by price ascending, then rating descending

// These sorts are supported:
db.products.find({}).sort({ price: 1, rating: -1 });  // ✅ Matches index
db.products.find({}).sort({ price: -1, rating: 1 });  // ✅ Reverse of index (OK)

// These sorts are NOT supported:
db.products.find({}).sort({ price: 1, rating: 1 });   // ❌ Mismatch
db.products.find({}).sort({ price: -1, rating: -1 }); // ❌ Mismatch

10. Covered Queries

A query is "covered" when the index contains ALL fields the query needs — MongoDB never reads the actual documents:

// Index: { email: 1, name: 1 }
// Covered query — only needs fields in the index
db.users.find(
  { email: "alice@example.com" },  // Filter: in index
  { email: 1, name: 1, _id: 0 }   // Projection: in index, _id excluded
);
// totalDocsExamined: 0 (never reads documents — fastest possible)

// NOT covered — needs 'age' which is not in the index
db.users.find(
  { email: "alice@example.com" },
  { email: 1, name: 1, age: 1, _id: 0 }  // 'age' requires document read
);

11. Index Intersection

MongoDB can sometimes combine multiple single-field indexes:

// Indexes: { status: 1 } and { createdAt: 1 }
// MongoDB MAY use both for:
db.orders.find({ status: "active", createdAt: { $gte: date } });
// But a compound index { status: 1, createdAt: 1 } is almost always faster

In practice: Do not rely on index intersection. Create compound indexes for common query patterns.


12. When NOT to Index

ScenarioReason
Small collections (< 1000 docs)Collection scan is fast enough
Write-heavy, read-light collectionsIndexes slow down every write
Fields with low selectivity{ gender: 1 } — only 2-3 values, index barely helps
Fields rarely used in queriesUnused index wastes RAM and slows writes
Rapidly changing fieldsIndex constantly rebuilt

13. Mongoose Index Definition

const productSchema = new mongoose.Schema({
  name: { type: String, required: true },
  price: { type: Number, required: true },
  category: { type: String, required: true },
  tags: [String],
  description: String,
  seller: { type: mongoose.Schema.Types.ObjectId, ref: 'User' },
  createdAt: { type: Date, default: Date.now },
});

// Single-field indexes
productSchema.index({ price: 1 });
productSchema.index({ category: 1 });

// Compound index
productSchema.index({ category: 1, price: -1 });

// Text index
productSchema.index({ name: 'text', description: 'text' });

// Unique compound index
productSchema.index({ seller: 1, name: 1 }, { unique: true });

// Partial index
productSchema.index(
  { price: 1 },
  { partialFilterExpression: { price: { $exists: true } } }
);

// TTL index
productSchema.index({ createdAt: 1 }, { expireAfterSeconds: 2592000 }); // 30 days

const Product = mongoose.model('Product', productSchema);

Auto-Index Behavior

// By default, Mongoose calls createIndex for each index in the schema
// when the model is first used. In production, disable auto-indexing:
mongoose.connect(uri, { autoIndex: false });

// Or per-schema:
const schema = new mongoose.Schema({...}, { autoIndex: false });

// Then create indexes manually during deployment:
// await Product.createIndexes();

Key Takeaways

  1. Indexes trade write speed for read speed — every insert/update must update all indexes
  2. Compound index field order matters — queries must use a prefix of the index fields
  3. Use unique indexes to enforce data constraints at the database level
  4. TTL indexes auto-delete expired documents (sessions, tokens, logs)
  5. Covered queries are the fastest — index contains all needed fields
  6. Do not over-index — unused indexes waste RAM and slow writes
  7. Disable auto-indexing in production — create indexes during deployment

Explain-It Challenge

You are designing the database for an e-commerce platform with 10 million products. Users can search products by name, filter by category and price range, and sort by price or date. Each product belongs to a seller, and sellers can view their own products. Design a complete indexing strategy: list every index you would create, explain why, and identify which queries each index supports. Also explain which indexes you deliberately chose NOT to create and why.