Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11.b — Indexing Best Practices

Creating indexes is only half the battle. Knowing how to analyze query performance with explain(), follow the ESR rule, and monitor index usage separates good database design from great database design.

< 3.11.a — MongoDB Indexing | 3.11.c — Aggregation Pipeline >

1. explain() — Analyzing Query Performance

The explain() method reveals exactly how MongoDB executes a query:

// Basic explain — shows the query plan
db.users.find({ email: "alice@example.com" }).explain();

// Execution stats — shows actual performance numbers
db.users.find({ email: "alice@example.com" }).explain("executionStats");

// All plans — shows all candidate plans MongoDB considered
db.users.find({ email: "alice@example.com" }).explain("allPlansExecution");

Mongoose explain()

// Mongoose — append .explain() to any query
const result = await User.find({ email: "alice@example.com" })
  .explain("executionStats");

console.log(result[0].executionStats);

2. Reading explain() Output

const stats = db.orders.find({ status: "completed", total: { $gte: 100 } })
  .sort({ createdAt: -1 })
  .explain("executionStats")
  .executionStats;

Key Fields to Check

Field	Meaning	Good Value
`executionTimeMillis`	Total query time in ms	As low as possible
`totalKeysExamined`	Index entries scanned	Close to `nReturned`
`totalDocsExamined`	Documents read from disk	Close to `nReturned`
`nReturned`	Documents returned to client	Expected result count
`stage`	Execution stage type	`IXSCAN` not `COLLSCAN`

Ideal Ratios

totalKeysExamined / nReturned  ≈  1    (index is selective)
totalDocsExamined / nReturned  ≈  1    (no wasted reads)
totalDocsExamined              =  0    (covered query — best case)

Example Analysis

// BAD — collection scan
{
  "executionStats": {
    "nReturned": 15,
    "executionTimeMillis": 850,
    "totalKeysExamined": 0,        // No index used
    "totalDocsExamined": 1000000,  // Scanned ALL documents
    "executionStages": {
      "stage": "COLLSCAN"          // Collection scan — worst case
    }
  }
}

// GOOD — index scan
{
  "executionStats": {
    "nReturned": 15,
    "executionTimeMillis": 2,
    "totalKeysExamined": 15,       // Examined only 15 index entries
    "totalDocsExamined": 15,       // Read only 15 documents
    "executionStages": {
      "stage": "IXSCAN",           // Index scan — fast
      "indexName": "status_1_createdAt_-1"
    }
  }
}

3. IXSCAN vs COLLSCAN

COLLSCAN (Collection Scan)            IXSCAN (Index Scan)
┌──────────────────────┐              ┌──────────────────────┐
│ Read doc 1           │              │ B-Tree lookup        │
│ Read doc 2           │              │   → doc 45           │
│ Read doc 3           │              │   → doc 891          │
│ ...                  │              │   → doc 2340         │
│ Read doc 1,000,000   │              │ Done (3 docs)        │
│ Return 3 matches     │              │                      │
│ Time: 850ms          │              │ Time: 2ms            │
└──────────────────────┘              └──────────────────────┘

Common execution stages:

Stage	Description	Performance
`COLLSCAN`	Scans every document	Slowest
`IXSCAN`	Uses an index	Fast
`FETCH`	Retrieves documents after index scan	Normal
`SORT`	In-memory sort (no index for sort)	Can be slow
`SORT_KEY_GENERATOR`	Generates sort keys	Normal
`PROJECTION_COVERED`	Covered query — no FETCH needed	Fastest
`COUNT_SCAN`	Counting using index	Fast

Red flags in explain output:

COLLSCAN on large collections
SORT stage with large memUsage (in-memory sort)
totalDocsExamined >> nReturned

4. The ESR Rule: Equality, Sort, Range

The ESR rule dictates the optimal field order for compound indexes:

┌───────────────────────────────────────────────────┐
│  E — Equality fields first     (exact match)      │
│  S — Sort fields second        (ordering)         │
│  R — Range fields last         ($gte, $lte, $in)  │
└───────────────────────────────────────────────────┘

Example

// Query: Find completed orders from the last 30 days, sorted by amount
db.orders.find({
  status: "completed",                     // E — Equality
  createdAt: { $gte: thirtyDaysAgo },      // R — Range
}).sort({ amount: -1 });                   // S — Sort

// OPTIMAL index following ESR:
db.orders.createIndex({ status: 1, amount: -1, createdAt: -1 });
//                       E            S              R

// WHY this order?
// 1. Equality (status) narrows down to exact matches first — most selective
// 2. Sort (amount) — index is already sorted, no in-memory sort needed
// 3. Range (createdAt) — scans a range within the already-narrowed results

ESR Comparison

// Query: status = "active", sort by price, date >= lastWeek

// ❌ BAD: Range before Sort
db.products.createIndex({ status: 1, createdAt: -1, price: 1 });
// MongoDB can't use index for sort after a range scan
// Results in in-memory SORT stage

// ✅ GOOD: ESR order
db.products.createIndex({ status: 1, price: 1, createdAt: -1 });
// E: status (equality) → S: price (sort) → R: createdAt (range)
// No in-memory sort needed

5. Selectivity

Selectivity measures how many documents an index value matches. High selectivity = fewer matches = faster:

HIGH SELECTIVITY (Good for indexing)     LOW SELECTIVITY (Bad for indexing)
┌──────────────────────────────┐         ┌──────────────────────────────┐
│ email: unique per user       │         │ gender: only 2-3 values      │
│ username: unique per user    │         │ isActive: true/false         │
│ orderId: unique per order    │         │ country: ~200 values for     │
│ phone: nearly unique         │         │   millions of records        │
└──────────────────────────────┘         └──────────────────────────────┘

Rule: Put high-selectivity fields first in compound indexes:

// BAD: Low selectivity first
db.users.createIndex({ isActive: 1, email: 1 });
// isActive: true matches 90% of users — barely narrows the search

// GOOD: High selectivity first
db.users.createIndex({ email: 1, isActive: 1 });
// email matches 1 user — immediately narrows to a single document

6. Monitoring Index Usage

$indexStats

// See how often each index is used
db.users.aggregate([{ $indexStats: {} }]);

// Output:
[
  {
    "name": "email_1",
    "accesses": {
      "ops": 145230,           // Times this index was used
      "since": ISODate(...)    // Since server start or index creation
    }
  },
  {
    "name": "oldField_1",
    "accesses": {
      "ops": 0,                // NEVER used — candidate for removal
      "since": ISODate(...)
    }
  }
]

List All Indexes

// List all indexes on a collection
db.users.getIndexes();

// Mongoose
const indexes = await User.collection.getIndexes();
console.log(indexes);

7. Removing Unused Indexes

// Drop a specific index
db.users.dropIndex("oldField_1");
db.users.dropIndex({ oldField: 1 });

// Drop all indexes (except _id)
db.users.dropIndexes();

// Mongoose
await User.collection.dropIndex("oldField_1");

Before removing: Check $indexStats to confirm the index has zero or very low usage. Also check if the index is used by background jobs or batch processes that run infrequently.

8. Index Size Considerations

Indexes must fit in RAM for optimal performance:

// Check index sizes
db.users.stats().indexSizes;
// { "_id_": 2457600, "email_1": 1843200, "name_text": 5242880 }

// Total index size
db.users.totalIndexSize();
// 9543680 (bytes)

// Check total database stats
db.stats();

Guidelines:

Total index size should fit in available RAM
If indexes exceed RAM, MongoDB pages to disk (very slow)
Text indexes are particularly large
Monitor index size as data grows

9. hint() — Forcing Index Usage

Sometimes MongoDB's query planner picks the wrong index. Use hint() to force a specific index:

// Force a specific index
db.orders.find({ status: "active", customerId: id })
  .hint({ status: 1, createdAt: -1 })
  .explain("executionStats");

// Force collection scan (for testing/comparison)
db.orders.find({ status: "active" })
  .hint({ $natural: 1 });

// Mongoose
const orders = await Order.find({ status: "active" })
  .hint({ status: 1, createdAt: -1 });

Use hint() sparingly — MongoDB's query planner is usually correct. Use it primarily for testing or when you've verified the planner makes a suboptimal choice.

10. Atlas Performance Advisor

If using MongoDB Atlas (cloud):

Performance Advisor automatically suggests indexes based on slow queries
Real-Time Performance Panel shows current operations and their performance
Query Profiler logs slow queries (similar to setting profile level)
Access via Atlas dashboard: Cluster > Performance Advisor

Atlas Performance Advisor Suggestions:
┌─────────────────────────────────────────────────────────┐
│ Suggested Index: { customerId: 1, status: 1 }           │
│ Impact: Improves 342 queries/hour                       │
│ Avg improvement: 450ms → 3ms                            │
│ [Create Index] [Dismiss]                                │
└─────────────────────────────────────────────────────────┘

11. Practical Workflow: Optimizing a Slow Query

Step 1: Identify slow queries
  → Enable profiling: db.setProfilingLevel(1, { slowms: 100 })
  → Check: db.system.profile.find().sort({ ts: -1 })

Step 2: Analyze with explain()
  → Run the query with .explain("executionStats")
  → Check for COLLSCAN, high totalDocsExamined

Step 3: Design the index
  → Apply ESR rule: Equality → Sort → Range
  → Put high-selectivity fields first
  → Consider covered queries

Step 4: Create the index
  → db.collection.createIndex({...}, { background: true })
  → In production, create indexes during low-traffic periods

Step 5: Verify improvement
  → Run explain() again — confirm IXSCAN
  → Compare totalDocsExamined and executionTimeMillis

Step 6: Monitor ongoing
  → Check $indexStats periodically
  → Remove unused indexes

Key Takeaways

Always use explain("executionStats") to verify indexes are working
COLLSCAN on large collections is always a problem — add appropriate indexes
ESR rule — Equality, Sort, Range — determines optimal compound index field order
High selectivity fields first — fields with many unique values narrow results fastest
Monitor with $indexStats — remove unused indexes to save RAM and write performance
hint() forces a specific index but should be used sparingly
Indexes must fit in RAM — monitor sizes as data grows

Explain-It Challenge

You receive a bug report: "The orders page takes 15 seconds to load." The page shows a user's recent orders filtered by status, sorted by date. You run explain() and see a COLLSCAN with totalDocsExamined: 5,000,000. Walk through your complete debugging and optimization process: what indexes you would create and why, how you would verify the improvement, and what ongoing monitoring you would set up.