Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11 — Interview Questions: Database Optimization

These 12 questions cover the most commonly asked MongoDB optimization topics in technical interviews — from beginner fundamentals to advanced aggregation patterns.


< Exercise Questions | Quick Revision >


Quick-Fire Table

#QuestionLevelKey Topic
1What is an index in MongoDB?BeginnerIndexing basics
2What is the difference between COLLSCAN and IXSCAN?Beginnerexplain()
3How does a compound index work?BeginnerIndex types
4What is the ESR rule?IntermediateIndex design
5What is a covered query?IntermediatePerformance
6How does the aggregation pipeline work?IntermediateAggregation
7Explain $lookup and its limitationsIntermediateJoins
8What is $facet and when would you use it?IntermediateAdvanced aggregation
9How do you handle the 100MB aggregation memory limit?IntermediatePerformance
10When should you NOT create an index?AdvancedIndex strategy
11How would you optimize a slow MongoDB query in production?AdvancedDebugging
12Compare $merge and $out. When do you use each?AdvancedAdvanced aggregation

Beginner Level

Q1. What is an index in MongoDB and why is it important?

Model Answer:

An index is a B-tree data structure that maps field values to the documents containing those values. It works like the index at the back of a textbook — instead of scanning every document in a collection (collection scan, O(n)), MongoDB can use the index to jump directly to matching documents (index scan, O(log n)).

Indexes are important because:

  • They dramatically speed up read queries — from O(n) to O(log n)
  • They enable efficient sorting without loading all documents into memory
  • Unique indexes enforce data constraints at the database level
  • TTL indexes automatically expire documents (sessions, tokens)

The tradeoff is that indexes slow down write operations because every insert, update, or delete must also update all indexes on the collection.

// Without index: scans all 1M documents (~1000ms)
// With index: B-tree lookup (~1ms)
db.users.createIndex({ email: 1 }, { unique: true });

Q2. What is the difference between COLLSCAN and IXSCAN?

Model Answer:

These are execution stages shown in explain() output:

  • COLLSCAN (Collection Scan): MongoDB reads every document in the collection to find matches. This is O(n) and very slow on large collections. It means no suitable index exists for the query.

  • IXSCAN (Index Scan): MongoDB uses a B-tree index to locate matching documents. This is O(log n) and significantly faster. Only the relevant index entries and matching documents are read.

// Check which scan a query uses:
db.users.find({ email: "alice@example.com" }).explain("executionStats");
// Look at executionStages.stage: "IXSCAN" or "COLLSCAN"

Key metrics to compare:

  • totalDocsExamined: COLLSCAN examines ALL docs; IXSCAN examines only matches
  • totalKeysExamined: 0 for COLLSCAN; close to nReturned for IXSCAN
  • executionTimeMillis: COLLSCAN is dramatically slower on large collections

Q3. How does a compound index work and what is the index prefix rule?

Model Answer:

A compound index indexes multiple fields in a specific order. The field order is crucial because a compound index supports queries on any prefix of its fields, but not on arbitrary subsets.

// Compound index
db.orders.createIndex({ status: 1, createdAt: -1, total: 1 });

// This index supports these queries (using prefixes):
db.orders.find({ status: "active" });                              // prefix: status
db.orders.find({ status: "active", createdAt: { $gte: date } });  // prefix: status, createdAt
db.orders.find({ status: "active" }).sort({ createdAt: -1 });     // prefix: status + sort on createdAt

// These queries CANNOT use the index efficiently:
db.orders.find({ createdAt: { $gte: date } });  // skips status (first field)
db.orders.find({ total: { $gt: 100 } });        // skips status and createdAt

Direction matters for compound indexes in sort operations. If the index is { a: 1, b: -1 }, it supports sorts { a: 1, b: -1 } and { a: -1, b: 1 } (exact reverse), but not { a: 1, b: 1 }.


Intermediate Level

Q4. What is the ESR rule and how do you apply it?

Model Answer:

The ESR rule defines the optimal field order for compound indexes:

  • E (Equality): Fields tested for exact match go first — they narrow results most effectively
  • S (Sort): Fields used in sort order go second — allows the index to provide sorted results without in-memory sort
  • R (Range): Fields with range conditions ($gte, $lte, $in) go last — range scans after equality and sort are most efficient
// Query:
db.orders.find({
  customerId: "abc123",                    // E - Equality
  total: { $gte: 50 }                     // R - Range
}).sort({ createdAt: -1 });               // S - Sort

// Optimal index (ESR order):
db.orders.createIndex({ customerId: 1, createdAt: -1, total: 1 });
//                        E                S                R

Putting range before sort forces MongoDB to do an in-memory sort, which is slower and limited to 100MB. ESR order avoids this.


Q5. What is a covered query and why is it the fastest type of query?

Model Answer:

A covered query is one where the index contains ALL fields needed by the query — both for filtering and for the returned projection. MongoDB answers the query entirely from the index without ever reading the actual documents from disk.

// Index: { email: 1, name: 1 }
// Covered query:
db.users.find(
  { email: "alice@example.com" },    // Filter field in index
  { email: 1, name: 1, _id: 0 }     // All projected fields in index, _id excluded
);
// totalDocsExamined: 0 — fastest possible

Key requirements:

  1. All query filter fields must be in the index
  2. All projected (returned) fields must be in the index
  3. The _id field must be explicitly excluded (since it is returned by default and may not be in the index)

In explain(), a covered query shows totalDocsExamined: 0 and uses PROJECTION_COVERED stage.


Q6. How does the aggregation pipeline work?

Model Answer:

The aggregation pipeline is an ordered array of stages that process documents sequentially. Each stage takes input documents, transforms them, and passes the results to the next stage — like a data assembly line.

db.orders.aggregate([
  { $match: { status: "completed" } },          // 1. Filter
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },  // 2. Group & sum
  { $sort: { total: -1 } },                     // 3. Sort
  { $limit: 10 }                                // 4. Top 10
]);

Key stages: $match (filter), $group (aggregate), $project/$addFields (reshape), $sort, $limit/$skip, $unwind (deconstruct arrays), $lookup (join collections), $facet (parallel pipelines).

Important performance rules:

  • Place $match first to use indexes and reduce documents
  • Aggregation has a 100MB per-stage memory limit (use allowDiskUse: true)
  • In Mongoose, aggregation bypasses schemas — you must manually cast ObjectIds

Q7. Explain $lookup and its limitations.

Model Answer:

$lookup performs a left outer join between two collections in the same database:

db.orders.aggregate([
  { $lookup: {
    from: "users",              // Collection to join
    localField: "customerId",   // Field in orders
    foreignField: "_id",        // Field in users
    as: "customer"              // Output array field
  }},
  { $unwind: "$customer" }     // Flatten the array to an object
]);

Limitations:

  1. Same database only — cannot join across databases
  2. No sharded from collection (in older versions; MongoDB 5.1+ supports it)
  3. Always returns an array — even for one-to-one relationships, you need $unwind
  4. Performance$lookup on large collections without indexes on foreignField is very slow
  5. Cannot use indexes inside $facet — if $lookup is inside $facet, it processes all input documents
  6. Memory — joining large datasets can exceed the 100MB stage limit

Best practice: Always have an index on the foreignField and filter with $match before $lookup.


Q8. What is $facet and when would you use it?

Model Answer:

$facet runs multiple aggregation sub-pipelines in parallel on the same set of input documents. Each sub-pipeline produces independent results stored as separate fields in a single output document.

db.products.aggregate([
  { $match: { category: "electronics" } },
  { $facet: {
    results: [{ $skip: 0 }, { $limit: 20 }],
    totalCount: [{ $count: "count" }],
    priceStats: [{ $group: { _id: null, avg: { $avg: "$price" } } }]
  }}
]);

Common use cases:

  1. Faceted search — return search results plus filter counts (like e-commerce sidebars)
  2. Dashboard queries — multiple analytics in one round trip
  3. Pagination metadata — return paginated data plus total count simultaneously

Key limitation: Sub-pipelines inside $facet cannot use indexes (only the $match before $facet can).


Q9. How do you handle the 100MB aggregation memory limit?

Model Answer:

Each aggregation pipeline stage has a 100MB memory limit by default. Options to handle large datasets:

  1. allowDiskUse: true — spill to disk when memory is exceeded (slower but works):
db.orders.aggregate([...], { allowDiskUse: true });
// Mongoose: Model.aggregate([...]).allowDiskUse(true);
  1. Reduce data early$match and $project at the start to minimize document size
  2. Avoid $push of $$ROOT — collecting entire documents into arrays is memory-intensive
  3. Use $merge/$out — materialize intermediate results into collections for multi-step processing
  4. Limit $unwind — unwinding large arrays multiplies document count dramatically

Advanced Level

Q10. When should you NOT create an index?

Model Answer:

Indexes are not always beneficial. Avoid indexing when:

  1. Small collections (< 1000 documents) — collection scan is fast enough; index overhead is unnecessary
  2. Write-heavy, read-light workloads — every write must update ALL indexes, so many indexes significantly slow inserts and updates
  3. Low-selectivity fields — fields like gender (2-3 values) or isActive (true/false) barely narrow searches; the index is barely more efficient than a collection scan
  4. Rarely queried fields — unused indexes waste RAM and slow writes
  5. Very large documents with embedded arrays — multikey indexes on large arrays can be enormous
  6. Rapidly changing fields — index constantly needs rebuilding

Strategy: Use $indexStats to identify unused indexes and remove them. Monitor index size with db.collection.stats().indexSizes. Indexes should fit in RAM.


Q11. How would you optimize a slow MongoDB query in production?

Model Answer:

Step-by-step approach:

  1. Identify the slow query:

    • Enable profiler: db.setProfilingLevel(1, { slowms: 100 })
    • Check: db.system.profile.find().sort({ ts: -1 }).limit(5)
  2. Analyze with explain():

    db.collection.find({...}).explain("executionStats");
    
    • Check for COLLSCAN (bad) vs IXSCAN (good)
    • Compare totalDocsExamined to nReturned
    • Look for in-memory SORT stages
  3. Design the index:

    • Apply ESR rule: Equality, Sort, Range
    • Put high-selectivity fields first
    • Consider if a covered query is possible
  4. Create the index:

    db.collection.createIndex({...}, { background: true });
    
    • In production, create during low-traffic periods
    • Background build avoids blocking other operations
  5. Verify:

    • Run explain() again — confirm IXSCAN
    • Compare execution time before and after
  6. Monitor:

    • Check $indexStats to confirm the index is being used
    • Review index sizes relative to available RAM

Q12. Compare $merge and $out. When do you use each?

Model Answer:

Both write aggregation results to a collection, but with different behaviors:

Feature$out$merge
Target collectionReplaces entirelyUpserts/merges
If target existsDrops and recreatesUpdates matching docs
If doc matchesN/A (full replace)merge/replace/keepExisting/fail
If doc doesn't matchN/Ainsert/discard/fail
Same database only?YesCan write to different database
AtomicityAtomic replacementPer-document upsert

Use $out when:

  • You want to completely replace a materialized view
  • Building a new collection from scratch (e.g., full ETL rebuild)

Use $merge when:

  • Incrementally updating a stats/reports collection
  • Upsert behavior is needed (update existing, insert new)
  • You need to preserve existing data not affected by the pipeline
  • Writing to a collection in a different database
// $merge — update customer stats incrementally
{ $merge: { into: "customer_stats", on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }

// $out — full rebuild of monthly reports
{ $out: "monthly_reports" }