Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11 — Interview Questions: Database Optimization

These 12 questions cover the most commonly asked MongoDB optimization topics in technical interviews — from beginner fundamentals to advanced aggregation patterns.

< Exercise Questions | Quick Revision >

Quick-Fire Table

#	Question	Level	Key Topic
1	What is an index in MongoDB?	Beginner	Indexing basics
2	What is the difference between COLLSCAN and IXSCAN?	Beginner	explain()
3	How does a compound index work?	Beginner	Index types
4	What is the ESR rule?	Intermediate	Index design
5	What is a covered query?	Intermediate	Performance
6	How does the aggregation pipeline work?	Intermediate	Aggregation
7	Explain $lookup and its limitations	Intermediate	Joins
8	What is $facet and when would you use it?	Intermediate	Advanced aggregation
9	How do you handle the 100MB aggregation memory limit?	Intermediate	Performance
10	When should you NOT create an index?	Advanced	Index strategy
11	How would you optimize a slow MongoDB query in production?	Advanced	Debugging
12	Compare $merge and $out. When do you use each?	Advanced	Advanced aggregation

Beginner Level

Q1. What is an index in MongoDB and why is it important?

Model Answer:

An index is a B-tree data structure that maps field values to the documents containing those values. It works like the index at the back of a textbook — instead of scanning every document in a collection (collection scan, O(n)), MongoDB can use the index to jump directly to matching documents (index scan, O(log n)).

Indexes are important because:

They dramatically speed up read queries — from O(n) to O(log n)
They enable efficient sorting without loading all documents into memory
Unique indexes enforce data constraints at the database level
TTL indexes automatically expire documents (sessions, tokens)

The tradeoff is that indexes slow down write operations because every insert, update, or delete must also update all indexes on the collection.

// Without index: scans all 1M documents (~1000ms)
// With index: B-tree lookup (~1ms)
db.users.createIndex({ email: 1 }, { unique: true });

Q2. What is the difference between COLLSCAN and IXSCAN?

Model Answer:

These are execution stages shown in explain() output:

COLLSCAN (Collection Scan): MongoDB reads every document in the collection to find matches. This is O(n) and very slow on large collections. It means no suitable index exists for the query.
IXSCAN (Index Scan): MongoDB uses a B-tree index to locate matching documents. This is O(log n) and significantly faster. Only the relevant index entries and matching documents are read.

// Check which scan a query uses:
db.users.find({ email: "alice@example.com" }).explain("executionStats");
// Look at executionStages.stage: "IXSCAN" or "COLLSCAN"

Key metrics to compare:

totalDocsExamined: COLLSCAN examines ALL docs; IXSCAN examines only matches
totalKeysExamined: 0 for COLLSCAN; close to nReturned for IXSCAN
executionTimeMillis: COLLSCAN is dramatically slower on large collections

Q3. How does a compound index work and what is the index prefix rule?

Model Answer:

A compound index indexes multiple fields in a specific order. The field order is crucial because a compound index supports queries on any prefix of its fields, but not on arbitrary subsets.

// Compound index
db.orders.createIndex({ status: 1, createdAt: -1, total: 1 });

// This index supports these queries (using prefixes):
db.orders.find({ status: "active" });                              // prefix: status
db.orders.find({ status: "active", createdAt: { $gte: date } });  // prefix: status, createdAt
db.orders.find({ status: "active" }).sort({ createdAt: -1 });     // prefix: status + sort on createdAt

// These queries CANNOT use the index efficiently:
db.orders.find({ createdAt: { $gte: date } });  // skips status (first field)
db.orders.find({ total: { $gt: 100 } });        // skips status and createdAt

Direction matters for compound indexes in sort operations. If the index is { a: 1, b: -1 }, it supports sorts { a: 1, b: -1 } and { a: -1, b: 1 } (exact reverse), but not { a: 1, b: 1 }.

Intermediate Level

Q4. What is the ESR rule and how do you apply it?

Model Answer:

The ESR rule defines the optimal field order for compound indexes:

E (Equality): Fields tested for exact match go first — they narrow results most effectively
S (Sort): Fields used in sort order go second — allows the index to provide sorted results without in-memory sort
R (Range): Fields with range conditions ($gte, $lte, $in) go last — range scans after equality and sort are most efficient

// Query:
db.orders.find({
  customerId: "abc123",                    // E - Equality
  total: { $gte: 50 }                     // R - Range
}).sort({ createdAt: -1 });               // S - Sort

// Optimal index (ESR order):
db.orders.createIndex({ customerId: 1, createdAt: -1, total: 1 });
//                        E                S                R

Putting range before sort forces MongoDB to do an in-memory sort, which is slower and limited to 100MB. ESR order avoids this.

Q5. What is a covered query and why is it the fastest type of query?

Model Answer:

A covered query is one where the index contains ALL fields needed by the query — both for filtering and for the returned projection. MongoDB answers the query entirely from the index without ever reading the actual documents from disk.

// Index: { email: 1, name: 1 }
// Covered query:
db.users.find(
  { email: "alice@example.com" },    // Filter field in index
  { email: 1, name: 1, _id: 0 }     // All projected fields in index, _id excluded
);
// totalDocsExamined: 0 — fastest possible

Key requirements:

All query filter fields must be in the index
All projected (returned) fields must be in the index
The _id field must be explicitly excluded (since it is returned by default and may not be in the index)

In explain(), a covered query shows totalDocsExamined: 0 and uses PROJECTION_COVERED stage.

Q6. How does the aggregation pipeline work?

Model Answer:

The aggregation pipeline is an ordered array of stages that process documents sequentially. Each stage takes input documents, transforms them, and passes the results to the next stage — like a data assembly line.

db.orders.aggregate([
  { $match: { status: "completed" } },          // 1. Filter
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },  // 2. Group & sum
  { $sort: { total: -1 } },                     // 3. Sort
  { $limit: 10 }                                // 4. Top 10
]);

Key stages: $match (filter), $group (aggregate), $project/$addFields (reshape), $sort, $limit/$skip, $unwind (deconstruct arrays), $lookup (join collections), $facet (parallel pipelines).

Important performance rules:

Place $match first to use indexes and reduce documents
Aggregation has a 100MB per-stage memory limit (use allowDiskUse: true)
In Mongoose, aggregation bypasses schemas — you must manually cast ObjectIds

Q7. Explain $lookup and its limitations.

Model Answer:

$lookup performs a left outer join between two collections in the same database:

db.orders.aggregate([
  { $lookup: {
    from: "users",              // Collection to join
    localField: "customerId",   // Field in orders
    foreignField: "_id",        // Field in users
    as: "customer"              // Output array field
  }},
  { $unwind: "$customer" }     // Flatten the array to an object
]);

Limitations:

Same database only — cannot join across databases
No sharded from collection (in older versions; MongoDB 5.1+ supports it)
Always returns an array — even for one-to-one relationships, you need $unwind
Performance — $lookup on large collections without indexes on foreignField is very slow
Cannot use indexes inside $facet — if $lookup is inside $facet, it processes all input documents
Memory — joining large datasets can exceed the 100MB stage limit

Best practice: Always have an index on the foreignField and filter with $match before $lookup.

Q8. What is $facet and when would you use it?

Model Answer:

$facet runs multiple aggregation sub-pipelines in parallel on the same set of input documents. Each sub-pipeline produces independent results stored as separate fields in a single output document.

db.products.aggregate([
  { $match: { category: "electronics" } },
  { $facet: {
    results: [{ $skip: 0 }, { $limit: 20 }],
    totalCount: [{ $count: "count" }],
    priceStats: [{ $group: { _id: null, avg: { $avg: "$price" } } }]
  }}
]);

Common use cases:

Faceted search — return search results plus filter counts (like e-commerce sidebars)
Dashboard queries — multiple analytics in one round trip
Pagination metadata — return paginated data plus total count simultaneously

Key limitation: Sub-pipelines inside $facet cannot use indexes (only the $match before $facet can).

Q9. How do you handle the 100MB aggregation memory limit?

Model Answer:

Each aggregation pipeline stage has a 100MB memory limit by default. Options to handle large datasets:

allowDiskUse: true — spill to disk when memory is exceeded (slower but works):

db.orders.aggregate([...], { allowDiskUse: true });
// Mongoose: Model.aggregate([...]).allowDiskUse(true);

Reduce data early — $match and $project at the start to minimize document size
Avoid $push of $$ROOT — collecting entire documents into arrays is memory-intensive
Use $merge/$out — materialize intermediate results into collections for multi-step processing
Limit $unwind — unwinding large arrays multiplies document count dramatically

Advanced Level

Q10. When should you NOT create an index?

Model Answer:

Indexes are not always beneficial. Avoid indexing when:

Small collections (< 1000 documents) — collection scan is fast enough; index overhead is unnecessary
Write-heavy, read-light workloads — every write must update ALL indexes, so many indexes significantly slow inserts and updates
Low-selectivity fields — fields like gender (2-3 values) or isActive (true/false) barely narrow searches; the index is barely more efficient than a collection scan
Rarely queried fields — unused indexes waste RAM and slow writes
Very large documents with embedded arrays — multikey indexes on large arrays can be enormous
Rapidly changing fields — index constantly needs rebuilding

Strategy: Use $indexStats to identify unused indexes and remove them. Monitor index size with db.collection.stats().indexSizes. Indexes should fit in RAM.

Q11. How would you optimize a slow MongoDB query in production?

Model Answer:

Step-by-step approach:

Identify the slow query:
- Enable profiler: db.setProfilingLevel(1, { slowms: 100 })
- Check: db.system.profile.find().sort({ ts: -1 }).limit(5)
Analyze with explain():
```
db.collection.find({...}).explain("executionStats");
```
- Check for COLLSCAN (bad) vs IXSCAN (good)
- Compare totalDocsExamined to nReturned
- Look for in-memory SORT stages
Design the index:
- Apply ESR rule: Equality, Sort, Range
- Put high-selectivity fields first
- Consider if a covered query is possible
Create the index:
```
db.collection.createIndex({...}, { background: true });
```
- In production, create during low-traffic periods
- Background build avoids blocking other operations
Verify:
- Run explain() again — confirm IXSCAN
- Compare execution time before and after
Monitor:
- Check $indexStats to confirm the index is being used
- Review index sizes relative to available RAM

Q12. Compare $merge and $out. When do you use each?

Model Answer:

Both write aggregation results to a collection, but with different behaviors:

Feature	`$out`	`$merge`
Target collection	Replaces entirely	Upserts/merges
If target exists	Drops and recreates	Updates matching docs
If doc matches	N/A (full replace)	merge/replace/keepExisting/fail
If doc doesn't match	N/A	insert/discard/fail
Same database only?	Yes	Can write to different database
Atomicity	Atomic replacement	Per-document upsert

Use $out when:

You want to completely replace a materialized view
Building a new collection from scratch (e.g., full ETL rebuild)

Use $merge when:

Incrementally updating a stats/reports collection
Upsert behavior is needed (update existing, insert new)
You need to preserve existing data not affected by the pipeline
Writing to a collection in a different database

// $merge — update customer stats incrementally
{ $merge: { into: "customer_stats", on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }

// $out — full rebuild of monthly reports
{ $out: "monthly_reports" }