Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization
3.11 — Interview Questions: Database Optimization
These 12 questions cover the most commonly asked MongoDB optimization topics in technical interviews — from beginner fundamentals to advanced aggregation patterns.
< Exercise Questions | Quick Revision >
Quick-Fire Table
| # | Question | Level | Key Topic |
|---|---|---|---|
| 1 | What is an index in MongoDB? | Beginner | Indexing basics |
| 2 | What is the difference between COLLSCAN and IXSCAN? | Beginner | explain() |
| 3 | How does a compound index work? | Beginner | Index types |
| 4 | What is the ESR rule? | Intermediate | Index design |
| 5 | What is a covered query? | Intermediate | Performance |
| 6 | How does the aggregation pipeline work? | Intermediate | Aggregation |
| 7 | Explain $lookup and its limitations | Intermediate | Joins |
| 8 | What is $facet and when would you use it? | Intermediate | Advanced aggregation |
| 9 | How do you handle the 100MB aggregation memory limit? | Intermediate | Performance |
| 10 | When should you NOT create an index? | Advanced | Index strategy |
| 11 | How would you optimize a slow MongoDB query in production? | Advanced | Debugging |
| 12 | Compare $merge and $out. When do you use each? | Advanced | Advanced aggregation |
Beginner Level
Q1. What is an index in MongoDB and why is it important?
Model Answer:
An index is a B-tree data structure that maps field values to the documents containing those values. It works like the index at the back of a textbook — instead of scanning every document in a collection (collection scan, O(n)), MongoDB can use the index to jump directly to matching documents (index scan, O(log n)).
Indexes are important because:
- They dramatically speed up read queries — from O(n) to O(log n)
- They enable efficient sorting without loading all documents into memory
- Unique indexes enforce data constraints at the database level
- TTL indexes automatically expire documents (sessions, tokens)
The tradeoff is that indexes slow down write operations because every insert, update, or delete must also update all indexes on the collection.
// Without index: scans all 1M documents (~1000ms)
// With index: B-tree lookup (~1ms)
db.users.createIndex({ email: 1 }, { unique: true });
Q2. What is the difference between COLLSCAN and IXSCAN?
Model Answer:
These are execution stages shown in explain() output:
-
COLLSCAN (Collection Scan): MongoDB reads every document in the collection to find matches. This is O(n) and very slow on large collections. It means no suitable index exists for the query.
-
IXSCAN (Index Scan): MongoDB uses a B-tree index to locate matching documents. This is O(log n) and significantly faster. Only the relevant index entries and matching documents are read.
// Check which scan a query uses:
db.users.find({ email: "alice@example.com" }).explain("executionStats");
// Look at executionStages.stage: "IXSCAN" or "COLLSCAN"
Key metrics to compare:
totalDocsExamined: COLLSCAN examines ALL docs; IXSCAN examines only matchestotalKeysExamined: 0 for COLLSCAN; close to nReturned for IXSCANexecutionTimeMillis: COLLSCAN is dramatically slower on large collections
Q3. How does a compound index work and what is the index prefix rule?
Model Answer:
A compound index indexes multiple fields in a specific order. The field order is crucial because a compound index supports queries on any prefix of its fields, but not on arbitrary subsets.
// Compound index
db.orders.createIndex({ status: 1, createdAt: -1, total: 1 });
// This index supports these queries (using prefixes):
db.orders.find({ status: "active" }); // prefix: status
db.orders.find({ status: "active", createdAt: { $gte: date } }); // prefix: status, createdAt
db.orders.find({ status: "active" }).sort({ createdAt: -1 }); // prefix: status + sort on createdAt
// These queries CANNOT use the index efficiently:
db.orders.find({ createdAt: { $gte: date } }); // skips status (first field)
db.orders.find({ total: { $gt: 100 } }); // skips status and createdAt
Direction matters for compound indexes in sort operations. If the index is { a: 1, b: -1 }, it supports sorts { a: 1, b: -1 } and { a: -1, b: 1 } (exact reverse), but not { a: 1, b: 1 }.
Intermediate Level
Q4. What is the ESR rule and how do you apply it?
Model Answer:
The ESR rule defines the optimal field order for compound indexes:
- E (Equality): Fields tested for exact match go first — they narrow results most effectively
- S (Sort): Fields used in sort order go second — allows the index to provide sorted results without in-memory sort
- R (Range): Fields with range conditions (
$gte,$lte,$in) go last — range scans after equality and sort are most efficient
// Query:
db.orders.find({
customerId: "abc123", // E - Equality
total: { $gte: 50 } // R - Range
}).sort({ createdAt: -1 }); // S - Sort
// Optimal index (ESR order):
db.orders.createIndex({ customerId: 1, createdAt: -1, total: 1 });
// E S R
Putting range before sort forces MongoDB to do an in-memory sort, which is slower and limited to 100MB. ESR order avoids this.
Q5. What is a covered query and why is it the fastest type of query?
Model Answer:
A covered query is one where the index contains ALL fields needed by the query — both for filtering and for the returned projection. MongoDB answers the query entirely from the index without ever reading the actual documents from disk.
// Index: { email: 1, name: 1 }
// Covered query:
db.users.find(
{ email: "alice@example.com" }, // Filter field in index
{ email: 1, name: 1, _id: 0 } // All projected fields in index, _id excluded
);
// totalDocsExamined: 0 — fastest possible
Key requirements:
- All query filter fields must be in the index
- All projected (returned) fields must be in the index
- The
_idfield must be explicitly excluded (since it is returned by default and may not be in the index)
In explain(), a covered query shows totalDocsExamined: 0 and uses PROJECTION_COVERED stage.
Q6. How does the aggregation pipeline work?
Model Answer:
The aggregation pipeline is an ordered array of stages that process documents sequentially. Each stage takes input documents, transforms them, and passes the results to the next stage — like a data assembly line.
db.orders.aggregate([
{ $match: { status: "completed" } }, // 1. Filter
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } }, // 2. Group & sum
{ $sort: { total: -1 } }, // 3. Sort
{ $limit: 10 } // 4. Top 10
]);
Key stages: $match (filter), $group (aggregate), $project/$addFields (reshape), $sort, $limit/$skip, $unwind (deconstruct arrays), $lookup (join collections), $facet (parallel pipelines).
Important performance rules:
- Place
$matchfirst to use indexes and reduce documents - Aggregation has a 100MB per-stage memory limit (use
allowDiskUse: true) - In Mongoose, aggregation bypasses schemas — you must manually cast ObjectIds
Q7. Explain $lookup and its limitations.
Model Answer:
$lookup performs a left outer join between two collections in the same database:
db.orders.aggregate([
{ $lookup: {
from: "users", // Collection to join
localField: "customerId", // Field in orders
foreignField: "_id", // Field in users
as: "customer" // Output array field
}},
{ $unwind: "$customer" } // Flatten the array to an object
]);
Limitations:
- Same database only — cannot join across databases
- No sharded
fromcollection (in older versions; MongoDB 5.1+ supports it) - Always returns an array — even for one-to-one relationships, you need
$unwind - Performance —
$lookupon large collections without indexes onforeignFieldis very slow - Cannot use indexes inside
$facet— if$lookupis inside$facet, it processes all input documents - Memory — joining large datasets can exceed the 100MB stage limit
Best practice: Always have an index on the foreignField and filter with $match before $lookup.
Q8. What is $facet and when would you use it?
Model Answer:
$facet runs multiple aggregation sub-pipelines in parallel on the same set of input documents. Each sub-pipeline produces independent results stored as separate fields in a single output document.
db.products.aggregate([
{ $match: { category: "electronics" } },
{ $facet: {
results: [{ $skip: 0 }, { $limit: 20 }],
totalCount: [{ $count: "count" }],
priceStats: [{ $group: { _id: null, avg: { $avg: "$price" } } }]
}}
]);
Common use cases:
- Faceted search — return search results plus filter counts (like e-commerce sidebars)
- Dashboard queries — multiple analytics in one round trip
- Pagination metadata — return paginated data plus total count simultaneously
Key limitation: Sub-pipelines inside $facet cannot use indexes (only the $match before $facet can).
Q9. How do you handle the 100MB aggregation memory limit?
Model Answer:
Each aggregation pipeline stage has a 100MB memory limit by default. Options to handle large datasets:
allowDiskUse: true— spill to disk when memory is exceeded (slower but works):
db.orders.aggregate([...], { allowDiskUse: true });
// Mongoose: Model.aggregate([...]).allowDiskUse(true);
- Reduce data early —
$matchand$projectat the start to minimize document size - Avoid
$pushof$$ROOT— collecting entire documents into arrays is memory-intensive - Use
$merge/$out— materialize intermediate results into collections for multi-step processing - Limit
$unwind— unwinding large arrays multiplies document count dramatically
Advanced Level
Q10. When should you NOT create an index?
Model Answer:
Indexes are not always beneficial. Avoid indexing when:
- Small collections (< 1000 documents) — collection scan is fast enough; index overhead is unnecessary
- Write-heavy, read-light workloads — every write must update ALL indexes, so many indexes significantly slow inserts and updates
- Low-selectivity fields — fields like
gender(2-3 values) orisActive(true/false) barely narrow searches; the index is barely more efficient than a collection scan - Rarely queried fields — unused indexes waste RAM and slow writes
- Very large documents with embedded arrays — multikey indexes on large arrays can be enormous
- Rapidly changing fields — index constantly needs rebuilding
Strategy: Use $indexStats to identify unused indexes and remove them. Monitor index size with db.collection.stats().indexSizes. Indexes should fit in RAM.
Q11. How would you optimize a slow MongoDB query in production?
Model Answer:
Step-by-step approach:
-
Identify the slow query:
- Enable profiler:
db.setProfilingLevel(1, { slowms: 100 }) - Check:
db.system.profile.find().sort({ ts: -1 }).limit(5)
- Enable profiler:
-
Analyze with explain():
db.collection.find({...}).explain("executionStats");- Check for
COLLSCAN(bad) vsIXSCAN(good) - Compare
totalDocsExaminedtonReturned - Look for in-memory
SORTstages
- Check for
-
Design the index:
- Apply ESR rule: Equality, Sort, Range
- Put high-selectivity fields first
- Consider if a covered query is possible
-
Create the index:
db.collection.createIndex({...}, { background: true });- In production, create during low-traffic periods
- Background build avoids blocking other operations
-
Verify:
- Run
explain()again — confirm IXSCAN - Compare execution time before and after
- Run
-
Monitor:
- Check
$indexStatsto confirm the index is being used - Review index sizes relative to available RAM
- Check
Q12. Compare $merge and $out. When do you use each?
Model Answer:
Both write aggregation results to a collection, but with different behaviors:
| Feature | $out | $merge |
|---|---|---|
| Target collection | Replaces entirely | Upserts/merges |
| If target exists | Drops and recreates | Updates matching docs |
| If doc matches | N/A (full replace) | merge/replace/keepExisting/fail |
| If doc doesn't match | N/A | insert/discard/fail |
| Same database only? | Yes | Can write to different database |
| Atomicity | Atomic replacement | Per-document upsert |
Use $out when:
- You want to completely replace a materialized view
- Building a new collection from scratch (e.g., full ETL rebuild)
Use $merge when:
- Incrementally updating a stats/reports collection
- Upsert behavior is needed (update existing, insert new)
- You need to preserve existing data not affected by the pipeline
- Writing to a collection in a different database
// $merge — update customer stats incrementally
{ $merge: { into: "customer_stats", on: "_id", whenMatched: "merge", whenNotMatched: "insert" } }
// $out — full rebuild of monthly reports
{ $out: "monthly_reports" }