Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11 — Exercise Questions: Database Optimization

Practice makes permanent. These 40+ questions cover indexing, explain() analysis, aggregation pipelines, and MongoDB operators. Try to answer each one before checking the hints.

< 3.11.e — Advanced Aggregation | Interview Questions >

Indexing (Questions 1-12)

Q1. What is the default index on every MongoDB collection?

Hint: Every document has a unique identifier field created automatically.

Q2. Create a single-field index on the `email` field of a `users` collection in ascending order. Then create a unique index on the same field.

Hint: createIndex({ field: 1 }) and the unique option.

Q3. Given a compound index `{ status: 1, createdAt: -1, total: 1 }`, which of these queries can use the index efficiently?

a) db.orders.find({ status: "active" })
b) db.orders.find({ createdAt: { $gte: date } })
c) db.orders.find({ status: "active", createdAt: { $gte: date } })
d) db.orders.find({ total: { $gt: 100 } })
e) db.orders.find({}).sort({ status: 1, createdAt: -1 })

Hint: Compound indexes support queries on any prefix of the indexed fields.

Q4. What is a covered query? Write an example with the index `{ email: 1, name: 1 }`.

Hint: A covered query returns data entirely from the index without reading documents. Mind the _id field.

Q5. Create a TTL index that automatically deletes documents from a `sessions` collection after 2 hours.

Hint: expireAfterSeconds option on a date field.

Q6. When should you NOT create an index? List at least 3 scenarios.

Hint: Think about collection size, write patterns, and field selectivity.

Q7. What is a partial index? Create one that only indexes orders where `total` is greater than 500.

Hint: partialFilterExpression option.

Q8. Explain the difference between a sparse index and a partial index.

Hint: Sparse indexes only include documents where the indexed field exists. Partial indexes use a filter expression.

Q9. What is a text index? Create one that enables searching across `title` and `content` fields of a `posts` collection.

Hint: Text indexes use the "text" keyword instead of 1 or -1.

Q10. Write Mongoose schema definitions that include: (a) a unique index on `username`, (b) a compound index on `{ category: 1, price: -1 }`, (c) auto-index disabled for production.

Hint: Schema-level index definition uses schema.index().

Q11. Why does index direction (1 vs -1) matter for compound indexes but not for single-field indexes?

Hint: Think about sort operations on multiple fields.

Q12. You have a collection with 50 million documents. What happens to write performance as you add more indexes?

Hint: Every insert/update must update ALL indexes on the collection.

explain() and Performance (Questions 13-20)

Q13. Run `explain("executionStats")` on a query and identify whether it uses an index. What field tells you the scan type?

Hint: Look for stage — the key values are COLLSCAN and IXSCAN.

Q14. Given this explain output, identify the problems:

{
  "nReturned": 5,
  "executionTimeMillis": 2300,
  "totalKeysExamined": 0,
  "totalDocsExamined": 8000000
}

Hint: Compare nReturned to totalDocsExamined, and check totalKeysExamined.

Q15. What is the ESR rule? Apply it to design an index for this query:

db.products.find({ category: "electronics", price: { $gte: 50, $lte: 200 } }).sort({ rating: -1 });

Hint: Equality first, then Sort, then Range.

Q16. How do you check which indexes exist on a collection and how often each is used?

Hint: getIndexes() and $indexStats aggregation.

Q17. What does `hint()` do? When should you use it?

Hint: Forces MongoDB to use a specific index. Use sparingly.

Q18. What is the difference between `explain()`, `explain("executionStats")`, and `explain("allPlansExecution")`?

Hint: Each mode provides increasingly detailed information.

Q19. You see a `SORT` stage in your explain output with high memory usage. What does this mean and how do you fix it?

Hint: In-memory sort happens when the index cannot provide the sort order.

Q20. How do you enable profiling to log slow queries? What threshold would you set in production?

Hint: db.setProfilingLevel() and slowms parameter.

Aggregation Pipeline (Questions 21-32)

Q21. Write an aggregation pipeline to calculate total revenue and average order value for completed orders.

Hint: $match then $group with _id: null, using $sum and $avg.

Q22. Write a pipeline that groups orders by month and calculates monthly revenue.

Hint: Use $month or $dateToString inside the $group _id.

Q23. Given an `orders` collection with an `items` array field, write a pipeline to find the top 5 most-ordered products.

Hint: $unwind the items array, then $group by product, then $sort and $limit.

Q24. Write a `$lookup` to join `orders` with `users` (on `customerId` -> `_id`) and include the customer's name in the output.

Hint: $lookup produces an array — use $unwind to flatten it.

Q25. What is the difference between `$project` and `$addFields`?

Hint: One removes fields not mentioned; the other preserves all existing fields.

Q26. Write a `$facet` pipeline that returns both paginated results and a total count in one query.

Hint: Two sub-pipelines inside $facet — one with $skip/$limit, one with $count.

Q27. Use `$bucket` to group products into price ranges: 0-25, 25-50, 50-100, 100-500, 500+.

Hint: boundaries array and default for the overflow bucket.

Q28. Write a pipeline using `$cond` to add a "highValue" boolean field that is `true` when order total exceeds 1000.

Hint: $addFields with $cond: [condition, trueValue, falseValue].

Q29. What is the 100MB memory limit in aggregation? How do you handle datasets that exceed it?

Hint: allowDiskUse: true option.

Q30. Write an aggregation using `$merge` to upsert customer statistics into a `customer_stats` collection.

Hint: $merge with whenMatched: "merge" and whenNotMatched: "insert".

Q31. Explain why you should place `$match` as the first stage in a pipeline.

Hint: Two reasons — index usage and document reduction.

Q32. Write a pipeline that uses `$graphLookup` to find all ancestors of a category in a category tree.

Hint: startWith: "$parent", connectFromField: "parent", connectToField: "_id".

MongoDB Operators (Questions 33-42)

Q33. Write a query to find all users aged between 18 and 30 (inclusive) who are either "active" or "pending".

Hint: Combine $gte/$lte with $in.

Q34. What is the difference between `$push` and `$addToSet`? When would you use each?

Hint: One allows duplicates, the other does not.

Q35. Write an update that increments `viewCount` by 1, sets `lastViewed` to the current date, and adds a tag "trending" only if it does not already exist.

Hint: Combine $inc, $set, and $addToSet in one update.

Q36. Remove all elements from a `notifications` array where `read` is `true`.

Hint: $pull with a condition object.

Q37. What does `$elemMatch` do that a simple array query cannot? Give an example.

Hint: $elemMatch ensures all conditions match the SAME array element.

Q38. Write a query to find documents where a `phone` field exists AND is of type "string".

Hint: Combine $exists and $type.

Q39. What is the difference between `$unset` and setting a field to `null`?

Hint: One removes the field entirely; the other keeps the field with a null value.

Q40. Use `$regex` to find users whose email ends with "@gmail.com". Can this query use an index?

Hint: Suffix regex (/$pattern/) cannot use indexes efficiently; prefix regex (/^pattern/) can.

Q41. Write an update using the positional operator `$` to modify a specific element in an array.

Hint: The query must match the array element, and $ refers to the first matched element.

Q42. Use `$expr` to find products where the `stock` field is less than the `reorderLevel` field.

Hint: $expr allows comparing two fields within the same document.

Bonus Challenges (Questions 43-45)

Q43. Design a complete indexing strategy for a blog platform with these queries:

Find posts by author, sorted by date
Search posts by text content
Find posts by tag
Find posts with more than 100 likes, sorted by likes
Admin: find all posts by status

List each index and explain which query it supports.

Q44. Write a complete aggregation pipeline for a "User Activity Report" that includes:

Total posts per user
Average likes per post
Most active day of the week
User ranking by engagement

Hint: Use $lookup, $unwind, $group, $addFields, $sort.

Q45. You notice the following slow query in production logs:

db.orders.find({ customerId: id, status: "completed", total: { $gte: 50 } }).sort({ createdAt: -1 }).limit(10)

Walk through the complete optimization process: analyze with explain(), design the optimal index using the ESR rule, create the index, and verify the improvement.

Hint: Identify E (equality), S (sort), and R (range) fields, then create the compound index in that order.

3.11 — Exercise Questions: Database Optimization

Indexing (Questions 1-12)

Q1. What is the default index on every MongoDB collection?

Q2. Create a single-field index on the email field of a users collection in ascending order. Then create a unique index on the same field.

Q3. Given a compound index { status: 1, createdAt: -1, total: 1 }, which of these queries can use the index efficiently?

Q4. What is a covered query? Write an example with the index { email: 1, name: 1 }.

Q5. Create a TTL index that automatically deletes documents from a sessions collection after 2 hours.

Q6. When should you NOT create an index? List at least 3 scenarios.

Q7. What is a partial index? Create one that only indexes orders where total is greater than 500.

Q8. Explain the difference between a sparse index and a partial index.

Q9. What is a text index? Create one that enables searching across title and content fields of a posts collection.

Q10. Write Mongoose schema definitions that include: (a) a unique index on username, (b) a compound index on { category: 1, price: -1 }, (c) auto-index disabled for production.

Q11. Why does index direction (1 vs -1) matter for compound indexes but not for single-field indexes?

Q12. You have a collection with 50 million documents. What happens to write performance as you add more indexes?

explain() and Performance (Questions 13-20)

Q13. Run explain("executionStats") on a query and identify whether it uses an index. What field tells you the scan type?

Q14. Given this explain output, identify the problems:

Q15. What is the ESR rule? Apply it to design an index for this query:

Q16. How do you check which indexes exist on a collection and how often each is used?

Q17. What does hint() do? When should you use it?

Q18. What is the difference between explain(), explain("executionStats"), and explain("allPlansExecution")?

Q19. You see a SORT stage in your explain output with high memory usage. What does this mean and how do you fix it?

Q20. How do you enable profiling to log slow queries? What threshold would you set in production?

Aggregation Pipeline (Questions 21-32)

Q21. Write an aggregation pipeline to calculate total revenue and average order value for completed orders.

Q22. Write a pipeline that groups orders by month and calculates monthly revenue.

Q23. Given an orders collection with an items array field, write a pipeline to find the top 5 most-ordered products.

Q24. Write a $lookup to join orders with users (on customerId -> _id) and include the customer's name in the output.

Q25. What is the difference between $project and $addFields?

Q26. Write a $facet pipeline that returns both paginated results and a total count in one query.

Q27. Use $bucket to group products into price ranges: 0-25, 25-50, 50-100, 100-500, 500+.

Q28. Write a pipeline using $cond to add a "highValue" boolean field that is true when order total exceeds 1000.