Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11 — Exercise Questions: Database Optimization

Practice makes permanent. These 40+ questions cover indexing, explain() analysis, aggregation pipelines, and MongoDB operators. Try to answer each one before checking the hints.


< 3.11.e — Advanced Aggregation | Interview Questions >


Indexing (Questions 1-12)

Q1. What is the default index on every MongoDB collection?

Hint: Every document has a unique identifier field created automatically.

Q2. Create a single-field index on the email field of a users collection in ascending order. Then create a unique index on the same field.

Hint: createIndex({ field: 1 }) and the unique option.

Q3. Given a compound index { status: 1, createdAt: -1, total: 1 }, which of these queries can use the index efficiently?

  • a) db.orders.find({ status: "active" })
  • b) db.orders.find({ createdAt: { $gte: date } })
  • c) db.orders.find({ status: "active", createdAt: { $gte: date } })
  • d) db.orders.find({ total: { $gt: 100 } })
  • e) db.orders.find({}).sort({ status: 1, createdAt: -1 })

Hint: Compound indexes support queries on any prefix of the indexed fields.

Q4. What is a covered query? Write an example with the index { email: 1, name: 1 }.

Hint: A covered query returns data entirely from the index without reading documents. Mind the _id field.

Q5. Create a TTL index that automatically deletes documents from a sessions collection after 2 hours.

Hint: expireAfterSeconds option on a date field.

Q6. When should you NOT create an index? List at least 3 scenarios.

Hint: Think about collection size, write patterns, and field selectivity.

Q7. What is a partial index? Create one that only indexes orders where total is greater than 500.

Hint: partialFilterExpression option.

Q8. Explain the difference between a sparse index and a partial index.

Hint: Sparse indexes only include documents where the indexed field exists. Partial indexes use a filter expression.

Q9. What is a text index? Create one that enables searching across title and content fields of a posts collection.

Hint: Text indexes use the "text" keyword instead of 1 or -1.

Q10. Write Mongoose schema definitions that include: (a) a unique index on username, (b) a compound index on { category: 1, price: -1 }, (c) auto-index disabled for production.

Hint: Schema-level index definition uses schema.index().

Q11. Why does index direction (1 vs -1) matter for compound indexes but not for single-field indexes?

Hint: Think about sort operations on multiple fields.

Q12. You have a collection with 50 million documents. What happens to write performance as you add more indexes?

Hint: Every insert/update must update ALL indexes on the collection.


explain() and Performance (Questions 13-20)

Q13. Run explain("executionStats") on a query and identify whether it uses an index. What field tells you the scan type?

Hint: Look for stage — the key values are COLLSCAN and IXSCAN.

Q14. Given this explain output, identify the problems:

{
  "nReturned": 5,
  "executionTimeMillis": 2300,
  "totalKeysExamined": 0,
  "totalDocsExamined": 8000000
}

Hint: Compare nReturned to totalDocsExamined, and check totalKeysExamined.

Q15. What is the ESR rule? Apply it to design an index for this query:

db.products.find({ category: "electronics", price: { $gte: 50, $lte: 200 } }).sort({ rating: -1 });

Hint: Equality first, then Sort, then Range.

Q16. How do you check which indexes exist on a collection and how often each is used?

Hint: getIndexes() and $indexStats aggregation.

Q17. What does hint() do? When should you use it?

Hint: Forces MongoDB to use a specific index. Use sparingly.

Q18. What is the difference between explain(), explain("executionStats"), and explain("allPlansExecution")?

Hint: Each mode provides increasingly detailed information.

Q19. You see a SORT stage in your explain output with high memory usage. What does this mean and how do you fix it?

Hint: In-memory sort happens when the index cannot provide the sort order.

Q20. How do you enable profiling to log slow queries? What threshold would you set in production?

Hint: db.setProfilingLevel() and slowms parameter.


Aggregation Pipeline (Questions 21-32)

Q21. Write an aggregation pipeline to calculate total revenue and average order value for completed orders.

Hint: $match then $group with _id: null, using $sum and $avg.

Q22. Write a pipeline that groups orders by month and calculates monthly revenue.

Hint: Use $month or $dateToString inside the $group _id.

Q23. Given an orders collection with an items array field, write a pipeline to find the top 5 most-ordered products.

Hint: $unwind the items array, then $group by product, then $sort and $limit.

Q24. Write a $lookup to join orders with users (on customerId -> _id) and include the customer's name in the output.

Hint: $lookup produces an array — use $unwind to flatten it.

Q25. What is the difference between $project and $addFields?

Hint: One removes fields not mentioned; the other preserves all existing fields.

Q26. Write a $facet pipeline that returns both paginated results and a total count in one query.

Hint: Two sub-pipelines inside $facet — one with $skip/$limit, one with $count.

Q27. Use $bucket to group products into price ranges: 0-25, 25-50, 50-100, 100-500, 500+.

Hint: boundaries array and default for the overflow bucket.

Q28. Write a pipeline using $cond to add a "highValue" boolean field that is true when order total exceeds 1000.

Hint: $addFields with $cond: [condition, trueValue, falseValue].

Q29. What is the 100MB memory limit in aggregation? How do you handle datasets that exceed it?

Hint: allowDiskUse: true option.

Q30. Write an aggregation using $merge to upsert customer statistics into a customer_stats collection.

Hint: $merge with whenMatched: "merge" and whenNotMatched: "insert".

Q31. Explain why you should place $match as the first stage in a pipeline.

Hint: Two reasons — index usage and document reduction.

Q32. Write a pipeline that uses $graphLookup to find all ancestors of a category in a category tree.

Hint: startWith: "$parent", connectFromField: "parent", connectToField: "_id".


MongoDB Operators (Questions 33-42)

Q33. Write a query to find all users aged between 18 and 30 (inclusive) who are either "active" or "pending".

Hint: Combine $gte/$lte with $in.

Q34. What is the difference between $push and $addToSet? When would you use each?

Hint: One allows duplicates, the other does not.

Q35. Write an update that increments viewCount by 1, sets lastViewed to the current date, and adds a tag "trending" only if it does not already exist.

Hint: Combine $inc, $set, and $addToSet in one update.

Q36. Remove all elements from a notifications array where read is true.

Hint: $pull with a condition object.

Q37. What does $elemMatch do that a simple array query cannot? Give an example.

Hint: $elemMatch ensures all conditions match the SAME array element.

Q38. Write a query to find documents where a phone field exists AND is of type "string".

Hint: Combine $exists and $type.

Q39. What is the difference between $unset and setting a field to null?

Hint: One removes the field entirely; the other keeps the field with a null value.

Q40. Use $regex to find users whose email ends with "@gmail.com". Can this query use an index?

Hint: Suffix regex (/$pattern/) cannot use indexes efficiently; prefix regex (/^pattern/) can.

Q41. Write an update using the positional operator $ to modify a specific element in an array.

Hint: The query must match the array element, and $ refers to the first matched element.

Q42. Use $expr to find products where the stock field is less than the reorderLevel field.

Hint: $expr allows comparing two fields within the same document.


Bonus Challenges (Questions 43-45)

Q43. Design a complete indexing strategy for a blog platform with these queries:

  • Find posts by author, sorted by date
  • Search posts by text content
  • Find posts by tag
  • Find posts with more than 100 likes, sorted by likes
  • Admin: find all posts by status

List each index and explain which query it supports.

Q44. Write a complete aggregation pipeline for a "User Activity Report" that includes:

  • Total posts per user
  • Average likes per post
  • Most active day of the week
  • User ranking by engagement

Hint: Use $lookup, $unwind, $group, $addFields, $sort.

Q45. You notice the following slow query in production logs:

db.orders.find({ customerId: id, status: "completed", total: { $gte: 50 } }).sort({ createdAt: -1 }).limit(10)

Walk through the complete optimization process: analyze with explain(), design the optimal index using the ESR rule, create the index, and verify the improvement.

Hint: Identify E (equality), S (sort), and R (range) fields, then create the compound index in that order.