Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization
3.11 — Exercise Questions: Database Optimization
Practice makes permanent. These 40+ questions cover indexing, explain() analysis, aggregation pipelines, and MongoDB operators. Try to answer each one before checking the hints.
< 3.11.e — Advanced Aggregation | Interview Questions >
Indexing (Questions 1-12)
Q1. What is the default index on every MongoDB collection?
Hint: Every document has a unique identifier field created automatically.
Q2. Create a single-field index on the email field of a users collection in ascending order. Then create a unique index on the same field.
Hint: createIndex({ field: 1 }) and the unique option.
Q3. Given a compound index { status: 1, createdAt: -1, total: 1 }, which of these queries can use the index efficiently?
- a)
db.orders.find({ status: "active" }) - b)
db.orders.find({ createdAt: { $gte: date } }) - c)
db.orders.find({ status: "active", createdAt: { $gte: date } }) - d)
db.orders.find({ total: { $gt: 100 } }) - e)
db.orders.find({}).sort({ status: 1, createdAt: -1 })
Hint: Compound indexes support queries on any prefix of the indexed fields.
Q4. What is a covered query? Write an example with the index { email: 1, name: 1 }.
Hint: A covered query returns data entirely from the index without reading documents. Mind the _id field.
Q5. Create a TTL index that automatically deletes documents from a sessions collection after 2 hours.
Hint: expireAfterSeconds option on a date field.
Q6. When should you NOT create an index? List at least 3 scenarios.
Hint: Think about collection size, write patterns, and field selectivity.
Q7. What is a partial index? Create one that only indexes orders where total is greater than 500.
Hint: partialFilterExpression option.
Q8. Explain the difference between a sparse index and a partial index.
Hint: Sparse indexes only include documents where the indexed field exists. Partial indexes use a filter expression.
Q9. What is a text index? Create one that enables searching across title and content fields of a posts collection.
Hint: Text indexes use the "text" keyword instead of 1 or -1.
Q10. Write Mongoose schema definitions that include: (a) a unique index on username, (b) a compound index on { category: 1, price: -1 }, (c) auto-index disabled for production.
Hint: Schema-level index definition uses schema.index().
Q11. Why does index direction (1 vs -1) matter for compound indexes but not for single-field indexes?
Hint: Think about sort operations on multiple fields.
Q12. You have a collection with 50 million documents. What happens to write performance as you add more indexes?
Hint: Every insert/update must update ALL indexes on the collection.
explain() and Performance (Questions 13-20)
Q13. Run explain("executionStats") on a query and identify whether it uses an index. What field tells you the scan type?
Hint: Look for stage — the key values are COLLSCAN and IXSCAN.
Q14. Given this explain output, identify the problems:
{
"nReturned": 5,
"executionTimeMillis": 2300,
"totalKeysExamined": 0,
"totalDocsExamined": 8000000
}
Hint: Compare nReturned to totalDocsExamined, and check totalKeysExamined.
Q15. What is the ESR rule? Apply it to design an index for this query:
db.products.find({ category: "electronics", price: { $gte: 50, $lte: 200 } }).sort({ rating: -1 });
Hint: Equality first, then Sort, then Range.
Q16. How do you check which indexes exist on a collection and how often each is used?
Hint: getIndexes() and $indexStats aggregation.
Q17. What does hint() do? When should you use it?
Hint: Forces MongoDB to use a specific index. Use sparingly.
Q18. What is the difference between explain(), explain("executionStats"), and explain("allPlansExecution")?
Hint: Each mode provides increasingly detailed information.
Q19. You see a SORT stage in your explain output with high memory usage. What does this mean and how do you fix it?
Hint: In-memory sort happens when the index cannot provide the sort order.
Q20. How do you enable profiling to log slow queries? What threshold would you set in production?
Hint: db.setProfilingLevel() and slowms parameter.
Aggregation Pipeline (Questions 21-32)
Q21. Write an aggregation pipeline to calculate total revenue and average order value for completed orders.
Hint: $match then $group with _id: null, using $sum and $avg.
Q22. Write a pipeline that groups orders by month and calculates monthly revenue.
Hint: Use $month or $dateToString inside the $group _id.
Q23. Given an orders collection with an items array field, write a pipeline to find the top 5 most-ordered products.
Hint: $unwind the items array, then $group by product, then $sort and $limit.
Q24. Write a $lookup to join orders with users (on customerId -> _id) and include the customer's name in the output.
Hint: $lookup produces an array — use $unwind to flatten it.
Q25. What is the difference between $project and $addFields?
Hint: One removes fields not mentioned; the other preserves all existing fields.
Q26. Write a $facet pipeline that returns both paginated results and a total count in one query.
Hint: Two sub-pipelines inside $facet — one with $skip/$limit, one with $count.
Q27. Use $bucket to group products into price ranges: 0-25, 25-50, 50-100, 100-500, 500+.
Hint: boundaries array and default for the overflow bucket.
Q28. Write a pipeline using $cond to add a "highValue" boolean field that is true when order total exceeds 1000.
Hint: $addFields with $cond: [condition, trueValue, falseValue].
Q29. What is the 100MB memory limit in aggregation? How do you handle datasets that exceed it?
Hint: allowDiskUse: true option.
Q30. Write an aggregation using $merge to upsert customer statistics into a customer_stats collection.
Hint: $merge with whenMatched: "merge" and whenNotMatched: "insert".
Q31. Explain why you should place $match as the first stage in a pipeline.
Hint: Two reasons — index usage and document reduction.
Q32. Write a pipeline that uses $graphLookup to find all ancestors of a category in a category tree.
Hint: startWith: "$parent", connectFromField: "parent", connectToField: "_id".
MongoDB Operators (Questions 33-42)
Q33. Write a query to find all users aged between 18 and 30 (inclusive) who are either "active" or "pending".
Hint: Combine $gte/$lte with $in.
Q34. What is the difference between $push and $addToSet? When would you use each?
Hint: One allows duplicates, the other does not.
Q35. Write an update that increments viewCount by 1, sets lastViewed to the current date, and adds a tag "trending" only if it does not already exist.
Hint: Combine $inc, $set, and $addToSet in one update.
Q36. Remove all elements from a notifications array where read is true.
Hint: $pull with a condition object.
Q37. What does $elemMatch do that a simple array query cannot? Give an example.
Hint: $elemMatch ensures all conditions match the SAME array element.
Q38. Write a query to find documents where a phone field exists AND is of type "string".
Hint: Combine $exists and $type.
Q39. What is the difference between $unset and setting a field to null?
Hint: One removes the field entirely; the other keeps the field with a null value.
Q40. Use $regex to find users whose email ends with "@gmail.com". Can this query use an index?
Hint: Suffix regex (/$pattern/) cannot use indexes efficiently; prefix regex (/^pattern/) can.
Q41. Write an update using the positional operator $ to modify a specific element in an array.
Hint: The query must match the array element, and $ refers to the first matched element.
Q42. Use $expr to find products where the stock field is less than the reorderLevel field.
Hint: $expr allows comparing two fields within the same document.
Bonus Challenges (Questions 43-45)
Q43. Design a complete indexing strategy for a blog platform with these queries:
- Find posts by author, sorted by date
- Search posts by text content
- Find posts by tag
- Find posts with more than 100 likes, sorted by likes
- Admin: find all posts by status
List each index and explain which query it supports.
Q44. Write a complete aggregation pipeline for a "User Activity Report" that includes:
- Total posts per user
- Average likes per post
- Most active day of the week
- User ranking by engagement
Hint: Use $lookup, $unwind, $group, $addFields, $sort.
Q45. You notice the following slow query in production logs:
db.orders.find({ customerId: id, status: "completed", total: { $gte: 50 } }).sort({ createdAt: -1 }).limit(10)
Walk through the complete optimization process: analyze with explain(), design the optimal index using the ESR rule, create the index, and verify the improvement.
Hint: Identify E (equality), S (sort), and R (range) fields, then create the compound index in that order.