Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization
3.11.c — MongoDB Aggregation Pipeline
The aggregation pipeline is MongoDB's most powerful data processing tool. It lets you transform, filter, group, and reshape documents through a series of stages — think of it as a data assembly line where each stage processes the output of the previous one.
< 3.11.b — Indexing Best Practices | 3.11.d — MongoDB Operators >
1. What Is the Aggregation Pipeline?
An aggregation pipeline is an ordered array of stages. Documents enter the first stage, get processed, and the output flows into the next stage:
Documents → [ $match ] → [ $group ] → [ $sort ] → [ $limit ] → Results
(filter) (summarize) (order) (top N)
// Basic syntax
db.collection.aggregate([
{ $stage1: { ... } },
{ $stage2: { ... } },
{ $stage3: { ... } },
]);
// Mongoose syntax
const results = await Model.aggregate([
{ $stage1: { ... } },
{ $stage2: { ... } },
]);
Key difference from find(): Aggregation can transform data — compute totals, averages, reshape documents, join collections, and more. find() can only filter and project existing fields.
2. $match — Filtering Documents
$match filters documents, like a find() query. Always place $match as early as possible to reduce the number of documents flowing through subsequent stages.
// Filter for completed orders
db.orders.aggregate([
{ $match: { status: "completed" } }
]);
// Multiple conditions
db.orders.aggregate([
{ $match: {
status: "completed",
total: { $gte: 100 },
createdAt: { $gte: new Date("2025-01-01") }
}}
]);
Performance tip: When $match is the first stage and uses indexed fields, MongoDB uses the index — just like a regular find().
3. $group — Grouping and Accumulating
$group groups documents by a key and applies accumulators to compute values per group.
// Syntax
{ $group: {
_id: <grouping expression>, // What to group by (null = all documents)
field1: { <accumulator>: <expression> },
field2: { <accumulator>: <expression> },
}}
Accumulators
| Accumulator | Description | Example |
|---|---|---|
$sum | Sum values or count | { $sum: "$amount" } or { $sum: 1 } |
$avg | Average value | { $avg: "$price" } |
$min | Minimum value | { $min: "$score" } |
$max | Maximum value | { $max: "$score" } |
$count | Count documents in group | { $count: {} } |
$push | Collect values into an array | { $push: "$name" } |
$addToSet | Collect unique values | { $addToSet: "$category" } |
$first | First value in group | { $first: "$name" } |
$last | Last value in group | { $last: "$name" } |
Examples
// Total sales per customer
db.orders.aggregate([
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$total" },
orderCount: { $sum: 1 },
avgOrderValue: { $avg: "$total" },
firstOrder: { $min: "$createdAt" },
lastOrder: { $max: "$createdAt" }
}}
]);
// Count orders by status
db.orders.aggregate([
{ $group: {
_id: "$status",
count: { $sum: 1 }
}}
]);
// Output: [{ _id: "completed", count: 450 }, { _id: "pending", count: 87 }, ...]
// Group ALL documents (no grouping key)
db.orders.aggregate([
{ $group: {
_id: null,
totalRevenue: { $sum: "$total" },
totalOrders: { $sum: 1 },
avgOrderValue: { $avg: "$total" }
}}
]);
// Output: [{ _id: null, totalRevenue: 125000, totalOrders: 537, avgOrderValue: 232.77 }]
// Collect product names per category
db.products.aggregate([
{ $group: {
_id: "$category",
products: { $push: "$name" },
uniqueTags: { $addToSet: "$brand" }
}}
]);
4. $project — Reshaping Documents
$project controls which fields appear in the output and can create computed fields:
// Include specific fields
db.users.aggregate([
{ $project: {
name: 1,
email: 1,
_id: 0 // Exclude _id
}}
]);
// Computed fields
db.orders.aggregate([
{ $project: {
orderId: "$_id",
customer: "$customerId",
totalWithTax: { $multiply: ["$total", 1.18] },
year: { $year: "$createdAt" },
month: { $month: "$createdAt" },
fullName: { $concat: ["$firstName", " ", "$lastName"] },
isHighValue: { $gte: ["$total", 1000] }
}}
]);
5. $sort — Ordering Results
// Sort by total descending
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$total" }
}},
{ $sort: { totalSpent: -1 } } // -1 = descending, 1 = ascending
]);
Performance tip: If $sort is placed immediately after $match and uses an indexed field, MongoDB can use the index for sorting and avoid in-memory sort.
6. $limit and $skip — Pagination
// Top 5 customers by spending
db.orders.aggregate([
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$total" }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 5 }
]);
// Pagination: page 3 with 10 items per page
db.products.aggregate([
{ $match: { category: "electronics" } },
{ $sort: { price: -1 } },
{ $skip: 20 }, // Skip first 2 pages (20 items)
{ $limit: 10 } // Take 10 items
]);
Note: $skip + $limit is fine for small offsets. For large offsets, use range-based pagination (e.g., { _id: { $gt: lastSeenId } }).
7. $unwind — Deconstructing Arrays
$unwind creates one document per array element:
// Document: { _id: 1, tags: ["tech", "sale", "new"] }
db.products.aggregate([
{ $unwind: "$tags" }
]);
// Output:
// { _id: 1, tags: "tech" }
// { _id: 1, tags: "sale" }
// { _id: 1, tags: "new" }
Practical use — count tags across all products:
db.products.aggregate([
{ $unwind: "$tags" },
{ $group: {
_id: "$tags",
count: { $sum: 1 }
}},
{ $sort: { count: -1 } }
]);
// Output: [{ _id: "tech", count: 245 }, { _id: "sale", count: 189 }, ...]
Preserving documents with empty arrays:
db.products.aggregate([
{ $unwind: {
path: "$tags",
preserveNullAndEmptyArrays: true // Keep docs with no tags
}}
]);
8. $lookup — Joining Collections
$lookup performs a left outer join with another collection:
// Basic syntax
{
$lookup: {
from: "otherCollection", // Collection to join
localField: "fieldInThis", // Field in current collection
foreignField: "fieldInOther", // Field in other collection
as: "outputArrayField" // Output field name (always an array)
}
}
Example — Orders with Customer Details
db.orders.aggregate([
{ $lookup: {
from: "users",
localField: "customerId",
foreignField: "_id",
as: "customer"
}},
{ $unwind: "$customer" }, // Convert single-element array to object
{ $project: {
orderTotal: "$total",
customerName: "$customer.name",
customerEmail: "$customer.email"
}}
]);
Pipeline $lookup (advanced):
// Join with filtering inside the joined collection
db.orders.aggregate([
{ $lookup: {
from: "products",
let: { orderItems: "$items" },
pipeline: [
{ $match: {
$expr: { $in: ["$_id", "$$orderItems"] }
}},
{ $project: { name: 1, price: 1 } }
],
as: "productDetails"
}}
]);
9. $addFields — Adding New Fields
$addFields adds new fields to documents without removing existing ones (unlike $project):
db.orders.aggregate([
{ $addFields: {
totalWithTax: { $multiply: ["$total", 1.18] },
itemCount: { $size: "$items" },
isHighValue: { $gte: ["$total", 1000] }
}}
]);
// All original fields are preserved, plus the new ones
10. $count — Counting Results
// Count completed orders
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $count: "completedOrderCount" }
]);
// Output: [{ completedOrderCount: 450 }]
11. Mongoose Aggregation
const mongoose = require("mongoose");
const Order = require("./models/Order");
// Basic aggregation
const salesReport = await Order.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: { $month: "$createdAt" },
revenue: { $sum: "$total" },
orders: { $sum: 1 }
}},
{ $sort: { _id: 1 } }
]);
console.log(salesReport);
// [{ _id: 1, revenue: 12500, orders: 45 }, { _id: 2, revenue: 18700, orders: 62 }, ...]
Important Mongoose Note
// Aggregation does NOT use Mongoose schemas or middleware
// Field names must match the database, not the Mongoose virtual names
// ObjectId fields need explicit casting:
const userId = new mongoose.Types.ObjectId(req.params.id);
const stats = await Order.aggregate([
{ $match: { customerId: userId } }, // Must cast to ObjectId
{ $group: {
_id: null,
totalSpent: { $sum: "$total" }
}}
]);
12. Real Example: Sales Report
// Monthly sales report with top products
const monthlySales = await Order.aggregate([
// Stage 1: Filter completed orders for this year
{ $match: {
status: "completed",
createdAt: { $gte: new Date("2025-01-01") }
}},
// Stage 2: Unwind order items
{ $unwind: "$items" },
// Stage 3: Lookup product details
{ $lookup: {
from: "products",
localField: "items.productId",
foreignField: "_id",
as: "product"
}},
{ $unwind: "$product" },
// Stage 4: Group by month
{ $group: {
_id: {
month: { $month: "$createdAt" },
year: { $year: "$createdAt" }
},
totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } },
totalOrders: { $sum: 1 },
avgOrderValue: { $avg: "$total" },
topProducts: { $push: "$product.name" }
}},
// Stage 5: Sort by month
{ $sort: { "_id.year": 1, "_id.month": 1 } },
// Stage 6: Reshape output
{ $project: {
_id: 0,
month: "$_id.month",
year: "$_id.year",
totalRevenue: { $round: ["$totalRevenue", 2] },
totalOrders: 1,
avgOrderValue: { $round: ["$avgOrderValue", 2] }
}}
]);
13. Real Example: User Analytics
// User engagement analytics
const userAnalytics = await User.aggregate([
// Stage 1: Lookup user orders
{ $lookup: {
from: "orders",
localField: "_id",
foreignField: "customerId",
as: "orders"
}},
// Stage 2: Add computed fields
{ $addFields: {
orderCount: { $size: "$orders" },
totalSpent: { $sum: "$orders.total" },
avgOrderValue: { $avg: "$orders.total" },
daysSinceJoined: {
$dateDiff: {
startDate: "$createdAt",
endDate: new Date(),
unit: "day"
}
}
}},
// Stage 3: Categorize users
{ $addFields: {
segment: {
$switch: {
branches: [
{ case: { $gte: ["$totalSpent", 5000] }, then: "VIP" },
{ case: { $gte: ["$totalSpent", 1000] }, then: "Regular" },
{ case: { $gt: ["$orderCount", 0] }, then: "Occasional" }
],
default: "Inactive"
}
}
}},
// Stage 4: Group by segment
{ $group: {
_id: "$segment",
userCount: { $sum: 1 },
totalRevenue: { $sum: "$totalSpent" },
avgSpent: { $avg: "$totalSpent" }
}},
// Stage 5: Sort by revenue
{ $sort: { totalRevenue: -1 } }
]);
14. Pipeline Optimization Tips
| Tip | Why |
|---|---|
Put $match first | Reduces documents for all later stages; can use indexes |
$match before $lookup | Fewer documents to join = much faster |
Use $project early | Remove unneeded fields to reduce memory usage |
Avoid $unwind on large arrays when possible | Creates many documents; consider $filter instead |
Use $limit after $sort | MongoDB optimizes this into a top-N sort |
Combine adjacent $match stages | MongoDB merges them automatically, but explicit is clearer |
EFFICIENT PIPELINE INEFFICIENT PIPELINE
┌─────────────────────┐ ┌─────────────────────┐
│ $match (filter 95%) │ │ $lookup (all docs) │
│ $project (3 fields) │ │ $unwind (explode) │
│ $lookup (few docs) │ │ $match (filter) │
│ $group (summarize) │ │ $group (summarize) │
│ $sort + $limit │ │ $sort │
│ Fast: 50ms │ │ Slow: 5000ms │
└─────────────────────┘ └─────────────────────┘
Key Takeaways
- The aggregation pipeline is an ordered series of stages — each stage transforms the data stream
$matchshould be the first stage whenever possible to leverage indexes and reduce document count$groupwith accumulators ($sum,$avg,$min,$max,$push) provides powerful analytics$lookupperforms joins — use it to combine data from multiple collections$unwinddeconstructs arrays — essential for per-element operations$projectand$addFieldsreshape documents and create computed fields- Pipeline order matters for performance — filter and reduce data as early as possible
- Mongoose aggregation bypasses schemas — you must manually cast ObjectIds and use raw field names
Explain-It Challenge
Your e-commerce platform needs a dashboard showing: (1) monthly revenue for the last 12 months, (2) top 10 customers by total spending, (3) most popular product categories with average order value, and (4) a breakdown of orders by status. Design four aggregation pipelines for these reports. Explain which stages you use and why, how you would optimize each pipeline for performance, and how you would handle the Mongoose integration in an Express route handler.