Episode 3 — NodeJS MongoDB Backend Architecture / 3.11 — Database Optimization

3.11.c — MongoDB Aggregation Pipeline

The aggregation pipeline is MongoDB's most powerful data processing tool. It lets you transform, filter, group, and reshape documents through a series of stages — think of it as a data assembly line where each stage processes the output of the previous one.

< 3.11.b — Indexing Best Practices | 3.11.d — MongoDB Operators >

1. What Is the Aggregation Pipeline?

An aggregation pipeline is an ordered array of stages. Documents enter the first stage, get processed, and the output flows into the next stage:

Documents  →  [ $match ]  →  [ $group ]  →  [ $sort ]  →  [ $limit ]  →  Results
              (filter)       (summarize)    (order)       (top N)

// Basic syntax
db.collection.aggregate([
  { $stage1: { ... } },
  { $stage2: { ... } },
  { $stage3: { ... } },
]);

// Mongoose syntax
const results = await Model.aggregate([
  { $stage1: { ... } },
  { $stage2: { ... } },
]);

Key difference from find(): Aggregation can transform data — compute totals, averages, reshape documents, join collections, and more. find() can only filter and project existing fields.

2. $match — Filtering Documents

$match filters documents, like a find() query. Always place $match as early as possible to reduce the number of documents flowing through subsequent stages.

// Filter for completed orders
db.orders.aggregate([
  { $match: { status: "completed" } }
]);

// Multiple conditions
db.orders.aggregate([
  { $match: {
    status: "completed",
    total: { $gte: 100 },
    createdAt: { $gte: new Date("2025-01-01") }
  }}
]);

Performance tip: When $match is the first stage and uses indexed fields, MongoDB uses the index — just like a regular find().

3. $group — Grouping and Accumulating

$group groups documents by a key and applies accumulators to compute values per group.

// Syntax
{ $group: {
  _id: <grouping expression>,   // What to group by (null = all documents)
  field1: { <accumulator>: <expression> },
  field2: { <accumulator>: <expression> },
}}

Accumulators

Accumulator	Description	Example
`$sum`	Sum values or count	`{ $sum: "$amount" }` or `{ $sum: 1 }`
`$avg`	Average value	`{ $avg: "$price" }`
`$min`	Minimum value	`{ $min: "$score" }`
`$max`	Maximum value	`{ $max: "$score" }`
`$count`	Count documents in group	`{ $count: {} }`
`$push`	Collect values into an array	`{ $push: "$name" }`
`$addToSet`	Collect unique values	`{ $addToSet: "$category" }`
`$first`	First value in group	`{ $first: "$name" }`
`$last`	Last value in group	`{ $last: "$name" }`

Examples

// Total sales per customer
db.orders.aggregate([
  { $group: {
    _id: "$customerId",
    totalSpent: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrderValue: { $avg: "$total" },
    firstOrder: { $min: "$createdAt" },
    lastOrder: { $max: "$createdAt" }
  }}
]);

// Count orders by status
db.orders.aggregate([
  { $group: {
    _id: "$status",
    count: { $sum: 1 }
  }}
]);
// Output: [{ _id: "completed", count: 450 }, { _id: "pending", count: 87 }, ...]

// Group ALL documents (no grouping key)
db.orders.aggregate([
  { $group: {
    _id: null,
    totalRevenue: { $sum: "$total" },
    totalOrders: { $sum: 1 },
    avgOrderValue: { $avg: "$total" }
  }}
]);
// Output: [{ _id: null, totalRevenue: 125000, totalOrders: 537, avgOrderValue: 232.77 }]

// Collect product names per category
db.products.aggregate([
  { $group: {
    _id: "$category",
    products: { $push: "$name" },
    uniqueTags: { $addToSet: "$brand" }
  }}
]);

4. $project — Reshaping Documents

$project controls which fields appear in the output and can create computed fields:

// Include specific fields
db.users.aggregate([
  { $project: {
    name: 1,
    email: 1,
    _id: 0              // Exclude _id
  }}
]);

// Computed fields
db.orders.aggregate([
  { $project: {
    orderId: "$_id",
    customer: "$customerId",
    totalWithTax: { $multiply: ["$total", 1.18] },
    year: { $year: "$createdAt" },
    month: { $month: "$createdAt" },
    fullName: { $concat: ["$firstName", " ", "$lastName"] },
    isHighValue: { $gte: ["$total", 1000] }
  }}
]);

5. $sort — Ordering Results

// Sort by total descending
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: {
    _id: "$customerId",
    totalSpent: { $sum: "$total" }
  }},
  { $sort: { totalSpent: -1 } }   // -1 = descending, 1 = ascending
]);

Performance tip: If $sort is placed immediately after $match and uses an indexed field, MongoDB can use the index for sorting and avoid in-memory sort.

6. $limit and $skip — Pagination

// Top 5 customers by spending
db.orders.aggregate([
  { $group: {
    _id: "$customerId",
    totalSpent: { $sum: "$total" }
  }},
  { $sort: { totalSpent: -1 } },
  { $limit: 5 }
]);

// Pagination: page 3 with 10 items per page
db.products.aggregate([
  { $match: { category: "electronics" } },
  { $sort: { price: -1 } },
  { $skip: 20 },    // Skip first 2 pages (20 items)
  { $limit: 10 }    // Take 10 items
]);

Note: $skip + $limit is fine for small offsets. For large offsets, use range-based pagination (e.g., { _id: { $gt: lastSeenId } }).

7. $unwind — Deconstructing Arrays

$unwind creates one document per array element:

// Document: { _id: 1, tags: ["tech", "sale", "new"] }

db.products.aggregate([
  { $unwind: "$tags" }
]);

// Output:
// { _id: 1, tags: "tech" }
// { _id: 1, tags: "sale" }
// { _id: 1, tags: "new" }

Practical use — count tags across all products:

db.products.aggregate([
  { $unwind: "$tags" },
  { $group: {
    _id: "$tags",
    count: { $sum: 1 }
  }},
  { $sort: { count: -1 } }
]);
// Output: [{ _id: "tech", count: 245 }, { _id: "sale", count: 189 }, ...]

Preserving documents with empty arrays:

db.products.aggregate([
  { $unwind: {
    path: "$tags",
    preserveNullAndEmptyArrays: true   // Keep docs with no tags
  }}
]);

8. $lookup — Joining Collections

$lookup performs a left outer join with another collection:

// Basic syntax
{
  $lookup: {
    from: "otherCollection",       // Collection to join
    localField: "fieldInThis",     // Field in current collection
    foreignField: "fieldInOther",  // Field in other collection
    as: "outputArrayField"         // Output field name (always an array)
  }
}

Example — Orders with Customer Details

db.orders.aggregate([
  { $lookup: {
    from: "users",
    localField: "customerId",
    foreignField: "_id",
    as: "customer"
  }},
  { $unwind: "$customer" },     // Convert single-element array to object
  { $project: {
    orderTotal: "$total",
    customerName: "$customer.name",
    customerEmail: "$customer.email"
  }}
]);

Pipeline $lookup (advanced):

// Join with filtering inside the joined collection
db.orders.aggregate([
  { $lookup: {
    from: "products",
    let: { orderItems: "$items" },
    pipeline: [
      { $match: {
        $expr: { $in: ["$_id", "$$orderItems"] }
      }},
      { $project: { name: 1, price: 1 } }
    ],
    as: "productDetails"
  }}
]);

9. $addFields — Adding New Fields

$addFields adds new fields to documents without removing existing ones (unlike $project):

db.orders.aggregate([
  { $addFields: {
    totalWithTax: { $multiply: ["$total", 1.18] },
    itemCount: { $size: "$items" },
    isHighValue: { $gte: ["$total", 1000] }
  }}
]);
// All original fields are preserved, plus the new ones

10. $count — Counting Results

// Count completed orders
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $count: "completedOrderCount" }
]);
// Output: [{ completedOrderCount: 450 }]

11. Mongoose Aggregation

const mongoose = require("mongoose");
const Order = require("./models/Order");

// Basic aggregation
const salesReport = await Order.aggregate([
  { $match: { status: "completed" } },
  { $group: {
    _id: { $month: "$createdAt" },
    revenue: { $sum: "$total" },
    orders: { $sum: 1 }
  }},
  { $sort: { _id: 1 } }
]);

console.log(salesReport);
// [{ _id: 1, revenue: 12500, orders: 45 }, { _id: 2, revenue: 18700, orders: 62 }, ...]

Important Mongoose Note

// Aggregation does NOT use Mongoose schemas or middleware
// Field names must match the database, not the Mongoose virtual names
// ObjectId fields need explicit casting:
const userId = new mongoose.Types.ObjectId(req.params.id);

const stats = await Order.aggregate([
  { $match: { customerId: userId } },  // Must cast to ObjectId
  { $group: {
    _id: null,
    totalSpent: { $sum: "$total" }
  }}
]);

12. Real Example: Sales Report

// Monthly sales report with top products
const monthlySales = await Order.aggregate([
  // Stage 1: Filter completed orders for this year
  { $match: {
    status: "completed",
    createdAt: { $gte: new Date("2025-01-01") }
  }},

  // Stage 2: Unwind order items
  { $unwind: "$items" },

  // Stage 3: Lookup product details
  { $lookup: {
    from: "products",
    localField: "items.productId",
    foreignField: "_id",
    as: "product"
  }},
  { $unwind: "$product" },

  // Stage 4: Group by month
  { $group: {
    _id: {
      month: { $month: "$createdAt" },
      year: { $year: "$createdAt" }
    },
    totalRevenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } },
    totalOrders: { $sum: 1 },
    avgOrderValue: { $avg: "$total" },
    topProducts: { $push: "$product.name" }
  }},

  // Stage 5: Sort by month
  { $sort: { "_id.year": 1, "_id.month": 1 } },

  // Stage 6: Reshape output
  { $project: {
    _id: 0,
    month: "$_id.month",
    year: "$_id.year",
    totalRevenue: { $round: ["$totalRevenue", 2] },
    totalOrders: 1,
    avgOrderValue: { $round: ["$avgOrderValue", 2] }
  }}
]);

13. Real Example: User Analytics

// User engagement analytics
const userAnalytics = await User.aggregate([
  // Stage 1: Lookup user orders
  { $lookup: {
    from: "orders",
    localField: "_id",
    foreignField: "customerId",
    as: "orders"
  }},

  // Stage 2: Add computed fields
  { $addFields: {
    orderCount: { $size: "$orders" },
    totalSpent: { $sum: "$orders.total" },
    avgOrderValue: { $avg: "$orders.total" },
    daysSinceJoined: {
      $dateDiff: {
        startDate: "$createdAt",
        endDate: new Date(),
        unit: "day"
      }
    }
  }},

  // Stage 3: Categorize users
  { $addFields: {
    segment: {
      $switch: {
        branches: [
          { case: { $gte: ["$totalSpent", 5000] }, then: "VIP" },
          { case: { $gte: ["$totalSpent", 1000] }, then: "Regular" },
          { case: { $gt: ["$orderCount", 0] }, then: "Occasional" }
        ],
        default: "Inactive"
      }
    }
  }},

  // Stage 4: Group by segment
  { $group: {
    _id: "$segment",
    userCount: { $sum: 1 },
    totalRevenue: { $sum: "$totalSpent" },
    avgSpent: { $avg: "$totalSpent" }
  }},

  // Stage 5: Sort by revenue
  { $sort: { totalRevenue: -1 } }
]);

14. Pipeline Optimization Tips

Tip	Why
Put `$match` first	Reduces documents for all later stages; can use indexes
`$match` before `$lookup`	Fewer documents to join = much faster
Use `$project` early	Remove unneeded fields to reduce memory usage
Avoid `$unwind` on large arrays when possible	Creates many documents; consider `$filter` instead
Use `$limit` after `$sort`	MongoDB optimizes this into a top-N sort
Combine adjacent `$match` stages	MongoDB merges them automatically, but explicit is clearer

EFFICIENT PIPELINE                      INEFFICIENT PIPELINE
┌─────────────────────┐                ┌─────────────────────┐
│ $match (filter 95%) │                │ $lookup (all docs)  │
│ $project (3 fields) │                │ $unwind (explode)   │
│ $lookup (few docs)  │                │ $match (filter)     │
│ $group (summarize)  │                │ $group (summarize)  │
│ $sort + $limit      │                │ $sort               │
│ Fast: 50ms          │                │ Slow: 5000ms        │
└─────────────────────┘                └─────────────────────┘

Key Takeaways

The aggregation pipeline is an ordered series of stages — each stage transforms the data stream
$match should be the first stage whenever possible to leverage indexes and reduce document count
$group with accumulators ($sum, $avg, $min, $max, $push) provides powerful analytics
$lookup performs joins — use it to combine data from multiple collections
$unwind deconstructs arrays — essential for per-element operations
$project and $addFields reshape documents and create computed fields
Pipeline order matters for performance — filter and reduce data as early as possible
Mongoose aggregation bypasses schemas — you must manually cast ObjectIds and use raw field names

Explain-It Challenge

Your e-commerce platform needs a dashboard showing: (1) monthly revenue for the last 12 months, (2) top 10 customers by total spending, (3) most popular product categories with average order value, and (4) a breakdown of orders by status. Design four aggregation pipelines for these reports. Explain which stages you use and why, how you would optimize each pipeline for performance, and how you would handle the Mongoose integration in an Express route handler.