Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB

3.8.g — Database Relations and Populate

In one sentence: MongoDB supports relationships between collections through references (ObjectId pointers) and Mongoose's populate() method resolves these references at query time -- replacing IDs with full documents, similar to SQL JOINs but performed at the application level.


< 3.8.f -- CRUD Operations with Mongoose | 3.8-Exercise-Questions >


Table of Contents

  1. Relationships in MongoDB
  2. One-to-One Relationships
  3. One-to-Many Relationships
  4. Many-to-Many Relationships
  5. Embedding vs Referencing — Decision Guide
  6. Setting Up References in Schemas
  7. populate() — Resolving References
  8. Populating Specific Fields
  9. Populating Multiple Paths
  10. Nested Populate (Deep Population)
  11. Virtual Populate
  12. Populate with Conditions
  13. Manual Population and $lookup
  14. Real-World Example: Blog Application
  15. Performance Considerations
  16. Key Takeaways
  17. Explain-It Challenge

1. Relationships in MongoDB

Unlike SQL databases with built-in JOINs, MongoDB handles relationships in two ways:

┌──────────────────────────────────────────────────────────────┐
│              TWO APPROACHES TO RELATIONSHIPS                  │
│                                                              │
│  EMBEDDING (denormalization)     REFERENCING (normalization)  │
│  ┌─────────────────┐            ┌──────────┐  ┌──────────┐  │
│  │ User            │            │ User     │  │ Post     │  │
│  │  name: "Alice"  │            │ _id: A1  │  │ _id: P1  │  │
│  │  posts: [       │            │ name...  │  │ title... │  │
│  │   { title... }, │            └──────────┘  │ author:  │  │
│  │   { title... }  │                          │   A1     │  │
│  │  ]              │            One read       └──────────┘  │
│  └─────────────────┘            Two reads (or populate)      │
│  One read, one document         Two collections              │
│  Data duplicated                No duplication               │
└──────────────────────────────────────────────────────────────┘

Three relationship types

TypeExampleMongoDB approach
One-to-OneUser <-> ProfileEmbed (usually) or reference
One-to-ManyUser <-> PostsReference with ObjectId
Many-to-ManyStudents <-> CoursesArray of ObjectIds (both sides or junction)

2. One-to-One Relationships

Approach 1: Embedding (preferred for 1:1)

const userSchema = new Schema({
  name: String,
  email: String,
  profile: {                    // embedded 1:1
    bio: String,
    avatar: String,
    website: String,
    location: String
  }
});

Approach 2: Referencing (when data is large or accessed independently)

// User schema
const userSchema = new Schema({
  name: String,
  email: String,
  profile: {
    type: Schema.Types.ObjectId,
    ref: 'Profile'
  }
});

// Profile schema (separate collection)
const profileSchema = new Schema({
  bio: String,
  avatar: String,
  website: String,
  socialLinks: {
    github: String,
    twitter: String,
    linkedin: String
  },
  user: {
    type: Schema.Types.ObjectId,
    ref: 'User'
  }
});

When to embed vs reference for 1:1

CriteriaEmbedReference
Data always accessed togetherYesNo
Data changes independentlyRarelyFrequently
Data is smallYesLarge data (images, blobs)
Need atomic updatesYes (single document)No (two documents)

3. One-to-Many Relationships

The most common relationship in web applications. Two approaches:

Approach 1: Child references parent (recommended)

// Author schema
const authorSchema = new Schema({
  name: String,
  email: String
});

// Post schema — each post references its author
const postSchema = new Schema({
  title: { type: String, required: true },
  content: String,
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',
    required: true
  }
}, { timestamps: true });
// Create
const author = await Author.create({ name: 'Alice' });
const post = await Post.create({
  title: 'My First Post',
  content: 'Hello world!',
  author: author._id
});

// Find posts by author
const posts = await Post.find({ author: author._id });

Approach 2: Parent stores array of child IDs

const authorSchema = new Schema({
  name: String,
  posts: [{
    type: Schema.Types.ObjectId,
    ref: 'Post'
  }]
});

const postSchema = new Schema({
  title: String,
  content: String
});
// Create post and add to author
const post = await Post.create({ title: 'Hello', content: 'World' });
await Author.findByIdAndUpdate(authorId, { $push: { posts: post._id } });

Which approach is better?

CriteriaChild references parentParent stores child array
Unbounded growth?No (each post has one field)Yes (array grows forever)
Find childrenPost.find({ author: id })Author.findById(id).populate('posts')
Add a childCreate post with author: idCreate post + $push to parent
16 MB limit riskNoneYes, if thousands of children

Best practice: For true 1:many (hundreds or thousands of children), use child-references-parent. The parent array approach works well only for 1:few (up to ~100 items).


4. Many-to-Many Relationships

Approach 1: Arrays of references on both sides

// Student schema
const studentSchema = new Schema({
  name: String,
  enrolledCourses: [{
    type: Schema.Types.ObjectId,
    ref: 'Course'
  }]
});

// Course schema
const courseSchema = new Schema({
  title: String,
  instructor: String,
  students: [{
    type: Schema.Types.ObjectId,
    ref: 'Student'
  }]
});
// Enroll a student in a course
async function enrollStudent(studentId, courseId) {
  await Student.findByIdAndUpdate(studentId, {
    $addToSet: { enrolledCourses: courseId }
  });
  await Course.findByIdAndUpdate(courseId, {
    $addToSet: { students: studentId }
  });
}

Approach 2: Junction collection (for extra relationship data)

// Enrollment schema (junction collection)
const enrollmentSchema = new Schema({
  student: { type: Schema.Types.ObjectId, ref: 'Student', required: true },
  course: { type: Schema.Types.ObjectId, ref: 'Course', required: true },
  enrolledAt: { type: Date, default: Date.now },
  grade: String,
  status: { type: String, enum: ['active', 'completed', 'dropped'], default: 'active' }
});

// Unique compound index to prevent duplicate enrollments
enrollmentSchema.index({ student: 1, course: 1 }, { unique: true });
// Enroll
await Enrollment.create({ student: studentId, course: courseId });

// Find all courses for a student
const enrollments = await Enrollment.find({ student: studentId })
  .populate('course', 'title instructor');

// Find all students in a course
const enrollments = await Enrollment.find({ course: courseId })
  .populate('student', 'name email');

Use a junction collection when the relationship itself has properties (grade, enrollment date, role). Use arrays of references when the relationship is simple (no extra data).


5. Embedding vs Referencing -- Decision Guide

┌─────────────────────────────────────────────────────────────┐
│              DECISION FLOWCHART                               │
│                                                             │
│  Is the data always accessed together?                      │
│  ├── YES: Is it 1:1 or 1:few? ─── YES → EMBED             │
│  │        └── NO (1:many) → REFERENCE                       │
│  └── NO → REFERENCE                                         │
│                                                             │
│  Does the child data change independently?                  │
│  ├── YES → REFERENCE                                        │
│  └── NO → EMBED (if size permits)                           │
│                                                             │
│  Could the array grow unbounded (>100 items)?               │
│  ├── YES → REFERENCE (child points to parent)               │
│  └── NO → EMBED or parent-array OK                          │
│                                                             │
│  Do you need the data to be atomic (single document)?       │
│  ├── YES → EMBED                                            │
│  └── NO → REFERENCE                                         │
└─────────────────────────────────────────────────────────────┘

Summary table

PatternRelationshipExampleApproach
Embed subdocument1:1User + Profileprofile: { bio, avatar }
Embed array1:fewUser + Addressesaddresses: [{ street, city }]
Child references parent1:manyAuthor + Postsauthor: ObjectId on Post
Parent stores child IDs1:few (bounded)User + Favorite Itemsfavorites: [ObjectId] on User
Arrays both sidesmany:many (simple)Students + CoursesArrays of ObjectIds
Junction collectionmany:many (data)Enrollment (student, course, grade)Separate collection

6. Setting Up References in Schemas

The key ingredient is type: Schema.Types.ObjectId with ref: 'ModelName':

const mongoose = require('mongoose');
const { Schema } = mongoose;

// Author model
const authorSchema = new Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true }
}, { timestamps: true });

const Author = mongoose.model('Author', authorSchema);

// Post model with reference to Author
const postSchema = new Schema({
  title: { type: String, required: true },
  content: { type: String, required: true },
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',              // Must match the model name exactly
    required: true
  },
  tags: [String],
  isPublished: { type: Boolean, default: false }
}, { timestamps: true });

const Post = mongoose.model('Post', postSchema);

// Comment model with references to both Post and Author
const commentSchema = new Schema({
  text: { type: String, required: true },
  post: {
    type: Schema.Types.ObjectId,
    ref: 'Post',
    required: true
  },
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',
    required: true
  }
}, { timestamps: true });

const Comment = mongoose.model('Comment', commentSchema);

7. populate() -- Resolving References

Without populate(), a reference field stores only the ObjectId:

const post = await Post.findById(postId);
console.log(post.author);
// ObjectId("64a1b2c3d4e5f6a7b8c9d0e1")  -- just an ID

With populate(), Mongoose replaces the ObjectId with the actual document:

const post = await Post.findById(postId).populate('author');
console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com", ... }

How populate works internally

┌──────────────────────────────────────────────────────────────┐
│                  HOW populate() WORKS                          │
│                                                              │
│  Step 1: Post.findById(postId)                               │
│          → MongoDB returns: { ..., author: ObjectId("A1") }  │
│                                                              │
│  Step 2: Mongoose sees populate('author')                    │
│          → Reads ref: 'Author' from the schema               │
│          → Runs: Author.findById("A1")                       │
│                                                              │
│  Step 3: Mongoose replaces the ObjectId with the result      │
│          → { ..., author: { _id: "A1", name: "Alice" } }    │
│                                                              │
│  This is an APPLICATION-LEVEL join, not a database-level     │
│  join. It issues a SECOND query to MongoDB.                  │
└──────────────────────────────────────────────────────────────┘

8. Populating Specific Fields

Fetch only certain fields from the referenced document to reduce data transfer:

// Only get author's name and email (not every field)
const post = await Post.findById(postId)
  .populate('author', 'name email');

console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com" }
// Other fields (role, createdAt, etc.) are excluded

Object syntax for more control

const post = await Post.findById(postId)
  .populate({
    path: 'author',
    select: 'name email -_id'     // include name/email, exclude _id
  });

9. Populating Multiple Paths

// Method 1: Chain multiple populate calls
const comment = await Comment.findById(commentId)
  .populate('author', 'name')
  .populate('post', 'title');

// Method 2: Pass an array
const comment = await Comment.findById(commentId)
  .populate([
    { path: 'author', select: 'name email' },
    { path: 'post', select: 'title' }
  ]);

// Method 3: Space-separated string (no field selection)
const comment = await Comment.findById(commentId)
  .populate('author post');

10. Nested Populate (Deep Population)

Populate a field on an already-populated document:

// Comment -> Post -> Author (two levels deep)
const comment = await Comment.findById(commentId)
  .populate({
    path: 'post',
    populate: {
      path: 'author',
      select: 'name email'
    }
  });

console.log(comment.post.title);        // "My First Post"
console.log(comment.post.author.name);  // "Alice"

Multiple nested populations

const comment = await Comment.findById(commentId)
  .populate({
    path: 'post',
    select: 'title author tags',
    populate: [
      { path: 'author', select: 'name' },
      // could populate more fields on Post if they were refs
    ]
  })
  .populate('author', 'name');  // populate comment's own author too

11. Virtual Populate

Standard references require the child to store the parent's ObjectId. But what if you want to find all posts by an author without storing post IDs on the author document?

Virtual populate creates a virtual field on the parent that dynamically looks up children:

const authorSchema = new Schema({
  name: String,
  email: String
}, {
  timestamps: true,
  toJSON: { virtuals: true },
  toObject: { virtuals: true }
});

// Virtual field: "posts" does not exist in the database
// Mongoose resolves it at query time
authorSchema.virtual('posts', {
  ref: 'Post',            // The model to populate from
  localField: '_id',      // Field on Author
  foreignField: 'author', // Field on Post that references Author
  count: false             // Set true to only get count
});

const Author = mongoose.model('Author', authorSchema);
// Now you can populate "posts" even though it's not stored in the author document
const author = await Author.findById(authorId).populate('posts');

console.log(author.posts);
// [ { title: "Post 1", ... }, { title: "Post 2", ... } ]

// Get only the count
const author = await Author.findById(authorId).populate({
  path: 'posts',
  count: true
});
console.log(author.posts); // 5 (number, not array)

Virtual populate with options

authorSchema.virtual('recentPosts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author',
  options: {
    sort: { createdAt: -1 },
    limit: 5
  }
});

authorSchema.virtual('publishedPosts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author',
  match: { isPublished: true }    // only published posts
});

12. Populate with Conditions

Filter populated results with match

const author = await Author.findById(authorId)
  .populate({
    path: 'posts',
    match: { isPublished: true },      // only published posts
    select: 'title createdAt',
    options: { sort: { createdAt: -1 }, limit: 10 }
  });

// author.posts contains only published posts, sorted by date, max 10

Conditional population in route handlers

app.get('/api/authors/:id', async (req, res) => {
  let query = Author.findById(req.params.id);

  // Only populate if requested via query param
  if (req.query.include === 'posts') {
    query = query.populate({
      path: 'posts',
      select: 'title createdAt',
      options: { sort: { createdAt: -1 } }
    });
  }

  const author = await query.lean();
  if (!author) return res.status(404).json({ error: 'Author not found' });
  res.json({ data: author });
});

13. Manual Population and $lookup

Manual population (without populate())

// Sometimes you need more control than populate() offers
const post = await Post.findById(postId).lean();
const author = await Author.findById(post.author).lean();
post.author = author;  // manually replace

MongoDB $lookup (aggregation pipeline)

For complex joins, use MongoDB's native $lookup (server-side join):

const postsWithAuthors = await Post.aggregate([
  { $match: { isPublished: true } },
  {
    $lookup: {
      from: 'authors',         // collection name (lowercase, plural)
      localField: 'author',    // field in posts
      foreignField: '_id',     // field in authors
      as: 'authorDetails'      // output array field name
    }
  },
  { $unwind: '$authorDetails' },  // convert array to object
  {
    $project: {
      title: 1,
      'authorDetails.name': 1,
      'authorDetails.email': 1,
      createdAt: 1
    }
  },
  { $sort: { createdAt: -1 } },
  { $limit: 10 }
]);

populate() vs $lookup

Featurepopulate()$lookup
ExecutionApplication-level (multiple queries)Database-level (single pipeline)
ComplexitySimple APIAggregation pipeline syntax
FlexibilityLimited filteringFull aggregation power
PerformanceGood for small datasetsBetter for large/complex joins
Mongoose featuresWorks with schema, virtualsReturns raw documents
Use caseStandard CRUD APIsReports, analytics, complex queries

14. Real-World Example: Blog Application

Schemas

// models/Author.js
const authorSchema = new Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true },
  bio: String,
  avatar: String
}, {
  timestamps: true,
  toJSON: { virtuals: true }
});

authorSchema.virtual('posts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author'
});

module.exports = mongoose.model('Author', authorSchema);

// models/Post.js
const postSchema = new Schema({
  title: { type: String, required: true },
  slug: { type: String, unique: true },
  content: { type: String, required: true },
  excerpt: String,
  author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
  tags: [String],
  isPublished: { type: Boolean, default: false },
  publishedAt: Date
}, { timestamps: true });

module.exports = mongoose.model('Post', postSchema);

// models/Comment.js
const commentSchema = new Schema({
  text: { type: String, required: true, maxlength: 1000 },
  post: { type: Schema.Types.ObjectId, ref: 'Post', required: true },
  author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
  parentComment: { type: Schema.Types.ObjectId, ref: 'Comment', default: null }
}, { timestamps: true });

module.exports = mongoose.model('Comment', commentSchema);

Routes with populate

const express = require('express');
const router = express.Router();
const Post = require('../models/Post');
const Comment = require('../models/Comment');

// GET /api/posts — list with author info
router.get('/posts', async (req, res) => {
  const posts = await Post.find({ isPublished: true })
    .populate('author', 'name avatar')
    .select('title slug excerpt tags publishedAt')
    .sort({ publishedAt: -1 })
    .limit(20)
    .lean();

  res.json({ data: posts });
});

// GET /api/posts/:slug — single post with author and comments
router.get('/posts/:slug', async (req, res) => {
  const post = await Post.findOne({ slug: req.params.slug, isPublished: true })
    .populate('author', 'name bio avatar');

  if (!post) return res.status(404).json({ error: 'Post not found' });

  const comments = await Comment.find({ post: post._id })
    .populate('author', 'name avatar')
    .sort({ createdAt: -1 })
    .lean();

  res.json({ data: { ...post.toObject(), comments } });
});

// GET /api/authors/:id — author with their posts
router.get('/authors/:id', async (req, res) => {
  const author = await Author.findById(req.params.id)
    .populate({
      path: 'posts',
      match: { isPublished: true },
      select: 'title slug excerpt publishedAt',
      options: { sort: { publishedAt: -1 } }
    })
    .lean();

  if (!author) return res.status(404).json({ error: 'Author not found' });
  res.json({ data: author });
});

module.exports = router;

15. Performance Considerations

The N+1 query problem

// BAD: N+1 queries — one query per post to get the author
const posts = await Post.find({}).lean();
for (const post of posts) {
  post.author = await Author.findById(post.author).lean();
}
// If 100 posts → 101 queries (1 for posts + 100 for authors)

// GOOD: populate batches into a single additional query
const posts = await Post.find({}).populate('author', 'name').lean();
// 2 queries total: 1 for posts, 1 for all referenced authors

Performance tips

TipDescription
Select only needed fields.populate('author', 'name email') not .populate('author')
Use lean().lean() returns plain objects, skipping Mongoose overhead
Avoid deep nestingMore than 2 levels of populate is a design smell
Index foreign keysAdd an index on the reference field (author on Post)
Use $lookup for complex queriesAggregation pipeline is more efficient for reports
Paginate populated resultsUse options: { limit } in populate to cap results
Cache frequent lookupsAuthor names rarely change -- cache them
Consider denormalizationStore authorName on Post to avoid populate for common reads

Indexing reference fields

// Always index fields used in populate
postSchema.index({ author: 1 });
commentSchema.index({ post: 1 });
commentSchema.index({ author: 1 });
enrollmentSchema.index({ student: 1, course: 1 });

16. Key Takeaways

  1. Embed for 1:1 and 1:few relationships that are always accessed together. Reference for 1:many and many:many.
  2. Child-references-parent is the standard 1:many pattern (e.g., author: ObjectId on Post).
  3. populate('field') resolves ObjectId references into full documents -- it is an application-level join, not a database-level join.
  4. Select fields in populate (.populate('author', 'name email')) to minimize data transfer.
  5. Nested populate handles multi-level relationships (Comment -> Post -> Author) but avoid going deeper than 2 levels.
  6. Virtual populate lets you query children from the parent without storing an array of IDs (e.g., get all posts for an author).
  7. Many-to-many relationships use either arrays of ObjectIds on both models or a junction collection (when the relationship has properties).
  8. $lookup in the aggregation pipeline is the database-level alternative to populate() -- better for complex queries and analytics.
  9. Always index reference fields (author, post, etc.) -- without indexes, populate triggers full collection scans.
  10. Avoid N+1 queries -- use populate() instead of looping with individual findById calls.

17. Explain-It Challenge

Design a schema for a course platform with Authors, Courses, Lessons, and Enrollments (with enrollment date and progress percentage). Explain your embedding vs referencing decisions. Then write the Mongoose schemas, including a virtual populate for "get all courses by an author." Finally, write a route that returns a single course with the author's name, all lessons (sorted by order), and the enrollment count. If you can do this without notes, you have mastered this topic.


< 3.8.f -- CRUD Operations with Mongoose | 3.8-Exercise-Questions >