Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB

3.8.g — Database Relations and Populate

In one sentence: MongoDB supports relationships between collections through references (ObjectId pointers) and Mongoose's populate() method resolves these references at query time -- replacing IDs with full documents, similar to SQL JOINs but performed at the application level.

< 3.8.f -- CRUD Operations with Mongoose | 3.8-Exercise-Questions >

Relationships in MongoDB
One-to-One Relationships
One-to-Many Relationships
Many-to-Many Relationships
Embedding vs Referencing — Decision Guide
Setting Up References in Schemas
populate() — Resolving References
Populating Specific Fields
Populating Multiple Paths
Nested Populate (Deep Population)
Virtual Populate
Populate with Conditions
Manual Population and $lookup
Real-World Example: Blog Application
Performance Considerations
Key Takeaways
Explain-It Challenge

1. Relationships in MongoDB

Unlike SQL databases with built-in JOINs, MongoDB handles relationships in two ways:

┌──────────────────────────────────────────────────────────────┐
│              TWO APPROACHES TO RELATIONSHIPS                  │
│                                                              │
│  EMBEDDING (denormalization)     REFERENCING (normalization)  │
│  ┌─────────────────┐            ┌──────────┐  ┌──────────┐  │
│  │ User            │            │ User     │  │ Post     │  │
│  │  name: "Alice"  │            │ _id: A1  │  │ _id: P1  │  │
│  │  posts: [       │            │ name...  │  │ title... │  │
│  │   { title... }, │            └──────────┘  │ author:  │  │
│  │   { title... }  │                          │   A1     │  │
│  │  ]              │            One read       └──────────┘  │
│  └─────────────────┘            Two reads (or populate)      │
│  One read, one document         Two collections              │
│  Data duplicated                No duplication               │
└──────────────────────────────────────────────────────────────┘

Three relationship types

Type	Example	MongoDB approach
One-to-One	User <-> Profile	Embed (usually) or reference
One-to-Many	User <-> Posts	Reference with ObjectId
Many-to-Many	Students <-> Courses	Array of ObjectIds (both sides or junction)

2. One-to-One Relationships

Approach 1: Embedding (preferred for 1:1)

const userSchema = new Schema({
  name: String,
  email: String,
  profile: {                    // embedded 1:1
    bio: String,
    avatar: String,
    website: String,
    location: String
  }
});

Approach 2: Referencing (when data is large or accessed independently)

// User schema
const userSchema = new Schema({
  name: String,
  email: String,
  profile: {
    type: Schema.Types.ObjectId,
    ref: 'Profile'
  }
});

// Profile schema (separate collection)
const profileSchema = new Schema({
  bio: String,
  avatar: String,
  website: String,
  socialLinks: {
    github: String,
    twitter: String,
    linkedin: String
  },
  user: {
    type: Schema.Types.ObjectId,
    ref: 'User'
  }
});

When to embed vs reference for 1:1

Criteria	Embed	Reference
Data always accessed together	Yes	No
Data changes independently	Rarely	Frequently
Data is small	Yes	Large data (images, blobs)
Need atomic updates	Yes (single document)	No (two documents)

3. One-to-Many Relationships

The most common relationship in web applications. Two approaches:

Approach 1: Child references parent (recommended)

// Author schema
const authorSchema = new Schema({
  name: String,
  email: String
});

// Post schema — each post references its author
const postSchema = new Schema({
  title: { type: String, required: true },
  content: String,
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',
    required: true
  }
}, { timestamps: true });

// Create
const author = await Author.create({ name: 'Alice' });
const post = await Post.create({
  title: 'My First Post',
  content: 'Hello world!',
  author: author._id
});

// Find posts by author
const posts = await Post.find({ author: author._id });

Approach 2: Parent stores array of child IDs

const authorSchema = new Schema({
  name: String,
  posts: [{
    type: Schema.Types.ObjectId,
    ref: 'Post'
  }]
});

const postSchema = new Schema({
  title: String,
  content: String
});

// Create post and add to author
const post = await Post.create({ title: 'Hello', content: 'World' });
await Author.findByIdAndUpdate(authorId, { $push: { posts: post._id } });

Which approach is better?

Criteria	Child references parent	Parent stores child array
Unbounded growth?	No (each post has one field)	Yes (array grows forever)
Find children	`Post.find({ author: id })`	`Author.findById(id).populate('posts')`
Add a child	Create post with `author: id`	Create post + `$push` to parent
16 MB limit risk	None	Yes, if thousands of children

Best practice: For true 1:many (hundreds or thousands of children), use child-references-parent. The parent array approach works well only for 1:few (up to ~100 items).

4. Many-to-Many Relationships

Approach 1: Arrays of references on both sides

// Student schema
const studentSchema = new Schema({
  name: String,
  enrolledCourses: [{
    type: Schema.Types.ObjectId,
    ref: 'Course'
  }]
});

// Course schema
const courseSchema = new Schema({
  title: String,
  instructor: String,
  students: [{
    type: Schema.Types.ObjectId,
    ref: 'Student'
  }]
});

// Enroll a student in a course
async function enrollStudent(studentId, courseId) {
  await Student.findByIdAndUpdate(studentId, {
    $addToSet: { enrolledCourses: courseId }
  });
  await Course.findByIdAndUpdate(courseId, {
    $addToSet: { students: studentId }
  });
}

Approach 2: Junction collection (for extra relationship data)

// Enrollment schema (junction collection)
const enrollmentSchema = new Schema({
  student: { type: Schema.Types.ObjectId, ref: 'Student', required: true },
  course: { type: Schema.Types.ObjectId, ref: 'Course', required: true },
  enrolledAt: { type: Date, default: Date.now },
  grade: String,
  status: { type: String, enum: ['active', 'completed', 'dropped'], default: 'active' }
});

// Unique compound index to prevent duplicate enrollments
enrollmentSchema.index({ student: 1, course: 1 }, { unique: true });

// Enroll
await Enrollment.create({ student: studentId, course: courseId });

// Find all courses for a student
const enrollments = await Enrollment.find({ student: studentId })
  .populate('course', 'title instructor');

// Find all students in a course
const enrollments = await Enrollment.find({ course: courseId })
  .populate('student', 'name email');

Use a junction collection when the relationship itself has properties (grade, enrollment date, role). Use arrays of references when the relationship is simple (no extra data).

5. Embedding vs Referencing -- Decision Guide

┌─────────────────────────────────────────────────────────────┐
│              DECISION FLOWCHART                               │
│                                                             │
│  Is the data always accessed together?                      │
│  ├── YES: Is it 1:1 or 1:few? ─── YES → EMBED             │
│  │        └── NO (1:many) → REFERENCE                       │
│  └── NO → REFERENCE                                         │
│                                                             │
│  Does the child data change independently?                  │
│  ├── YES → REFERENCE                                        │
│  └── NO → EMBED (if size permits)                           │
│                                                             │
│  Could the array grow unbounded (>100 items)?               │
│  ├── YES → REFERENCE (child points to parent)               │
│  └── NO → EMBED or parent-array OK                          │
│                                                             │
│  Do you need the data to be atomic (single document)?       │
│  ├── YES → EMBED                                            │
│  └── NO → REFERENCE                                         │
└─────────────────────────────────────────────────────────────┘

Summary table

Pattern	Relationship	Example	Approach
Embed subdocument	1:1	User + Profile	`profile: { bio, avatar }`
Embed array	1:few	User + Addresses	`addresses: [{ street, city }]`
Child references parent	1:many	Author + Posts	`author: ObjectId` on Post
Parent stores child IDs	1:few (bounded)	User + Favorite Items	`favorites: [ObjectId]` on User
Arrays both sides	many:many (simple)	Students + Courses	Arrays of ObjectIds
Junction collection	many:many (data)	Enrollment (student, course, grade)	Separate collection

6. Setting Up References in Schemas

The key ingredient is type: Schema.Types.ObjectId with ref: 'ModelName':

const mongoose = require('mongoose');
const { Schema } = mongoose;

// Author model
const authorSchema = new Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true }
}, { timestamps: true });

const Author = mongoose.model('Author', authorSchema);

// Post model with reference to Author
const postSchema = new Schema({
  title: { type: String, required: true },
  content: { type: String, required: true },
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',              // Must match the model name exactly
    required: true
  },
  tags: [String],
  isPublished: { type: Boolean, default: false }
}, { timestamps: true });

const Post = mongoose.model('Post', postSchema);

// Comment model with references to both Post and Author
const commentSchema = new Schema({
  text: { type: String, required: true },
  post: {
    type: Schema.Types.ObjectId,
    ref: 'Post',
    required: true
  },
  author: {
    type: Schema.Types.ObjectId,
    ref: 'Author',
    required: true
  }
}, { timestamps: true });

const Comment = mongoose.model('Comment', commentSchema);

7. populate() -- Resolving References

Without populate(), a reference field stores only the ObjectId:

const post = await Post.findById(postId);
console.log(post.author);
// ObjectId("64a1b2c3d4e5f6a7b8c9d0e1")  -- just an ID

With populate(), Mongoose replaces the ObjectId with the actual document:

const post = await Post.findById(postId).populate('author');
console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com", ... }

How populate works internally

┌──────────────────────────────────────────────────────────────┐
│                  HOW populate() WORKS                          │
│                                                              │
│  Step 1: Post.findById(postId)                               │
│          → MongoDB returns: { ..., author: ObjectId("A1") }  │
│                                                              │
│  Step 2: Mongoose sees populate('author')                    │
│          → Reads ref: 'Author' from the schema               │
│          → Runs: Author.findById("A1")                       │
│                                                              │
│  Step 3: Mongoose replaces the ObjectId with the result      │
│          → { ..., author: { _id: "A1", name: "Alice" } }    │
│                                                              │
│  This is an APPLICATION-LEVEL join, not a database-level     │
│  join. It issues a SECOND query to MongoDB.                  │
└──────────────────────────────────────────────────────────────┘

8. Populating Specific Fields

Fetch only certain fields from the referenced document to reduce data transfer:

// Only get author's name and email (not every field)
const post = await Post.findById(postId)
  .populate('author', 'name email');

console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com" }
// Other fields (role, createdAt, etc.) are excluded

Object syntax for more control

const post = await Post.findById(postId)
  .populate({
    path: 'author',
    select: 'name email -_id'     // include name/email, exclude _id
  });

9. Populating Multiple Paths

// Method 1: Chain multiple populate calls
const comment = await Comment.findById(commentId)
  .populate('author', 'name')
  .populate('post', 'title');

// Method 2: Pass an array
const comment = await Comment.findById(commentId)
  .populate([
    { path: 'author', select: 'name email' },
    { path: 'post', select: 'title' }
  ]);

// Method 3: Space-separated string (no field selection)
const comment = await Comment.findById(commentId)
  .populate('author post');

10. Nested Populate (Deep Population)

Populate a field on an already-populated document:

// Comment -> Post -> Author (two levels deep)
const comment = await Comment.findById(commentId)
  .populate({
    path: 'post',
    populate: {
      path: 'author',
      select: 'name email'
    }
  });

console.log(comment.post.title);        // "My First Post"
console.log(comment.post.author.name);  // "Alice"

Multiple nested populations

const comment = await Comment.findById(commentId)
  .populate({
    path: 'post',
    select: 'title author tags',
    populate: [
      { path: 'author', select: 'name' },
      // could populate more fields on Post if they were refs
    ]
  })
  .populate('author', 'name');  // populate comment's own author too

11. Virtual Populate

Standard references require the child to store the parent's ObjectId. But what if you want to find all posts by an author without storing post IDs on the author document?

Virtual populate creates a virtual field on the parent that dynamically looks up children:

const authorSchema = new Schema({
  name: String,
  email: String
}, {
  timestamps: true,
  toJSON: { virtuals: true },
  toObject: { virtuals: true }
});

// Virtual field: "posts" does not exist in the database
// Mongoose resolves it at query time
authorSchema.virtual('posts', {
  ref: 'Post',            // The model to populate from
  localField: '_id',      // Field on Author
  foreignField: 'author', // Field on Post that references Author
  count: false             // Set true to only get count
});

const Author = mongoose.model('Author', authorSchema);

// Now you can populate "posts" even though it's not stored in the author document
const author = await Author.findById(authorId).populate('posts');

console.log(author.posts);
// [ { title: "Post 1", ... }, { title: "Post 2", ... } ]

// Get only the count
const author = await Author.findById(authorId).populate({
  path: 'posts',
  count: true
});
console.log(author.posts); // 5 (number, not array)

Virtual populate with options

authorSchema.virtual('recentPosts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author',
  options: {
    sort: { createdAt: -1 },
    limit: 5
  }
});

authorSchema.virtual('publishedPosts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author',
  match: { isPublished: true }    // only published posts
});

12. Populate with Conditions

Filter populated results with `match`

const author = await Author.findById(authorId)
  .populate({
    path: 'posts',
    match: { isPublished: true },      // only published posts
    select: 'title createdAt',
    options: { sort: { createdAt: -1 }, limit: 10 }
  });

// author.posts contains only published posts, sorted by date, max 10

Conditional population in route handlers

app.get('/api/authors/:id', async (req, res) => {
  let query = Author.findById(req.params.id);

  // Only populate if requested via query param
  if (req.query.include === 'posts') {
    query = query.populate({
      path: 'posts',
      select: 'title createdAt',
      options: { sort: { createdAt: -1 } }
    });
  }

  const author = await query.lean();
  if (!author) return res.status(404).json({ error: 'Author not found' });
  res.json({ data: author });
});

13. Manual Population and $lookup

Manual population (without populate())

// Sometimes you need more control than populate() offers
const post = await Post.findById(postId).lean();
const author = await Author.findById(post.author).lean();
post.author = author;  // manually replace

MongoDB $lookup (aggregation pipeline)

For complex joins, use MongoDB's native $lookup (server-side join):

const postsWithAuthors = await Post.aggregate([
  { $match: { isPublished: true } },
  {
    $lookup: {
      from: 'authors',         // collection name (lowercase, plural)
      localField: 'author',    // field in posts
      foreignField: '_id',     // field in authors
      as: 'authorDetails'      // output array field name
    }
  },
  { $unwind: '$authorDetails' },  // convert array to object
  {
    $project: {
      title: 1,
      'authorDetails.name': 1,
      'authorDetails.email': 1,
      createdAt: 1
    }
  },
  { $sort: { createdAt: -1 } },
  { $limit: 10 }
]);

populate() vs $lookup

Feature	`populate()`	`$lookup`
Execution	Application-level (multiple queries)	Database-level (single pipeline)
Complexity	Simple API	Aggregation pipeline syntax
Flexibility	Limited filtering	Full aggregation power
Performance	Good for small datasets	Better for large/complex joins
Mongoose features	Works with schema, virtuals	Returns raw documents
Use case	Standard CRUD APIs	Reports, analytics, complex queries

14. Real-World Example: Blog Application

Schemas

// models/Author.js
const authorSchema = new Schema({
  name: { type: String, required: true },
  email: { type: String, required: true, unique: true },
  bio: String,
  avatar: String
}, {
  timestamps: true,
  toJSON: { virtuals: true }
});

authorSchema.virtual('posts', {
  ref: 'Post',
  localField: '_id',
  foreignField: 'author'
});

module.exports = mongoose.model('Author', authorSchema);

// models/Post.js
const postSchema = new Schema({
  title: { type: String, required: true },
  slug: { type: String, unique: true },
  content: { type: String, required: true },
  excerpt: String,
  author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
  tags: [String],
  isPublished: { type: Boolean, default: false },
  publishedAt: Date
}, { timestamps: true });

module.exports = mongoose.model('Post', postSchema);

// models/Comment.js
const commentSchema = new Schema({
  text: { type: String, required: true, maxlength: 1000 },
  post: { type: Schema.Types.ObjectId, ref: 'Post', required: true },
  author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
  parentComment: { type: Schema.Types.ObjectId, ref: 'Comment', default: null }
}, { timestamps: true });

module.exports = mongoose.model('Comment', commentSchema);

Routes with populate

const express = require('express');
const router = express.Router();
const Post = require('../models/Post');
const Comment = require('../models/Comment');

// GET /api/posts — list with author info
router.get('/posts', async (req, res) => {
  const posts = await Post.find({ isPublished: true })
    .populate('author', 'name avatar')
    .select('title slug excerpt tags publishedAt')
    .sort({ publishedAt: -1 })
    .limit(20)
    .lean();

  res.json({ data: posts });
});

// GET /api/posts/:slug — single post with author and comments
router.get('/posts/:slug', async (req, res) => {
  const post = await Post.findOne({ slug: req.params.slug, isPublished: true })
    .populate('author', 'name bio avatar');

  if (!post) return res.status(404).json({ error: 'Post not found' });

  const comments = await Comment.find({ post: post._id })
    .populate('author', 'name avatar')
    .sort({ createdAt: -1 })
    .lean();

  res.json({ data: { ...post.toObject(), comments } });
});

// GET /api/authors/:id — author with their posts
router.get('/authors/:id', async (req, res) => {
  const author = await Author.findById(req.params.id)
    .populate({
      path: 'posts',
      match: { isPublished: true },
      select: 'title slug excerpt publishedAt',
      options: { sort: { publishedAt: -1 } }
    })
    .lean();

  if (!author) return res.status(404).json({ error: 'Author not found' });
  res.json({ data: author });
});

module.exports = router;

15. Performance Considerations

The N+1 query problem

// BAD: N+1 queries — one query per post to get the author
const posts = await Post.find({}).lean();
for (const post of posts) {
  post.author = await Author.findById(post.author).lean();
}
// If 100 posts → 101 queries (1 for posts + 100 for authors)

// GOOD: populate batches into a single additional query
const posts = await Post.find({}).populate('author', 'name').lean();
// 2 queries total: 1 for posts, 1 for all referenced authors

Performance tips

Tip	Description
Select only needed fields	`.populate('author', 'name email')` not `.populate('author')`
Use lean()	`.lean()` returns plain objects, skipping Mongoose overhead
Avoid deep nesting	More than 2 levels of populate is a design smell
Index foreign keys	Add an index on the reference field (`author` on Post)
Use $lookup for complex queries	Aggregation pipeline is more efficient for reports
Paginate populated results	Use `options: { limit }` in populate to cap results
Cache frequent lookups	Author names rarely change -- cache them
Consider denormalization	Store `authorName` on Post to avoid populate for common reads

Indexing reference fields

// Always index fields used in populate
postSchema.index({ author: 1 });
commentSchema.index({ post: 1 });
commentSchema.index({ author: 1 });
enrollmentSchema.index({ student: 1, course: 1 });

16. Key Takeaways

Embed for 1:1 and 1:few relationships that are always accessed together. Reference for 1:many and many:many.
Child-references-parent is the standard 1:many pattern (e.g., author: ObjectId on Post).
populate('field') resolves ObjectId references into full documents -- it is an application-level join, not a database-level join.
Select fields in populate (.populate('author', 'name email')) to minimize data transfer.
Nested populate handles multi-level relationships (Comment -> Post -> Author) but avoid going deeper than 2 levels.
Virtual populate lets you query children from the parent without storing an array of IDs (e.g., get all posts for an author).
Many-to-many relationships use either arrays of ObjectIds on both models or a junction collection (when the relationship has properties).
$lookup in the aggregation pipeline is the database-level alternative to populate() -- better for complex queries and analytics.
Always index reference fields (author, post, etc.) -- without indexes, populate triggers full collection scans.
Avoid N+1 queries -- use populate() instead of looping with individual findById calls.

17. Explain-It Challenge

Design a schema for a course platform with Authors, Courses, Lessons, and Enrollments (with enrollment date and progress percentage). Explain your embedding vs referencing decisions. Then write the Mongoose schemas, including a virtual populate for "get all courses by an author." Finally, write a route that returns a single course with the author's name, all lessons (sorted by order), and the enrollment count. If you can do this without notes, you have mastered this topic.