Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB
3.8.g — Database Relations and Populate
In one sentence: MongoDB supports relationships between collections through references (ObjectId pointers) and Mongoose's
populate()method resolves these references at query time -- replacing IDs with full documents, similar to SQL JOINs but performed at the application level.
< 3.8.f -- CRUD Operations with Mongoose | 3.8-Exercise-Questions >
Table of Contents
- Relationships in MongoDB
- One-to-One Relationships
- One-to-Many Relationships
- Many-to-Many Relationships
- Embedding vs Referencing — Decision Guide
- Setting Up References in Schemas
- populate() — Resolving References
- Populating Specific Fields
- Populating Multiple Paths
- Nested Populate (Deep Population)
- Virtual Populate
- Populate with Conditions
- Manual Population and $lookup
- Real-World Example: Blog Application
- Performance Considerations
- Key Takeaways
- Explain-It Challenge
1. Relationships in MongoDB
Unlike SQL databases with built-in JOINs, MongoDB handles relationships in two ways:
┌──────────────────────────────────────────────────────────────┐
│ TWO APPROACHES TO RELATIONSHIPS │
│ │
│ EMBEDDING (denormalization) REFERENCING (normalization) │
│ ┌─────────────────┐ ┌──────────┐ ┌──────────┐ │
│ │ User │ │ User │ │ Post │ │
│ │ name: "Alice" │ │ _id: A1 │ │ _id: P1 │ │
│ │ posts: [ │ │ name... │ │ title... │ │
│ │ { title... }, │ └──────────┘ │ author: │ │
│ │ { title... } │ │ A1 │ │
│ │ ] │ One read └──────────┘ │
│ └─────────────────┘ Two reads (or populate) │
│ One read, one document Two collections │
│ Data duplicated No duplication │
└──────────────────────────────────────────────────────────────┘
Three relationship types
| Type | Example | MongoDB approach |
|---|---|---|
| One-to-One | User <-> Profile | Embed (usually) or reference |
| One-to-Many | User <-> Posts | Reference with ObjectId |
| Many-to-Many | Students <-> Courses | Array of ObjectIds (both sides or junction) |
2. One-to-One Relationships
Approach 1: Embedding (preferred for 1:1)
const userSchema = new Schema({
name: String,
email: String,
profile: { // embedded 1:1
bio: String,
avatar: String,
website: String,
location: String
}
});
Approach 2: Referencing (when data is large or accessed independently)
// User schema
const userSchema = new Schema({
name: String,
email: String,
profile: {
type: Schema.Types.ObjectId,
ref: 'Profile'
}
});
// Profile schema (separate collection)
const profileSchema = new Schema({
bio: String,
avatar: String,
website: String,
socialLinks: {
github: String,
twitter: String,
linkedin: String
},
user: {
type: Schema.Types.ObjectId,
ref: 'User'
}
});
When to embed vs reference for 1:1
| Criteria | Embed | Reference |
|---|---|---|
| Data always accessed together | Yes | No |
| Data changes independently | Rarely | Frequently |
| Data is small | Yes | Large data (images, blobs) |
| Need atomic updates | Yes (single document) | No (two documents) |
3. One-to-Many Relationships
The most common relationship in web applications. Two approaches:
Approach 1: Child references parent (recommended)
// Author schema
const authorSchema = new Schema({
name: String,
email: String
});
// Post schema — each post references its author
const postSchema = new Schema({
title: { type: String, required: true },
content: String,
author: {
type: Schema.Types.ObjectId,
ref: 'Author',
required: true
}
}, { timestamps: true });
// Create
const author = await Author.create({ name: 'Alice' });
const post = await Post.create({
title: 'My First Post',
content: 'Hello world!',
author: author._id
});
// Find posts by author
const posts = await Post.find({ author: author._id });
Approach 2: Parent stores array of child IDs
const authorSchema = new Schema({
name: String,
posts: [{
type: Schema.Types.ObjectId,
ref: 'Post'
}]
});
const postSchema = new Schema({
title: String,
content: String
});
// Create post and add to author
const post = await Post.create({ title: 'Hello', content: 'World' });
await Author.findByIdAndUpdate(authorId, { $push: { posts: post._id } });
Which approach is better?
| Criteria | Child references parent | Parent stores child array |
|---|---|---|
| Unbounded growth? | No (each post has one field) | Yes (array grows forever) |
| Find children | Post.find({ author: id }) | Author.findById(id).populate('posts') |
| Add a child | Create post with author: id | Create post + $push to parent |
| 16 MB limit risk | None | Yes, if thousands of children |
Best practice: For true 1:many (hundreds or thousands of children), use child-references-parent. The parent array approach works well only for 1:few (up to ~100 items).
4. Many-to-Many Relationships
Approach 1: Arrays of references on both sides
// Student schema
const studentSchema = new Schema({
name: String,
enrolledCourses: [{
type: Schema.Types.ObjectId,
ref: 'Course'
}]
});
// Course schema
const courseSchema = new Schema({
title: String,
instructor: String,
students: [{
type: Schema.Types.ObjectId,
ref: 'Student'
}]
});
// Enroll a student in a course
async function enrollStudent(studentId, courseId) {
await Student.findByIdAndUpdate(studentId, {
$addToSet: { enrolledCourses: courseId }
});
await Course.findByIdAndUpdate(courseId, {
$addToSet: { students: studentId }
});
}
Approach 2: Junction collection (for extra relationship data)
// Enrollment schema (junction collection)
const enrollmentSchema = new Schema({
student: { type: Schema.Types.ObjectId, ref: 'Student', required: true },
course: { type: Schema.Types.ObjectId, ref: 'Course', required: true },
enrolledAt: { type: Date, default: Date.now },
grade: String,
status: { type: String, enum: ['active', 'completed', 'dropped'], default: 'active' }
});
// Unique compound index to prevent duplicate enrollments
enrollmentSchema.index({ student: 1, course: 1 }, { unique: true });
// Enroll
await Enrollment.create({ student: studentId, course: courseId });
// Find all courses for a student
const enrollments = await Enrollment.find({ student: studentId })
.populate('course', 'title instructor');
// Find all students in a course
const enrollments = await Enrollment.find({ course: courseId })
.populate('student', 'name email');
Use a junction collection when the relationship itself has properties (grade, enrollment date, role). Use arrays of references when the relationship is simple (no extra data).
5. Embedding vs Referencing -- Decision Guide
┌─────────────────────────────────────────────────────────────┐
│ DECISION FLOWCHART │
│ │
│ Is the data always accessed together? │
│ ├── YES: Is it 1:1 or 1:few? ─── YES → EMBED │
│ │ └── NO (1:many) → REFERENCE │
│ └── NO → REFERENCE │
│ │
│ Does the child data change independently? │
│ ├── YES → REFERENCE │
│ └── NO → EMBED (if size permits) │
│ │
│ Could the array grow unbounded (>100 items)? │
│ ├── YES → REFERENCE (child points to parent) │
│ └── NO → EMBED or parent-array OK │
│ │
│ Do you need the data to be atomic (single document)? │
│ ├── YES → EMBED │
│ └── NO → REFERENCE │
└─────────────────────────────────────────────────────────────┘
Summary table
| Pattern | Relationship | Example | Approach |
|---|---|---|---|
| Embed subdocument | 1:1 | User + Profile | profile: { bio, avatar } |
| Embed array | 1:few | User + Addresses | addresses: [{ street, city }] |
| Child references parent | 1:many | Author + Posts | author: ObjectId on Post |
| Parent stores child IDs | 1:few (bounded) | User + Favorite Items | favorites: [ObjectId] on User |
| Arrays both sides | many:many (simple) | Students + Courses | Arrays of ObjectIds |
| Junction collection | many:many (data) | Enrollment (student, course, grade) | Separate collection |
6. Setting Up References in Schemas
The key ingredient is type: Schema.Types.ObjectId with ref: 'ModelName':
const mongoose = require('mongoose');
const { Schema } = mongoose;
// Author model
const authorSchema = new Schema({
name: { type: String, required: true },
email: { type: String, required: true, unique: true }
}, { timestamps: true });
const Author = mongoose.model('Author', authorSchema);
// Post model with reference to Author
const postSchema = new Schema({
title: { type: String, required: true },
content: { type: String, required: true },
author: {
type: Schema.Types.ObjectId,
ref: 'Author', // Must match the model name exactly
required: true
},
tags: [String],
isPublished: { type: Boolean, default: false }
}, { timestamps: true });
const Post = mongoose.model('Post', postSchema);
// Comment model with references to both Post and Author
const commentSchema = new Schema({
text: { type: String, required: true },
post: {
type: Schema.Types.ObjectId,
ref: 'Post',
required: true
},
author: {
type: Schema.Types.ObjectId,
ref: 'Author',
required: true
}
}, { timestamps: true });
const Comment = mongoose.model('Comment', commentSchema);
7. populate() -- Resolving References
Without populate(), a reference field stores only the ObjectId:
const post = await Post.findById(postId);
console.log(post.author);
// ObjectId("64a1b2c3d4e5f6a7b8c9d0e1") -- just an ID
With populate(), Mongoose replaces the ObjectId with the actual document:
const post = await Post.findById(postId).populate('author');
console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com", ... }
How populate works internally
┌──────────────────────────────────────────────────────────────┐
│ HOW populate() WORKS │
│ │
│ Step 1: Post.findById(postId) │
│ → MongoDB returns: { ..., author: ObjectId("A1") } │
│ │
│ Step 2: Mongoose sees populate('author') │
│ → Reads ref: 'Author' from the schema │
│ → Runs: Author.findById("A1") │
│ │
│ Step 3: Mongoose replaces the ObjectId with the result │
│ → { ..., author: { _id: "A1", name: "Alice" } } │
│ │
│ This is an APPLICATION-LEVEL join, not a database-level │
│ join. It issues a SECOND query to MongoDB. │
└──────────────────────────────────────────────────────────────┘
8. Populating Specific Fields
Fetch only certain fields from the referenced document to reduce data transfer:
// Only get author's name and email (not every field)
const post = await Post.findById(postId)
.populate('author', 'name email');
console.log(post.author);
// { _id: ObjectId("..."), name: "Alice", email: "alice@example.com" }
// Other fields (role, createdAt, etc.) are excluded
Object syntax for more control
const post = await Post.findById(postId)
.populate({
path: 'author',
select: 'name email -_id' // include name/email, exclude _id
});
9. Populating Multiple Paths
// Method 1: Chain multiple populate calls
const comment = await Comment.findById(commentId)
.populate('author', 'name')
.populate('post', 'title');
// Method 2: Pass an array
const comment = await Comment.findById(commentId)
.populate([
{ path: 'author', select: 'name email' },
{ path: 'post', select: 'title' }
]);
// Method 3: Space-separated string (no field selection)
const comment = await Comment.findById(commentId)
.populate('author post');
10. Nested Populate (Deep Population)
Populate a field on an already-populated document:
// Comment -> Post -> Author (two levels deep)
const comment = await Comment.findById(commentId)
.populate({
path: 'post',
populate: {
path: 'author',
select: 'name email'
}
});
console.log(comment.post.title); // "My First Post"
console.log(comment.post.author.name); // "Alice"
Multiple nested populations
const comment = await Comment.findById(commentId)
.populate({
path: 'post',
select: 'title author tags',
populate: [
{ path: 'author', select: 'name' },
// could populate more fields on Post if they were refs
]
})
.populate('author', 'name'); // populate comment's own author too
11. Virtual Populate
Standard references require the child to store the parent's ObjectId. But what if you want to find all posts by an author without storing post IDs on the author document?
Virtual populate creates a virtual field on the parent that dynamically looks up children:
const authorSchema = new Schema({
name: String,
email: String
}, {
timestamps: true,
toJSON: { virtuals: true },
toObject: { virtuals: true }
});
// Virtual field: "posts" does not exist in the database
// Mongoose resolves it at query time
authorSchema.virtual('posts', {
ref: 'Post', // The model to populate from
localField: '_id', // Field on Author
foreignField: 'author', // Field on Post that references Author
count: false // Set true to only get count
});
const Author = mongoose.model('Author', authorSchema);
// Now you can populate "posts" even though it's not stored in the author document
const author = await Author.findById(authorId).populate('posts');
console.log(author.posts);
// [ { title: "Post 1", ... }, { title: "Post 2", ... } ]
// Get only the count
const author = await Author.findById(authorId).populate({
path: 'posts',
count: true
});
console.log(author.posts); // 5 (number, not array)
Virtual populate with options
authorSchema.virtual('recentPosts', {
ref: 'Post',
localField: '_id',
foreignField: 'author',
options: {
sort: { createdAt: -1 },
limit: 5
}
});
authorSchema.virtual('publishedPosts', {
ref: 'Post',
localField: '_id',
foreignField: 'author',
match: { isPublished: true } // only published posts
});
12. Populate with Conditions
Filter populated results with match
const author = await Author.findById(authorId)
.populate({
path: 'posts',
match: { isPublished: true }, // only published posts
select: 'title createdAt',
options: { sort: { createdAt: -1 }, limit: 10 }
});
// author.posts contains only published posts, sorted by date, max 10
Conditional population in route handlers
app.get('/api/authors/:id', async (req, res) => {
let query = Author.findById(req.params.id);
// Only populate if requested via query param
if (req.query.include === 'posts') {
query = query.populate({
path: 'posts',
select: 'title createdAt',
options: { sort: { createdAt: -1 } }
});
}
const author = await query.lean();
if (!author) return res.status(404).json({ error: 'Author not found' });
res.json({ data: author });
});
13. Manual Population and $lookup
Manual population (without populate())
// Sometimes you need more control than populate() offers
const post = await Post.findById(postId).lean();
const author = await Author.findById(post.author).lean();
post.author = author; // manually replace
MongoDB $lookup (aggregation pipeline)
For complex joins, use MongoDB's native $lookup (server-side join):
const postsWithAuthors = await Post.aggregate([
{ $match: { isPublished: true } },
{
$lookup: {
from: 'authors', // collection name (lowercase, plural)
localField: 'author', // field in posts
foreignField: '_id', // field in authors
as: 'authorDetails' // output array field name
}
},
{ $unwind: '$authorDetails' }, // convert array to object
{
$project: {
title: 1,
'authorDetails.name': 1,
'authorDetails.email': 1,
createdAt: 1
}
},
{ $sort: { createdAt: -1 } },
{ $limit: 10 }
]);
populate() vs $lookup
| Feature | populate() | $lookup |
|---|---|---|
| Execution | Application-level (multiple queries) | Database-level (single pipeline) |
| Complexity | Simple API | Aggregation pipeline syntax |
| Flexibility | Limited filtering | Full aggregation power |
| Performance | Good for small datasets | Better for large/complex joins |
| Mongoose features | Works with schema, virtuals | Returns raw documents |
| Use case | Standard CRUD APIs | Reports, analytics, complex queries |
14. Real-World Example: Blog Application
Schemas
// models/Author.js
const authorSchema = new Schema({
name: { type: String, required: true },
email: { type: String, required: true, unique: true },
bio: String,
avatar: String
}, {
timestamps: true,
toJSON: { virtuals: true }
});
authorSchema.virtual('posts', {
ref: 'Post',
localField: '_id',
foreignField: 'author'
});
module.exports = mongoose.model('Author', authorSchema);
// models/Post.js
const postSchema = new Schema({
title: { type: String, required: true },
slug: { type: String, unique: true },
content: { type: String, required: true },
excerpt: String,
author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
tags: [String],
isPublished: { type: Boolean, default: false },
publishedAt: Date
}, { timestamps: true });
module.exports = mongoose.model('Post', postSchema);
// models/Comment.js
const commentSchema = new Schema({
text: { type: String, required: true, maxlength: 1000 },
post: { type: Schema.Types.ObjectId, ref: 'Post', required: true },
author: { type: Schema.Types.ObjectId, ref: 'Author', required: true },
parentComment: { type: Schema.Types.ObjectId, ref: 'Comment', default: null }
}, { timestamps: true });
module.exports = mongoose.model('Comment', commentSchema);
Routes with populate
const express = require('express');
const router = express.Router();
const Post = require('../models/Post');
const Comment = require('../models/Comment');
// GET /api/posts — list with author info
router.get('/posts', async (req, res) => {
const posts = await Post.find({ isPublished: true })
.populate('author', 'name avatar')
.select('title slug excerpt tags publishedAt')
.sort({ publishedAt: -1 })
.limit(20)
.lean();
res.json({ data: posts });
});
// GET /api/posts/:slug — single post with author and comments
router.get('/posts/:slug', async (req, res) => {
const post = await Post.findOne({ slug: req.params.slug, isPublished: true })
.populate('author', 'name bio avatar');
if (!post) return res.status(404).json({ error: 'Post not found' });
const comments = await Comment.find({ post: post._id })
.populate('author', 'name avatar')
.sort({ createdAt: -1 })
.lean();
res.json({ data: { ...post.toObject(), comments } });
});
// GET /api/authors/:id — author with their posts
router.get('/authors/:id', async (req, res) => {
const author = await Author.findById(req.params.id)
.populate({
path: 'posts',
match: { isPublished: true },
select: 'title slug excerpt publishedAt',
options: { sort: { publishedAt: -1 } }
})
.lean();
if (!author) return res.status(404).json({ error: 'Author not found' });
res.json({ data: author });
});
module.exports = router;
15. Performance Considerations
The N+1 query problem
// BAD: N+1 queries — one query per post to get the author
const posts = await Post.find({}).lean();
for (const post of posts) {
post.author = await Author.findById(post.author).lean();
}
// If 100 posts → 101 queries (1 for posts + 100 for authors)
// GOOD: populate batches into a single additional query
const posts = await Post.find({}).populate('author', 'name').lean();
// 2 queries total: 1 for posts, 1 for all referenced authors
Performance tips
| Tip | Description |
|---|---|
| Select only needed fields | .populate('author', 'name email') not .populate('author') |
| Use lean() | .lean() returns plain objects, skipping Mongoose overhead |
| Avoid deep nesting | More than 2 levels of populate is a design smell |
| Index foreign keys | Add an index on the reference field (author on Post) |
| Use $lookup for complex queries | Aggregation pipeline is more efficient for reports |
| Paginate populated results | Use options: { limit } in populate to cap results |
| Cache frequent lookups | Author names rarely change -- cache them |
| Consider denormalization | Store authorName on Post to avoid populate for common reads |
Indexing reference fields
// Always index fields used in populate
postSchema.index({ author: 1 });
commentSchema.index({ post: 1 });
commentSchema.index({ author: 1 });
enrollmentSchema.index({ student: 1, course: 1 });
16. Key Takeaways
- Embed for 1:1 and 1:few relationships that are always accessed together. Reference for 1:many and many:many.
- Child-references-parent is the standard 1:many pattern (e.g.,
author: ObjectIdon Post). populate('field')resolves ObjectId references into full documents -- it is an application-level join, not a database-level join.- Select fields in populate (
.populate('author', 'name email')) to minimize data transfer. - Nested populate handles multi-level relationships (Comment -> Post -> Author) but avoid going deeper than 2 levels.
- Virtual populate lets you query children from the parent without storing an array of IDs (e.g., get all posts for an author).
- Many-to-many relationships use either arrays of ObjectIds on both models or a junction collection (when the relationship has properties).
- $lookup in the aggregation pipeline is the database-level alternative to
populate()-- better for complex queries and analytics. - Always index reference fields (
author,post, etc.) -- without indexes, populate triggers full collection scans. - Avoid N+1 queries -- use
populate()instead of looping with individualfindByIdcalls.
17. Explain-It Challenge
Design a schema for a course platform with Authors, Courses, Lessons, and Enrollments (with enrollment date and progress percentage). Explain your embedding vs referencing decisions. Then write the Mongoose schemas, including a virtual populate for "get all courses by an author." Finally, write a route that returns a single course with the author's name, all lessons (sorted by order), and the enrollment count. If you can do this without notes, you have mastered this topic.
< 3.8.f -- CRUD Operations with Mongoose | 3.8-Exercise-Questions >