Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB

Interview Questions: Database Basics — MongoDB (Episode 3)

How to use this material (instructions)

Read 3.8.a through 3.8.g.
Answer aloud, then compare below.
Pair with 3.8-Exercise-Questions.md.

Beginner Level

Q1: What is MongoDB and how does it differ from a relational database like MySQL?

Why interviewers ask: Foundational understanding of NoSQL vs SQL.

Model answer:

MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents (BSON internally) in collections, rather than as rows in fixed-schema tables. The key differences are: (1) MongoDB has a flexible schema -- documents in the same collection can have different fields, while SQL tables enforce a strict column structure. (2) MongoDB stores related data in nested documents and arrays within a single document, reducing the need for joins. SQL databases normalize data across multiple tables and use JOINs. (3) MongoDB scales horizontally via sharding (distributing data across servers), while SQL databases traditionally scale vertically (bigger server). (4) MongoDB uses its own query language with operators like $gt, $in, $regex, while SQL databases use the SQL standard. MongoDB excels at rapid prototyping, flexible data models, and high-throughput applications. SQL excels at complex transactions, strict data integrity, and heavy relational queries.

Q2: What is Mongoose and why would you use it instead of the native MongoDB driver?

Why interviewers ask: Tests whether you understand the value of an ODM layer.

Model answer:

Mongoose is an Object Document Mapper (ODM) for MongoDB and Node.js. The native MongoDB driver lets you insert and query documents with no structure enforcement -- any shape of data goes into any collection. Mongoose adds a structured layer on top: Schemas define the shape and types of documents. Validators enforce rules (required, min/max, enum, custom functions) before data hits the database. Type casting automatically converts values (e.g., string "25" to number 25). Middleware hooks (pre/post on save, validate, delete) let you run logic at lifecycle points -- the classic example is hashing passwords in pre('save'). populate() provides application-level joins by resolving ObjectId references into full documents. Virtuals provide computed properties that exist in memory but not in the database. The trade-off is a slight performance overhead and a learning curve. For most application development, the productivity gains far outweigh the overhead.

Q3: What is an ObjectId in MongoDB? What information does it contain?

Why interviewers ask: Tests understanding of MongoDB's primary key mechanism.

Model answer:

An ObjectId is a 12-byte unique identifier that MongoDB auto-generates for the _id field of every document. Represented as a 24-character hexadecimal string. Its structure: the first 4 bytes encode a Unix timestamp (seconds since epoch) indicating when the ID was created. The next 5 bytes are random values unique to the machine and process. The final 3 bytes are an incrementing counter starting from a random value. This structure means ObjectIds are roughly time-ordered -- sorting by _id sorts by creation time. You can extract the creation timestamp with id.getTimestamp(). ObjectIds are generated client-side (by the driver), not by the server, which means inserts do not require a round-trip to get an ID. The _id field is immutable and automatically indexed.

Q4: What does the required validator do in Mongoose, and how is unique different from a validator?

Why interviewers ask: Tests understanding of a common source of confusion.

Model answer:

The required validator is a true Mongoose validator that runs during the validation phase before a document is saved. If a required field is missing, empty, null, or undefined, Mongoose throws a ValidationError and the save is aborted. You can provide a custom message: required: [true, 'Email is required'].

unique is not a Mongoose validator -- it creates a MongoDB unique index on the field. The uniqueness check happens at the database level when the document is inserted or updated. If a duplicate is found, MongoDB throws an error with code: 11000, not a ValidationError. This means unique does not participate in Mongoose's validate() cycle, it is not reported by validationResult, and it requires the index to be built before it takes effect. You must catch duplicate key errors separately in your error-handling middleware.

Q5: Explain the difference between embedding and referencing in MongoDB. When would you choose each?

Why interviewers ask: Core schema design question for any MongoDB project.

Model answer:

Embedding (denormalization) means storing related data directly inside a document as a nested object or array. For example, a user document containing an address subdocument. Advantages: single-read access (no joins), atomic writes (single document), and simpler queries. Disadvantages: data duplication, document size growth, and the 16 MB limit.

Referencing (normalization) means storing a reference (ObjectId) that points to a document in another collection. For example, a post document with author: ObjectId("..."). Advantages: no data duplication, independent updates, and no size limit concerns. Disadvantages: requires multiple queries (or populate()) to assemble related data.

Decision rules: Embed for 1:1 and 1:few relationships where data is always accessed together and rarely changes independently. Reference for 1:many and many:many relationships where child data could grow unbounded or is frequently accessed independently. A common hybrid approach: embed a summary (recent reviews, denormalized author name) while referencing the full data in a separate collection.

Intermediate Level

Q6: What is populate() in Mongoose and how does it work internally?

Why interviewers ask: Tests understanding of application-level joins in MongoDB.

Model answer:

populate() is Mongoose's mechanism for resolving ObjectId references into full documents. When you define a field with type: Schema.Types.ObjectId, ref: 'ModelName', Mongoose knows which collection to query. Internally, populate() works in two steps: (1) Execute the primary query (e.g., Post.find()) and collect all the referenced ObjectIds. (2) Execute a second query against the referenced collection using $in with the collected IDs (e.g., Author.find({ _id: { $in: [...] } })). (3) Replace each ObjectId in the results with the corresponding full document.

This is an application-level join, not a database-level join -- MongoDB itself does not know about the relationship. You can control which fields are returned (.populate('author', 'name email')), apply conditions (.populate({ path: 'posts', match: { isPublished: true } })), nest populations for multi-level lookups, and use virtual populate to query children from a parent without storing an array of IDs.

The performance implication is at least one extra database query per populated path. For complex joins or analytics, MongoDB's $lookup aggregation stage performs the join at the database level in a single pipeline.

Q7: What is lean() in Mongoose and when should you use it?

Why interviewers ask: Tests performance awareness in Mongoose applications.

Model answer:

By default, every document returned by a Mongoose query is wrapped in a Mongoose Document object. This wrapper adds change tracking, virtuals, instance methods, save(), remove(), and other features. This overhead is measurable -- it uses more memory and CPU.

.lean() tells Mongoose to return plain JavaScript objects (POJOs) instead. No Mongoose wrapper, no change tracking, no methods. The result is significantly faster query execution and lower memory usage -- roughly 2-5x faster for read-heavy workloads depending on document size and count.

Use lean for: API responses (res.json()), template rendering, any read-only operation where you will not modify and save the document. Do not use lean when: you need to call .save(), use instance methods, or rely on virtuals (unless you configure lean with virtuals plugin). In a typical REST API, the majority of queries should use .lean().

Q8: How would you implement pagination in a Mongoose query? What are the performance implications?

Why interviewers ask: Tests practical API development skills.

Model answer:

The standard approach uses .skip() and .limit():

const page = parseInt(req.query.page) || 1;
const limit = parseInt(req.query.limit) || 10;
const skip = (page - 1) * limit;

const [data, total] = await Promise.all([
  Model.find(filter).sort(sort).skip(skip).limit(limit).lean(),
  Model.countDocuments(filter)
]);

Performance concern: skip() becomes expensive on large collections because MongoDB must scan and discard all skipped documents. Skipping 10,000 documents means MongoDB reads 10,000 documents just to throw them away.

Alternative: cursor-based pagination. Instead of page numbers, use the last document's _id or a timestamp field as a cursor. The next query starts from that point: Model.find({ _id: { $gt: lastId } }).limit(limit). This is O(limit) regardless of how deep into the dataset you are. Trade-off: no "jump to page 50" -- only "next page" navigation, which is fine for infinite scroll patterns.

Q9: What is the difference between findByIdAndUpdate() and using findById() + modify + save()?

Why interviewers ask: Tests understanding of Mongoose middleware and validation behavior.

Model answer:

findByIdAndUpdate() sends an update command directly to MongoDB. By default, it does not run schema validators (you must pass { runValidators: true }) and it does not trigger pre('save') middleware. This means password-hashing middleware, computed field updates, or any pre('save') logic is completely skipped. It returns the document before or after the update depending on { new: true }.

The findById() + modify + save() pattern first retrieves the document as a Mongoose Document, lets you modify it in memory, then calls .save(). This always runs validators and triggers all pre('save') and post('save') middleware. The cost is two round-trips to the database (one read, one write) instead of one atomic operation.

Choose findByIdAndUpdate() when: you do not have pre-save middleware that needs to run, and you want atomic, single-query updates. Choose findById() + save() when: you have middleware that must run (password hashing, audit logging), need conditional logic before saving, or want full validation.

Q10: Explain Mongoose middleware (hooks). What types exist and what are common use cases?

Why interviewers ask: Tests understanding of the Mongoose lifecycle and cross-cutting concerns.

Model answer:

Mongoose middleware are functions that execute at specific points in a document or query lifecycle. They come in pre (before) and post (after) variants across several hook types:

Document middleware: save, validate, remove, updateOne, deleteOne. These run on individual document instances. The most common is pre('save') for password hashing -- check this.isModified('password') to avoid re-hashing on every save.

Query middleware: find, findOne, findOneAndUpdate, findOneAndDelete, updateOne, updateMany, deleteOne, deleteMany. These run on query objects. Common use: pre('find') that adds { isDeleted: { $ne: true } } to implement soft deletes transparently.

Aggregate middleware: aggregate. Runs before aggregation pipelines execute.

Important gotchas: (1) Arrow functions do not bind this -- always use regular functions. (2) findByIdAndUpdate triggers findOneAndUpdate middleware, not save middleware. (3) Call next() to pass control to the next middleware (or the operation). (4) Pass an error to next(error) to abort the operation.

Advanced Level

Q11: How does MongoDB handle transactions? When would you need them with Mongoose?

Why interviewers ask: Tests understanding of data consistency in MongoDB.

Model answer:

MongoDB guarantees atomic operations at the single-document level -- any operation on one document either fully succeeds or fully fails. For operations spanning multiple documents or collections, MongoDB added multi-document transactions in version 4.0 (replica sets) and 4.2 (sharded clusters).

In Mongoose, you use transactions via sessions:

const session = await mongoose.startSession();
session.startTransaction();
try {
  await Account.findByIdAndUpdate(fromId, { $inc: { balance: -100 } }, { session });
  await Account.findByIdAndUpdate(toId, { $inc: { balance: 100 } }, { session });
  await session.commitTransaction();
} catch (error) {
  await session.abortTransaction();
  throw error;
} finally {
  session.endSession();
}

When needed: Money transfers between accounts, order placement (decrement inventory + create order), any operation where partial completion would leave data in an inconsistent state. When not needed: Most CRUD operations on single documents, and operations where eventual consistency is acceptable. Transactions add latency and lock contention, so design schemas to minimize their need -- embedding related data in a single document keeps operations atomic without transactions.

Q12: How would you design a MongoDB schema for a social media application?

Why interviewers ask: Tests real-world schema design skills combining all relationship patterns.

Model answer:

A social media app involves users, posts, comments, likes, follows, and messages. Key design decisions:

Users collection: Store profile info, authentication data, settings. Embed small, bounded data like address and preferences. Reference followers/following -- an array of ObjectIds works for bounded lists, but for users with millions of followers, use a separate Follow junction collection.

Posts collection: Each post references author: ObjectId('User'). Embed a few recent comments for quick display. Store likesCount (denormalized) on the post for fast reads; actual likes go in a Like junction collection ({ user, post } with unique compound index) to prevent double-likes.

Comments collection: Reference both post and author. Support nested comments with a parentComment self-reference. Add virtual populate on Post for comments.

Follows collection (junction): { follower: ObjectId, following: ObjectId, createdAt } with compound unique index. This handles the many-to-many relationship without unbounded arrays on User.

Feeds: The hardest problem. Fan-out-on-write: when a user posts, push the post ID into each follower's feed collection. Fan-out-on-read: when a user opens their feed, query posts from all users they follow. Hybrid approaches exist. This is where caching (Redis) becomes essential.

Indexes: { author: 1, createdAt: -1 } on posts, { post: 1 } on comments, { follower: 1 } and { following: 1 } on follows.

Q13: What is the aggregation pipeline in MongoDB and when would you use it instead of Mongoose queries?

Why interviewers ask: Tests knowledge of MongoDB's most powerful query mechanism.

Model answer:

The aggregation pipeline is MongoDB's framework for data processing. Documents pass through a sequence of stages, each transforming the data. Common stages: $match (filter), $group (aggregate values), $sort, $project (reshape), $lookup (join), $unwind (flatten arrays), $limit, $skip, $addFields (computed fields).

Use aggregation instead of Mongoose queries when you need: (1) Grouping and aggregation -- total sales by category, average rating by product, count by status. (2) Complex joins -- $lookup performs database-level joins more efficiently than multiple populate() calls. (3) Data transformation -- reshape documents, compute new fields, merge arrays. (4) Analytics and reporting -- time-series data, histograms, percentiles. (5) Performance -- a single pipeline can replace multiple Mongoose queries.

In Mongoose, use Model.aggregate([...stages...]). Note that aggregation returns raw documents (like lean()), not Mongoose Documents. Schema middleware and virtuals do not apply to aggregation results.

Q14: How do you handle data migration and schema evolution in MongoDB with Mongoose?

Why interviewers ask: Tests production-readiness and long-term thinking.

Model answer:

MongoDB's flexible schema means you can add new fields without downtime -- just update the Mongoose schema and set a default value for existing documents. However, renaming fields, changing types, or restructuring data requires migration.

Approach 1: Lazy migration. Update the schema, add a version field. When a document is read with the old structure, transform it in-memory and re-save. Over time, all documents migrate as they are accessed. Good for gradual changes.

Approach 2: Bulk migration script. Write a Node.js script that uses updateMany() with $set, $rename, or $unset to transform all documents at once. For complex transformations, iterate with cursor() and process each document. Run during a maintenance window.

Approach 3: Migration framework. Use a tool like migrate-mongo that tracks migration history (like SQL migration tools). Each migration is a file with up() and down() functions. This provides version control and rollback capability.

Best practices: Always add new fields with defaults. Never remove fields that old code depends on until all code is updated. Use the strict: true schema option to prevent unknown fields from being saved. Test migrations against a copy of production data before running on the real database.

Q15: Compare MongoDB's $lookup with Mongoose's populate(). When would you choose each?

Why interviewers ask: Tests depth of understanding of join mechanisms in MongoDB.

Model answer:

populate() is Mongoose's application-level join. It runs the primary query, collects referenced ObjectIds, then runs a second find() query with $in. It is simple to use, works with Mongoose features (schema, virtuals, middleware), and handles nested population elegantly. However, it always requires at least 2 queries (N+1 in the worst case), and complex filtering/sorting on the joined data is limited.

$lookup is MongoDB's native aggregation stage that performs a server-side left outer join. It runs entirely within the database in a single pipeline. It supports complex conditions via the pipeline sub-expression syntax (filter, sort, limit the joined documents within the $lookup). It returns raw documents (no Mongoose wrapper).

Choose populate when: Building standard CRUD API responses, the join is simple (one or two levels), you want Mongoose features on the result, and the dataset is moderate. Choose $lookup when: Building reports or analytics, the join involves filtering/sorting the related documents, you need maximum performance on large datasets, or the query involves more than 2-3 collection joins.

Quick-fire

#	Question	One-line
1	MongoDB stores data as	BSON documents in collections
2	ObjectId size	12 bytes (24 hex characters)
3	Max document size	16 MB
4	Mongoose adds over native driver	Schemas, validation, middleware, populate, virtuals
5	`required` vs `unique`	`required` is a validator; `unique` is a DB index
6	`populate()` query count	At least 2 (primary + one per populated path)
7	`lean()` returns	Plain JS objects (no Mongoose features)
8	`pre('save')` runs on...	`doc.save()` and `Model.create()`, NOT on `findByIdAndUpdate`
9	Embed when	1:1, 1:few, accessed together, bounded size
10	Reference when	1:many, many:many, independent access, unbounded growth
11	Duplicate key error code	11000
12	Virtual field stored in DB?	No -- computed in memory only

<- Back to 3.8 -- Database Basics: MongoDB (README)