Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB
3.8.d — MongoDB Data Types and Documents
MongoDB documents are rich, flexible data structures built on BSON. Understanding data types, the _id field, embedding patterns, and schema design is essential for building efficient, well-modeled databases.
< 3.8.c -- Setting Up MongoDB | 3.8.e -- Mongoose ODM >
Table of Contents
- BSON Data Types
- The _id Field and ObjectId
- Embedded Documents (Nested Objects)
- Arrays in Documents
- Embed vs Reference
- Document Size Limits and Field Naming
- Data Modeling Patterns
- Real-World Document Examples
- Key Takeaways
- Explain-It Challenge
1. BSON Data Types
BSON (Binary JSON) supports more data types than standard JSON. Here is the complete reference:
| Type | BSON Type ID | Description | Example |
|---|---|---|---|
| String | 2 | UTF-8 encoded text | "Hello World" |
| Int32 | 16 | 32-bit integer | NumberInt(42) |
| Int64 | 18 | 64-bit integer | NumberLong(9007199254740993) |
| Double | 1 | 64-bit floating point (default for numbers) | 3.14 |
| Decimal128 | 19 | 128-bit decimal (exact precision for finance) | NumberDecimal("19.99") |
| Boolean | 8 | true or false | true |
| Date | 9 | UTC datetime (milliseconds since epoch) | new Date() |
| ObjectId | 7 | 12-byte unique identifier | ObjectId("64a1b2c3d4e5f6a7b8c9d0e1") |
| Array | 4 | Ordered list of values | ["a", "b", "c"] |
| Object | 3 | Embedded document (nested object) | { city: "NYC" } |
| Null | 10 | Null or missing value | null |
| Binary | 5 | Binary data (files, images) | BinData(0, "base64...") |
| Regex | 11 | Regular expression | /^alice/i |
| Timestamp | 17 | Internal MongoDB timestamp (replication) | Timestamp(1687000000, 1) |
| MinKey | -1 | Lowest possible BSON value | MinKey() |
| MaxKey | 127 | Highest possible BSON value | MaxKey() |
Using Data Types in the Shell
db.examples.insertOne({
// String
name: "Alice Johnson",
// Numbers
age: 25, // Double (default)
score: NumberInt(100), // Int32
bigNumber: NumberLong("9007199254740993"), // Int64
price: NumberDecimal("19.99"), // Decimal128 (exact)
// Boolean
isActive: true,
// Date
createdAt: new Date(), // Current UTC date
birthday: new Date("1999-06-15"), // Specific date
// ObjectId (usually auto-generated for _id)
referenceId: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"),
// Array
tags: ["developer", "javascript", "mongodb"],
// Embedded document (Object)
address: {
street: "123 Main St",
city: "San Francisco",
state: "CA",
zip: "94102"
},
// Null
middleName: null,
// Regex
emailPattern: /^[a-z]+@example\.com$/i,
// Binary (uncommon in application code)
// avatar: BinData(0, "iVBORw0KGgo...")
})
Number Type Gotcha
// In mongosh, numbers are Double by default
db.test.insertOne({ value: 42 })
// Stored as Double (64-bit float), NOT an integer
// To store as Int32:
db.test.insertOne({ value: NumberInt(42) })
// To store as Int64:
db.test.insertOne({ value: NumberLong(42) })
// For financial data, always use Decimal128:
db.products.insertOne({ price: NumberDecimal("29.99") })
// Avoids floating-point rounding errors like 0.1 + 0.2 !== 0.3
2. The _id Field and ObjectId
Every document in MongoDB must have a unique _id field. If you do not provide one, MongoDB generates an ObjectId automatically.
ObjectId Structure
An ObjectId is a 12-byte value, typically represented as a 24-character hexadecimal string:
ObjectId("64a1b2c3 d4e5f6 a7b8 c9d0e1")
| | | |
4 bytes 3 bytes 2 bytes 3 bytes
timestamp random random counter
| Component | Bytes | Description |
|---|---|---|
| Timestamp | 4 | Seconds since Unix epoch (when the ID was created) |
| Random value | 5 | Random bytes unique to the machine and process |
| Counter | 3 | Incrementing counter (starts from a random value) |
Extracting the Timestamp
const id = ObjectId("64a1b2c3d4e5f6a7b8c9d0e1");
// Get the creation timestamp
id.getTimestamp()
// Output: ISODate("2023-07-02T12:34:27.000Z")
// This means you can sort by _id to sort by creation time!
db.users.find().sort({ _id: 1 }) // Oldest first
db.users.find().sort({ _id: -1 }) // Newest first
Custom _id Values
// You can use any unique value as _id
db.settings.insertOne({ _id: "app_config", theme: "dark", language: "en" })
db.counters.insertOne({ _id: "page_views", count: 0 })
db.users.insertOne({ _id: "user_alice_123", name: "Alice" })
// You can even use numbers
db.items.insertOne({ _id: 1, name: "First Item" })
db.items.insertOne({ _id: 2, name: "Second Item" })
Rules for _id
- Every document must have an
_idfield - The
_idvalue must be unique within a collection - The
_idfield is immutable -- you cannot change it after insertion - MongoDB creates an
_idindex automatically (you never need to create one) - If you do not provide
_id, MongoDB generates an ObjectId
3. Embedded Documents (Nested Objects)
Embedded documents (also called subdocuments or nested objects) allow you to store related data within a single document.
// Instead of separate "users" and "addresses" tables (SQL approach),
// you embed the address directly:
db.users.insertOne({
name: "Alice Johnson",
email: "alice@example.com",
// Embedded document
address: {
street: "123 Main St",
city: "San Francisco",
state: "CA",
zip: "94102",
country: "USA"
},
// Nested embedded documents
employment: {
company: "TechCorp",
position: "Software Engineer",
salary: {
amount: NumberDecimal("120000.00"),
currency: "USD",
frequency: "annual"
}
}
})
Querying Embedded Documents
// Dot notation to query nested fields
db.users.find({ "address.city": "San Francisco" })
// Query deeply nested fields
db.users.find({ "employment.salary.amount": { $gt: NumberDecimal("100000") } })
// Update a nested field
db.users.updateOne(
{ name: "Alice Johnson" },
{ $set: { "address.zip": "94103" } }
)
When to Embed
- Data is always accessed together (e.g., user + address)
- The embedded data belongs to the parent (1:1 or 1:few)
- The embedded data rarely changes independently
- The combined document stays under 16 MB
4. Arrays in Documents
Arrays are one of MongoDB's most powerful features. A single document can hold lists of values, objects, or even other arrays.
Simple Arrays
db.users.insertOne({
name: "Alice",
hobbies: ["reading", "hiking", "coding"],
scores: [95, 88, 92, 87, 91],
tags: ["premium", "verified"]
})
Arrays of Embedded Documents
db.users.insertOne({
name: "Bob",
education: [
{
school: "MIT",
degree: "B.S. Computer Science",
year: 2018
},
{
school: "Stanford",
degree: "M.S. AI",
year: 2020
}
],
orders: [
{ productId: ObjectId("..."), quantity: 2, total: 59.98 },
{ productId: ObjectId("..."), quantity: 1, total: 29.99 }
]
})
Querying Arrays
// Find documents where the array contains a specific value
db.users.find({ hobbies: "hiking" })
// Find documents where the array contains ALL specified values
db.users.find({ hobbies: { $all: ["reading", "coding"] } })
// Find by array element's nested field
db.users.find({ "education.school": "MIT" })
// Find by array size
db.users.find({ hobbies: { $size: 3 } })
// Find by array element position (0-indexed)
db.users.find({ "scores.0": { $gt: 90 } }) // First score > 90
// $elemMatch: match multiple conditions on the SAME array element
db.users.find({
education: {
$elemMatch: { school: "MIT", year: { $gt: 2015 } }
}
})
Updating Arrays
// Add an element to the end
db.users.updateOne({ name: "Alice" }, { $push: { hobbies: "swimming" } })
// Add multiple elements
db.users.updateOne({ name: "Alice" }, {
$push: { hobbies: { $each: ["yoga", "painting"] } }
})
// Remove a specific element
db.users.updateOne({ name: "Alice" }, { $pull: { hobbies: "hiking" } })
// Remove the last element
db.users.updateOne({ name: "Alice" }, { $pop: { hobbies: 1 } })
// Remove the first element
db.users.updateOne({ name: "Alice" }, { $pop: { hobbies: -1 } })
// Add only if not already present
db.users.updateOne({ name: "Alice" }, { $addToSet: { hobbies: "reading" } })
// Does nothing if "reading" already exists
5. Embed vs Reference
The most important schema design decision in MongoDB is whether to embed related data or reference it.
Embedding (Denormalization)
// The order document CONTAINS all the information it needs
{
_id: ObjectId("..."),
orderNumber: "ORD-001",
customer: {
name: "Alice",
email: "alice@example.com",
phone: "555-0123"
},
items: [
{ name: "Widget", price: 9.99, quantity: 2 },
{ name: "Gadget", price: 24.99, quantity: 1 }
],
total: 44.97,
status: "shipped"
}
Referencing (Normalization)
// users collection
{
_id: ObjectId("aaa111"),
name: "Alice",
email: "alice@example.com"
}
// orders collection -- references the user by _id
{
_id: ObjectId("bbb222"),
orderNumber: "ORD-001",
customerId: ObjectId("aaa111"), // <-- reference
items: [
{ productId: ObjectId("ccc333"), quantity: 2 },
{ productId: ObjectId("ddd444"), quantity: 1 }
],
total: 44.97,
status: "shipped"
}
Decision Guide
| Criteria | Embed | Reference |
|---|---|---|
| Data accessed together? | Yes -- embed | No -- reference |
| Relationship type | 1:1, 1:few | 1:many, many:many |
| Data changes independently? | Rarely -- embed | Frequently -- reference |
| Document size | Stays under 16 MB | Could grow unbounded |
| Data duplication OK? | Yes (acceptable) | No (avoid duplication) |
| Read performance | Better (single read) | Requires multiple reads |
| Write performance | Risk of large updates | Smaller, targeted updates |
| Consistency needs | Atomic (single doc) | May need transactions |
Rule of Thumb
1:1 relationship --> Embed (almost always)
1:Few relationship --> Embed (usually)
1:Many relationship --> Reference (usually) or hybrid
Many:Many --> Reference (always)
6. Document Size Limits and Field Naming
Maximum Document Size
MongoDB documents have a maximum size of 16 MB. This is generous for most use cases, but be aware of it when embedding large arrays.
16 MB can hold approximately:
- ~16,000 short documents embedded in an array
- ~8 million characters of text
- A medium-sized image (but don't store images in documents!)
For files larger than 16 MB, use GridFS (MongoDB's file storage specification).
Field Naming Rules
| Rule | Example | Valid? |
|---|---|---|
| Must be a string | name | Yes |
Cannot contain null character (\0) | na\0me | No |
Cannot start with $ (reserved for operators) | $price | No |
Cannot contain . (reserved for dot notation) | first.name | No |
_id is reserved for the primary key | _id | Special |
| Case-sensitive | Name vs name are different fields | Yes |
| Can contain spaces (but avoid them) | "first name" | Technically yes, but avoid |
| Can be empty string (but avoid) | "" | Technically yes, but avoid |
Best Practices for Field Names
// GOOD: camelCase, descriptive, concise
{
firstName: "Alice",
lastName: "Johnson",
emailAddress: "alice@example.com",
createdAt: new Date(),
isActive: true,
orderCount: 5
}
// BAD: inconsistent, unclear, wasteful
{
"first name": "Alice", // Spaces make queries awkward
"Last_Name": "Johnson", // Inconsistent casing
"e": "alice@example.com", // Too abbreviated
"date_of_the_creation": new Date(), // Too verbose
"$active": true, // Cannot start with $
"is.active": true // Cannot contain dots
}
Tip: Since field names are stored in every document (unlike SQL column headers), shorter names save storage in large collections. But readability matters more in most cases.
7. Data Modeling Patterns
7.1 One-to-One: Embedding
// User with a single profile (embed the profile)
{
_id: ObjectId("..."),
username: "alice_dev",
email: "alice@example.com",
profile: { // <-- embedded 1:1
bio: "Full-stack developer",
avatar: "https://example.com/alice.jpg",
website: "https://alice.dev",
location: "San Francisco, CA"
}
}
7.2 One-to-Few: Embedding Array
// User with a few addresses (embed the array)
{
_id: ObjectId("..."),
name: "Alice",
addresses: [ // <-- embedded 1:few
{
label: "Home",
street: "123 Main St",
city: "San Francisco",
state: "CA"
},
{
label: "Work",
street: "456 Market St",
city: "San Francisco",
state: "CA"
}
]
}
7.3 One-to-Many: Referencing
// Author document
{
_id: ObjectId("author_001"),
name: "Alice",
email: "alice@example.com"
}
// Many blog posts reference the author
{
_id: ObjectId("post_001"),
title: "Getting Started with MongoDB",
content: "MongoDB is a document database...",
authorId: ObjectId("author_001"), // <-- reference
createdAt: new Date()
}
{
_id: ObjectId("post_002"),
title: "Advanced Mongoose Patterns",
content: "Mongoose provides powerful...",
authorId: ObjectId("author_001"), // <-- same reference
createdAt: new Date()
}
7.4 Hybrid Pattern (Subset Embedding)
// Product document with recent reviews embedded, but full reviews referenced
{
_id: ObjectId("prod_001"),
name: "Wireless Mouse",
price: NumberDecimal("29.99"),
// Embed only the 3 most recent reviews (quick access)
recentReviews: [
{ userId: ObjectId("..."), rating: 5, text: "Great!", date: new Date() },
{ userId: ObjectId("..."), rating: 4, text: "Good value", date: new Date() },
{ userId: ObjectId("..."), rating: 5, text: "Love it!", date: new Date() }
],
reviewCount: 247,
averageRating: 4.6
}
// Full reviews live in a separate collection
// reviews collection
{
_id: ObjectId("rev_001"),
productId: ObjectId("prod_001"), // <-- reference
userId: ObjectId("user_001"),
rating: 5,
text: "Great mouse! Very comfortable for long coding sessions.",
date: new Date()
}
8. Real-World Document Examples
User Document
{
_id: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"),
username: "alice_dev",
email: "alice@example.com",
passwordHash: "$2b$10$xJ8K3...", // bcrypt hash, never plain text
role: "user",
profile: {
firstName: "Alice",
lastName: "Johnson",
avatar: "https://cdn.example.com/avatars/alice.jpg",
bio: "Full-stack developer | Open source contributor",
socialLinks: {
github: "https://github.com/alice",
twitter: "https://twitter.com/alice_dev"
}
},
preferences: {
theme: "dark",
language: "en",
notifications: {
email: true,
push: false,
sms: false
}
},
isVerified: true,
isActive: true,
lastLoginAt: ISODate("2025-03-15T10:30:00Z"),
createdAt: ISODate("2024-01-10T08:00:00Z"),
updatedAt: ISODate("2025-03-15T10:30:00Z")
}
Product Document (E-commerce)
{
_id: ObjectId("64b2c3d4e5f6a7b8c9d0e1f2"),
name: "Wireless Bluetooth Headphones",
slug: "wireless-bluetooth-headphones",
description: "Premium noise-canceling headphones with 30-hour battery life.",
brand: "AudioTech",
category: "Electronics",
subcategory: "Headphones",
price: NumberDecimal("79.99"),
compareAtPrice: NumberDecimal("99.99"),
currency: "USD",
sku: "AT-WBH-001",
inventory: {
quantity: 150,
reserved: 12,
warehouse: "WH-SF-01"
},
attributes: {
color: "Matte Black",
weight: "250g",
connectivity: "Bluetooth 5.3",
batteryLife: "30 hours",
noiseCanceling: true
},
images: [
{ url: "https://cdn.example.com/products/headphones-1.jpg", alt: "Front view", isPrimary: true },
{ url: "https://cdn.example.com/products/headphones-2.jpg", alt: "Side view", isPrimary: false }
],
tags: ["wireless", "bluetooth", "noise-canceling", "premium"],
ratings: {
average: 4.6,
count: 247
},
isPublished: true,
createdAt: ISODate("2024-06-15T12:00:00Z"),
updatedAt: ISODate("2025-02-20T09:15:00Z")
}
Order Document
{
_id: ObjectId("64c3d4e5f6a7b8c9d0e1f2a3"),
orderNumber: "ORD-2025-00142",
customerId: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"), // ref -> users
items: [
{
productId: ObjectId("64b2c3d4e5f6a7b8c9d0e1f2"), // ref -> products
name: "Wireless Bluetooth Headphones", // denormalized
price: NumberDecimal("79.99"),
quantity: 1,
subtotal: NumberDecimal("79.99")
},
{
productId: ObjectId("64b2c3d4e5f6a7b8c9d0e1f3"),
name: "USB-C Cable",
price: NumberDecimal("12.99"),
quantity: 2,
subtotal: NumberDecimal("25.98")
}
],
shipping: {
method: "standard",
cost: NumberDecimal("5.99"),
address: {
name: "Alice Johnson",
street: "123 Main St",
city: "San Francisco",
state: "CA",
zip: "94102",
country: "US"
},
trackingNumber: "1Z999AA10123456784",
carrier: "UPS"
},
payment: {
method: "credit_card",
last4: "4242",
brand: "Visa",
transactionId: "txn_1234567890"
},
subtotal: NumberDecimal("105.97"),
tax: NumberDecimal("9.54"),
shippingCost: NumberDecimal("5.99"),
total: NumberDecimal("121.50"),
status: "shipped",
statusHistory: [
{ status: "pending", date: ISODate("2025-03-10T14:00:00Z") },
{ status: "confirmed", date: ISODate("2025-03-10T14:05:00Z") },
{ status: "processing", date: ISODate("2025-03-11T09:00:00Z") },
{ status: "shipped", date: ISODate("2025-03-12T11:30:00Z") }
],
createdAt: ISODate("2025-03-10T14:00:00Z"),
updatedAt: ISODate("2025-03-12T11:30:00Z")
}
9. Key Takeaways
- MongoDB uses BSON internally, supporting richer types than JSON (Date, ObjectId, Decimal128, etc.).
- Every document has a unique
_idfield -- auto-generated as an ObjectId if not provided. - ObjectIds contain a timestamp, so sorting by
_idsorts by creation time. - Embedded documents store related data in a single document for fast reads.
- Arrays in documents can hold any data type, including nested objects.
- The embed vs reference decision depends on relationship type, access patterns, and document size.
- Maximum document size is 16 MB -- use GridFS for larger files.
- Field names are case-sensitive and should follow camelCase convention.
- Common patterns: embed for 1:1 and 1:few, reference for 1:many and many:many.
- Use Decimal128 for financial data to avoid floating-point errors.
10. Explain-It Challenge
Design a MongoDB document schema for a recipe application. A recipe has a title, description, author, ingredients (each with name, quantity, and unit), step-by-step instructions, tags, ratings, and comments. Decide which data should be embedded and which should be referenced. Justify each decision. What would the document look like for a recipe with 5 ingredients, 8 steps, and 3 comments?