Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB

3.8.d — MongoDB Data Types and Documents

MongoDB documents are rich, flexible data structures built on BSON. Understanding data types, the _id field, embedding patterns, and schema design is essential for building efficient, well-modeled databases.


< 3.8.c -- Setting Up MongoDB | 3.8.e -- Mongoose ODM >


Table of Contents

  1. BSON Data Types
  2. The _id Field and ObjectId
  3. Embedded Documents (Nested Objects)
  4. Arrays in Documents
  5. Embed vs Reference
  6. Document Size Limits and Field Naming
  7. Data Modeling Patterns
  8. Real-World Document Examples
  9. Key Takeaways
  10. Explain-It Challenge

1. BSON Data Types

BSON (Binary JSON) supports more data types than standard JSON. Here is the complete reference:

TypeBSON Type IDDescriptionExample
String2UTF-8 encoded text"Hello World"
Int321632-bit integerNumberInt(42)
Int641864-bit integerNumberLong(9007199254740993)
Double164-bit floating point (default for numbers)3.14
Decimal12819128-bit decimal (exact precision for finance)NumberDecimal("19.99")
Boolean8true or falsetrue
Date9UTC datetime (milliseconds since epoch)new Date()
ObjectId712-byte unique identifierObjectId("64a1b2c3d4e5f6a7b8c9d0e1")
Array4Ordered list of values["a", "b", "c"]
Object3Embedded document (nested object){ city: "NYC" }
Null10Null or missing valuenull
Binary5Binary data (files, images)BinData(0, "base64...")
Regex11Regular expression/^alice/i
Timestamp17Internal MongoDB timestamp (replication)Timestamp(1687000000, 1)
MinKey-1Lowest possible BSON valueMinKey()
MaxKey127Highest possible BSON valueMaxKey()

Using Data Types in the Shell

db.examples.insertOne({
  // String
  name: "Alice Johnson",

  // Numbers
  age: 25,                              // Double (default)
  score: NumberInt(100),                 // Int32
  bigNumber: NumberLong("9007199254740993"), // Int64
  price: NumberDecimal("19.99"),         // Decimal128 (exact)

  // Boolean
  isActive: true,

  // Date
  createdAt: new Date(),                // Current UTC date
  birthday: new Date("1999-06-15"),     // Specific date

  // ObjectId (usually auto-generated for _id)
  referenceId: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"),

  // Array
  tags: ["developer", "javascript", "mongodb"],

  // Embedded document (Object)
  address: {
    street: "123 Main St",
    city: "San Francisco",
    state: "CA",
    zip: "94102"
  },

  // Null
  middleName: null,

  // Regex
  emailPattern: /^[a-z]+@example\.com$/i,

  // Binary (uncommon in application code)
  // avatar: BinData(0, "iVBORw0KGgo...")
})

Number Type Gotcha

// In mongosh, numbers are Double by default
db.test.insertOne({ value: 42 })
// Stored as Double (64-bit float), NOT an integer

// To store as Int32:
db.test.insertOne({ value: NumberInt(42) })

// To store as Int64:
db.test.insertOne({ value: NumberLong(42) })

// For financial data, always use Decimal128:
db.products.insertOne({ price: NumberDecimal("29.99") })
// Avoids floating-point rounding errors like 0.1 + 0.2 !== 0.3

2. The _id Field and ObjectId

Every document in MongoDB must have a unique _id field. If you do not provide one, MongoDB generates an ObjectId automatically.

ObjectId Structure

An ObjectId is a 12-byte value, typically represented as a 24-character hexadecimal string:

ObjectId("64a1b2c3 d4e5f6 a7b8 c9d0e1")
           |         |      |     |
         4 bytes   3 bytes 2 bytes 3 bytes
        timestamp  random  random  counter
ComponentBytesDescription
Timestamp4Seconds since Unix epoch (when the ID was created)
Random value5Random bytes unique to the machine and process
Counter3Incrementing counter (starts from a random value)

Extracting the Timestamp

const id = ObjectId("64a1b2c3d4e5f6a7b8c9d0e1");

// Get the creation timestamp
id.getTimestamp()
// Output: ISODate("2023-07-02T12:34:27.000Z")

// This means you can sort by _id to sort by creation time!
db.users.find().sort({ _id: 1 }) // Oldest first
db.users.find().sort({ _id: -1 }) // Newest first

Custom _id Values

// You can use any unique value as _id
db.settings.insertOne({ _id: "app_config", theme: "dark", language: "en" })
db.counters.insertOne({ _id: "page_views", count: 0 })
db.users.insertOne({ _id: "user_alice_123", name: "Alice" })

// You can even use numbers
db.items.insertOne({ _id: 1, name: "First Item" })
db.items.insertOne({ _id: 2, name: "Second Item" })

Rules for _id

  • Every document must have an _id field
  • The _id value must be unique within a collection
  • The _id field is immutable -- you cannot change it after insertion
  • MongoDB creates an _id index automatically (you never need to create one)
  • If you do not provide _id, MongoDB generates an ObjectId

3. Embedded Documents (Nested Objects)

Embedded documents (also called subdocuments or nested objects) allow you to store related data within a single document.

// Instead of separate "users" and "addresses" tables (SQL approach),
// you embed the address directly:

db.users.insertOne({
  name: "Alice Johnson",
  email: "alice@example.com",

  // Embedded document
  address: {
    street: "123 Main St",
    city: "San Francisco",
    state: "CA",
    zip: "94102",
    country: "USA"
  },

  // Nested embedded documents
  employment: {
    company: "TechCorp",
    position: "Software Engineer",
    salary: {
      amount: NumberDecimal("120000.00"),
      currency: "USD",
      frequency: "annual"
    }
  }
})

Querying Embedded Documents

// Dot notation to query nested fields
db.users.find({ "address.city": "San Francisco" })

// Query deeply nested fields
db.users.find({ "employment.salary.amount": { $gt: NumberDecimal("100000") } })

// Update a nested field
db.users.updateOne(
  { name: "Alice Johnson" },
  { $set: { "address.zip": "94103" } }
)

When to Embed

  • Data is always accessed together (e.g., user + address)
  • The embedded data belongs to the parent (1:1 or 1:few)
  • The embedded data rarely changes independently
  • The combined document stays under 16 MB

4. Arrays in Documents

Arrays are one of MongoDB's most powerful features. A single document can hold lists of values, objects, or even other arrays.

Simple Arrays

db.users.insertOne({
  name: "Alice",
  hobbies: ["reading", "hiking", "coding"],
  scores: [95, 88, 92, 87, 91],
  tags: ["premium", "verified"]
})

Arrays of Embedded Documents

db.users.insertOne({
  name: "Bob",
  education: [
    {
      school: "MIT",
      degree: "B.S. Computer Science",
      year: 2018
    },
    {
      school: "Stanford",
      degree: "M.S. AI",
      year: 2020
    }
  ],
  orders: [
    { productId: ObjectId("..."), quantity: 2, total: 59.98 },
    { productId: ObjectId("..."), quantity: 1, total: 29.99 }
  ]
})

Querying Arrays

// Find documents where the array contains a specific value
db.users.find({ hobbies: "hiking" })

// Find documents where the array contains ALL specified values
db.users.find({ hobbies: { $all: ["reading", "coding"] } })

// Find by array element's nested field
db.users.find({ "education.school": "MIT" })

// Find by array size
db.users.find({ hobbies: { $size: 3 } })

// Find by array element position (0-indexed)
db.users.find({ "scores.0": { $gt: 90 } })  // First score > 90

// $elemMatch: match multiple conditions on the SAME array element
db.users.find({
  education: {
    $elemMatch: { school: "MIT", year: { $gt: 2015 } }
  }
})

Updating Arrays

// Add an element to the end
db.users.updateOne({ name: "Alice" }, { $push: { hobbies: "swimming" } })

// Add multiple elements
db.users.updateOne({ name: "Alice" }, {
  $push: { hobbies: { $each: ["yoga", "painting"] } }
})

// Remove a specific element
db.users.updateOne({ name: "Alice" }, { $pull: { hobbies: "hiking" } })

// Remove the last element
db.users.updateOne({ name: "Alice" }, { $pop: { hobbies: 1 } })

// Remove the first element
db.users.updateOne({ name: "Alice" }, { $pop: { hobbies: -1 } })

// Add only if not already present
db.users.updateOne({ name: "Alice" }, { $addToSet: { hobbies: "reading" } })
// Does nothing if "reading" already exists

5. Embed vs Reference

The most important schema design decision in MongoDB is whether to embed related data or reference it.

Embedding (Denormalization)

// The order document CONTAINS all the information it needs
{
  _id: ObjectId("..."),
  orderNumber: "ORD-001",
  customer: {
    name: "Alice",
    email: "alice@example.com",
    phone: "555-0123"
  },
  items: [
    { name: "Widget", price: 9.99, quantity: 2 },
    { name: "Gadget", price: 24.99, quantity: 1 }
  ],
  total: 44.97,
  status: "shipped"
}

Referencing (Normalization)

// users collection
{
  _id: ObjectId("aaa111"),
  name: "Alice",
  email: "alice@example.com"
}

// orders collection -- references the user by _id
{
  _id: ObjectId("bbb222"),
  orderNumber: "ORD-001",
  customerId: ObjectId("aaa111"),  // <-- reference
  items: [
    { productId: ObjectId("ccc333"), quantity: 2 },
    { productId: ObjectId("ddd444"), quantity: 1 }
  ],
  total: 44.97,
  status: "shipped"
}

Decision Guide

CriteriaEmbedReference
Data accessed together?Yes -- embedNo -- reference
Relationship type1:1, 1:few1:many, many:many
Data changes independently?Rarely -- embedFrequently -- reference
Document sizeStays under 16 MBCould grow unbounded
Data duplication OK?Yes (acceptable)No (avoid duplication)
Read performanceBetter (single read)Requires multiple reads
Write performanceRisk of large updatesSmaller, targeted updates
Consistency needsAtomic (single doc)May need transactions

Rule of Thumb

1:1   relationship  --> Embed (almost always)
1:Few relationship  --> Embed (usually)
1:Many relationship --> Reference (usually) or hybrid
Many:Many           --> Reference (always)

6. Document Size Limits and Field Naming

Maximum Document Size

MongoDB documents have a maximum size of 16 MB. This is generous for most use cases, but be aware of it when embedding large arrays.

16 MB can hold approximately:
  - ~16,000 short documents embedded in an array
  - ~8 million characters of text
  - A medium-sized image (but don't store images in documents!)

For files larger than 16 MB, use GridFS (MongoDB's file storage specification).

Field Naming Rules

RuleExampleValid?
Must be a stringnameYes
Cannot contain null character (\0)na\0meNo
Cannot start with $ (reserved for operators)$priceNo
Cannot contain . (reserved for dot notation)first.nameNo
_id is reserved for the primary key_idSpecial
Case-sensitiveName vs name are different fieldsYes
Can contain spaces (but avoid them)"first name"Technically yes, but avoid
Can be empty string (but avoid)""Technically yes, but avoid

Best Practices for Field Names

// GOOD: camelCase, descriptive, concise
{
  firstName: "Alice",
  lastName: "Johnson",
  emailAddress: "alice@example.com",
  createdAt: new Date(),
  isActive: true,
  orderCount: 5
}

// BAD: inconsistent, unclear, wasteful
{
  "first name": "Alice",      // Spaces make queries awkward
  "Last_Name": "Johnson",     // Inconsistent casing
  "e": "alice@example.com",   // Too abbreviated
  "date_of_the_creation": new Date(), // Too verbose
  "$active": true,             // Cannot start with $
  "is.active": true            // Cannot contain dots
}

Tip: Since field names are stored in every document (unlike SQL column headers), shorter names save storage in large collections. But readability matters more in most cases.


7. Data Modeling Patterns

7.1 One-to-One: Embedding

// User with a single profile (embed the profile)
{
  _id: ObjectId("..."),
  username: "alice_dev",
  email: "alice@example.com",
  profile: {                    // <-- embedded 1:1
    bio: "Full-stack developer",
    avatar: "https://example.com/alice.jpg",
    website: "https://alice.dev",
    location: "San Francisco, CA"
  }
}

7.2 One-to-Few: Embedding Array

// User with a few addresses (embed the array)
{
  _id: ObjectId("..."),
  name: "Alice",
  addresses: [                  // <-- embedded 1:few
    {
      label: "Home",
      street: "123 Main St",
      city: "San Francisco",
      state: "CA"
    },
    {
      label: "Work",
      street: "456 Market St",
      city: "San Francisco",
      state: "CA"
    }
  ]
}

7.3 One-to-Many: Referencing

// Author document
{
  _id: ObjectId("author_001"),
  name: "Alice",
  email: "alice@example.com"
}

// Many blog posts reference the author
{
  _id: ObjectId("post_001"),
  title: "Getting Started with MongoDB",
  content: "MongoDB is a document database...",
  authorId: ObjectId("author_001"),  // <-- reference
  createdAt: new Date()
}

{
  _id: ObjectId("post_002"),
  title: "Advanced Mongoose Patterns",
  content: "Mongoose provides powerful...",
  authorId: ObjectId("author_001"),  // <-- same reference
  createdAt: new Date()
}

7.4 Hybrid Pattern (Subset Embedding)

// Product document with recent reviews embedded, but full reviews referenced
{
  _id: ObjectId("prod_001"),
  name: "Wireless Mouse",
  price: NumberDecimal("29.99"),

  // Embed only the 3 most recent reviews (quick access)
  recentReviews: [
    { userId: ObjectId("..."), rating: 5, text: "Great!", date: new Date() },
    { userId: ObjectId("..."), rating: 4, text: "Good value", date: new Date() },
    { userId: ObjectId("..."), rating: 5, text: "Love it!", date: new Date() }
  ],

  reviewCount: 247,
  averageRating: 4.6
}

// Full reviews live in a separate collection
// reviews collection
{
  _id: ObjectId("rev_001"),
  productId: ObjectId("prod_001"),  // <-- reference
  userId: ObjectId("user_001"),
  rating: 5,
  text: "Great mouse! Very comfortable for long coding sessions.",
  date: new Date()
}

8. Real-World Document Examples

User Document

{
  _id: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"),
  username: "alice_dev",
  email: "alice@example.com",
  passwordHash: "$2b$10$xJ8K3...", // bcrypt hash, never plain text
  role: "user",
  profile: {
    firstName: "Alice",
    lastName: "Johnson",
    avatar: "https://cdn.example.com/avatars/alice.jpg",
    bio: "Full-stack developer | Open source contributor",
    socialLinks: {
      github: "https://github.com/alice",
      twitter: "https://twitter.com/alice_dev"
    }
  },
  preferences: {
    theme: "dark",
    language: "en",
    notifications: {
      email: true,
      push: false,
      sms: false
    }
  },
  isVerified: true,
  isActive: true,
  lastLoginAt: ISODate("2025-03-15T10:30:00Z"),
  createdAt: ISODate("2024-01-10T08:00:00Z"),
  updatedAt: ISODate("2025-03-15T10:30:00Z")
}

Product Document (E-commerce)

{
  _id: ObjectId("64b2c3d4e5f6a7b8c9d0e1f2"),
  name: "Wireless Bluetooth Headphones",
  slug: "wireless-bluetooth-headphones",
  description: "Premium noise-canceling headphones with 30-hour battery life.",
  brand: "AudioTech",
  category: "Electronics",
  subcategory: "Headphones",
  price: NumberDecimal("79.99"),
  compareAtPrice: NumberDecimal("99.99"),
  currency: "USD",
  sku: "AT-WBH-001",
  inventory: {
    quantity: 150,
    reserved: 12,
    warehouse: "WH-SF-01"
  },
  attributes: {
    color: "Matte Black",
    weight: "250g",
    connectivity: "Bluetooth 5.3",
    batteryLife: "30 hours",
    noiseCanceling: true
  },
  images: [
    { url: "https://cdn.example.com/products/headphones-1.jpg", alt: "Front view", isPrimary: true },
    { url: "https://cdn.example.com/products/headphones-2.jpg", alt: "Side view", isPrimary: false }
  ],
  tags: ["wireless", "bluetooth", "noise-canceling", "premium"],
  ratings: {
    average: 4.6,
    count: 247
  },
  isPublished: true,
  createdAt: ISODate("2024-06-15T12:00:00Z"),
  updatedAt: ISODate("2025-02-20T09:15:00Z")
}

Order Document

{
  _id: ObjectId("64c3d4e5f6a7b8c9d0e1f2a3"),
  orderNumber: "ORD-2025-00142",
  customerId: ObjectId("64a1b2c3d4e5f6a7b8c9d0e1"),  // ref -> users
  items: [
    {
      productId: ObjectId("64b2c3d4e5f6a7b8c9d0e1f2"), // ref -> products
      name: "Wireless Bluetooth Headphones",             // denormalized
      price: NumberDecimal("79.99"),
      quantity: 1,
      subtotal: NumberDecimal("79.99")
    },
    {
      productId: ObjectId("64b2c3d4e5f6a7b8c9d0e1f3"),
      name: "USB-C Cable",
      price: NumberDecimal("12.99"),
      quantity: 2,
      subtotal: NumberDecimal("25.98")
    }
  ],
  shipping: {
    method: "standard",
    cost: NumberDecimal("5.99"),
    address: {
      name: "Alice Johnson",
      street: "123 Main St",
      city: "San Francisco",
      state: "CA",
      zip: "94102",
      country: "US"
    },
    trackingNumber: "1Z999AA10123456784",
    carrier: "UPS"
  },
  payment: {
    method: "credit_card",
    last4: "4242",
    brand: "Visa",
    transactionId: "txn_1234567890"
  },
  subtotal: NumberDecimal("105.97"),
  tax: NumberDecimal("9.54"),
  shippingCost: NumberDecimal("5.99"),
  total: NumberDecimal("121.50"),
  status: "shipped",
  statusHistory: [
    { status: "pending",    date: ISODate("2025-03-10T14:00:00Z") },
    { status: "confirmed",  date: ISODate("2025-03-10T14:05:00Z") },
    { status: "processing", date: ISODate("2025-03-11T09:00:00Z") },
    { status: "shipped",    date: ISODate("2025-03-12T11:30:00Z") }
  ],
  createdAt: ISODate("2025-03-10T14:00:00Z"),
  updatedAt: ISODate("2025-03-12T11:30:00Z")
}

9. Key Takeaways

  • MongoDB uses BSON internally, supporting richer types than JSON (Date, ObjectId, Decimal128, etc.).
  • Every document has a unique _id field -- auto-generated as an ObjectId if not provided.
  • ObjectIds contain a timestamp, so sorting by _id sorts by creation time.
  • Embedded documents store related data in a single document for fast reads.
  • Arrays in documents can hold any data type, including nested objects.
  • The embed vs reference decision depends on relationship type, access patterns, and document size.
  • Maximum document size is 16 MB -- use GridFS for larger files.
  • Field names are case-sensitive and should follow camelCase convention.
  • Common patterns: embed for 1:1 and 1:few, reference for 1:many and many:many.
  • Use Decimal128 for financial data to avoid floating-point errors.

10. Explain-It Challenge

Design a MongoDB document schema for a recipe application. A recipe has a title, description, author, ingredients (each with name, quantity, and unit), step-by-step instructions, tags, ratings, and comments. Decide which data should be embedded and which should be referenced. Justify each decision. What would the document look like for a recipe with 5 ingredients, 8 steps, and 3 comments?


< 3.8.c -- Setting Up MongoDB | 3.8.e -- Mongoose ODM >