Episode 3 — NodeJS MongoDB Backend Architecture / 3.8 — Database Basics MongoDB

3.8.a — Relational vs Non-Relational Databases

Databases are the backbone of every application. Understanding the difference between relational (SQL) and non-relational (NoSQL) databases is the first step toward choosing the right storage layer for your project.


< README | 3.8.b -- Introduction to MongoDB >


Table of Contents

  1. What Is a Database?
  2. Relational Databases (SQL)
  3. Non-Relational Databases (NoSQL)
  4. Types of NoSQL Databases
  5. SQL vs NoSQL Comparison
  6. When to Choose SQL vs NoSQL
  7. CAP Theorem Simplified
  8. ACID vs BASE Models
  9. Real-World Usage Examples
  10. Key Takeaways
  11. Explain-It Challenge

1. What Is a Database?

A database is an organized collection of structured data stored electronically. It provides mechanisms to create, read, update, and delete data efficiently.

Without a database, your data lives in memory and vanishes when the server restarts. Databases provide:

  • Persistence -- data survives restarts and crashes
  • Organization -- data is structured for efficient retrieval
  • Concurrency -- multiple users can access data simultaneously
  • Security -- access control and encryption
  • Integrity -- rules ensure data stays valid and consistent
Application (Node.js / Express)
        |
        |  <-- queries / commands -->
        |
    Database (MongoDB, PostgreSQL, MySQL, etc.)
        |
    Disk / Cloud Storage

Every application -- from a simple to-do list to a social media platform -- relies on one or more databases.


2. Relational Databases (SQL)

Relational databases store data in tables (rows and columns) with predefined schemas. They use SQL (Structured Query Language) to manage data.

Core Concepts

ConceptDescription
TableA named collection of rows sharing the same columns (like a spreadsheet)
Row (Record)A single entry in a table
Column (Field)A specific attribute with a defined data type
SchemaThe blueprint defining tables, columns, types, and constraints
Primary KeyA unique identifier for each row (e.g., id)
Foreign KeyA reference from one table to another table's primary key
JOINCombining data from multiple tables in a single query
IndexData structure that speeds up queries on specific columns

Example: Users Table

+----+----------+---------------------+-----+
| id | name     | email               | age |
+----+----------+---------------------+-----+
|  1 | Alice    | alice@example.com   |  25 |
|  2 | Bob      | bob@example.com     |  30 |
|  3 | Charlie  | charlie@example.com |  28 |
+----+----------+---------------------+-----+

SQL Query Examples

-- Create a table
CREATE TABLE users (
  id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL,
  age INT CHECK (age >= 0)
);

-- Insert a row
INSERT INTO users (name, email, age) VALUES ('Alice', 'alice@example.com', 25);

-- Query with filter
SELECT name, email FROM users WHERE age > 20 ORDER BY name;

-- JOIN two tables
SELECT users.name, orders.total
FROM users
JOIN orders ON users.id = orders.user_id
WHERE orders.total > 100;

Popular Relational Databases

DatabaseUse CaseLicense
PostgreSQLComplex queries, ACID compliance, extensionsOpen source
MySQLWeb applications, WordPress, read-heavy workloadsOpen source
SQLiteEmbedded databases, mobile apps, prototypingPublic domain
Microsoft SQL ServerEnterprise Windows environmentsCommercial
Oracle DBLarge enterprise systems, bankingCommercial

3. Non-Relational Databases (NoSQL)

Non-relational databases (NoSQL -- "Not Only SQL") store data in formats other than tables. They offer flexible schemas and are designed for horizontal scaling.

Core Concepts

ConceptDescription
DocumentA self-contained data unit, typically JSON/BSON (MongoDB)
CollectionA group of documents (like a table but schema-free)
Key-Value PairSimplest model: a key maps to a value (Redis)
Node / EdgeGraph databases store entities (nodes) and relationships (edges)
Column FamilyGroups of columns stored together (Cassandra)

Example: User Document (MongoDB)

{
  "_id": "ObjectId('64a1b2c3d4e5f6a7b8c9d0e1')",
  "name": "Alice",
  "email": "alice@example.com",
  "age": 25,
  "hobbies": ["reading", "hiking", "coding"],
  "address": {
    "city": "San Francisco",
    "state": "CA",
    "zip": "94102"
  }
}

Notice how a single document can hold nested objects and arrays -- something that would require multiple tables and JOINs in SQL.

Popular NoSQL Databases

DatabaseTypeUse Case
MongoDBDocumentGeneral-purpose, flexible schemas
RedisKey-ValueCaching, sessions, real-time data
DynamoDBKey-Value / DocumentAWS serverless, auto-scaling
CassandraColumn-FamilyTime-series, massive write throughput
Neo4jGraphSocial networks, recommendation engines
CouchDBDocumentOffline-first, sync-friendly apps

4. Types of NoSQL Databases

4.1 Document Databases

Store data as JSON-like documents. Each document can have a different structure.

// Document 1
{ "name": "Alice", "age": 25, "hobbies": ["reading"] }

// Document 2 (different fields -- totally valid)
{ "name": "Bob", "role": "admin", "department": "Engineering" }

Examples: MongoDB, CouchDB, Firestore

Best for: Content management, user profiles, catalogs, real-time apps

4.2 Key-Value Databases

The simplest model -- every piece of data is stored as a key mapped to a value.

KEY                VALUE
------------------------------------------
"user:1001"        '{"name":"Alice","age":25}'
"session:abc123"   '{"userId":"1001","expires":"2025-01-01"}'
"cache:homepage"   '<html>...</html>'

Examples: Redis, Memcached, DynamoDB

Best for: Caching, session management, leaderboards, real-time counters

4.3 Column-Family Databases

Data is stored in columns rather than rows. Optimized for reading and writing large amounts of data across many servers.

Row Key: "user:1001"
  Column Family "profile":   { name: "Alice", age: 25 }
  Column Family "activity":  { lastLogin: "2025-03-01", pageViews: 150 }

Examples: Apache Cassandra, HBase, ScyllaDB

Best for: Time-series data, IoT sensor data, analytics at scale

4.4 Graph Databases

Store data as nodes (entities) and edges (relationships). Optimized for traversing relationships.

(Alice) --[FRIENDS_WITH]--> (Bob)
(Alice) --[WORKS_AT]------> (Google)
(Bob)   --[FRIENDS_WITH]--> (Charlie)

Examples: Neo4j, Amazon Neptune, ArangoDB

Best for: Social networks, fraud detection, recommendation engines, knowledge graphs


5. SQL vs NoSQL Comparison

CriteriaSQL (Relational)NoSQL (Non-Relational)
Data ModelTables with rows and columnsDocuments, key-value, graph, column-family
SchemaFixed, predefined (rigid)Flexible, dynamic (schema-less)
Query LanguageSQL (standardized)Database-specific APIs
ScalingVertical (bigger server)Horizontal (more servers)
JOINsNative, powerfulLimited or manual
ACID ComplianceStrong by defaultVaries (eventual consistency common)
Data IntegrityStrict (constraints, foreign keys)Application-level enforcement
Best ForComplex queries, transactionsFlexible data, high throughput
Schema ChangesRequires migration scriptsAdd fields anytime
MaturityDecades of tooling and knowledgeRapidly evolving ecosystem
ExamplesPostgreSQL, MySQL, SQLiteMongoDB, Redis, Cassandra

6. When to Choose SQL vs NoSQL

Choose SQL When:

  • Your data has clear, fixed relationships (e.g., banking, e-commerce orders)
  • You need complex queries with multiple JOINs
  • Data integrity is critical (ACID transactions)
  • Your schema is well-defined and unlikely to change drastically
  • You need strong consistency guarantees

Choose NoSQL When:

  • Your data structure is evolving or unpredictable
  • You need horizontal scaling across many servers
  • You are working with large volumes of unstructured or semi-structured data
  • Read/write speed matters more than complex queries
  • You are building a prototype and need to iterate fast

Decision Flowchart

Is your data highly relational with many JOINs?
  |
  YES --> Do you need strict ACID transactions?
  |         |
  |         YES --> SQL (PostgreSQL / MySQL)
  |         NO  --> Consider SQL or Document DB
  |
  NO --> Is your data structure flexible or evolving?
          |
          YES --> Do you need fast key lookups?
          |        |
          |        YES --> Key-Value (Redis / DynamoDB)
          |        NO  --> Document DB (MongoDB)
          |
          NO --> Is your data about relationships/connections?
                   |
                   YES --> Graph DB (Neo4j)
                   NO  --> Column-Family (Cassandra) or Document DB

7. CAP Theorem Simplified

The CAP theorem states that a distributed database can provide only two out of three guarantees simultaneously:

LetterPropertyMeaning
CConsistencyEvery read returns the most recent write
AAvailabilityEvery request receives a response (no timeouts)
PPartition ToleranceSystem works despite network failures between nodes

Since network partitions will happen in distributed systems, you must choose between C and A:

        Consistency (C)
           /\
          /  \
         /    \
        / CP   \  CA
       /  zone  \ zone
      /          \
     /____________\
  Partition (P) --- Availability (A)
                AP zone
TypeDatabasesTrade-off
CPMongoDB, Redis, HBaseSacrifices availability during partitions
APCassandra, DynamoDB, CouchDBSacrifices consistency (eventual consistency)
CATraditional RDBMS (PostgreSQL, MySQL)Works well on a single node; partitions break them

In practice: MongoDB defaults to strong consistency for reads from the primary node, but offers tunable read preferences for availability.


8. ACID vs BASE Models

ACID (SQL Databases)

PropertyMeaningExample
AtomicityAll operations in a transaction succeed or all failTransfer $100: debit AND credit both happen or neither
ConsistencyDatabase moves from one valid state to anotherBalance cannot go negative if constraint exists
IsolationConcurrent transactions do not interfereTwo transfers at the same time produce correct results
DurabilityCommitted data survives crashesData written to disk, not just memory
-- ACID Example: Bank Transfer
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- If either UPDATE fails, both are rolled back

BASE (NoSQL Databases)

PropertyMeaning
Basically AvailableSystem guarantees availability (may return stale data)
Soft StateState may change over time even without input (due to replication)
Eventually ConsistentData will become consistent across nodes -- eventually
User writes to Node A --> Node A acknowledges immediately
                      --> Node A replicates to Node B (async)
                      --> Node B replicates to Node C (async)
                      --> After a short delay, all nodes agree

Comparison

PropertyACIDBASE
ConsistencyImmediateEventual
AvailabilityMay block during transactionsAlways available
PerformanceSlower (locking)Faster (no locking)
Use CaseBanking, inventorySocial feeds, analytics

9. Real-World Usage Examples

CompanySQLNoSQLWhy
InstagramPostgreSQLCassandra, RedisPostgreSQL for user data; Cassandra for feeds; Redis for caching
UberPostgreSQLCassandra, RedisPostgreSQL for trips; Cassandra for real-time location data
Netflix--Cassandra, DynamoDBMassive scale requires horizontal scaling across regions
AirbnbMySQLRedis, ElasticsearchMySQL for bookings; Redis for caching; Elasticsearch for search
Twitter/XMySQLRedis, Manhattan (custom)MySQL for tweets; Redis for timelines and caching
ShopifyMySQLRedis, MemcachedMySQL for orders/products; Redis for sessions and caching

Typical Modern Stack

Frontend (React / Next.js)
    |
    v
API Server (Node.js / Express)
    |
    +--> MongoDB       (primary database -- flexible user/product data)
    +--> Redis         (caching layer -- sessions, frequently accessed data)
    +--> Elasticsearch (search engine -- full-text search)
    +--> PostgreSQL    (analytics -- complex reporting queries)

Key insight: Most production applications use multiple databases (polyglot persistence), choosing the best tool for each specific need.


10. Key Takeaways

  • A database is an organized, persistent collection of data with CRUD capabilities.
  • Relational (SQL) databases use tables with fixed schemas and SQL for querying.
  • Non-relational (NoSQL) databases offer flexible schemas with various data models.
  • Four types of NoSQL: document, key-value, column-family, and graph.
  • SQL excels at complex queries, JOINs, and strict data integrity.
  • NoSQL excels at flexible schemas, horizontal scaling, and high throughput.
  • The CAP theorem means you trade off between consistency and availability in distributed systems.
  • ACID guarantees strict transactions; BASE trades consistency for availability.
  • Real-world applications often use multiple database types together.

11. Explain-It Challenge

Imagine you are explaining to a friend who has never coded before: What is the difference between a SQL and a NoSQL database? Use an analogy they would understand (e.g., a filing cabinet vs a storage bin, or a spreadsheet vs a folder of sticky notes). Explain why some companies use both types at the same time.


< README | 3.8.b -- Introduction to MongoDB >