Episode 3 — NodeJS MongoDB Backend Architecture / 3.7 — Handling Files with Express

3.7.d — File Validation and Security

In one sentence: Accepting user-uploaded files is one of the most dangerous things a web server can do — you must validate file types by MIME and magic bytes, enforce size limits, sanitize filenames, store files outside the web root, and assume every upload is potentially malicious until proven otherwise.


Table of Contents


1. Why File Upload Security Matters

┌─────────────────────────────────────────────────────────────────┐
│               WHAT CAN GO WRONG WITH FILE UPLOADS               │
│                                                                 │
│  1. Remote Code Execution (RCE)                                 │
│     Attacker uploads a .php/.js/.sh file                        │
│     Server executes it → full system compromise                 │
│                                                                 │
│  2. Denial of Service (DoS)                                     │
│     Attacker uploads a 10 GB file → server runs out of          │
│     disk space or memory                                        │
│                                                                 │
│  3. Path Traversal                                              │
│     Filename contains ../../etc/passwd → file saved             │
│     outside intended directory                                  │
│                                                                 │
│  4. Cross-Site Scripting (XSS)                                  │
│     Attacker uploads an HTML/SVG file with embedded JavaScript  │
│     Browser renders it → script executes in victim's session    │
│                                                                 │
│  5. Malware Distribution                                        │
│     Your server becomes a host for viruses/ransomware           │
│                                                                 │
│  6. Storage Abuse                                               │
│     Automated uploads fill your disk → service outage           │
└─────────────────────────────────────────────────────────────────┘

Rule: Never trust anything from the client — not the filename, not the MIME type, not the file extension, not the file content. Validate everything on the server.


2. File Type Validation with fileFilter

Multer's fileFilter function runs before the file is stored. It receives req, file, and a callback:

const fileFilter = (req, file, cb) => {
  // file.mimetype is reported by the browser
  // file.originalname is the original filename

  // Accept: cb(null, true)
  // Reject: cb(null, false)       — silently skip the file
  // Error:  cb(new Error('...'))  — stop with an error

  if (file.mimetype === 'image/jpeg' || file.mimetype === 'image/png') {
    cb(null, true);   // accept
  } else {
    cb(new Error('Only JPEG and PNG images are allowed'), false);  // reject with error
  }
};

const upload = multer({
  storage: multer.diskStorage({ /* ... */ }),
  fileFilter: fileFilter,
  limits: { fileSize: 5 * 1024 * 1024 }
});

Silent rejection vs error

// Silent rejection — file is skipped, no error thrown
// req.file will be undefined
cb(null, false);

// Rejection with error — Multer stops and passes error to Express error handler
cb(new Error('Invalid file type'), false);

Recommendation: Always throw an error on rejection. Silent rejection confuses users — they think the upload succeeded but no file was saved.


3. MIME Type vs File Extension

Check MethodHow It WorksReliability
File extensionCheck .jpg, .png in filenameLow — trivially spoofed
MIME type (from client)Check file.mimetypeMedium — browser-reported, can be spoofed
Magic bytesRead first bytes of file contentHigh — based on actual file data

Extension check (weakest)

const path = require('path');

const fileFilter = (req, file, cb) => {
  const ext = path.extname(file.originalname).toLowerCase();
  const allowed = ['.jpg', '.jpeg', '.png', '.gif', '.webp'];

  if (allowed.includes(ext)) {
    cb(null, true);
  } else {
    cb(new Error(`Extension ${ext} is not allowed`), false);
  }
};

Problem: An attacker can rename malware.exe to malware.jpg.

MIME type check (better)

const fileFilter = (req, file, cb) => {
  const allowed = ['image/jpeg', 'image/png', 'image/gif', 'image/webp'];

  if (allowed.includes(file.mimetype)) {
    cb(null, true);
  } else {
    cb(new Error(`MIME type ${file.mimetype} is not allowed`), false);
  }
};

Problem: The MIME type is set by the browser based on the extension. A crafted request can send any MIME type.

Combined check (recommended minimum)

const fileFilter = (req, file, cb) => {
  const allowedMimes = ['image/jpeg', 'image/png', 'image/gif', 'image/webp'];
  const allowedExts = ['.jpg', '.jpeg', '.png', '.gif', '.webp'];
  const ext = path.extname(file.originalname).toLowerCase();

  if (allowedMimes.includes(file.mimetype) && allowedExts.includes(ext)) {
    cb(null, true);
  } else {
    cb(new Error('Only JPEG, PNG, GIF, and WebP images are allowed'), false);
  }
};

4. Magic Bytes — The Most Reliable Check

Every file format starts with specific bytes called magic bytes (or file signatures). These cannot be easily spoofed without breaking the file:

FormatMagic Bytes (hex)Magic Bytes (readable)
JPEGFF D8 FF(binary)
PNG89 50 4E 47.PNG
GIF47 49 46 38GIF8
PDF25 50 44 46%PDF
ZIP50 4B 03 04PK..
WebP52 49 46 46RIFF (then WEBP at offset 8)

Checking magic bytes with memory storage

const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 5 * 1024 * 1024 }
});

app.post('/api/upload', upload.single('image'), (req, res) => {
  const buffer = req.file.buffer;

  // Check magic bytes
  const isJPEG = buffer[0] === 0xFF && buffer[1] === 0xD8 && buffer[2] === 0xFF;
  const isPNG = buffer[0] === 0x89 && buffer[1] === 0x50 &&
                buffer[2] === 0x4E && buffer[3] === 0x47;
  const isGIF = buffer[0] === 0x47 && buffer[1] === 0x49 &&
                buffer[2] === 0x46 && buffer[3] === 0x38;

  if (!isJPEG && !isPNG && !isGIF) {
    return res.status(400).json({ error: 'Invalid image file' });
  }

  // File is genuinely an image — safe to proceed
  res.json({ message: 'Valid image uploaded', type: req.file.mimetype });
});

Using the file-type package (recommended)

npm install file-type
// file-type v19+ is ESM-only. For CommonJS, use v16:
// npm install file-type@16

const FileType = require('file-type');

app.post('/api/upload', upload.single('image'), async (req, res) => {
  // Detect actual file type from buffer content
  const type = await FileType.fromBuffer(req.file.buffer);

  if (!type || !type.mime.startsWith('image/')) {
    return res.status(400).json({
      error: 'File content does not match an allowed image format',
      detected: type ? type.mime : 'unknown'
    });
  }

  console.log('Detected type:', type.mime);  // 'image/jpeg'
  console.log('Detected ext:', type.ext);    // 'jpg'

  res.json({ message: 'Valid image', detectedType: type.mime });
});

Best practice: Use file-type for production applications. It checks magic bytes for 300+ file formats and is maintained by the sindresorhus ecosystem.


5. Allowed File Type Patterns

Image uploads

const imageFilter = (req, file, cb) => {
  const allowed = ['image/jpeg', 'image/png', 'image/gif', 'image/webp', 'image/svg+xml'];
  if (allowed.includes(file.mimetype)) cb(null, true);
  else cb(new Error('Only image files (JPEG, PNG, GIF, WebP, SVG) are allowed'), false);
};

Warning: SVG files can contain embedded JavaScript. If you accept SVGs, never serve them with Content-Type: image/svg+xml without sanitization.

Document uploads

const documentFilter = (req, file, cb) => {
  const allowed = [
    'application/pdf',
    'application/msword',
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
    'application/vnd.ms-excel',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    'text/plain',
    'text/csv'
  ];
  if (allowed.includes(file.mimetype)) cb(null, true);
  else cb(new Error('Only PDF, Word, Excel, TXT, and CSV files are allowed'), false);
};

Mixed uploads (images + documents)

const mixedFilter = (req, file, cb) => {
  const imageTypes = ['image/jpeg', 'image/png', 'image/webp'];
  const docTypes = ['application/pdf'];
  const allowed = [...imageTypes, ...docTypes];

  if (allowed.includes(file.mimetype)) cb(null, true);
  else cb(new Error('Only images and PDFs are allowed'), false);
};

6. File Size Limits

Setting limits in Multer

const upload = multer({
  storage: multer.diskStorage({ /* ... */ }),
  limits: {
    fileSize: 5 * 1024 * 1024,  // 5 MB per file
    files: 10                    // max 10 files per request
  }
});

Common size limits by use case

Use CaseRecommended LimitWhy
Profile avatar2 MBSmall image, quick to upload
Gallery photo10 MBHigh-res photos
Document (PDF/DOCX)15 MBMost documents under 10 MB
Video upload100-500 MBDepends on your infrastructure
General attachment25 MBCommon email attachment limit

Handling size limit errors

app.post('/api/upload', upload.single('file'), (req, res) => {
  res.json({ message: 'Upload successful' });
});

// Error handler catches LIMIT_FILE_SIZE
app.use((err, req, res, next) => {
  if (err instanceof multer.MulterError) {
    switch (err.code) {
      case 'LIMIT_FILE_SIZE':
        return res.status(413).json({
          error: 'File too large',
          maxSize: '5 MB',
          suggestion: 'Please compress the file and try again'
        });
      case 'LIMIT_FILE_COUNT':
        return res.status(400).json({
          error: 'Too many files',
          maxFiles: 10
        });
      case 'LIMIT_UNEXPECTED_FILE':
        return res.status(400).json({
          error: 'Unexpected file field name'
        });
      default:
        return res.status(400).json({ error: err.message });
    }
  }
  next(err);
});

Nginx/reverse proxy limits

If you use Nginx, set client_max_body_size to match your Multer limit:

# /etc/nginx/nginx.conf or site config
server {
    client_max_body_size 10M;  # Must be >= Multer's fileSize limit
}

Without this, Nginx returns 413 Request Entity Too Large before Express even sees the request.


7. Preventing Malicious Uploads

Dangerous file types to always block

const DANGEROUS_EXTENSIONS = [
  '.exe', '.bat', '.cmd', '.sh', '.bash',  // Executables
  '.php', '.jsp', '.asp', '.aspx',          // Server-side scripts
  '.js', '.mjs', '.cjs',                     // JavaScript (if not needed)
  '.py', '.rb', '.pl',                       // Scripting languages
  '.dll', '.so', '.dylib',                   // Libraries
  '.html', '.htm', '.svg',                   // Can contain scripts
  '.com', '.msi', '.scr',                    // Windows executables
];

const fileFilter = (req, file, cb) => {
  const ext = path.extname(file.originalname).toLowerCase();

  if (DANGEROUS_EXTENSIONS.includes(ext)) {
    return cb(new Error(`File type ${ext} is not allowed for security reasons`), false);
  }

  // Additional MIME type check
  const allowedMimes = ['image/jpeg', 'image/png', 'application/pdf'];
  if (!allowedMimes.includes(file.mimetype)) {
    return cb(new Error('File type not allowed'), false);
  }

  cb(null, true);
};

Double extension attack

Attackers may use filenames like photo.jpg.php or report.pdf.exe:

filename: (req, file, cb) => {
  // Strip all but the last extension
  const ext = path.extname(file.originalname).toLowerCase();

  // Reject double extensions
  const nameWithoutExt = path.basename(file.originalname, ext);
  if (path.extname(nameWithoutExt)) {
    return cb(new Error('Double extensions are not allowed'));
  }

  const safeName = crypto.randomBytes(16).toString('hex') + ext;
  cb(null, safeName);
}

Null byte injection

Older systems may interpret photo.jpg%00.php as photo.jpg in validation but photo.jpg.php when saving:

// Always sanitize null bytes
const safeName = file.originalname.replace(/\0/g, '');

8. Sanitizing Filenames

User-provided filenames can contain special characters, path traversal sequences, or Unicode that causes problems:

function sanitizeFilename(filename) {
  // 1. Remove path components (prevent directory traversal)
  let safe = path.basename(filename);

  // 2. Remove null bytes
  safe = safe.replace(/\0/g, '');

  // 3. Replace path separators
  safe = safe.replace(/[/\\]/g, '');

  // 4. Remove special characters (keep only alphanumeric, dash, underscore, dot)
  safe = safe.replace(/[^a-zA-Z0-9._-]/g, '_');

  // 5. Prevent hidden files (starting with dot)
  if (safe.startsWith('.')) safe = '_' + safe;

  // 6. Limit length
  if (safe.length > 200) {
    const ext = path.extname(safe);
    safe = safe.substring(0, 200 - ext.length) + ext;
  }

  // 7. Ensure it is not empty
  if (!safe || safe === '') safe = 'unnamed_file';

  return safe;
}

// Usage in diskStorage filename function
filename: (req, file, cb) => {
  const sanitized = sanitizeFilename(file.originalname);
  cb(null, Date.now() + '-' + sanitized);
}

Or just ignore the original name entirely

// Safest approach: generate entirely new names
filename: (req, file, cb) => {
  const ext = path.extname(file.originalname).toLowerCase();
  const allowed = ['.jpg', '.jpeg', '.png', '.gif', '.webp', '.pdf'];

  if (!allowed.includes(ext)) {
    return cb(new Error('Invalid file extension'));
  }

  cb(null, crypto.randomBytes(16).toString('hex') + ext);
}

Best practice: Generate random filenames and never use user-provided names for storage. Store the original name in your database if you need to display it later.


9. Storing Files Outside the Web Root

BAD — uploads inside the public directory:
┌─────────────────────┐
│  project/            │
│  ├── public/         │  ← express.static('public')
│  │   ├── index.html  │
│  │   └── uploads/    │  ← DANGEROUS: anyone can access any upload
│  │       └── evil.html │  ← Browser renders this! XSS risk!
│  └── server.js       │
└─────────────────────┘

GOOD — uploads outside the public directory:
┌─────────────────────┐
│  project/            │
│  ├── public/         │  ← express.static('public')
│  │   └── index.html  │
│  ├── uploads/        │  ← NOT served by express.static
│  │   └── files here  │
│  └── server.js       │
└─────────────────────┘

Serving uploads through a controlled route

const path = require('path');
const fs = require('fs');

// DO NOT serve uploads directory directly with express.static
// Instead, use a controlled route:

app.get('/api/files/:filename', (req, res) => {
  const filename = path.basename(req.params.filename); // prevent traversal
  const filepath = path.join(__dirname, 'uploads', filename);

  // Check file exists
  if (!fs.existsSync(filepath)) {
    return res.status(404).json({ error: 'File not found' });
  }

  // Force download (prevents browser from rendering HTML/SVG)
  res.setHeader('Content-Disposition', `attachment; filename="${filename}"`);

  // Set Content-Type explicitly
  res.setHeader('Content-Type', 'application/octet-stream');

  // Stream the file
  const stream = fs.createReadStream(filepath);
  stream.pipe(res);
});

Using Content-Disposition to prevent execution

// Force download — browser will NOT render the file
res.setHeader('Content-Disposition', 'attachment; filename="document.pdf"');

// Allow inline display (only for trusted file types like images)
res.setHeader('Content-Disposition', 'inline');

10. Virus Scanning Concepts

For production applications handling sensitive uploads (healthcare, finance, enterprise), consider virus scanning:

Using ClamAV (open source)

# Install ClamAV on Ubuntu
sudo apt-get install clamav clamav-daemon

# Update virus definitions
sudo freshclam

# Scan a file
clamscan /path/to/uploaded/file.jpg

Node.js integration with clamscan

npm install clamscan
const NodeClam = require('clamscan');

async function initScanner() {
  const clamscan = await new NodeClam().init({
    clamdscan: {
      socket: '/var/run/clamav/clamd.ctl',
      host: '127.0.0.1',
      port: 3310
    }
  });
  return clamscan;
}

app.post('/api/upload', upload.single('file'), async (req, res) => {
  try {
    const scanner = await initScanner();

    // Scan the uploaded file
    const { isInfected, viruses } = await scanner.isInfected(req.file.path);

    if (isInfected) {
      // Delete the infected file immediately
      fs.unlinkSync(req.file.path);
      return res.status(400).json({
        error: 'File is infected',
        viruses: viruses
      });
    }

    res.json({ message: 'File is clean and uploaded successfully' });
  } catch (error) {
    res.status(500).json({ error: 'Virus scan failed' });
  }
});

Note: ClamAV adds latency (100ms-2s per scan). For high-throughput systems, scan files asynchronously in a background queue.


11. Best Practices Checklist

┌─────────────────────────────────────────────────────────────────┐
│              FILE UPLOAD SECURITY CHECKLIST                      │
│                                                                 │
│  Validation                                                     │
│  [  ] Set fileFilter to whitelist allowed MIME types             │
│  [  ] Check file extension AND MIME type (both must match)       │
│  [  ] Verify magic bytes for high-security uploads               │
│  [  ] Block dangerous extensions (.exe, .php, .sh, .html)        │
│  [  ] Reject double extensions (photo.jpg.php)                   │
│                                                                 │
│  Size & Limits                                                   │
│  [  ] Set fileSize limit in Multer (e.g., 5 MB)                 │
│  [  ] Set files limit (max files per request)                    │
│  [  ] Match Nginx/reverse proxy body size limit                  │
│  [  ] Rate-limit upload endpoints                                │
│                                                                 │
│  Storage                                                         │
│  [  ] Store uploads OUTSIDE the web root                         │
│  [  ] Generate random filenames (never use original name)        │
│  [  ] Sanitize any user-provided strings                         │
│  [  ] Set restrictive file permissions (0644 or 0640)            │
│                                                                 │
│  Serving                                                         │
│  [  ] Serve files through a controlled route, not express.static │
│  [  ] Set Content-Disposition: attachment for non-image files    │
│  [  ] Set X-Content-Type-Options: nosniff header                 │
│  [  ] Set Content-Security-Policy on upload-serving routes       │
│                                                                 │
│  Infrastructure                                                  │
│  [  ] Consider virus scanning for sensitive applications         │
│  [  ] Monitor disk usage and set alerts                          │
│  [  ] Implement cleanup for orphaned/expired uploads             │
│  [  ] Log all upload attempts (filename, size, user, IP)         │
│  [  ] Use cloud storage for production (S3, Cloudinary)          │
└─────────────────────────────────────────────────────────────────┘

12. Key Takeaways

  1. Never trust the client — filename, MIME type, and extension can all be spoofed.
  2. Validate at multiple levels — extension check + MIME check + magic bytes is the gold standard.
  3. Use fileFilter to reject files before they are stored (saves disk/memory).
  4. Set limits.fileSize always — an uncapped upload is a denial-of-service vector.
  5. Generate random filenames — never save files with user-provided names.
  6. Store uploads outside the web root — prevent direct access and browser execution.
  7. Block dangerous file types — executables, scripts, HTML, and SVG need special handling.
  8. Use Content-Disposition: attachment when serving files to prevent browser rendering.
  9. Consider virus scanning for healthcare, finance, and enterprise applications.
  10. Use file-type package to detect actual file content type from magic bytes.

Explain-It Challenge

Can you explain to a friend: "What are the three layers of file type validation and why do you need all three?" If you can walk through extension, MIME, and magic bytes with examples of how each can be bypassed alone, you have mastered this topic.


Next → 3.7.e — Working with Express Static