Episode 3 — NodeJS MongoDB Backend Architecture / 3.12 — Logging and Monitoring

3.12 — Interview Questions: Logging and Monitoring

These 10 questions cover the most commonly asked logging and monitoring topics in Node.js technical interviews — from fundamentals to production architecture.

< Exercise Questions | Quick Revision >

Quick-Fire Table

#	Question	Level	Key Topic
1	Why is console.log() bad for production?	Beginner	Logging fundamentals
2	What are log levels and how do they work?	Beginner	Log levels
3	What is structured logging?	Beginner	Log format
4	Compare Winston, Pino, and Morgan	Intermediate	Libraries
5	How do you set up log rotation?	Intermediate	Operations
6	What is a request ID and why is it important?	Intermediate	Correlation
7	How do you handle unhandled promise rejections?	Intermediate	Error handling
8	What is the ELK stack?	Intermediate	Architecture
9	Design a logging strategy for a production API	Advanced	System design
10	How do you debug a production error using only logs?	Advanced	Debugging

Beginner Level

Q1. Why is console.log() insufficient for production applications?

Model Answer:

console.log() has several critical limitations that make it unsuitable for production:

No log levels — cannot distinguish errors from info messages, cannot filter by severity
No timestamps — impossible to know when events occurred
No structure — free-form text cannot be parsed or searched programmatically
No persistence — output goes to stdout/stderr and is lost when the process restarts
No context — does not include request ID, user ID, or service metadata
No configurability — cannot change verbosity without modifying code
No rotation — cannot manage output size or archive old logs

Instead, use a logging library like Winston or Pino that provides leveled, structured, persistent logs with configurable transports.

Q2. What are log levels and how do they work?

Model Answer:

Log levels categorize messages by severity, from most critical to least:

Level	Severity	Use
`error`	Highest	Something broke and needs immediate attention
`warn`	High	Potential problem, not critical yet
`info`	Medium	Normal operations worth recording
`http`	Lower	HTTP request/response details
`debug`	Low	Detailed debugging information
`silly`	Lowest	Extremely verbose tracing

Setting a log level means "log this level and everything more severe." For example, setting level: "warn" logs error and warn messages, but suppresses info, debug, and below.

Best practice:

Production: warn or info — minimal noise, only important events
Development: debug — full detail for debugging
Testing: error or silent — reduce output clutter

Q3. What is structured logging and why is it important?

Model Answer:

Structured logging means writing log entries in a consistent, machine-parseable format — typically JSON:

{
  "level": "error",
  "message": "Payment failed",
  "timestamp": "2025-06-15T14:23:45.123Z",
  "userId": "user-456",
  "orderId": "order-789",
  "error": "Stripe timeout"
}

Compared to unstructured logging: "ERROR: Payment failed for user user-456 on order order-789 — Stripe timeout"

Structured logging is important because:

Searchable — query by any field (find all errors for user-456)
Parseable — log aggregation tools (ELK, Datadog, CloudWatch) can index and analyze fields
Consistent — enforces a standard format across the team and services
Alertable — set up alerts on specific field values (error rate, affected users)
Analyzable — compute metrics like errors per hour, average response time

Intermediate Level

Q4. Compare Winston, Pino, and Morgan. When would you use each?

Model Answer:

Feature	Winston	Pino	Morgan
Type	General-purpose logger	General-purpose logger	HTTP request logger
Speed	Moderate	Fastest (~5x Winston)	N/A (middleware)
Formats	JSON, simple, custom	JSON (pino-pretty for dev)	Predefined strings
Transports	Built-in (file, console, HTTP)	Separate process	Stream
Best for	Most applications	High-throughput APIs	HTTP access logs

When to use each:

Winston: General-purpose logging for most Node.js applications. Largest ecosystem, flexible formats and transports.
Pino: When performance matters (APIs handling >10K req/s). Minimal overhead due to asynchronous design.
Morgan: Specifically for HTTP request logging in Express. Always use alongside (not instead of) Winston or Pino.

Common combination: Winston + Morgan — Morgan for HTTP access logs piped through Winston's transports.

Q5. How do you set up log rotation and why is it necessary?

Model Answer:

Log rotation creates new log files periodically and manages old ones to prevent disk space exhaustion. Without rotation, a single log file grows indefinitely and can fill the entire disk.

Using winston-daily-rotate-file:

const winston = require("winston");
require("winston-daily-rotate-file");

const transport = new winston.transports.DailyRotateFile({
  filename: "logs/app-%DATE%.log",
  datePattern: "YYYY-MM-DD",
  maxSize: "20m",        // Max 20MB per file
  maxFiles: "14d",       // Keep for 14 days
  zippedArchive: true,   // Compress old files
});

This creates files like app-2025-06-15.log, compresses them to .gz after the day ends, and deletes files older than 14 days.

Rotation strategies:

By date: New file each day (most common)
By size: New file when current exceeds a limit
Combined: Daily rotation with per-file size limits and age-based deletion

Q6. What is a request ID and why is it important?

Model Answer:

A request ID is a unique identifier (typically a UUID) assigned to each incoming HTTP request. It is included in every log entry generated during that request's lifecycle.

// Middleware to assign request ID
const { v4: uuidv4 } = require("uuid");
app.use((req, res, next) => {
  req.id = req.headers["x-request-id"] || uuidv4();
  res.setHeader("x-request-id", req.id);
  next();
});

Why it matters:

Traceability: In a system handling thousands of concurrent requests, you can find all logs for a single request by searching for its ID
Debugging: When a user reports an issue, the request ID (from the response header or error page) lets you trace the exact execution path
Distributed tracing: In microservices, the request ID propagates across services, enabling end-to-end tracing
Correlation: Ties together HTTP logs, application logs, database queries, and external API calls for one request

Q7. How do you handle unhandled promise rejections in Node.js?

Model Answer:

Unhandled promise rejections occur when a Promise is rejected but no .catch() or try/catch handles the error. In modern Node.js, they can crash the process.

process.on("unhandledRejection", (reason, promise) => {
  logger.error("Unhandled Promise Rejection", {
    reason: reason instanceof Error ? reason.message : String(reason),
    stack: reason instanceof Error ? reason.stack : undefined,
  });
  // Optionally exit (recommended in production)
  // process.exit(1);
});

process.on("uncaughtException", (err) => {
  logger.error("Uncaught Exception", {
    error: err.message,
    stack: err.stack,
  });
  // MUST exit — process state is unreliable
  process.exit(1);
});

Key distinction:

Unhandled rejection: An un-caught rejected Promise. The process may continue, but it is in an unknown state.
Uncaught exception: A thrown error with no handler. The process MUST exit because its state is unreliable.

Both should be logged with full context and ideally sent to an external monitoring service (Sentry) before exit.

Q8. What is the ELK stack?

Model Answer:

ELK stands for Elasticsearch, Logstash, and Kibana — a suite for centralized log management:

Component	Role
Elasticsearch	Stores and indexes log data for fast full-text search
Logstash	Ingests logs, transforms/parses them, and forwards to Elasticsearch
Kibana	Web UI for searching, filtering, and visualizing logs
Filebeat	Lightweight agent that ships log files to Logstash/Elasticsearch

Workflow:

Application writes structured JSON logs to files (via Winston/Pino)
Filebeat monitors log files and ships entries to Logstash/Elasticsearch
Elasticsearch indexes the logs for fast searching
Kibana provides dashboards, search, and alerting

Cloud alternatives: AWS CloudWatch Logs, GCP Cloud Logging, Datadog, Better Stack (Logtail).

Advanced Level

Q9. Design a complete logging strategy for a production API handling 50,000 requests per minute.

Model Answer:

Library choice: Pino (for performance at this scale) + Morgan (HTTP access logs piped through Pino).

Log levels:

Production: info (captures operations without debug noise)
Error alerting threshold: immediate notification for error level

Configuration:

Structured JSON output for all logs
Daily log rotation with 14-day retention, compressed archives
Request ID middleware for correlation
Child loggers per request with automatic context (requestId, userId)

Transports:

Local: Daily rotating files (combined + error-only)
External: Ship to centralized logging (ELK/Datadog/CloudWatch) via Filebeat or direct HTTP transport

Error handling:

Global Express error middleware with full context logging
unhandledRejection and uncaughtException handlers
Sentry integration for real-time error alerting and tracking

What to log:

All HTTP requests (method, URL, status, response time)
Authentication events (login, logout, failed attempts)
Business events (orders, payments, critical operations)
Errors with full stack traces and request context
External API calls (duration, status, errors)

What NOT to log:

Passwords, tokens, credit card numbers, PII
Health check endpoints
Static asset requests
Debug-level details (too noisy at this scale)

Alerting:

Error rate exceeds 5% of requests
Response time p99 exceeds 2000ms
Unhandled rejection or uncaught exception occurs
External service failure rate exceeds threshold

Q10. A user reports that their order failed 3 hours ago. Walk through how you would debug this using only logs.

Model Answer:

Step 1: Get identifiers

Ask the user for their user ID, email, or order ID
If available, get the request ID from the error response or browser network tab

Step 2: Search error logs

Search: level="error" AND userId="user-456" AND time > 3h ago

Look for the specific error message and stack trace
Identify the exact timestamp and request ID

Step 3: Trace the full request

Search: requestId="req-abc-123"

Follow the request from entry to failure
See what steps succeeded before the error

Step 4: Identify root cause

Read the error message and stack trace
Check if it is an operational error (expected) or programming error (bug)
Look for patterns (did this error happen to other users too?)

Step 5: Check context

Was the database available? (check for connection errors around that time)
Was an external service down? (check API call logs)
Was there a deployment around that time? (check deployment logs)

Step 6: Verify and fix

Reproduce in staging if possible
Deploy fix with additional logging if root cause is unclear
Set up an alert to detect this error class in the future

This process is only possible with structured logging, request ID correlation, and proper error context in the logs.