Episode 3 — NodeJS MongoDB Backend Architecture / 3.12 — Logging and Monitoring

3.12 — Interview Questions: Logging and Monitoring

These 10 questions cover the most commonly asked logging and monitoring topics in Node.js technical interviews — from fundamentals to production architecture.


< Exercise Questions | Quick Revision >


Quick-Fire Table

#QuestionLevelKey Topic
1Why is console.log() bad for production?BeginnerLogging fundamentals
2What are log levels and how do they work?BeginnerLog levels
3What is structured logging?BeginnerLog format
4Compare Winston, Pino, and MorganIntermediateLibraries
5How do you set up log rotation?IntermediateOperations
6What is a request ID and why is it important?IntermediateCorrelation
7How do you handle unhandled promise rejections?IntermediateError handling
8What is the ELK stack?IntermediateArchitecture
9Design a logging strategy for a production APIAdvancedSystem design
10How do you debug a production error using only logs?AdvancedDebugging

Beginner Level

Q1. Why is console.log() insufficient for production applications?

Model Answer:

console.log() has several critical limitations that make it unsuitable for production:

  1. No log levels — cannot distinguish errors from info messages, cannot filter by severity
  2. No timestamps — impossible to know when events occurred
  3. No structure — free-form text cannot be parsed or searched programmatically
  4. No persistence — output goes to stdout/stderr and is lost when the process restarts
  5. No context — does not include request ID, user ID, or service metadata
  6. No configurability — cannot change verbosity without modifying code
  7. No rotation — cannot manage output size or archive old logs

Instead, use a logging library like Winston or Pino that provides leveled, structured, persistent logs with configurable transports.


Q2. What are log levels and how do they work?

Model Answer:

Log levels categorize messages by severity, from most critical to least:

LevelSeverityUse
errorHighestSomething broke and needs immediate attention
warnHighPotential problem, not critical yet
infoMediumNormal operations worth recording
httpLowerHTTP request/response details
debugLowDetailed debugging information
sillyLowestExtremely verbose tracing

Setting a log level means "log this level and everything more severe." For example, setting level: "warn" logs error and warn messages, but suppresses info, debug, and below.

Best practice:

  • Production: warn or info — minimal noise, only important events
  • Development: debug — full detail for debugging
  • Testing: error or silent — reduce output clutter

Q3. What is structured logging and why is it important?

Model Answer:

Structured logging means writing log entries in a consistent, machine-parseable format — typically JSON:

{
  "level": "error",
  "message": "Payment failed",
  "timestamp": "2025-06-15T14:23:45.123Z",
  "userId": "user-456",
  "orderId": "order-789",
  "error": "Stripe timeout"
}

Compared to unstructured logging: "ERROR: Payment failed for user user-456 on order order-789 — Stripe timeout"

Structured logging is important because:

  1. Searchable — query by any field (find all errors for user-456)
  2. Parseable — log aggregation tools (ELK, Datadog, CloudWatch) can index and analyze fields
  3. Consistent — enforces a standard format across the team and services
  4. Alertable — set up alerts on specific field values (error rate, affected users)
  5. Analyzable — compute metrics like errors per hour, average response time

Intermediate Level

Q4. Compare Winston, Pino, and Morgan. When would you use each?

Model Answer:

FeatureWinstonPinoMorgan
TypeGeneral-purpose loggerGeneral-purpose loggerHTTP request logger
SpeedModerateFastest (~5x Winston)N/A (middleware)
FormatsJSON, simple, customJSON (pino-pretty for dev)Predefined strings
TransportsBuilt-in (file, console, HTTP)Separate processStream
Best forMost applicationsHigh-throughput APIsHTTP access logs

When to use each:

  • Winston: General-purpose logging for most Node.js applications. Largest ecosystem, flexible formats and transports.
  • Pino: When performance matters (APIs handling >10K req/s). Minimal overhead due to asynchronous design.
  • Morgan: Specifically for HTTP request logging in Express. Always use alongside (not instead of) Winston or Pino.

Common combination: Winston + Morgan — Morgan for HTTP access logs piped through Winston's transports.


Q5. How do you set up log rotation and why is it necessary?

Model Answer:

Log rotation creates new log files periodically and manages old ones to prevent disk space exhaustion. Without rotation, a single log file grows indefinitely and can fill the entire disk.

Using winston-daily-rotate-file:

const winston = require("winston");
require("winston-daily-rotate-file");

const transport = new winston.transports.DailyRotateFile({
  filename: "logs/app-%DATE%.log",
  datePattern: "YYYY-MM-DD",
  maxSize: "20m",        // Max 20MB per file
  maxFiles: "14d",       // Keep for 14 days
  zippedArchive: true,   // Compress old files
});

This creates files like app-2025-06-15.log, compresses them to .gz after the day ends, and deletes files older than 14 days.

Rotation strategies:

  • By date: New file each day (most common)
  • By size: New file when current exceeds a limit
  • Combined: Daily rotation with per-file size limits and age-based deletion

Q6. What is a request ID and why is it important?

Model Answer:

A request ID is a unique identifier (typically a UUID) assigned to each incoming HTTP request. It is included in every log entry generated during that request's lifecycle.

// Middleware to assign request ID
const { v4: uuidv4 } = require("uuid");
app.use((req, res, next) => {
  req.id = req.headers["x-request-id"] || uuidv4();
  res.setHeader("x-request-id", req.id);
  next();
});

Why it matters:

  1. Traceability: In a system handling thousands of concurrent requests, you can find all logs for a single request by searching for its ID
  2. Debugging: When a user reports an issue, the request ID (from the response header or error page) lets you trace the exact execution path
  3. Distributed tracing: In microservices, the request ID propagates across services, enabling end-to-end tracing
  4. Correlation: Ties together HTTP logs, application logs, database queries, and external API calls for one request

Q7. How do you handle unhandled promise rejections in Node.js?

Model Answer:

Unhandled promise rejections occur when a Promise is rejected but no .catch() or try/catch handles the error. In modern Node.js, they can crash the process.

process.on("unhandledRejection", (reason, promise) => {
  logger.error("Unhandled Promise Rejection", {
    reason: reason instanceof Error ? reason.message : String(reason),
    stack: reason instanceof Error ? reason.stack : undefined,
  });
  // Optionally exit (recommended in production)
  // process.exit(1);
});

process.on("uncaughtException", (err) => {
  logger.error("Uncaught Exception", {
    error: err.message,
    stack: err.stack,
  });
  // MUST exit — process state is unreliable
  process.exit(1);
});

Key distinction:

  • Unhandled rejection: An un-caught rejected Promise. The process may continue, but it is in an unknown state.
  • Uncaught exception: A thrown error with no handler. The process MUST exit because its state is unreliable.

Both should be logged with full context and ideally sent to an external monitoring service (Sentry) before exit.


Q8. What is the ELK stack?

Model Answer:

ELK stands for Elasticsearch, Logstash, and Kibana — a suite for centralized log management:

ComponentRole
ElasticsearchStores and indexes log data for fast full-text search
LogstashIngests logs, transforms/parses them, and forwards to Elasticsearch
KibanaWeb UI for searching, filtering, and visualizing logs
FilebeatLightweight agent that ships log files to Logstash/Elasticsearch

Workflow:

  1. Application writes structured JSON logs to files (via Winston/Pino)
  2. Filebeat monitors log files and ships entries to Logstash/Elasticsearch
  3. Elasticsearch indexes the logs for fast searching
  4. Kibana provides dashboards, search, and alerting

Cloud alternatives: AWS CloudWatch Logs, GCP Cloud Logging, Datadog, Better Stack (Logtail).


Advanced Level

Q9. Design a complete logging strategy for a production API handling 50,000 requests per minute.

Model Answer:

Library choice: Pino (for performance at this scale) + Morgan (HTTP access logs piped through Pino).

Log levels:

  • Production: info (captures operations without debug noise)
  • Error alerting threshold: immediate notification for error level

Configuration:

  1. Structured JSON output for all logs
  2. Daily log rotation with 14-day retention, compressed archives
  3. Request ID middleware for correlation
  4. Child loggers per request with automatic context (requestId, userId)

Transports:

  • Local: Daily rotating files (combined + error-only)
  • External: Ship to centralized logging (ELK/Datadog/CloudWatch) via Filebeat or direct HTTP transport

Error handling:

  • Global Express error middleware with full context logging
  • unhandledRejection and uncaughtException handlers
  • Sentry integration for real-time error alerting and tracking

What to log:

  • All HTTP requests (method, URL, status, response time)
  • Authentication events (login, logout, failed attempts)
  • Business events (orders, payments, critical operations)
  • Errors with full stack traces and request context
  • External API calls (duration, status, errors)

What NOT to log:

  • Passwords, tokens, credit card numbers, PII
  • Health check endpoints
  • Static asset requests
  • Debug-level details (too noisy at this scale)

Alerting:

  • Error rate exceeds 5% of requests
  • Response time p99 exceeds 2000ms
  • Unhandled rejection or uncaught exception occurs
  • External service failure rate exceeds threshold

Q10. A user reports that their order failed 3 hours ago. Walk through how you would debug this using only logs.

Model Answer:

Step 1: Get identifiers

  • Ask the user for their user ID, email, or order ID
  • If available, get the request ID from the error response or browser network tab

Step 2: Search error logs

Search: level="error" AND userId="user-456" AND time > 3h ago
  • Look for the specific error message and stack trace
  • Identify the exact timestamp and request ID

Step 3: Trace the full request

Search: requestId="req-abc-123"
  • Follow the request from entry to failure
  • See what steps succeeded before the error

Step 4: Identify root cause

  • Read the error message and stack trace
  • Check if it is an operational error (expected) or programming error (bug)
  • Look for patterns (did this error happen to other users too?)

Step 5: Check context

  • Was the database available? (check for connection errors around that time)
  • Was an external service down? (check API call logs)
  • Was there a deployment around that time? (check deployment logs)

Step 6: Verify and fix

  • Reproduce in staging if possible
  • Deploy fix with additional logging if root cause is unclear
  • Set up an alert to detect this error class in the future

This process is only possible with structured logging, request ID correlation, and proper error context in the logs.