Episode 6 — Scaling Reliability Microservices Web3 / 6.8 — Production Hardening
6.8.a — Rate Limiting
In one sentence: Rate limiting controls how many requests a client can make in a time window — it is the single most important defense against API abuse, runaway clients, and brute-force attacks in production.
Navigation: ← 6.8 Overview · 6.8.b — CORS and Secure Headers →
1. Why Rate Limiting Is Essential
Without rate limiting, a single bad actor (or a buggy client) can:
- Exhaust server resources — CPU, memory, database connections
- Drive up cloud costs — every request costs money (Lambda, API Gateway, DB reads)
- Brute-force authentication — try millions of username/password combos
- Scrape your data — download your entire database through the API
- Degrade service for everyone — one client starving others of capacity
Rate limiting is not optional. Every production API must have it.
WITHOUT rate limiting: WITH rate limiting:
Attacker → 10,000 req/s Attacker → 10,000 req/s
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ Server │ ← overwhelmed │ Rate Limiter │
│ (DOWN) │ │ 429 Too Many │───→ 9,900 rejected
└──────────┘ └──────┬───────┘
│ 100 allowed
▼
┌──────────┐
│ Server │ ← healthy
│ (UP) │
└──────────┘
2. Rate Limiting Algorithms
2.1 Fixed Window
The simplest algorithm. Divide time into fixed windows (e.g., 1-minute blocks) and count requests per window.
Fixed Window (limit: 5 requests per minute)
Timeline: |-------- Minute 1 --------|-------- Minute 2 --------|
Requests: X X X X X ✗ ✗ X X X
Count: 1 2 3 4 5 DENIED 1 2 3
Legend: X = allowed, ✗ = denied (429)
// Simple fixed-window counter (in-memory)
const windowCounts = new Map();
const WINDOW_SIZE_MS = 60 * 1000; // 1 minute
const MAX_REQUESTS = 100;
function fixedWindowLimiter(clientId) {
const now = Date.now();
const windowKey = `${clientId}:${Math.floor(now / WINDOW_SIZE_MS)}`;
const current = windowCounts.get(windowKey) || 0;
if (current >= MAX_REQUESTS) {
return { allowed: false, remaining: 0 };
}
windowCounts.set(windowKey, current + 1);
return { allowed: true, remaining: MAX_REQUESTS - current - 1 };
}
Pros: Simple, low memory. Cons: Burst problem — a client can make 100 requests at 0:59 and 100 more at 1:01, getting 200 in 2 seconds.
2.2 Sliding Window Log
Records the timestamp of every request and counts how many fall within the rolling window.
Sliding Window Log (limit: 5 per 60s, current time: T=75s)
Stored timestamps: [10, 25, 40, 55, 70]
↑ ↑
oldest newest
Window: [75 - 60, 75] = [15, 75]
In-window: [25, 40, 55, 70] → 4 requests → ALLOWED (1 remaining)
Out-of-window: [10] → pruned
// Sliding window log
const requestLogs = new Map();
const WINDOW_MS = 60 * 1000;
const MAX_REQUESTS = 100;
function slidingWindowLog(clientId) {
const now = Date.now();
const windowStart = now - WINDOW_MS;
if (!requestLogs.has(clientId)) {
requestLogs.set(clientId, []);
}
const logs = requestLogs.get(clientId);
// Remove expired entries
while (logs.length > 0 && logs[0] < windowStart) {
logs.shift();
}
if (logs.length >= MAX_REQUESTS) {
return { allowed: false, remaining: 0 };
}
logs.push(now);
return { allowed: true, remaining: MAX_REQUESTS - logs.length };
}
Pros: No burst problem — perfectly accurate. Cons: High memory (stores every timestamp). Expensive for high-volume APIs.
2.3 Sliding Window Counter
A hybrid: combines fixed-window counting with a weighted estimate from the previous window. Best of both worlds.
Sliding Window Counter (limit: 100/min)
Prev window count: 80 Current window count: 30
|-------- Prev --------|-------- Current --------|
↑
We are 25% into current window
Weighted count = (80 × 0.75) + 30 = 60 + 30 = 90
90 < 100 → ALLOWED
// Sliding window counter (memory-efficient)
const counters = new Map();
const WINDOW_MS = 60 * 1000;
const MAX_REQUESTS = 100;
function slidingWindowCounter(clientId) {
const now = Date.now();
const currentWindow = Math.floor(now / WINDOW_MS);
const prevWindow = currentWindow - 1;
const windowProgress = (now % WINDOW_MS) / WINDOW_MS;
const prevCount = counters.get(`${clientId}:${prevWindow}`) || 0;
const currCount = counters.get(`${clientId}:${currentWindow}`) || 0;
// Weighted estimate
const estimatedCount = prevCount * (1 - windowProgress) + currCount;
if (estimatedCount >= MAX_REQUESTS) {
return { allowed: false, remaining: 0 };
}
counters.set(`${clientId}:${currentWindow}`, currCount + 1);
return {
allowed: true,
remaining: Math.floor(MAX_REQUESTS - estimatedCount - 1),
};
}
Pros: Low memory, smooth rate enforcement, no burst spikes. Cons: Slightly less precise than sliding window log (but close enough for production).
2.4 Token Bucket
Each client has a bucket that fills with tokens at a steady rate. Each request costs one token. If the bucket is empty, the request is denied.
Token Bucket (capacity: 10, refill: 2 tokens/sec)
Time 0: [##########] 10 tokens — full bucket
Request → [#########] 9 tokens
Time 0.5: [##########] 10 tokens (refilled 1, but capped at 10)
5 requests → [#####] 5 tokens
Time 3: [##########] 10 tokens (refilled over time, capped)
Burst of 10 → [ ] 0 tokens
Next request → DENIED (wait for refill)
Key: # = available token
// Token bucket implementation
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // max tokens
this.tokens = capacity; // current tokens
this.refillRate = refillRate; // tokens per second
this.lastRefill = Date.now();
}
tryConsume() {
this.refill();
if (this.tokens < 1) {
return { allowed: false, retryAfter: (1 / this.refillRate) };
}
this.tokens -= 1;
return { allowed: true, remaining: Math.floor(this.tokens) };
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}
// Usage
const buckets = new Map();
function tokenBucketLimiter(clientId) {
if (!buckets.has(clientId)) {
buckets.set(clientId, new TokenBucket(10, 2)); // 10 capacity, 2/sec refill
}
return buckets.get(clientId).tryConsume();
}
Pros: Allows controlled bursts. Smooth long-term rate. Industry standard (AWS, Stripe use it). Cons: More complex to implement. State management per client.
2.5 Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. If the bucket overflows, requests are dropped.
Leaky Bucket (capacity: 5, drain rate: 1 req/sec)
Incoming: ████████████ (12 requests burst)
┌─────────┐
│ █ █ █ █ █ │ ← bucket (5 capacity)
│ █ █ █ █ █ │
└─────┬─────┘
│ drains at 1/sec
▼
Processed: █...█...█...█...█ (steady 1/sec)
Overflow: ███████ (7 dropped — 429 Too Many Requests)
// Leaky bucket (queue-based)
class LeakyBucket {
constructor(capacity, drainRate) {
this.capacity = capacity;
this.queue = [];
this.drainRate = drainRate; // requests per second
// Drain the bucket at a steady rate
setInterval(() => {
if (this.queue.length > 0) {
const request = this.queue.shift();
request.resolve(); // process the request
}
}, 1000 / this.drainRate);
}
enqueue() {
return new Promise((resolve, reject) => {
if (this.queue.length >= this.capacity) {
reject(new Error('Rate limit exceeded'));
return;
}
this.queue.push({ resolve });
});
}
}
Pros: Perfectly smooth output rate. Prevents any burstiness downstream. Cons: Adds latency (requests wait in queue). More complex.
Algorithm Comparison
| Algorithm | Burst Handling | Memory | Precision | Complexity | Best For |
|---|---|---|---|---|---|
| Fixed Window | Poor (boundary burst) | Low | Low | Simple | Basic APIs |
| Sliding Window Log | Excellent | High | Exact | Medium | Low-volume, precision-critical |
| Sliding Window Counter | Good | Low | Near-exact | Medium | Most production APIs |
| Token Bucket | Controlled bursts | Medium | Good | Medium | APIs that allow bursts (AWS, Stripe) |
| Leaky Bucket | No bursts | Medium | Good | High | Smooth-output requirements |
3. Rate Limiting by Identity
By IP Address
// Most common — limit by IP
function getClientIP(req) {
// Behind a load balancer / proxy, use X-Forwarded-For
return req.headers['x-forwarded-for']?.split(',')[0]?.trim()
|| req.socket.remoteAddress;
}
Problem: NAT — thousands of users behind one corporate IP get collectively throttled.
By User (Authenticated)
// After auth middleware sets req.user
function getRateLimitKey(req) {
if (req.user) return `user:${req.user.id}`;
return `ip:${getClientIP(req)}`;
}
By API Key
// For third-party API consumers
function getRateLimitKey(req) {
const apiKey = req.headers['x-api-key'];
if (apiKey) return `key:${apiKey}`;
return `ip:${getClientIP(req)}`;
}
4. express-rate-limit Library
The standard rate limiting middleware for Express.
import rateLimit from 'express-rate-limit';
// Global rate limiter
const globalLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // 100 requests per window
standardHeaders: true, // Return rate limit info in `RateLimit-*` headers
legacyHeaders: false, // Disable `X-RateLimit-*` headers
message: {
error: 'Too many requests, please try again later.',
retryAfter: 15 * 60,
},
keyGenerator: (req) => {
return req.headers['x-forwarded-for']?.split(',')[0]?.trim()
|| req.ip;
},
});
app.use(globalLimiter);
Endpoint-Specific Limits
// Strict limit on auth endpoints (prevent brute force)
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 5, // Only 5 login attempts
message: { error: 'Too many login attempts. Try again in 15 minutes.' },
skipSuccessfulRequests: true, // Don't count successful logins
});
// Relaxed limit on read endpoints
const readLimiter = rateLimit({
windowMs: 1 * 60 * 1000, // 1 minute
max: 60, // 60 reads per minute
});
// Very strict limit on expensive operations
const writeLimiter = rateLimit({
windowMs: 1 * 60 * 1000,
max: 10, // 10 writes per minute
});
// Apply per-route
app.post('/api/auth/login', authLimiter, loginHandler);
app.post('/api/auth/register', authLimiter, registerHandler);
app.get('/api/products', readLimiter, listProducts);
app.post('/api/orders', writeLimiter, createOrder);
5. Rate Limit Headers
Standard headers that tell clients about their rate limit status:
HTTP/1.1 200 OK
RateLimit-Limit: 100 # Max requests per window
RateLimit-Remaining: 42 # Requests left in current window
RateLimit-Reset: 1625847600 # Unix timestamp when window resets
# On 429 response:
HTTP/1.1 429 Too Many Requests
Retry-After: 30 # Seconds until the client should retry
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1625847600
// Custom header middleware
function rateLimitHeaders(req, res, next) {
const result = checkRateLimit(req);
res.set('RateLimit-Limit', String(result.limit));
res.set('RateLimit-Remaining', String(result.remaining));
res.set('RateLimit-Reset', String(result.resetTime));
if (!result.allowed) {
res.set('Retry-After', String(result.retryAfter));
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: result.retryAfter,
});
}
next();
}
6. Redis-Based Distributed Rate Limiting
In-memory rate limiting breaks when you have multiple server instances. Redis provides shared state.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ (Express)│ │ (Express)│ │ (Express)│
└─────┬────┘ └─────┬────┘ └─────┬────┘
│ │ │
└────────┬───────┴────────┬───────┘
│ │
▼ ▼
┌─────────────────────────┐
│ Redis Cluster │
│ Shared rate limit state │
│ "user:123" → count: 42 │
└─────────────────────────┘
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import { createClient } from 'redis';
// Connect to Redis
const redisClient = createClient({
url: process.env.REDIS_URL || 'redis://localhost:6379',
});
await redisClient.connect();
// Distributed rate limiter
const distributedLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 100,
standardHeaders: true,
legacyHeaders: false,
// Use Redis as the store
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: 'rl:', // Key prefix in Redis
}),
});
app.use('/api/', distributedLimiter);
Redis Sliding Window with Lua Script
For precise distributed rate limiting, use a Lua script (atomic operations in Redis):
// Atomic sliding window counter in Redis
const SLIDING_WINDOW_SCRIPT = `
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current entries
local count = redis.call('ZCARD', key)
if count < limit then
-- Add this request
redis.call('ZADD', key, now, now .. ':' .. math.random())
redis.call('PEXPIRE', key, window)
return {1, limit - count - 1} -- allowed, remaining
else
return {0, 0} -- denied, 0 remaining
end
`;
async function slidingWindowRedis(clientId, windowMs, maxRequests) {
const result = await redisClient.eval(SLIDING_WINDOW_SCRIPT, {
keys: [`ratelimit:${clientId}`],
arguments: [String(windowMs), String(maxRequests), String(Date.now())],
});
return {
allowed: result[0] === 1,
remaining: result[1],
};
}
7. Rate Limiting in Microservices
Gateway-Level vs Service-Level
Gateway-Level Service-Level
Rate Limiting Rate Limiting
Client ──→ ┌────────────────┐ Client ──→ ┌──────────┐
│ API Gateway │ │ Gateway │ (pass-through)
│ 100 req/min │ └─────┬────┘
└───────┬────────┘ │
│ ┌───────┼───────┐
┌───────┼───────┐ ▼ ▼ ▼
▼ ▼ ▼ ┌─────────┐ ┌──────┐ ┌──────┐
┌────────┐ ┌──────┐ ┌──────┐ │ Auth │ │Orders│ │Search│
│ Auth │ │Orders│ │Search│ │ 20/min │ │50/min│ │200/m │
└────────┘ └──────┘ └──────┘ └─────────┘ └──────┘ └──────┘
Best practice: Both. Gateway for global limits, services for granular limits.
// API Gateway global limit
const gatewayLimiter = rateLimit({
windowMs: 60 * 1000,
max: 200, // No client exceeds 200/min total
keyGenerator: (req) => req.headers['x-api-key'] || req.ip,
});
// Per-service limits (inside each service)
// auth-service
const authServiceLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 10, // 10 auth attempts per 15 min
keyGenerator: (req) => req.headers['x-user-id'] || req.ip,
});
// search-service
const searchServiceLimiter = rateLimit({
windowMs: 60 * 1000,
max: 60, // 60 searches per minute
});
8. Tiered Rate Limiting
Different users get different limits based on their plan.
// Tiered rate limiting by API key plan
const TIER_LIMITS = {
free: { windowMs: 60 * 1000, max: 10 },
basic: { windowMs: 60 * 1000, max: 100 },
pro: { windowMs: 60 * 1000, max: 1000 },
enterprise: { windowMs: 60 * 1000, max: 10000 },
};
async function tieredRateLimiter(req, res, next) {
const apiKey = req.headers['x-api-key'];
if (!apiKey) {
// Anonymous — strictest limits
return rateLimit({ windowMs: 60000, max: 5 })(req, res, next);
}
// Look up the plan for this API key
const plan = await getPlanForApiKey(apiKey); // e.g., from Redis or DB
const limits = TIER_LIMITS[plan] || TIER_LIMITS.free;
const result = await slidingWindowRedis(
`tier:${apiKey}`,
limits.windowMs,
limits.max
);
// Set headers
res.set('RateLimit-Limit', String(limits.max));
res.set('RateLimit-Remaining', String(result.remaining));
if (!result.allowed) {
res.set('Retry-After', String(Math.ceil(limits.windowMs / 1000)));
return res.status(429).json({
error: 'Rate limit exceeded',
plan,
limit: limits.max,
upgradeUrl: '/pricing',
});
}
next();
}
app.use('/api/', tieredRateLimiter);
9. Abuse Prevention Patterns
Exponential Backoff on Repeated Violations
// Track repeat offenders
const violations = new Map();
function abuseDetection(req, res, next) {
const clientId = req.ip;
const record = violations.get(clientId) || { count: 0, blockedUntil: 0 };
// Check if currently blocked
if (Date.now() < record.blockedUntil) {
const retryAfter = Math.ceil((record.blockedUntil - Date.now()) / 1000);
res.set('Retry-After', String(retryAfter));
return res.status(429).json({
error: 'Temporarily blocked due to repeated abuse',
retryAfter,
});
}
// Reset if enough time has passed
if (record.lastViolation && Date.now() - record.lastViolation > 3600000) {
violations.delete(clientId);
return next();
}
next();
}
// Call this when a rate limit is hit
function recordViolation(clientId) {
const record = violations.get(clientId) || { count: 0 };
record.count += 1;
record.lastViolation = Date.now();
// Exponential backoff: 1min, 5min, 30min, 2hr, 24hr
const backoffMinutes = [1, 5, 30, 120, 1440];
const backoffIndex = Math.min(record.count - 1, backoffMinutes.length - 1);
record.blockedUntil = Date.now() + backoffMinutes[backoffIndex] * 60 * 1000;
violations.set(clientId, record);
}
10. Complete Production Implementation
import express from 'express';
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import { createClient } from 'redis';
const app = express();
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
// --- Layer 1: Global rate limit (all routes) ---
const globalLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 500,
standardHeaders: true,
legacyHeaders: false,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: 'rl:global:',
}),
message: { error: 'Global rate limit exceeded' },
});
// --- Layer 2: Auth endpoint limiter ---
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 5,
skipSuccessfulRequests: true,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: 'rl:auth:',
}),
message: { error: 'Too many login attempts' },
});
// --- Layer 3: API-key-based tiered limiter ---
const apiLimiter = rateLimit({
windowMs: 60 * 1000,
max: async (req) => {
const apiKey = req.headers['x-api-key'];
if (!apiKey) return 10;
const plan = await redisClient.get(`plan:${apiKey}`);
const limits = { free: 30, basic: 100, pro: 1000, enterprise: 10000 };
return limits[plan] || 10;
},
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: 'rl:api:',
}),
keyGenerator: (req) => req.headers['x-api-key'] || req.ip,
});
// --- Layer 4: Expensive operation limiter ---
const expensiveLimiter = rateLimit({
windowMs: 60 * 1000,
max: 5,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: 'rl:expensive:',
}),
});
// Apply middleware
app.use(globalLimiter);
app.post('/api/auth/login', authLimiter);
app.post('/api/auth/register', authLimiter);
app.use('/api/', apiLimiter);
app.post('/api/export', expensiveLimiter);
app.post('/api/ai/generate', expensiveLimiter);
app.listen(3000, () => {
console.log('Server with production rate limiting running on :3000');
});
11. Key Takeaways
- Rate limiting is mandatory — every production API needs it to prevent abuse, control costs, and protect availability.
- Choose the right algorithm — sliding window counter or token bucket cover most production needs.
- Distribute with Redis — in-memory counters break with multiple server instances.
- Layer your limits — gateway-level for global caps, service-level for granular control.
- Set proper headers —
RateLimit-Limit,RateLimit-Remaining,Retry-Afterso clients can self-regulate. - Tier by identity — different limits for anonymous, free, and paid users.
Explain-It Challenge
- Your boss says "just set a global limit of 100 requests per minute." Why is this insufficient for production?
- A client behind corporate NAT complains they keep getting 429 errors. What happened and how do you fix it?
- Explain why token bucket is preferred by AWS and Stripe over fixed-window counting.
Navigation: ← 6.8 Overview · 6.8.b — CORS and Secure Headers →