Episode 9 — System Design / 9.9 — Core Infrastructure
9.9.d API Gateway
What API Gateways Do
An API Gateway is a single entry point for all client requests to a backend system. It acts as a reverse proxy that routes requests to appropriate services while handling cross-cutting concerns like authentication, rate limiting, and request transformation.
WITHOUT API Gateway:
Mobile App --> User Service (auth, rate limiting in each service)
Mobile App --> Order Service (auth, rate limiting in each service)
Mobile App --> Payment Service (auth, rate limiting in each service)
Web App --> User Service (duplicated logic everywhere)
Web App --> Order Service
Problems: Client must know all service URLs, auth duplicated, no central control
WITH API Gateway:
Mobile App ---+
|
Web App ------+--> API Gateway --> User Service
| | --> Order Service
Partner API --+ | --> Payment Service
|
Handles: Auth, Rate Limiting, Routing,
Logging, Transformation, Caching
Core Responsibilities
1. Request Routing
The gateway maps external URLs to internal service endpoints.
Routing Table:
External URL Internal Service
─────────────────────────────────────────────────
GET /api/users/* -> user-service:8080/users/*
POST /api/orders/* -> order-service:8081/orders/*
GET /api/products/* -> product-service:8082/products/*
POST /api/payments/* -> payment-service:8083/payments/*
Example:
Client: GET https://api.example.com/api/users/42
Gateway routes to: http://user-service:8080/users/42
Path-based routing:
/api/v1/* --> Service Pool A (version 1)
/api/v2/* --> Service Pool B (version 2)
Header-based routing:
X-Client-Type: mobile --> Mobile-optimized service
X-Client-Type: web --> Web service
Weight-based routing (canary deployments):
/api/users/* --> Service v1 (90% of traffic)
--> Service v2 (10% of traffic)
2. Authentication and Authorization
The gateway validates identity and permissions before requests reach backend services.
Authentication Flow:
Client API Gateway Auth Service Backend
| | | |
|-- Request + JWT ------->| | |
| |-- Validate JWT -------->| |
| |<-- Valid, user=42 ------| |
| | | |
| |-- Forward request ----->|---------------->|
| | + X-User-Id: 42 | |
| | + X-Roles: admin | |
|<-- Response ------------|<------------------------|<----------------|
Backend services TRUST the gateway's headers.
They never validate JWT themselves.
Common auth patterns at the gateway:
| Pattern | How It Works | Use Case |
|---|---|---|
| JWT validation | Gateway verifies JWT signature and claims | Stateless auth |
| OAuth2 token introspection | Gateway calls auth server to validate opaque token | Third-party tokens |
| API key validation | Gateway checks API key against a registry | Partner/developer APIs |
| mTLS | Gateway verifies client certificate | Service-to-service, B2B |
| Basic Auth | Gateway checks username/password | Simple internal APIs |
3. Rate Limiting
The gateway controls how many requests a client can make in a given time window.
Rate Limiting Example:
Plan: Free tier = 100 requests/minute
Request 1: 200 OK (remaining: 99)
Request 2: 200 OK (remaining: 98)
...
Request 100: 200 OK (remaining: 0)
Request 101: 429 Too Many Requests
Retry-After: 30
Response headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1680000060
Rate limiting algorithms:
| Algorithm | Description | Pros | Cons |
|---|---|---|---|
| Fixed Window | Count requests in fixed time windows (e.g., per minute) | Simple | Burst at window boundary |
| Sliding Window Log | Track timestamps of all requests | Accurate | Memory-intensive |
| Sliding Window Counter | Hybrid of fixed window + weighted overlap | Good balance | Slightly approximate |
| Token Bucket | Tokens added at fixed rate; request consumes a token | Allows bursts | More complex |
| Leaky Bucket | Requests processed at fixed rate; excess queued/dropped | Smooth output | No burst handling |
Token Bucket:
Bucket capacity: 10 tokens
Refill rate: 2 tokens/second
t=0s: [##########] 10 tokens (full)
t=0s: 3 requests --> [#######] 7 tokens
t=1s: refill +2 --> [#########] 9 tokens
t=1s: 5 requests --> [####] 4 tokens
t=2s: refill +2 --> [######] 6 tokens
If bucket empty: reject request (429)
Rate limiting dimensions:
- Per user/API key
- Per IP address
- Per endpoint
- Per service/tenant
- Global (total system throughput)
4. Request/Response Transformation
The gateway can modify requests before they reach services and modify responses before they reach clients.
Request Transformation:
Client sends:
POST /api/orders
Authorization: Bearer eyJhbG...
Content-Type: application/json
{"item": "laptop", "qty": 1}
Gateway transforms to:
POST /internal/orders/create
X-User-Id: 42
X-Request-Id: uuid-abc-123
X-Forwarded-For: 203.0.113.5
Content-Type: application/json
{"item": "laptop", "qty": 1, "user_id": 42, "timestamp": "2026-04-11T..."}
Response Transformation:
Service returns:
{"user_id": 42, "internal_score": 85, "name": "Alice", ...}
Gateway transforms to (remove internal fields):
{"name": "Alice", ...}
Common transformations:
| Transformation | Example |
|---|---|
| Header injection | Add X-Request-Id, X-Forwarded-For |
| Header removal | Strip internal headers from responses |
| Protocol translation | REST to gRPC, HTTP to WebSocket |
| Payload modification | Add fields, remove sensitive data |
| Response filtering | Return only requested fields |
| Format conversion | XML to JSON, JSON to Protocol Buffers |
5. Request Aggregation (API Composition)
The gateway can combine responses from multiple services into a single response.
Without Aggregation (client makes 3 calls):
Mobile App --> GET /api/users/42 --> User Service
Mobile App --> GET /api/users/42/orders --> Order Service
Mobile App --> GET /api/users/42/recommendations --> Recommendation Service
3 round trips from mobile! Slow on cellular networks.
With Aggregation (client makes 1 call):
Mobile App --> GET /api/users/42/dashboard --> API Gateway
API Gateway internally:
+--> User Service: GET /users/42
+--> Order Service: GET /users/42/orders (parallel)
+--> Recommendation Service: GET /users/42/recs (parallel)
Gateway combines results:
{
"user": {"name": "Alice", ...},
"recent_orders": [...],
"recommendations": [...]
}
1 round trip from mobile. Much faster.
This is especially valuable for:
- Mobile clients (high latency, limited bandwidth)
- BFF (Backend for Frontend) pattern
- Reducing over-fetching and under-fetching
6. Other Gateway Responsibilities
| Responsibility | Description |
|---|---|
| SSL/TLS termination | Decrypt HTTPS at the gateway; internal traffic can be HTTP |
| Logging and monitoring | Centralized request/response logging |
| Circuit breaking | Stop sending requests to failing services |
| Caching | Cache GET responses to reduce backend load |
| CORS handling | Manage cross-origin resource sharing headers |
| IP whitelisting/blacklisting | Block or allow specific IPs |
| Request validation | Validate request schema before forwarding |
| Compression | gzip/brotli responses |
| Retry logic | Retry failed requests with exponential backoff |
| Load shedding | Reject low-priority requests under high load |
Popular API Gateways
| Gateway | Type | Best For | Key Features |
|---|---|---|---|
| Kong | Open source / Enterprise | General purpose | Plugin ecosystem, Lua/Go plugins, DB-less mode |
| AWS API Gateway | Cloud-managed | AWS ecosystems | Lambda integration, WebSocket, REST & HTTP APIs |
| Nginx | Open source | High performance | Reverse proxy, extensive config, Lua scripting |
| Envoy | Open source | Service mesh | gRPC-native, observability, xDS API |
| Traefik | Open source | Container-native | Auto-discovery, Docker/K8s integration |
| Apigee (Google) | Cloud-managed | Enterprise API management | Analytics, developer portal, monetization |
| Azure API Management | Cloud-managed | Azure ecosystems | Policy engine, developer portal |
| Spring Cloud Gateway | Framework | Java/Spring ecosystems | Java-native, reactive, Spring integration |
| Zuul (Netflix) | Open source | JVM ecosystems | Filters, dynamic routing, Netflix battle-tested |
API Gateway vs Load Balancer
This is a common interview question. They overlap but serve different purposes.
API Gateway Load Balancer
─────────────────────────────────────────────────────
Application-level concerns Traffic distribution
Auth, rate limiting, transformation Health checks, routing
Single entry point for clients Distributes to server pool
Understands API semantics Protocol-level routing
Often L7 only L4 or L7
In practice, they work TOGETHER:
Client --> API Gateway --> Load Balancer --> Service Instances
| |
Auth, routing Distribution
Rate limiting Health checks
Transformation Failover
| Feature | API Gateway | Load Balancer |
|---|---|---|
| Primary purpose | API management | Traffic distribution |
| Authentication | Yes | No |
| Rate limiting | Yes | No (typically) |
| Request transformation | Yes | No |
| API composition | Yes | No |
| Health checks | Sometimes | Yes (core feature) |
| SSL termination | Yes | Yes (L7) |
| Content routing | Yes (rich) | Yes (basic, L7) |
| Protocol translation | Yes | No |
| Caching | Sometimes | No |
Key interview answer: "An API gateway manages API concerns (auth, rate limiting, transformation). A load balancer distributes traffic across server instances. In most architectures, you use both: the gateway handles cross-cutting concerns, then forwards to a load balancer that distributes across service instances."
API Gateway in Microservices Architecture
Microservices Architecture with API Gateway:
+------------------------------------------------------------------+
| Clients |
| +--------+ +---------+ +----------+ |
| | Mobile | | Web | | Partner | |
| | App | | App | | API | |
| +---+----+ +----+----+ +----+-----+ |
| | | | |
| +------+-----+-----+-----+ |
| | |
| v |
| +------+------+ |
| | API Gateway | Auth, Rate Limit, Route, Transform |
| +------+------+ |
| | |
| +--------+--------+-----------+ |
| | | | | |
| v v v v |
| +----+ +-----+ +-------+ +--------+ |
| |User| |Order| |Product| |Payment | |
| |Svc | |Svc | |Svc | |Svc | |
| +----+ +-----+ +-------+ +--------+ |
+------------------------------------------------------------------+
Backend for Frontend (BFF) Pattern
Different clients need different API shapes. Instead of one gateway for all, create a specialized gateway per client type.
BFF Pattern:
Mobile App --> Mobile BFF Gateway --> Services
Web App --> Web BFF Gateway --> Services
Partner --> Partner API Gateway --> Services
Each BFF:
- Aggregates data optimized for its client
- Returns only fields the client needs
- Handles client-specific auth (e.g., API keys for partners)
Mobile BFF:
GET /dashboard --> Aggregates user + orders + recommendations
Returns compact JSON (mobile bandwidth)
Web BFF:
GET /dashboard --> Aggregates user + orders + recommendations + analytics
Returns full JSON (desktop bandwidth)
Gateway Design Considerations
1. Single Point of Failure
The gateway is on the critical path. If it goes down, everything goes down.
Mitigations:
- Deploy multiple gateway instances behind a load balancer
- Use cloud-managed gateways (AWS API Gateway auto-scales)
- Implement circuit breakers and graceful degradation
2. Latency Overhead
Every request passes through the gateway, adding latency.
Without gateway: Client --> Service (10ms)
With gateway: Client --> Gateway (2ms) --> Service (10ms) = 12ms
Overhead is typically 1-5ms. Acceptable for most use cases.
Mitigations:
- Keep gateway logic lightweight
- Avoid heavy transformations in the gateway
- Cache frequent responses at the gateway
3. Gateway Bloat
Over time, too much logic migrates to the gateway.
Warning signs:
- Business logic in the gateway (should be in services)
- Complex orchestration (should be a dedicated service)
- Gateway config file is thousands of lines
Rule of thumb: The gateway handles cross-cutting infrastructure concerns. Business logic belongs in services.
4. Configuration Management
Gateway configuration approaches:
1. Static config (Nginx, HAProxy):
- Config file checked into VCS
- Reload on deployment
2. Dynamic config (Kong, Envoy):
- Config stored in database or control plane
- Changes take effect without restart
3. Code-based (Spring Cloud Gateway):
- Routes defined in application code
- Full programming language flexibility
Gateway Security Best Practices
| Practice | Implementation |
|---|---|
| Always terminate SSL at gateway | Internal traffic can be HTTP (encrypted in VPC) |
| Validate all input at gateway | Schema validation, size limits, content type checks |
| Rate limit aggressively | Per-user, per-IP, per-endpoint |
| Never expose internal URLs | Gateway rewrites paths; clients see only public URLs |
| Log all requests | Request ID, user ID, status, latency, response size |
| Implement CORS properly | Whitelist origins, methods, headers at gateway |
| Use circuit breakers | Prevent cascading failures when services are down |
| Sanitize responses | Strip internal headers, stack traces, debug info |
Key Takeaways
- API Gateway = single entry point for all client-to-backend communication
- The gateway handles cross-cutting concerns: auth, rate limiting, routing, transformation
- API composition at the gateway reduces round trips for mobile clients
- Gateway is NOT a load balancer -- they complement each other
- BFF pattern creates client-specific gateways for different frontends
- Beware gateway bloat -- keep business logic in services, not the gateway
- Gateway must be highly available -- it is the single point of failure
- In interviews, mention the gateway as the front door of your microservices architecture
Explain-It Challenge
"You are designing an API for a ride-sharing app. You have separate services for riders, drivers, trips, payments, and notifications. The mobile app needs a single 'request ride' flow that touches all five services. Your partner API (for corporate accounts) has stricter rate limits and different authentication. Design the API gateway layer, including routing, authentication, rate limiting, and how the 'request ride' call is orchestrated."