Episode 9 — System Design / 9.7 — System Design Foundations
9.7.c — Breaking Into Components
In one sentence: Decomposing a system into well-bounded components (services) is the core skill of high-level design — it determines how independently your teams can build, deploy, and scale each piece.
Navigation: ← 9.7.b — Requirements Analysis · 9.7.d — Capacity Estimation →
Table of Contents
- 1. Why Decompose?
- 2. Identifying Service Boundaries
- 3. Data Flow Between Components
- 4. API Contracts
- 5. Dependency Mapping
- 6. Component Diagram Examples
- 7. Common Decomposition Patterns
- 8. Anti-Patterns in Decomposition
- 9. Key Takeaways
- 10. Explain-It Challenge
1. Why Decompose?
A monolithic blob that does everything is simple to start with, but as a system grows, it becomes a bottleneck for development speed, scalability, and reliability.
MONOLITH DECOMPOSED SYSTEM
──────── ──────────────────
┌──────────────────┐ ┌─────────┐ ┌─────────┐
│ Everything in │ │ Auth │ │ Feed │
│ one big box │ ──────► │ Service │ │ Service │
│ │ └─────────┘ └─────────┘
│ • Auth │
│ • Feed │ ┌─────────┐ ┌─────────┐
│ • Users │ │ User │ │ Media │
│ • Media │ │ Service │ │ Service │
│ • Search │ └─────────┘ └─────────┘
│ • Notifications │
└──────────────────┘ ┌─────────┐
│ Search │
│ Service │
└─────────┘
| Benefit | Explanation |
|---|---|
| Independent scaling | Scale the read-heavy Feed Service without scaling the Auth Service |
| Independent deployment | Deploy a fix to Search without redeploying the whole system |
| Team ownership | Each team owns a service boundary — less coordination overhead |
| Fault isolation | If Media Service crashes, Auth and Feed keep working |
| Technology flexibility | Feed Service can use Redis; Search can use Elasticsearch |
2. Identifying Service Boundaries
The hardest part of decomposition is deciding where to draw the lines. Here are proven heuristics.
Heuristic 1: Single Responsibility
Each service should own one business capability.
| Good Boundary | Bad Boundary |
|---|---|
| "User Service handles registration, authentication, and profile management" | "UserAndTweetService handles user profiles AND tweet creation" |
| "Payment Service handles all billing logic" | "MiscService handles payments, emails, and logging" |
Heuristic 2: Data Ownership
Each service should own its own data store. If two services need the same table, that is a sign they should be one service or the data should be shared via APIs.
GOOD: Each service owns its data BAD: Shared database
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ User │ │ Order │ │ User │ │ Order │
│ Service │ │ Service │ │ Service │ │ Service │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
┌────▼────┐ ┌────▼────┐ └──────┬───────┘
│ User DB │ │Order DB │ ┌─────▼─────┐
└─────────┘ └─────────┘ │ Shared DB │ ← coupling!
└───────────┘
Heuristic 3: Rate of Change
Components that change frequently should be separate from stable components.
| Changes Often | Changes Rarely |
|---|---|
| Recommendation algorithm | User authentication |
| Search ranking | Payment processing |
| UI/BFF (Backend for Frontend) | Core data models |
Heuristic 4: Scaling Needs
Components with different scaling profiles should be separate.
| Component | Scaling Profile |
|---|---|
| Image upload | CPU-intensive (resizing), bursty |
| Timeline read | Memory-intensive (caching), constant high throughput |
| Notification | I/O-heavy (external APIs), can be async |
| Authentication | Low volume, must be always available |
Heuristic 5: Domain-Driven Design (DDD)
Group by bounded context — a business domain with clear boundaries.
┌─────────────────────────────────────────────────────────────┐
│ E-COMMERCE SYSTEM │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CATALOG │ │ ORDERING │ │ SHIPPING │ │
│ │ Context │ │ Context │ │ Context │ │
│ │ │ │ │ │ │ │
│ │ • Product │ │ • Cart │ │ • Shipment │ │
│ │ • Category │ │ • Order │ │ • Tracking │ │
│ │ • Inventory │ │ • Payment │ │ • Carrier │ │
│ │ • Pricing │ │ • Invoice │ │ • Label │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Each context has its own data model and service boundary │
└─────────────────────────────────────────────────────────────┘
3. Data Flow Between Components
Once you have services, you need to define how data flows between them.
Synchronous Communication (Request-Response)
┌─────────┐ HTTP/gRPC ┌─────────┐
│ Service │ ───────────────► │ Service │
│ A │ ◄─────────────── │ B │
└─────────┘ response └─────────┘
• A WAITS for B to respond
• Simple and predictable
• Creates coupling: if B is slow, A is slow
• If B is down, A may fail
Use when: You need an immediate answer (e.g., "Is this user authenticated?")
Asynchronous Communication (Event/Message)
┌─────────┐ publish ┌──────────┐ consume ┌─────────┐
│ Service │ ───────────────► │ Queue/ │ ────────────► │ Service │
│ A │ │ Topic │ │ B │
└─────────┘ └──────────┘ └─────────┘
• A does NOT wait for B
• Decoupled: A doesn't even know B exists
• B can process at its own pace
• If B is down, messages queue up (no data loss)
Use when: The caller doesn't need an immediate result (e.g., "Send a welcome email after signup")
Comparison Table
| Aspect | Synchronous (REST/gRPC) | Asynchronous (Queue/Event) |
|---|---|---|
| Latency | Caller waits for response | Caller returns immediately |
| Coupling | Tight (A knows about B) | Loose (A publishes; anyone can subscribe) |
| Failure handling | A fails if B fails | Messages buffer; B processes when ready |
| Debugging | Easy (request-response trace) | Harder (events across services) |
| Use case | Auth check, data fetch | Notifications, video processing, analytics |
Hybrid Pattern (Common in Practice)
Client API Gateway Tweet Service Queue Fan-out Worker
│ │ │ │ │
│ POST /tweet │ │ │ │
│─────────────────────►│ │ │ │
│ │ createTweet() │ │ │
│ │─────────────────────►│ │ │
│ │ │ publish event │ │
│ │ │─────────────────►│ │
│ │ { id: 123 } │ │ │
│ │◄─────────────────────│ │ │
│ 201 Created │ │ │ consume event │
│◄─────────────────────│ │ │─────────────────►│
│ │ │ │ update timelines│
│ (synchronous │ │ │ (asynchronous │
│ response) │ │ │ processing) │
4. API Contracts
API contracts define the interface between services. They are the "handshake" agreement.
REST API Contract Example
POST /api/v1/tweets
─────────────────────
Headers:
Authorization: Bearer <token>
Content-Type: application/json
Request Body:
{
"text": "Hello, world!",
"media_ids": ["img_123"]
}
Response (201 Created):
{
"id": "tweet_456",
"text": "Hello, world!",
"author_id": "user_789",
"created_at": "2025-01-15T10:30:00Z",
"media": [{ "id": "img_123", "url": "https://cdn.example.com/img_123.jpg" }]
}
Error Response (400 Bad Request):
{
"error": "TWEET_TOO_LONG",
"message": "Tweet exceeds 280 characters"
}
API Design Principles
| Principle | Explanation |
|---|---|
| Versioning | Use /v1/, /v2/ to avoid breaking existing clients |
| Idempotency | Retrying the same PUT/DELETE should not cause duplicates |
| Pagination | Large lists must support ?page=1&limit=20 or cursor-based pagination |
| Consistent naming | Use nouns for resources: /tweets, /users, not /createTweet |
| Error codes | Return meaningful error codes and messages, not just 500 |
| Rate limiting | Protect your API with per-user rate limits (e.g., 100 req/min) |
Service-to-Service Contract
┌──────────────────────────────────────────────────────────────┐
│ SERVICE CONTRACT │
│ │
│ Provider: User Service │
│ Consumer: Tweet Service │
│ │
│ Endpoint: GET /internal/users/{user_id} │
│ Purpose: Fetch user data for tweet enrichment │
│ SLA: P99 latency < 50ms, 99.99% availability │
│ │
│ Response: │
│ { "id": "user_789", "name": "Alice", "avatar_url": "..." } │
│ │
│ What if User Service is down? │
│ → Tweet Service uses cached user data (stale up to 5 min) │
└──────────────────────────────────────────────────────────────┘
5. Dependency Mapping
Before finalizing your design, map out which services depend on which. Look for:
- Circular dependencies (A calls B, B calls A — redesign needed)
- Single points of failure (everything depends on one service)
- Critical path (the chain of calls that determines end-to-end latency)
Dependency Diagram
┌────────────────────────────────────────────────────────────┐
│ DEPENDENCY MAP │
│ │
│ ┌───────────┐ │
│ │ API │ │
│ │ Gateway │ │
│ └─────┬─────┘ │
│ ┌────┼────┐ │
│ ▼ ▼ ▼ │
│ ┌──────┐┌──────┐┌──────┐ │
│ │ Auth ││Tweet ││Search│ │
│ │ Svc ││ Svc ││ Svc │ │
│ └──┬───┘└──┬───┘└──┬───┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────┐┌──────┐┌──────────┐ │
│ │ User ││ Feed ││ Elastic │ │
│ │ Svc ││ Svc ││ Search │ │
│ └──┬───┘└──┬───┘└──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────┐┌──────┐ │
│ │UserDB││FeedDB│ │
│ └──────┘└──────┘ │
│ │
│ Critical path (timeline read): │
│ Gateway → Feed Svc → Feed DB + User Svc → User DB │
│ Latency budget: 200ms total │
└────────────────────────────────────────────────────────────┘
Reducing Dependencies
| Problem | Solution |
|---|---|
| Service A calls B calls C calls A (circular) | Introduce an event bus or merge A and C |
| All services call Auth Service (bottleneck) | Cache auth tokens; use JWT for stateless validation |
| Single database for everything (SPOF) | Each service owns its database; replicate for reads |
| Synchronous chain of 5 services (latency) | Make non-critical calls async via queues |
6. Component Diagram Examples
Example 1: URL Shortener
┌────────────┐ ┌─────────────┐ ┌──────────────┐
│ Clients │──────►│ Load │──────►│ URL Service │
│(Browser/ │ HTTPS │ Balancer │ │ │
│ API) │ └─────────────┘ │ • shorten() │
└────────────┘ │ • redirect() │
│ • analytics()│
└──────┬───────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Cache │ │ DB │ │Analytics │
│ (Redis) │ │(Postgres│ │ Store │
│ │ │ / Cass.)│ │(ClickHs.)│
│ short→ │ │ short→ │ │ clicks, │
│ long │ │ long │ │ geo, ts │
└─────────┘ └─────────┘ └──────────┘
Data flow (redirect):
1. Client hits /abc123
2. URL Service checks Redis cache
3. Cache HIT → redirect immediately (< 10ms)
4. Cache MISS → query DB, populate cache, redirect
5. Log click event to Analytics Store (async)
Example 2: Chat Application (WhatsApp-like)
┌──────────┐ ┌──────────┐
│ Mobile │◄──── WebSocket ───►│ Chat │
│ App │ │ Gateway │
└──────────┘ └────┬─────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Presence │ │ Message │ │ Group │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌───▼──────┐
│ Redis │ │ Cassandra│ │ MySQL │
│ (online/ │ │ (messages│ │ (groups, │
│ offline)│ │ by chat)│ │ members)│
└──────────┘ └──────────┘ └──────────┘
Key decisions:
• WebSocket for real-time delivery
• Cassandra for messages (write-heavy, time-series)
• Redis for presence (fast reads, ephemeral data)
• MySQL for groups (relational, fewer writes)
Example 3: E-Commerce Platform
┌────────┐ ┌─────┐ ┌───────────┐
│ Web │────►│ CDN │ │ API │
│Browser │ │ │ │ Gateway │
└────────┘ └─────┘ └─────┬─────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Product │ │ Order │ │ Payment │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ Product │ │ Order DB │ │ Payment │
│ DB + ES │ │ (MySQL) │ │ Gateway │
│ (search) │ └──────────┘ │ (Stripe) │
└──────────┘ └──────────┘
┌────────────┐
│ Queue │
│ (Kafka) │
└──────┬─────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│Inventory │ │ Email │ │Shipping │
│ Update │ │ Notif. │ │ Service │
└──────────┘ └──────────┘ └──────────┘
7. Common Decomposition Patterns
Pattern 1: Backend for Frontend (BFF)
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Mobile │ │ Web │ │ Smart │
│ App │ │ Browser │ │ TV │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
│ Mobile │ │ Web │ │ TV │
│ BFF │ │ BFF │ │ BFF │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└───────────────┼───────────────┘
▼
┌──────────┐
│ Shared │
│ Services │
└──────────┘
Each BFF tailors the API for its specific client (mobile needs less data, TV needs bigger images, etc.).
Pattern 2: Gateway Aggregation
Client makes ONE request:
GET /dashboard
API Gateway calls multiple services in parallel:
├── User Service → user profile
├── Order Service → recent orders
├── Notification Svc → unread count
└── Recommendation → suggested products
Gateway aggregates responses and returns a single JSON
Pattern 3: Strangler Fig (Migration)
Gradually replace a monolith by routing specific endpoints to new services.
Phase 1: All traffic → Monolith
Phase 2: /api/auth → New Auth Service; everything else → Monolith
Phase 3: /api/auth → Auth Service; /api/feed → New Feed Service; rest → Monolith
Phase 4: Monolith is empty → decomission
8. Anti-Patterns in Decomposition
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Distributed monolith | Services are split but tightly coupled; must deploy together | Ensure each service can deploy and fail independently |
| Shared database | Multiple services read/write the same tables | Each service owns its own DB; share data via APIs or events |
| Chatty services | Service A makes 20 calls to Service B per request | Batch endpoints, denormalize data, or merge services |
| God service | One service does everything (it IS the monolith) | Apply single responsibility; break out distinct capabilities |
| Nano services | Over-decomposition; 50 services for a simple app | Merge small services until each has meaningful responsibility |
| Circular dependencies | A depends on B, B depends on A | Introduce events, a shared library, or merge |
9. Key Takeaways
- Decompose by business capability — each service should own one domain (User, Order, Payment).
- Each service owns its data — shared databases create coupling that defeats the purpose of decomposition.
- Use synchronous calls (REST/gRPC) when you need an immediate answer; use asynchronous messaging (queues/events) when you don't.
- Map your dependencies — look for circular dependencies, single points of failure, and long synchronous chains.
- API contracts are the glue between services — version them, document them, and design for failure.
- Avoid over-decomposition — start with fewer, larger services and split when there is a clear reason (scaling, team ownership, rate of change).
10. Explain-It Challenge
Without looking back, explain in your own words:
- Name five heuristics for deciding where to draw service boundaries.
- When would you use synchronous communication between services vs asynchronous? Give an example of each.
- What is a distributed monolith and why is it worse than a real monolith?
- Draw a simple component diagram for a food delivery app (Uber Eats-like) with at least 4 services.
- What is the strangler fig pattern and when would you use it?
Navigation: ← 9.7.b — Requirements Analysis · 9.7.d — Capacity Estimation →