Episode 9 — System Design / 9.7 — System Design Foundations

9.7.c — Breaking Into Components

In one sentence: Decomposing a system into well-bounded components (services) is the core skill of high-level design — it determines how independently your teams can build, deploy, and scale each piece.

Navigation: ← 9.7.b — Requirements Analysis · 9.7.d — Capacity Estimation →


Table of Contents


1. Why Decompose?

A monolithic blob that does everything is simple to start with, but as a system grows, it becomes a bottleneck for development speed, scalability, and reliability.

  MONOLITH                          DECOMPOSED SYSTEM
  ────────                          ──────────────────

  ┌──────────────────┐              ┌─────────┐  ┌─────────┐
  │  Everything in   │              │  Auth   │  │  Feed   │
  │  one big box     │    ──────►   │ Service │  │ Service │
  │                  │              └─────────┘  └─────────┘
  │  • Auth          │
  │  • Feed          │              ┌─────────┐  ┌─────────┐
  │  • Users         │              │  User   │  │  Media  │
  │  • Media         │              │ Service │  │ Service │
  │  • Search        │              └─────────┘  └─────────┘
  │  • Notifications │
  └──────────────────┘              ┌─────────┐
                                    │ Search  │
                                    │ Service │
                                    └─────────┘
BenefitExplanation
Independent scalingScale the read-heavy Feed Service without scaling the Auth Service
Independent deploymentDeploy a fix to Search without redeploying the whole system
Team ownershipEach team owns a service boundary — less coordination overhead
Fault isolationIf Media Service crashes, Auth and Feed keep working
Technology flexibilityFeed Service can use Redis; Search can use Elasticsearch

2. Identifying Service Boundaries

The hardest part of decomposition is deciding where to draw the lines. Here are proven heuristics.

Heuristic 1: Single Responsibility

Each service should own one business capability.

Good BoundaryBad Boundary
"User Service handles registration, authentication, and profile management""UserAndTweetService handles user profiles AND tweet creation"
"Payment Service handles all billing logic""MiscService handles payments, emails, and logging"

Heuristic 2: Data Ownership

Each service should own its own data store. If two services need the same table, that is a sign they should be one service or the data should be shared via APIs.

  GOOD: Each service owns its data         BAD: Shared database

  ┌─────────┐    ┌─────────┐               ┌─────────┐    ┌─────────┐
  │  User   │    │  Order  │               │  User   │    │  Order  │
  │ Service │    │ Service │               │ Service │    │ Service │
  └────┬────┘    └────┬────┘               └────┬────┘    └────┬────┘
       │              │                         │              │
  ┌────▼────┐    ┌────▼────┐                    └──────┬───────┘
  │ User DB │    │Order DB │                     ┌─────▼─────┐
  └─────────┘    └─────────┘                     │ Shared DB │  ← coupling!
                                                 └───────────┘

Heuristic 3: Rate of Change

Components that change frequently should be separate from stable components.

Changes OftenChanges Rarely
Recommendation algorithmUser authentication
Search rankingPayment processing
UI/BFF (Backend for Frontend)Core data models

Heuristic 4: Scaling Needs

Components with different scaling profiles should be separate.

ComponentScaling Profile
Image uploadCPU-intensive (resizing), bursty
Timeline readMemory-intensive (caching), constant high throughput
NotificationI/O-heavy (external APIs), can be async
AuthenticationLow volume, must be always available

Heuristic 5: Domain-Driven Design (DDD)

Group by bounded context — a business domain with clear boundaries.

  ┌─────────────────────────────────────────────────────────────┐
  │                     E-COMMERCE SYSTEM                        │
  │                                                              │
  │  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐    │
  │  │   CATALOG    │   │   ORDERING   │   │   SHIPPING   │    │
  │  │   Context    │   │   Context    │   │   Context    │    │
  │  │              │   │              │   │              │    │
  │  │ • Product    │   │ • Cart       │   │ • Shipment   │    │
  │  │ • Category   │   │ • Order      │   │ • Tracking   │    │
  │  │ • Inventory  │   │ • Payment    │   │ • Carrier    │    │
  │  │ • Pricing    │   │ • Invoice    │   │ • Label      │    │
  │  └──────────────┘   └──────────────┘   └──────────────┘    │
  │                                                              │
  │  Each context has its own data model and service boundary    │
  └─────────────────────────────────────────────────────────────┘

3. Data Flow Between Components

Once you have services, you need to define how data flows between them.

Synchronous Communication (Request-Response)

  ┌─────────┐    HTTP/gRPC     ┌─────────┐
  │ Service │ ───────────────► │ Service │
  │    A    │ ◄─────────────── │    B    │
  └─────────┘    response      └─────────┘

  • A WAITS for B to respond
  • Simple and predictable
  • Creates coupling: if B is slow, A is slow
  • If B is down, A may fail

Use when: You need an immediate answer (e.g., "Is this user authenticated?")

Asynchronous Communication (Event/Message)

  ┌─────────┐    publish       ┌──────────┐    consume    ┌─────────┐
  │ Service │ ───────────────► │  Queue/  │ ────────────► │ Service │
  │    A    │                  │  Topic   │               │    B    │
  └─────────┘                  └──────────┘               └─────────┘

  • A does NOT wait for B
  • Decoupled: A doesn't even know B exists
  • B can process at its own pace
  • If B is down, messages queue up (no data loss)

Use when: The caller doesn't need an immediate result (e.g., "Send a welcome email after signup")

Comparison Table

AspectSynchronous (REST/gRPC)Asynchronous (Queue/Event)
LatencyCaller waits for responseCaller returns immediately
CouplingTight (A knows about B)Loose (A publishes; anyone can subscribe)
Failure handlingA fails if B failsMessages buffer; B processes when ready
DebuggingEasy (request-response trace)Harder (events across services)
Use caseAuth check, data fetchNotifications, video processing, analytics

Hybrid Pattern (Common in Practice)

  Client                API Gateway           Tweet Service         Queue          Fan-out Worker
    │                      │                      │                  │                  │
    │  POST /tweet         │                      │                  │                  │
    │─────────────────────►│                      │                  │                  │
    │                      │  createTweet()       │                  │                  │
    │                      │─────────────────────►│                  │                  │
    │                      │                      │  publish event   │                  │
    │                      │                      │─────────────────►│                  │
    │                      │  { id: 123 }         │                  │                  │
    │                      │◄─────────────────────│                  │                  │
    │  201 Created         │                      │                  │  consume event   │
    │◄─────────────────────│                      │                  │─────────────────►│
    │                      │                      │                  │  update timelines│
    │  (synchronous         │                      │                 │  (asynchronous   │
    │   response)          │                      │                  │   processing)    │

4. API Contracts

API contracts define the interface between services. They are the "handshake" agreement.

REST API Contract Example

  POST /api/v1/tweets
  ─────────────────────
  Headers:
    Authorization: Bearer <token>
    Content-Type: application/json

  Request Body:
    {
      "text": "Hello, world!",
      "media_ids": ["img_123"]
    }

  Response (201 Created):
    {
      "id": "tweet_456",
      "text": "Hello, world!",
      "author_id": "user_789",
      "created_at": "2025-01-15T10:30:00Z",
      "media": [{ "id": "img_123", "url": "https://cdn.example.com/img_123.jpg" }]
    }

  Error Response (400 Bad Request):
    {
      "error": "TWEET_TOO_LONG",
      "message": "Tweet exceeds 280 characters"
    }

API Design Principles

PrincipleExplanation
VersioningUse /v1/, /v2/ to avoid breaking existing clients
IdempotencyRetrying the same PUT/DELETE should not cause duplicates
PaginationLarge lists must support ?page=1&limit=20 or cursor-based pagination
Consistent namingUse nouns for resources: /tweets, /users, not /createTweet
Error codesReturn meaningful error codes and messages, not just 500
Rate limitingProtect your API with per-user rate limits (e.g., 100 req/min)

Service-to-Service Contract

  ┌──────────────────────────────────────────────────────────────┐
  │                  SERVICE CONTRACT                              │
  │                                                                │
  │  Provider: User Service                                        │
  │  Consumer: Tweet Service                                       │
  │                                                                │
  │  Endpoint: GET /internal/users/{user_id}                       │
  │  Purpose:  Fetch user data for tweet enrichment                │
  │  SLA:      P99 latency < 50ms, 99.99% availability            │
  │                                                                │
  │  Response:                                                     │
  │    { "id": "user_789", "name": "Alice", "avatar_url": "..." } │
  │                                                                │
  │  What if User Service is down?                                 │
  │    → Tweet Service uses cached user data (stale up to 5 min)   │
  └──────────────────────────────────────────────────────────────┘

5. Dependency Mapping

Before finalizing your design, map out which services depend on which. Look for:

  • Circular dependencies (A calls B, B calls A — redesign needed)
  • Single points of failure (everything depends on one service)
  • Critical path (the chain of calls that determines end-to-end latency)

Dependency Diagram

  ┌────────────────────────────────────────────────────────────┐
  │                    DEPENDENCY MAP                            │
  │                                                              │
  │                   ┌───────────┐                              │
  │                   │ API       │                              │
  │                   │ Gateway   │                              │
  │                   └─────┬─────┘                              │
  │                    ┌────┼────┐                               │
  │                    ▼    ▼    ▼                               │
  │              ┌──────┐┌──────┐┌──────┐                       │
  │              │ Auth ││Tweet ││Search│                       │
  │              │  Svc ││ Svc  ││ Svc  │                       │
  │              └──┬───┘└──┬───┘└──┬───┘                       │
  │                 │       │       │                            │
  │                 ▼       ▼       ▼                            │
  │              ┌──────┐┌──────┐┌──────────┐                   │
  │              │ User ││ Feed ││ Elastic  │                   │
  │              │ Svc  ││ Svc  ││ Search   │                   │
  │              └──┬───┘└──┬───┘└──────────┘                   │
  │                 │       │                                    │
  │                 ▼       ▼                                    │
  │              ┌──────┐┌──────┐                               │
  │              │UserDB││FeedDB│                               │
  │              └──────┘└──────┘                               │
  │                                                              │
  │  Critical path (timeline read):                              │
  │  Gateway → Feed Svc → Feed DB + User Svc → User DB          │
  │  Latency budget: 200ms total                                 │
  └────────────────────────────────────────────────────────────┘

Reducing Dependencies

ProblemSolution
Service A calls B calls C calls A (circular)Introduce an event bus or merge A and C
All services call Auth Service (bottleneck)Cache auth tokens; use JWT for stateless validation
Single database for everything (SPOF)Each service owns its database; replicate for reads
Synchronous chain of 5 services (latency)Make non-critical calls async via queues

6. Component Diagram Examples

Example 1: URL Shortener

  ┌────────────┐       ┌─────────────┐       ┌──────────────┐
  │  Clients   │──────►│   Load      │──────►│  URL Service │
  │(Browser/   │ HTTPS │  Balancer   │       │              │
  │ API)       │       └─────────────┘       │ • shorten()  │
  └────────────┘                             │ • redirect() │
                                             │ • analytics()│
                                             └──────┬───────┘
                                                    │
                                   ┌────────────────┼────────────────┐
                                   ▼                ▼                ▼
                              ┌─────────┐     ┌─────────┐     ┌──────────┐
                              │  Cache  │     │   DB    │     │Analytics │
                              │ (Redis) │     │(Postgres│     │  Store   │
                              │         │     │ / Cass.)│     │(ClickHs.)│
                              │ short→  │     │ short→  │     │ clicks,  │
                              │  long   │     │  long   │     │ geo, ts  │
                              └─────────┘     └─────────┘     └──────────┘

  Data flow (redirect):
  1. Client hits /abc123
  2. URL Service checks Redis cache
  3. Cache HIT → redirect immediately (< 10ms)
  4. Cache MISS → query DB, populate cache, redirect
  5. Log click event to Analytics Store (async)

Example 2: Chat Application (WhatsApp-like)

  ┌──────────┐                    ┌──────────┐
  │ Mobile   │◄──── WebSocket ───►│  Chat    │
  │  App     │                    │ Gateway  │
  └──────────┘                    └────┬─────┘
                                       │
                          ┌────────────┼────────────┐
                          ▼            ▼            ▼
                    ┌──────────┐ ┌──────────┐ ┌──────────┐
                    │ Presence │ │ Message  │ │  Group   │
                    │ Service  │ │ Service  │ │ Service  │
                    └────┬─────┘ └────┬─────┘ └────┬─────┘
                         │            │            │
                    ┌────▼─────┐ ┌────▼─────┐ ┌───▼──────┐
                    │  Redis   │ │ Cassandra│ │  MySQL   │
                    │ (online/ │ │ (messages│ │ (groups, │
                    │  offline)│ │  by chat)│ │  members)│
                    └──────────┘ └──────────┘ └──────────┘

  Key decisions:
  • WebSocket for real-time delivery
  • Cassandra for messages (write-heavy, time-series)
  • Redis for presence (fast reads, ephemeral data)
  • MySQL for groups (relational, fewer writes)

Example 3: E-Commerce Platform

  ┌────────┐     ┌─────┐     ┌───────────┐
  │  Web   │────►│ CDN │     │    API    │
  │Browser │     │     │     │  Gateway  │
  └────────┘     └─────┘     └─────┬─────┘
                                   │
              ┌────────────────────┼────────────────────┐
              ▼                    ▼                    ▼
        ┌──────────┐        ┌──────────┐        ┌──────────┐
        │ Product  │        │  Order   │        │ Payment  │
        │ Service  │        │ Service  │        │ Service  │
        └────┬─────┘        └────┬─────┘        └────┬─────┘
             │                   │                   │
        ┌────▼─────┐        ┌────▼─────┐        ┌────▼─────┐
        │ Product  │        │ Order DB │        │ Payment  │
        │ DB + ES  │        │ (MySQL)  │        │ Gateway  │
        │ (search) │        └──────────┘        │ (Stripe) │
        └──────────┘                            └──────────┘
                   ┌────────────┐
                   │   Queue    │
                   │  (Kafka)   │
                   └──────┬─────┘
                          │
              ┌───────────┼───────────┐
              ▼           ▼           ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │Inventory │ │  Email   │ │Shipping  │
        │ Update   │ │ Notif.  │ │ Service  │
        └──────────┘ └──────────┘ └──────────┘

7. Common Decomposition Patterns

Pattern 1: Backend for Frontend (BFF)

  ┌──────────┐    ┌──────────┐    ┌──────────┐
  │  Mobile  │    │   Web    │    │  Smart   │
  │   App    │    │  Browser │    │    TV    │
  └────┬─────┘    └────┬─────┘    └────┬─────┘
       │               │               │
  ┌────▼─────┐    ┌────▼─────┐    ┌────▼─────┐
  │ Mobile   │    │  Web     │    │  TV      │
  │   BFF    │    │   BFF    │    │   BFF    │
  └────┬─────┘    └────┬─────┘    └────┬─────┘
       │               │               │
       └───────────────┼───────────────┘
                       ▼
                 ┌──────────┐
                 │ Shared   │
                 │ Services │
                 └──────────┘

Each BFF tailors the API for its specific client (mobile needs less data, TV needs bigger images, etc.).

Pattern 2: Gateway Aggregation

  Client makes ONE request:
    GET /dashboard

  API Gateway calls multiple services in parallel:
    ├── User Service      → user profile
    ├── Order Service     → recent orders
    ├── Notification Svc  → unread count
    └── Recommendation    → suggested products

  Gateway aggregates responses and returns a single JSON

Pattern 3: Strangler Fig (Migration)

Gradually replace a monolith by routing specific endpoints to new services.

  Phase 1: All traffic → Monolith
  Phase 2: /api/auth → New Auth Service; everything else → Monolith
  Phase 3: /api/auth → Auth Service; /api/feed → New Feed Service; rest → Monolith
  Phase 4: Monolith is empty → decomission

8. Anti-Patterns in Decomposition

Anti-PatternProblemFix
Distributed monolithServices are split but tightly coupled; must deploy togetherEnsure each service can deploy and fail independently
Shared databaseMultiple services read/write the same tablesEach service owns its own DB; share data via APIs or events
Chatty servicesService A makes 20 calls to Service B per requestBatch endpoints, denormalize data, or merge services
God serviceOne service does everything (it IS the monolith)Apply single responsibility; break out distinct capabilities
Nano servicesOver-decomposition; 50 services for a simple appMerge small services until each has meaningful responsibility
Circular dependenciesA depends on B, B depends on AIntroduce events, a shared library, or merge

9. Key Takeaways

  1. Decompose by business capability — each service should own one domain (User, Order, Payment).
  2. Each service owns its data — shared databases create coupling that defeats the purpose of decomposition.
  3. Use synchronous calls (REST/gRPC) when you need an immediate answer; use asynchronous messaging (queues/events) when you don't.
  4. Map your dependencies — look for circular dependencies, single points of failure, and long synchronous chains.
  5. API contracts are the glue between services — version them, document them, and design for failure.
  6. Avoid over-decomposition — start with fewer, larger services and split when there is a clear reason (scaling, team ownership, rate of change).

10. Explain-It Challenge

Without looking back, explain in your own words:

  1. Name five heuristics for deciding where to draw service boundaries.
  2. When would you use synchronous communication between services vs asynchronous? Give an example of each.
  3. What is a distributed monolith and why is it worse than a real monolith?
  4. Draw a simple component diagram for a food delivery app (Uber Eats-like) with at least 4 services.
  5. What is the strangler fig pattern and when would you use it?

Navigation: ← 9.7.b — Requirements Analysis · 9.7.d — Capacity Estimation →