Episode 9 — System Design / 9.8 — Communication and Data Layer
9.8.a — Networking Basics
Big Picture: Before you design any distributed system, you must understand how data physically travels from one machine to another. Networking is the plumbing of system design — invisible when it works, catastrophic when it fails.
Table of Contents
- How Data Travels: Client to Server
- DNS Resolution
- HTTP and HTTPS
- TCP vs UDP
- WebSockets
- gRPC and Protocol Buffers
- Latency and Bandwidth
- CDN Basics
- Network Topology in Distributed Systems
- Protocol Comparison Table
- Key Takeaways
- Explain-It Challenge
How Data Travels: Client to Server
When a user types https://shop.example.com/products into a browser, here is what happens:
USER'S BROWSER THE INTERNET SERVER
+-------------+ +----------------+
| 1. Type URL | | |
| 2. DNS |----> DNS Resolver ----> Root NS ----> .com NS ---->| 6. Receive |
| Lookup |<---- IP: 93.184.216.34 <----- Authoritative NS <---| request |
| | | |
| 3. TCP |---- SYN -------------------------------------------->| 7. Process |
| Handshake|<--- SYN-ACK ----------------------------------------| |
| |---- ACK -------------------------------------------->| |
| | | |
| 4. TLS |---- ClientHello -------------------------------------->| 8. Query DB |
| Handshake|<--- ServerHello + Certificate ----------------------| |
| |---- Key Exchange ---------------------------------->| |
| | | |
| 5. HTTP GET |---- GET /products HTTP/1.1 ------------------------>| 9. Build |
| Request |<--- 200 OK { "products": [...] } ------------------| response |
| | | |
| 10. Render | | (Connection |
| page | | may stay open |
+-------------+ | for reuse) |
+----------------+
Step-by-step breakdown:
| Step | What Happens | Time Cost |
|---|---|---|
| 1. URL entered | Browser parses scheme, host, path | ~0 ms |
| 2. DNS lookup | Translates domain name to IP address | 20-120 ms |
| 3. TCP handshake | Three-way handshake (SYN, SYN-ACK, ACK) | 1 RTT (~20-100 ms) |
| 4. TLS handshake | Negotiates encryption (HTTPS only) | 1-2 RTTs (~40-200 ms) |
| 5. HTTP request | Sends the actual GET/POST request | Depends on payload |
| 6-9. Server processing | Route, auth, business logic, DB query | 5-500 ms |
| 10. Response | Data sent back, browser renders | Depends on payload |
Interview insight: A single page load can involve 50-100+ network requests (HTML, CSS, JS, images, API calls). This is why CDNs, caching, and connection reuse (HTTP/2) matter.
DNS Resolution
DNS (Domain Name System) translates human-readable domain names into IP addresses.
Browser DNS Resolver Root NS .com NS shop.example.com NS
| | | | |
|-- shop.example.com? -->| | | |
| |-- "." (root)? ---->| | |
| |<-- "ask .com NS" --| | |
| | | | |
| |-- ".com"? -------->|---------------->| |
| |<-- "ask example.com NS" ------------| |
| | | | |
| |-- "shop.example.com"? ------------->|------------------>|
| |<-- "93.184.216.34" ------------------------------------|
| | | | |
|<-- 93.184.216.34 ------| | | |
DNS Caching Layers
+------------------+ +-------------------+ +------------------+ +-----------------+
| Browser Cache | --> | OS Cache | --> | Router/ISP Cache | --> | DNS Resolver |
| (minutes) | | (/etc/hosts, stub)| | (hours) | | (Recursive) |
+------------------+ +-------------------+ +------------------+ +-----------------+
TTL: ~60s TTL: varies TTL: varies TTL: configured
DNS Record Types
| Record | Purpose | Example |
|---|---|---|
| A | Maps domain to IPv4 | shop.example.com -> 93.184.216.34 |
| AAAA | Maps domain to IPv6 | shop.example.com -> 2606:2800:220:1:... |
| CNAME | Alias to another domain | www.shop.com -> shop.example.com |
| MX | Mail server | example.com -> mail.example.com |
| NS | Name server for the zone | example.com -> ns1.example.com |
| TXT | Arbitrary text (SPF, verification) | example.com -> "v=spf1 ..." |
System design relevance: DNS-based load balancing returns different IPs for the same domain, spreading traffic across data centers. Services like Route 53 (AWS) can do geographic or latency-based routing at the DNS level.
HTTP and HTTPS
HTTP Versions
HTTP/1.0 HTTP/1.1 HTTP/2 HTTP/3
(1996) (1997) (2015) (2022)
+---------+ +---------+ +---------+ +---------+
| 1 req | | Keep- | | Multi- | | QUIC |
| per | | alive | | plexing | | (UDP) |
| conn | | Pipelining| | Binary | | 0-RTT |
+---------+ | (rarely | | Frames | | No HOL |
| used) | | Header | | blocking|
+---------+ | Compress| +---------+
| Server |
| Push |
+---------+
| Version | Key Feature | Connection Model |
|---|---|---|
| HTTP/1.0 | Basic request-response | New TCP connection per request |
| HTTP/1.1 | Keep-alive, chunked transfer | Persistent connections, but head-of-line blocking |
| HTTP/2 | Multiplexing, header compression | Single TCP connection, multiple streams |
| HTTP/3 | QUIC (UDP-based) | No TCP head-of-line blocking, faster handshakes |
HTTPS (TLS)
HTTPS = HTTP + TLS (Transport Layer Security). TLS provides:
- Encryption — Data cannot be read in transit
- Authentication — Server proves identity via certificate
- Integrity — Data cannot be tampered with
Client Server
|---- ClientHello (supported ciphers) ---->|
|<--- ServerHello + Certificate -----------|
| |
| (Client verifies certificate against |
| trusted Certificate Authorities) |
| |
|---- Key Exchange (PreMasterSecret) ----->|
| |
| (Both sides derive session keys) |
| |
|<========= Encrypted HTTP traffic =======>|
TCP vs UDP
TCP (Transmission Control Protocol) UDP (User Datagram Protocol)
=================================== ============================
+-------+ SYN +-------+ +-------+ data +-------+
|Client |----------->|Server | |Client |-------->|Server |
| |<-----------| | | | | |
| | SYN-ACK | | +-------+ +-------+
| |----------->| | (fire and forget)
| | ACK | |
+-------+ +-------+
(connection established)
Features: Features:
- Reliable delivery (ACKs) - No connection setup
- Ordered packets - No guaranteed delivery
- Flow control - No ordering
- Congestion control - Minimal overhead
- Error checking - Lower latency
Comparison Table
| Feature | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented (handshake) | Connectionless |
| Reliability | Guaranteed delivery (retransmits) | Best effort |
| Ordering | Packets arrive in order | No ordering guarantee |
| Speed | Slower (overhead) | Faster (minimal overhead) |
| Header size | 20-60 bytes | 8 bytes |
| Use cases | Web (HTTP), email, file transfer | Video streaming, gaming, DNS, VoIP |
When to Use What
| Scenario | Choose | Why |
|---|---|---|
| REST API | TCP (HTTP) | Need reliable, ordered delivery |
| Live video streaming | UDP | Dropped frame is better than delayed stream |
| Online gaming | UDP | Low latency matters more than every packet |
| File download | TCP | Every byte must arrive correctly |
| DNS query | UDP | Small, single request-response |
| Chat messages | TCP (WebSocket) | Every message must arrive |
WebSockets
WebSockets provide full-duplex, persistent communication over a single TCP connection.
HTTP Request-Response WebSocket
======================== ========================
Client Server Client Server
|-- GET ------->| |-- GET (Upgrade) ->|
|<-- 200 -------| |<-- 101 Switching--|
| | | |
|-- GET ------->| |<== Bidirectional =>|
|<-- 200 -------| |<== messages ======>|
| | |<== flowing =======>|
|-- GET ------->| | |
|<-- 200 -------| |-- close --------->|
| | |<-- close ---------|
WebSocket Handshake
# Client Request (HTTP Upgrade)
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
# Server Response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
When to Use WebSockets
| Use Case | Why WebSockets |
|---|---|
| Chat applications | Real-time bidirectional messaging |
| Live notifications | Server pushes updates instantly |
| Collaborative editing | Multiple users editing same document |
| Live sports scores | Continuous real-time updates |
| Stock tickers | High-frequency price updates |
When NOT to Use WebSockets
- Simple CRUD operations — REST is simpler and sufficient
- Infrequent updates — Long polling or SSE may be lighter
- One-directional server-to-client — Server-Sent Events (SSE) is simpler
Alternative: Server-Sent Events (SSE)
SSE (Server-Sent Events)
========================
Client Server
|-- GET ------->|
|<-- text/event-stream --|
|<-- data: update 1 -----|
|<-- data: update 2 -----|
|<-- data: update 3 -----|
| |
(one-directional: server to client only)
| Feature | WebSocket | SSE | Long Polling |
|---|---|---|---|
| Direction | Bidirectional | Server -> Client | Server -> Client |
| Protocol | WS (TCP) | HTTP | HTTP |
| Reconnection | Manual | Automatic | Manual |
| Binary data | Yes | No (text only) | Yes |
| Complexity | Medium | Low | Low |
gRPC and Protocol Buffers
gRPC is a high-performance RPC framework built on HTTP/2, using Protocol Buffers for serialization.
REST (JSON over HTTP) gRPC (Protobuf over HTTP/2)
===================== ==========================
POST /api/users service UserService {
Content-Type: application/json rpc GetUser(UserRequest)
{ "name": "Alice", "age": 30 } returns (UserResponse);
}
~100 bytes ~20 bytes (binary)
Human-readable Not human-readable
Slower serialization 10x faster serialization
Any HTTP client works Needs gRPC client/codegen
Protocol Buffer Example
// user.proto
syntax = "proto3";
service UserService {
rpc GetUser (GetUserRequest) returns (User);
rpc ListUsers (ListUsersRequest) returns (stream User); // server streaming
rpc Chat (stream Message) returns (stream Message); // bidirectional
}
message GetUserRequest {
string user_id = 1;
}
message User {
string id = 1;
string name = 2;
string email = 3;
int32 age = 4;
}
gRPC Communication Patterns
1. Unary 2. Server Streaming 3. Client Streaming 4. Bidirectional
============ ================== ================== ================
Client Server Client Server Client Server Client Server
|--req--->| |--req--->| |--msg1-->| |--msg-->|
|<--res---| |<--msg1--| |--msg2-->| |<--msg--|
|<--msg2--| |--msg3-->| |--msg-->|
|<--msg3--| |<--res---| |<--msg--|
When to Use gRPC vs REST
| Factor | REST | gRPC |
|---|---|---|
| Client type | Browsers, any HTTP client | Backend services (needs client lib) |
| Payload size | Larger (JSON text) | Smaller (binary protobuf) |
| Performance | Good | Excellent (2-10x faster) |
| Browser support | Native | Limited (needs gRPC-Web proxy) |
| Streaming | Workarounds (SSE, WebSocket) | Native (4 patterns) |
| Schema | OpenAPI/Swagger (optional) | .proto files (mandatory, strongly typed) |
| Debugging | Easy (human-readable JSON) | Harder (binary) |
| Best for | Public APIs, web apps | Microservice-to-microservice |
Latency and Bandwidth
Latency
Latency = time for a single unit of data to travel from source to destination.
LATENCY NUMBERS EVERY DEVELOPER SHOULD KNOW
============================================
L1 cache reference .......................... 0.5 ns
L2 cache reference .......................... 7 ns
Main memory reference ....................... 100 ns
SSD random read .......................... 16,000 ns (16 us)
HDD random read ....................... 2,000,000 ns (2 ms)
Send packet CA -> Netherlands -> CA ... 150,000,000 ns (150 ms)
NETWORK LATENCY (approximate round-trip)
========================================
Same data center ......................... 0.5 ms
Same region (e.g., us-east) .............. 1-5 ms
Cross-region (US East -> US West) ........ 30-70 ms
Cross-continent (US -> Europe) ........... 80-150 ms
Cross-world (US -> Australia) ............ 150-300 ms
Bandwidth
Bandwidth = maximum data that can be transferred per unit of time.
BANDWIDTH COMPARISON
====================
3G Mobile ................ 1-5 Mbps
4G/LTE Mobile ............ 10-50 Mbps
5G Mobile ................ 100-1000 Mbps
Home Wi-Fi ............... 50-500 Mbps
Ethernet (Office) ........ 1 Gbps
Data center internal ...... 10-100 Gbps
AWS region to region ...... 5-25 Gbps
Latency vs Bandwidth
| Concept | Analogy | Matters When |
|---|---|---|
| Latency | How long it takes a truck to drive from A to B | Many small requests (API calls) |
| Bandwidth | How much cargo the truck can carry | Large data transfers (video, backups) |
| Throughput | Actual cargo delivered per hour (real-world) | Overall system capacity |
Key insight: For most web applications, latency dominates. Reducing round trips (batching, caching, CDNs) often matters more than increasing bandwidth.
CDN Basics
A CDN (Content Delivery Network) caches content at edge servers geographically close to users.
WITHOUT CDN WITH CDN
=========== ========
User (Tokyo) User (Tokyo)
| |
|--- 150ms round trip -----> Origin |--- 5ms ---> CDN Edge (Tokyo)
| (US East) Server | |
|<--- 150ms ------------------| |<-- 5ms -----|
| |
Total: ~300ms+ per request Total: ~10ms per request
(Origin only hit on cache miss)
What CDNs Cache
| Content Type | Cacheable? | TTL Strategy |
|---|---|---|
| Static files (JS, CSS, images) | Always | Long TTL (days/weeks) |
| HTML pages | Often | Short TTL (minutes) or invalidation |
| API responses | Sometimes | Short TTL with cache headers |
| User-specific data | Rarely | Usually not cached at CDN |
| Video/audio | Always | Long TTL |
CDN Cache Flow
User ----> CDN Edge
|
|-- Cache HIT? --> Return cached content (fast)
|
|-- Cache MISS? --> Fetch from origin server
| |
| |--> Store in edge cache
| |--> Return to user
Popular CDN Providers
| Provider | Strengths |
|---|---|
| CloudFlare | DDoS protection, free tier, Workers (edge compute) |
| AWS CloudFront | Deep AWS integration, Lambda@Edge |
| Akamai | Largest network, enterprise-grade |
| Fastly | Real-time purging, edge compute (Compute@Edge) |
| Google Cloud CDN | Integrates with GCP load balancer |
Network Topology in Distributed Systems
Common Patterns
1. CLIENT-SERVER 2. PEER-TO-PEER
================== ================
+--------+ +--------+ +---+ +---+
| Client |--->| Server | | A |<--->| B |
+--------+ +--------+ +---+ +---+
+--------+ ^ ^ \ / ^
| Client |--------| | \ / |
+--------+ v X v
+---+/ \+---+
| C |<->| D |
+---+ +---+
3. HUB AND SPOKE 4. MESH (Microservices)
================== ======================
+---+ +-----+ +---+ +---+ +---+ +---+
| A |---->| HUB |<----| B | | A |<->| B |<->| C |
+---+ +-----+ +---+ +---+ +---+ +---+
^ ^ | | |
+---+ | | +---+ +---+ +---+ +---+
| C |------+ +-------| D | | D |<->| E |<->| F |
+---+ +---+ +---+ +---+ +---+
Service Mesh
In microservice architectures, a service mesh manages service-to-service communication:
+-------------------+ +-------------------+
| Service A | | Service B |
| +-------------+ | | +-------------+ |
| | App Code | | | | App Code | |
| +------+------+ | | +------+------+ |
| | | | | |
| +------v------+ | mTLS | +------v------+ |
| | Sidecar |<------------> | Sidecar | |
| | Proxy | | | | Proxy | |
| | (Envoy) | | | | (Envoy) | |
| +-------------+ | | +-------------+ |
+-------------------+ +-------------------+
| |
+---------> Control Plane <----+
(Istio, Linkerd)
The sidecar proxy handles:
- Service discovery — Finding other services
- Load balancing — Distributing requests
- mTLS — Mutual TLS encryption
- Retries and circuit breaking — Resilience
- Observability — Metrics, tracing, logging
Protocol Comparison Table
| Protocol | Layer | Connection | Use Case | Latency | Complexity |
|---|---|---|---|---|---|
| HTTP/1.1 | Application | TCP, persistent | Web apps, REST APIs | Medium | Low |
| HTTP/2 | Application | TCP, multiplexed | Web apps, APIs | Low | Low |
| HTTP/3 | Application | QUIC (UDP) | Modern web | Lowest | Medium |
| WebSocket | Application | TCP, persistent | Real-time bidirectional | Low | Medium |
| SSE | Application | HTTP, persistent | Real-time server push | Low | Low |
| gRPC | Application | HTTP/2 | Microservices RPC | Very Low | Medium |
| TCP | Transport | Connection-oriented | Reliable data transfer | Medium | N/A |
| UDP | Transport | Connectionless | Streaming, gaming | Low | N/A |
| DNS | Application | UDP (usually) | Name resolution | Variable | N/A |
Key Takeaways
- Every network request adds latency — DNS lookup, TCP handshake, TLS handshake, data transfer. Minimize round trips.
- TCP guarantees delivery; UDP guarantees speed — Choose based on whether correctness or timeliness matters more.
- WebSockets are for real-time bidirectional communication — Do not use them for simple request-response patterns.
- gRPC excels for internal microservice communication — Binary serialization, streaming, and strong typing make it faster than REST for service-to-service calls.
- CDNs reduce latency by moving data closer to users — Static assets should almost always be served from a CDN.
- DNS is a potential bottleneck and a load-balancing tool — Caching and geographic routing at the DNS layer are common in system design.
- Latency dominates bandwidth for most web apps — Focus on reducing round trips, not just increasing pipe size.
Explain-It Challenge
Scenario: Your friend asks you how a video call (like Zoom) works at the network level. Explain:
- Why video and audio use UDP instead of TCP
- What happens when a packet is lost during a call
- Why there is a slight delay when talking to someone on another continent
- How a CDN would NOT help with a live video call (but helps with pre-recorded videos)
Keep your explanation under 2 minutes, as if talking to a non-technical person.
Next -> 9.8.b — API Design