Episode 9 — System Design / 9.8 — Communication and Data Layer
9.8.e — Data Consistency
Big Picture: In a single-server world, consistency is easy — there is one copy of the data and every read sees the latest write. In distributed systems, data lives on multiple nodes, and keeping them in sync is one of the hardest problems in computer science. Every system design interview requires you to reason about consistency tradeoffs.
Table of Contents
- What Is Data Consistency?
- Strong vs Eventual Consistency
- Consistency Levels
- Consistency in Read Replicas
- Distributed Transactions
- Two-Phase Commit (2PC)
- Saga Pattern
- CRDTs (Conflict-Free Replicated Data Types)
- Real-World Tradeoffs
- Key Takeaways
- Explain-It Challenge
What Is Data Consistency?
Consistency = all nodes in a distributed system agree on the current value of data.
CONSISTENT STATE INCONSISTENT STATE
================ ==================
User writes: balance = $500 User writes: balance = $500
Node A: balance = $500 Node A: balance = $500
Node B: balance = $500 Node B: balance = $300 <-- stale!
Node C: balance = $500 Node C: balance = $500
All nodes agree. Node B has not received the update yet.
Any read returns $500. A read to Node B returns wrong data.
Strong vs Eventual Consistency
Strong Consistency
Every read returns the most recent write. The system behaves as if there is a single copy of the data.
Timeline (Strong Consistency):
==============================
t=0 Client A: WRITE balance = $500 to Primary
t=1 Primary: replicate to ALL replicas (wait for ACK)
t=2 All replicas: ACK received
t=3 Primary: returns success to Client A
t=4 Client B: READ from ANY node --> always returns $500
Guarantee: After a write completes, all subsequent reads
see that write, no matter which node they read from.
Eventual Consistency
If no new writes occur, all nodes will eventually converge to the same value. The window of inconsistency is usually milliseconds to seconds.
Timeline (Eventual Consistency):
================================
t=0 Client A: WRITE balance = $500 to Primary
t=1 Primary: returns success to Client A immediately
t=2 Primary: starts replicating to replicas (async)
t=3 Client B: READ from Replica 1 --> returns $300 (stale!)
t=4 Replica 1: receives update, now has $500
t=5 Client B: READ from Replica 1 --> returns $500 (correct)
Guarantee: Data will converge, but reads during replication
may return stale data.
Comparison
| Property | Strong Consistency | Eventual Consistency |
|---|---|---|
| Read guarantee | Always latest value | May be stale |
| Write latency | Higher (must wait for replicas) | Lower (return immediately) |
| Availability | Lower (fails if replicas unreachable) | Higher (works even if some nodes down) |
| Throughput | Lower | Higher |
| Complexity | Lower (simpler mental model) | Higher (handle stale reads) |
| Examples | Bank balances, inventory counts | Social media likes, view counts |
Consistency Levels
Many databases offer tunable consistency, letting you choose per-query.
Cassandra Consistency Levels
WRITE Consistency Levels:
=========================
Replication Factor = 3 (data on 3 nodes)
ONE: Write to 1 node, return success.
+-----+ +-----+ +-----+
|[ACK]| | | | | --> Fastest, least safe
+-----+ +-----+ +-----+
QUORUM: Write to majority (2 of 3), return success.
+-----+ +-----+ +-----+
|[ACK]| |[ACK]| | | --> Balanced
+-----+ +-----+ +-----+
ALL: Write to all 3 nodes, return success.
+-----+ +-----+ +-----+
|[ACK]| |[ACK]| |[ACK]| --> Slowest, safest
+-----+ +-----+ +-----+
The Quorum Formula
For strong consistency:
R + W > N
Where:
N = number of replicas
W = number of nodes that must ACK a write
R = number of nodes that must respond to a read
Example (N=3):
- W=2, R=2: 2+2=4 > 3 --> STRONG (at least one node has latest)
- W=1, R=1: 1+1=2 < 3 --> EVENTUAL (might read stale data)
- W=3, R=1: 3+1=4 > 3 --> STRONG (all nodes always current)
- W=1, R=3: 1+3=4 > 3 --> STRONG (read all, one must be current)
DynamoDB Consistency Options
| Read Type | Behavior | Cost | Latency |
|---|---|---|---|
| Eventually Consistent | May return stale data | 0.5 RCU per 4KB | Lower |
| Strongly Consistent | Always returns latest | 1 RCU per 4KB | Higher |
Interview tip: Always specify your consistency requirements when discussing a system. "We need strong consistency for payment data but can tolerate eventual consistency for the activity feed."
Consistency in Read Replicas
The Problem
User Alice: Creates a post
t=0 [Primary DB] <-- POST /posts {text: "Hello!"}
Primary writes and returns 201 Created
t=1 Alice: GET /posts (my posts)
Load balancer sends to Replica 2
[Replica 2] has not received the replication yet!
Returns: [] (empty list)
Alice: "Where is my post?! The app is broken!"
Solutions
| Pattern | How It Works | Tradeoff |
|---|---|---|
| Read-your-writes | After write, route that user's reads to primary for N seconds | Adds primary load |
| Sticky sessions | Pin user to a specific replica | Uneven load distribution |
| Causal consistency | Track write timestamps, ensure reads see at least that timestamp | Complex implementation |
| Synchronous replication | Primary waits for replicas before returning | Slower writes |
| Client-side merge | Client includes write in local state until confirmed | Complex client logic |
Read-Your-Writes Implementation
Request Flow:
=============
1. Client writes: POST /posts
Server returns: { "id": 42, "write_token": "ts:1705334400" }
2. Client reads: GET /posts
Header: X-Write-Token: ts:1705334400
3. Load balancer checks:
- Replica's last-applied timestamp >= 1705334400?
- YES: Route to replica (safe)
- NO: Route to primary (ensure consistency)
Distributed Transactions
In a microservice architecture, a single business operation may span multiple services and databases. How do you ensure atomicity?
E-Commerce Order Placement:
===========================
1. Order Service: Create order record
2. Payment Service: Charge credit card
3. Inventory Service: Reserve items
4. Notification: Send confirmation email
What if step 3 fails after step 2 succeeded?
Customer was charged but items were not reserved!
Approaches
| Approach | How | Pros | Cons |
|---|---|---|---|
| 2PC | Coordinator asks all to prepare, then commit | Strong consistency | Blocking, single point of failure |
| Saga | Chain of local transactions with compensations | Non-blocking, resilient | Eventual consistency, complex rollback |
| Outbox | Write to DB + outbox table, publish events | Reliable event publishing | Extra table, polling/CDC needed |
Two-Phase Commit (2PC)
A protocol for coordinating a transaction across multiple databases/services.
PHASE 1: PREPARE (Voting)
=========================
Coordinator Participant A Participant B Participant C
| | | |
|--- PREPARE ----------->| | |
|--- PREPARE ---------------------------> | |
|--- PREPARE ---------------------------------------------> |
| | | |
|<-- VOTE YES -----------| | |
|<-- VOTE YES ----------------------------| |
|<-- VOTE YES ----------------------------------------------|
| | | |
All voted YES? Proceed to Phase 2.
Any voted NO? ABORT all.
PHASE 2: COMMIT
================
Coordinator Participant A Participant B Participant C
| | | |
|--- COMMIT ------------>| | |
|--- COMMIT ------------------------------>| |
|--- COMMIT ------------------------------------------------>|
| | | |
|<-- ACK ----------------| | |
|<-- ACK ----------------------------------| |
|<-- ACK ---------------------------------------------------|
| | | |
Transaction complete!
2PC Problems
| Problem | Description |
|---|---|
| Blocking | If coordinator crashes after PREPARE but before COMMIT, participants are stuck holding locks |
| Latency | 2 round trips minimum, participants hold locks throughout |
| Single point of failure | Coordinator is a bottleneck |
| Not partition tolerant | Cannot proceed if coordinator is partitioned from participants |
| Reduced throughput | Locks are held across the entire distributed transaction |
Interview insight: 2PC is used within a single database system (e.g., PostgreSQL with foreign data wrappers) but is generally avoided in microservice architectures due to its blocking nature.
Saga Pattern
A saga is a sequence of local transactions. Each step has a compensating transaction that undoes its effect if a later step fails.
Choreography-Based Saga
Each service listens for events and reacts independently.
Order Service Payment Service Inventory Service Notification
| | | |
|-- OrderCreated --->| | |
| | | |
| |-- PaymentCharged ->| |
| | | |
| | |-- ItemsReserved ->|
| | | |
| | | SendEmail
| | | |
IF Inventory fails:
| | | |
| |<- ReserveFailed ---| |
| | | |
| |-- RefundPayment -->| |
|<-- OrderCancelled -| | |
Orchestration-Based Saga
A central orchestrator coordinates the steps.
+---------------------+
| Saga Orchestrator |
| (Order Saga) |
+----------+----------+
|
Step 1: |--- Create Order --------> Order Service
|<-- Order Created ---------|
|
Step 2: |--- Charge Payment ------> Payment Service
|<-- Payment Charged -------|
|
Step 3: |--- Reserve Items -------> Inventory Service
|<-- FAILED! Items OOS ----|
|
Compensate:
Step 2c: |--- Refund Payment ------> Payment Service
|<-- Refund Complete -------|
|
Step 1c: |--- Cancel Order --------> Order Service
|<-- Order Cancelled -------|
Choreography vs Orchestration
| Factor | Choreography | Orchestration |
|---|---|---|
| Coupling | Loose (event-driven) | Tighter (orchestrator knows all steps) |
| Complexity | Distributed across services | Centralized in orchestrator |
| Debugging | Harder (trace events across services) | Easier (single flow to follow) |
| Single point of failure | None | Orchestrator |
| Adding steps | Publish new event, add listener | Modify orchestrator |
| Best for | Simple sagas (2-3 steps) | Complex sagas (5+ steps, branching) |
Saga with Compensating Transactions
| Step | Action | Compensation |
|---|---|---|
| 1. Create order | order.status = 'pending' | order.status = 'cancelled' |
| 2. Reserve inventory | item.reserved += qty | item.reserved -= qty |
| 3. Charge payment | payment.charge(amount) | payment.refund(amount) |
| 4. Confirm order | order.status = 'confirmed' | (no compensation needed, last step) |
| 5. Send notification | email.send(confirmation) | email.send(cancellation) |
Saga State Machine
+----------+ +----------+ +----------+ +-----------+
| Order | --> | Payment | --> | Inventory| --> | Completed |
| Created | | Charged | | Reserved | | |
+----+-----+ +----+-----+ +----+-----+ +-----------+
| | |
v v v
+---------+ +-----------+ +-----------+
| Order | | Payment | | Inventory |
| Cancel | <-- | Refunded | <-- | Release |
+---------+ +-----------+ +-----------+
|
v
+---------+
| Failed |
+---------+
CRDTs (Conflict-Free Replicated Data Types)
CRDTs are data structures that can be replicated across nodes, updated independently, and always converge to a consistent state without coordination.
The Problem CRDTs Solve
Traditional approach:
=====================
Node A: counter = 5, increment -> counter = 6
Node B: counter = 5, increment -> counter = 6
Merge: counter = 6? or 7? CONFLICT!
CRDT approach (G-Counter):
==========================
Node A: {A: 3, B: 2} -> increment -> {A: 4, B: 2}
Node B: {A: 3, B: 2} -> increment -> {A: 3, B: 3}
Merge: {A: max(4,3), B: max(2,3)} = {A: 4, B: 3} = total 7
No conflict! Always correct!
Common CRDT Types
| CRDT | What It Models | Operations | Example Use |
|---|---|---|---|
| G-Counter | Grow-only counter | Increment | View counts, like counts |
| PN-Counter | Counter (up and down) | Increment, Decrement | Inventory (careful), votes |
| G-Set | Grow-only set | Add | Tags, visited pages |
| OR-Set | Set (add and remove) | Add, Remove | Shopping cart items |
| LWW-Register | Last-Writer-Wins | Set | User profile fields |
| LWW-Map | Map with LWW per key | Set, Delete | Collaborative document fields |
Shopping Cart as OR-Set CRDT
User's cart is replicated on 2 nodes:
Node A (US): Node B (EU):
Cart: {laptop, mouse} Cart: {laptop, mouse}
User adds "keyboard" (via US): User removes "mouse" (via EU):
Cart: {laptop, mouse, keyboard} Cart: {laptop}
Merge (OR-Set rules):
- "keyboard" was added -> include it
- "mouse" was removed -> exclude it
- Result: {laptop, keyboard}
Both nodes converge to {laptop, keyboard}. No conflicts.
Interview note: CRDTs are used by Riak, Redis (CRDTs in Redis Enterprise), Apple (Notes app sync), Figma (collaborative editing). You do not need to implement them, but knowing they exist and what they solve is valuable.
Real-World Tradeoffs
Case Study: Consistency Choices at Scale
| Company | System | Consistency Choice | Why |
|---|---|---|---|
| Amazon | Shopping cart | Eventual (AP) | Cart should always be available; merge conflicts later |
| Amazon | Order placement | Strong (CP) | Cannot charge twice or lose an order |
| Tweet timeline | Eventual | Seeing a tweet 2 seconds late is acceptable | |
| Direct messages | Strong | Losing a message is not acceptable | |
| Uber | Ride matching | Strong | Cannot assign same driver to two riders |
| Uber | Trip history | Eventual | Viewing yesterday's trip a second late is fine |
| Netflix | View count | Eventual | Approximate counts are fine for recommendations |
| Stripe | Payment processing | Strong | Financial transactions demand exactness |
The Consistency Spectrum
STRONG EVENTUAL
<=================================================================>
| | | | |
Linearizable Sequential Causal Session Eventual
Every read Operations Causally Per-session Will converge
sees latest appear in related ops guarantees eventually
write some total are ordered (read-your-
order writes)
Bank Distributed Social media Shopping View counts
balances locks comments cart Analytics
(replies after
parent)
Practical Guidelines
| Data Type | Consistency Needed | Why |
|---|---|---|
| Financial transactions | Strong | Money must be exact |
| Inventory counts | Strong | Over-selling is costly |
| User authentication | Strong | Security cannot be eventual |
| Social media likes | Eventual | 142 vs 143 likes does not matter |
| Search indexes | Eventual | Slight delay in searchability is fine |
| Analytics/metrics | Eventual | Aggregates smooth out inconsistencies |
| Configuration/feature flags | Strong | Incorrect config causes outages |
| User profiles | Session (read-your-writes) | User should see their own changes |
Key Takeaways
- Strong consistency is expensive — It requires coordination between nodes, adding latency and reducing availability. Use it only when correctness demands it.
- Eventual consistency is not "broken" — It is a deliberate choice that enables higher availability and performance. Most user-facing data can tolerate brief inconsistency.
- The quorum formula (R + W > N) is your tool for tuning consistency — Memorize it for interviews.
- 2PC is a coordination protocol, not a microservice pattern — It blocks, does not scale, and should be avoided across service boundaries.
- Sagas are the standard for distributed transactions in microservices — Choreography for simple flows, orchestration for complex ones.
- CRDTs eliminate conflicts by design — They are the most elegant solution for concurrent updates across replicas, but they only work for specific data types.
- Different data in the same system needs different consistency — Payment data needs strong consistency while activity feeds can be eventual. Design per use case, not per system.
- Read-your-writes consistency solves 80% of user-facing issues — Users expect to see their own changes immediately, even if other users see them with a delay.
Explain-It Challenge
Scenario: You are the tech lead for a ride-sharing app (like Uber). Your system must handle:
- Rider requests a ride — Must not double-book a driver
- Payment processing — Must not charge twice
- Driver location updates — Sent every 3 seconds from the driver's phone
- Trip history — Riders can view past trips
- Surge pricing — Prices adjust based on demand
For each feature, specify:
- What consistency level you would use and why
- Whether you would use a saga, 2PC, or neither
- What happens if a network partition occurs during this operation
Back to 9.8 README