Episode 9 — System Design / 9.8 — Communication and Data Layer

9.8.e — Data Consistency

Big Picture: In a single-server world, consistency is easy — there is one copy of the data and every read sees the latest write. In distributed systems, data lives on multiple nodes, and keeping them in sync is one of the hardest problems in computer science. Every system design interview requires you to reason about consistency tradeoffs.


Table of Contents

  1. What Is Data Consistency?
  2. Strong vs Eventual Consistency
  3. Consistency Levels
  4. Consistency in Read Replicas
  5. Distributed Transactions
  6. Two-Phase Commit (2PC)
  7. Saga Pattern
  8. CRDTs (Conflict-Free Replicated Data Types)
  9. Real-World Tradeoffs
  10. Key Takeaways
  11. Explain-It Challenge

What Is Data Consistency?

Consistency = all nodes in a distributed system agree on the current value of data.

  CONSISTENT STATE                        INCONSISTENT STATE
  ================                        ==================

  User writes: balance = $500            User writes: balance = $500

  Node A: balance = $500                 Node A: balance = $500
  Node B: balance = $500                 Node B: balance = $300  <-- stale!
  Node C: balance = $500                 Node C: balance = $500
  
  All nodes agree.                       Node B has not received the update yet.
  Any read returns $500.                 A read to Node B returns wrong data.

Strong vs Eventual Consistency

Strong Consistency

Every read returns the most recent write. The system behaves as if there is a single copy of the data.

  Timeline (Strong Consistency):
  ==============================
  
  t=0   Client A: WRITE balance = $500 to Primary
  t=1   Primary: replicate to ALL replicas (wait for ACK)
  t=2   All replicas: ACK received
  t=3   Primary: returns success to Client A
  t=4   Client B: READ from ANY node --> always returns $500
  
  Guarantee: After a write completes, all subsequent reads
  see that write, no matter which node they read from.

Eventual Consistency

If no new writes occur, all nodes will eventually converge to the same value. The window of inconsistency is usually milliseconds to seconds.

  Timeline (Eventual Consistency):
  ================================
  
  t=0   Client A: WRITE balance = $500 to Primary
  t=1   Primary: returns success to Client A immediately
  t=2   Primary: starts replicating to replicas (async)
  t=3   Client B: READ from Replica 1 --> returns $300 (stale!)
  t=4   Replica 1: receives update, now has $500
  t=5   Client B: READ from Replica 1 --> returns $500 (correct)
  
  Guarantee: Data will converge, but reads during replication
  may return stale data.

Comparison

PropertyStrong ConsistencyEventual Consistency
Read guaranteeAlways latest valueMay be stale
Write latencyHigher (must wait for replicas)Lower (return immediately)
AvailabilityLower (fails if replicas unreachable)Higher (works even if some nodes down)
ThroughputLowerHigher
ComplexityLower (simpler mental model)Higher (handle stale reads)
ExamplesBank balances, inventory countsSocial media likes, view counts

Consistency Levels

Many databases offer tunable consistency, letting you choose per-query.

Cassandra Consistency Levels

  WRITE Consistency Levels:
  =========================
  
  Replication Factor = 3 (data on 3 nodes)
  
  ONE:     Write to 1 node, return success.
           +-----+  +-----+  +-----+
           |[ACK]|  |     |  |     |    --> Fastest, least safe
           +-----+  +-----+  +-----+
  
  QUORUM:  Write to majority (2 of 3), return success.
           +-----+  +-----+  +-----+
           |[ACK]|  |[ACK]|  |     |    --> Balanced
           +-----+  +-----+  +-----+
  
  ALL:     Write to all 3 nodes, return success.
           +-----+  +-----+  +-----+
           |[ACK]|  |[ACK]|  |[ACK]|    --> Slowest, safest
           +-----+  +-----+  +-----+

The Quorum Formula

  For strong consistency:

  R + W > N

  Where:
    N = number of replicas
    W = number of nodes that must ACK a write
    R = number of nodes that must respond to a read
  
  Example (N=3):
  - W=2, R=2: 2+2=4 > 3 --> STRONG (at least one node has latest)
  - W=1, R=1: 1+1=2 < 3 --> EVENTUAL (might read stale data)
  - W=3, R=1: 3+1=4 > 3 --> STRONG (all nodes always current)
  - W=1, R=3: 1+3=4 > 3 --> STRONG (read all, one must be current)

DynamoDB Consistency Options

Read TypeBehaviorCostLatency
Eventually ConsistentMay return stale data0.5 RCU per 4KBLower
Strongly ConsistentAlways returns latest1 RCU per 4KBHigher

Interview tip: Always specify your consistency requirements when discussing a system. "We need strong consistency for payment data but can tolerate eventual consistency for the activity feed."


Consistency in Read Replicas

The Problem

  User Alice: Creates a post
  
  t=0  [Primary DB] <-- POST /posts {text: "Hello!"}
       Primary writes and returns 201 Created
       
  t=1  Alice: GET /posts (my posts)
       Load balancer sends to Replica 2
       
       [Replica 2] has not received the replication yet!
       Returns: [] (empty list)
       
       Alice: "Where is my post?! The app is broken!"

Solutions

PatternHow It WorksTradeoff
Read-your-writesAfter write, route that user's reads to primary for N secondsAdds primary load
Sticky sessionsPin user to a specific replicaUneven load distribution
Causal consistencyTrack write timestamps, ensure reads see at least that timestampComplex implementation
Synchronous replicationPrimary waits for replicas before returningSlower writes
Client-side mergeClient includes write in local state until confirmedComplex client logic

Read-Your-Writes Implementation

  Request Flow:
  =============

  1. Client writes: POST /posts
     Server returns: { "id": 42, "write_token": "ts:1705334400" }

  2. Client reads: GET /posts
     Header: X-Write-Token: ts:1705334400

  3. Load balancer checks:
     - Replica's last-applied timestamp >= 1705334400?
     - YES: Route to replica (safe)
     - NO: Route to primary (ensure consistency)

Distributed Transactions

In a microservice architecture, a single business operation may span multiple services and databases. How do you ensure atomicity?

  E-Commerce Order Placement:
  ===========================
  
  1. Order Service:     Create order record
  2. Payment Service:   Charge credit card
  3. Inventory Service: Reserve items
  4. Notification:      Send confirmation email
  
  What if step 3 fails after step 2 succeeded?
  Customer was charged but items were not reserved!

Approaches

ApproachHowProsCons
2PCCoordinator asks all to prepare, then commitStrong consistencyBlocking, single point of failure
SagaChain of local transactions with compensationsNon-blocking, resilientEventual consistency, complex rollback
OutboxWrite to DB + outbox table, publish eventsReliable event publishingExtra table, polling/CDC needed

Two-Phase Commit (2PC)

A protocol for coordinating a transaction across multiple databases/services.

  PHASE 1: PREPARE (Voting)
  =========================
  
  Coordinator              Participant A     Participant B     Participant C
       |                        |                 |                 |
       |--- PREPARE ----------->|                 |                 |
       |--- PREPARE --------------------------->  |                 |
       |--- PREPARE --------------------------------------------->  |
       |                        |                 |                 |
       |<-- VOTE YES -----------|                 |                 |
       |<-- VOTE YES ----------------------------|                 |
       |<-- VOTE YES ----------------------------------------------|
       |                        |                 |                 |
  
  All voted YES? Proceed to Phase 2.
  Any voted NO? ABORT all.

  PHASE 2: COMMIT
  ================
  
  Coordinator              Participant A     Participant B     Participant C
       |                        |                 |                 |
       |--- COMMIT ------------>|                 |                 |
       |--- COMMIT ------------------------------>|                 |
       |--- COMMIT ------------------------------------------------>|
       |                        |                 |                 |
       |<-- ACK ----------------|                 |                 |
       |<-- ACK ----------------------------------|                 |
       |<-- ACK ---------------------------------------------------|
       |                        |                 |                 |
  Transaction complete!

2PC Problems

ProblemDescription
BlockingIf coordinator crashes after PREPARE but before COMMIT, participants are stuck holding locks
Latency2 round trips minimum, participants hold locks throughout
Single point of failureCoordinator is a bottleneck
Not partition tolerantCannot proceed if coordinator is partitioned from participants
Reduced throughputLocks are held across the entire distributed transaction

Interview insight: 2PC is used within a single database system (e.g., PostgreSQL with foreign data wrappers) but is generally avoided in microservice architectures due to its blocking nature.


Saga Pattern

A saga is a sequence of local transactions. Each step has a compensating transaction that undoes its effect if a later step fails.

Choreography-Based Saga

Each service listens for events and reacts independently.

  Order Service       Payment Service     Inventory Service    Notification
       |                    |                    |                   |
       |-- OrderCreated --->|                    |                   |
       |                    |                    |                   |
       |                    |-- PaymentCharged ->|                   |
       |                    |                    |                   |
       |                    |                    |-- ItemsReserved ->|
       |                    |                    |                   |
       |                    |                    |               SendEmail
       |                    |                    |                   |
  
  IF Inventory fails:
       |                    |                    |                   |
       |                    |<- ReserveFailed ---|                   |
       |                    |                    |                   |
       |                    |-- RefundPayment -->|                   |
       |<-- OrderCancelled -|                    |                   |

Orchestration-Based Saga

A central orchestrator coordinates the steps.

  +---------------------+
  |   Saga Orchestrator  |
  |   (Order Saga)       |
  +----------+----------+
             |
    Step 1:  |--- Create Order --------> Order Service
             |<-- Order Created ---------|
             |
    Step 2:  |--- Charge Payment ------> Payment Service
             |<-- Payment Charged -------|
             |
    Step 3:  |--- Reserve Items -------> Inventory Service
             |<-- FAILED! Items OOS ----|
             |
    Compensate:
    Step 2c: |--- Refund Payment ------> Payment Service
             |<-- Refund Complete -------|
             |
    Step 1c: |--- Cancel Order --------> Order Service
             |<-- Order Cancelled -------|

Choreography vs Orchestration

FactorChoreographyOrchestration
CouplingLoose (event-driven)Tighter (orchestrator knows all steps)
ComplexityDistributed across servicesCentralized in orchestrator
DebuggingHarder (trace events across services)Easier (single flow to follow)
Single point of failureNoneOrchestrator
Adding stepsPublish new event, add listenerModify orchestrator
Best forSimple sagas (2-3 steps)Complex sagas (5+ steps, branching)

Saga with Compensating Transactions

StepActionCompensation
1. Create orderorder.status = 'pending'order.status = 'cancelled'
2. Reserve inventoryitem.reserved += qtyitem.reserved -= qty
3. Charge paymentpayment.charge(amount)payment.refund(amount)
4. Confirm orderorder.status = 'confirmed'(no compensation needed, last step)
5. Send notificationemail.send(confirmation)email.send(cancellation)

Saga State Machine

  +----------+     +----------+     +----------+     +-----------+
  |  Order   | --> | Payment  | --> | Inventory| --> | Completed |
  |  Created |     | Charged  |     | Reserved |     |           |
  +----+-----+     +----+-----+     +----+-----+     +-----------+
       |                |                |
       v                v                v
  +---------+     +-----------+     +-----------+
  |  Order  |     | Payment   |     | Inventory |
  | Cancel  | <-- | Refunded  | <-- | Release   |
  +---------+     +-----------+     +-----------+
       |
       v
  +---------+
  |  Failed |
  +---------+

CRDTs (Conflict-Free Replicated Data Types)

CRDTs are data structures that can be replicated across nodes, updated independently, and always converge to a consistent state without coordination.

The Problem CRDTs Solve

  Traditional approach:
  =====================
  Node A: counter = 5, increment -> counter = 6
  Node B: counter = 5, increment -> counter = 6
  
  Merge: counter = 6? or 7? CONFLICT!
  
  CRDT approach (G-Counter):
  ==========================
  Node A: {A: 3, B: 2} -> increment -> {A: 4, B: 2}
  Node B: {A: 3, B: 2} -> increment -> {A: 3, B: 3}
  
  Merge: {A: max(4,3), B: max(2,3)} = {A: 4, B: 3} = total 7
  No conflict! Always correct!

Common CRDT Types

CRDTWhat It ModelsOperationsExample Use
G-CounterGrow-only counterIncrementView counts, like counts
PN-CounterCounter (up and down)Increment, DecrementInventory (careful), votes
G-SetGrow-only setAddTags, visited pages
OR-SetSet (add and remove)Add, RemoveShopping cart items
LWW-RegisterLast-Writer-WinsSetUser profile fields
LWW-MapMap with LWW per keySet, DeleteCollaborative document fields

Shopping Cart as OR-Set CRDT

  User's cart is replicated on 2 nodes:
  
  Node A (US):                    Node B (EU):
  Cart: {laptop, mouse}          Cart: {laptop, mouse}
  
  User adds "keyboard" (via US):  User removes "mouse" (via EU):
  Cart: {laptop, mouse, keyboard} Cart: {laptop}
  
  Merge (OR-Set rules):
  - "keyboard" was added -> include it
  - "mouse" was removed -> exclude it
  - Result: {laptop, keyboard}
  
  Both nodes converge to {laptop, keyboard}. No conflicts.

Interview note: CRDTs are used by Riak, Redis (CRDTs in Redis Enterprise), Apple (Notes app sync), Figma (collaborative editing). You do not need to implement them, but knowing they exist and what they solve is valuable.


Real-World Tradeoffs

Case Study: Consistency Choices at Scale

CompanySystemConsistency ChoiceWhy
AmazonShopping cartEventual (AP)Cart should always be available; merge conflicts later
AmazonOrder placementStrong (CP)Cannot charge twice or lose an order
TwitterTweet timelineEventualSeeing a tweet 2 seconds late is acceptable
TwitterDirect messagesStrongLosing a message is not acceptable
UberRide matchingStrongCannot assign same driver to two riders
UberTrip historyEventualViewing yesterday's trip a second late is fine
NetflixView countEventualApproximate counts are fine for recommendations
StripePayment processingStrongFinancial transactions demand exactness

The Consistency Spectrum

  STRONG                                                          EVENTUAL
  <=================================================================>
  |              |              |              |              |
  Linearizable   Sequential    Causal         Session       Eventual
  
  Every read     Operations    Causally       Per-session    Will converge
  sees latest    appear in     related ops    guarantees     eventually
  write          some total    are ordered    (read-your-
                 order                        writes)
  
  Bank           Distributed   Social media   Shopping       View counts
  balances       locks         comments       cart           Analytics
                               (replies after 
                                parent)

Practical Guidelines

Data TypeConsistency NeededWhy
Financial transactionsStrongMoney must be exact
Inventory countsStrongOver-selling is costly
User authenticationStrongSecurity cannot be eventual
Social media likesEventual142 vs 143 likes does not matter
Search indexesEventualSlight delay in searchability is fine
Analytics/metricsEventualAggregates smooth out inconsistencies
Configuration/feature flagsStrongIncorrect config causes outages
User profilesSession (read-your-writes)User should see their own changes

Key Takeaways

  1. Strong consistency is expensive — It requires coordination between nodes, adding latency and reducing availability. Use it only when correctness demands it.
  2. Eventual consistency is not "broken" — It is a deliberate choice that enables higher availability and performance. Most user-facing data can tolerate brief inconsistency.
  3. The quorum formula (R + W > N) is your tool for tuning consistency — Memorize it for interviews.
  4. 2PC is a coordination protocol, not a microservice pattern — It blocks, does not scale, and should be avoided across service boundaries.
  5. Sagas are the standard for distributed transactions in microservices — Choreography for simple flows, orchestration for complex ones.
  6. CRDTs eliminate conflicts by design — They are the most elegant solution for concurrent updates across replicas, but they only work for specific data types.
  7. Different data in the same system needs different consistency — Payment data needs strong consistency while activity feeds can be eventual. Design per use case, not per system.
  8. Read-your-writes consistency solves 80% of user-facing issues — Users expect to see their own changes immediately, even if other users see them with a delay.

Explain-It Challenge

Scenario: You are the tech lead for a ride-sharing app (like Uber). Your system must handle:

  1. Rider requests a ride — Must not double-book a driver
  2. Payment processing — Must not charge twice
  3. Driver location updates — Sent every 3 seconds from the driver's phone
  4. Trip history — Riders can view past trips
  5. Surge pricing — Prices adjust based on demand

For each feature, specify:

  • What consistency level you would use and why
  • Whether you would use a saga, 2PC, or neither
  • What happens if a network partition occurs during this operation

Back to 9.8 README