Episode 6 — Scaling Reliability Microservices Web3 / 6.1 — Microservice Foundations
6.1.e -- Service Communication Patterns
How services talk to each other determines the reliability, performance, and coupling of your entire system. Choose the wrong communication pattern and you build a distributed monolith. Choose wisely and you unlock true independence.
Navigation << 6.1.d Database per Service | 6.1.e Service Communication Patterns | Exercise Questions >>
1. The Two Fundamental Styles
Synchronous (Request-Response):
Client --> [Service A] --request--> [Service B]
<--response--
Client waits. Service A waits. Everyone waits.
Asynchronous (Event-Driven):
Client --> [Service A] --event--> [Message Broker] --event--> [Service B]
Client gets immediate response.
Service B processes when ready.
| Dimension | Synchronous | Asynchronous |
|---|---|---|
| Client waits? | Yes | No (fire and forget) |
| Coupling | Temporal (both must be running) | Decoupled (producer does not know consumer) |
| Latency | Sum of all hops | Immediate response + eventual processing |
| Error handling | Immediate (HTTP status codes) | Deferred (dead-letter queues, retries) |
| Complexity | Lower (familiar HTTP) | Higher (message brokers, idempotency) |
| Throughput | Limited by slowest service | High (buffer in broker) |
| Best for | Queries, real-time needs | Commands, background processing, events |
2. Synchronous Communication
2.1 REST (HTTP/JSON)
The most common synchronous pattern. Services expose HTTP endpoints and clients call them with standard HTTP verbs.
// user-service/index.js -- REST API
const express = require('express');
const app = express();
app.use(express.json());
app.get('/users/:id', async (req, res) => {
const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
if (!user.rows[0]) return res.status(404).json({ error: 'User not found' });
res.json(user.rows[0]);
});
app.post('/users', async (req, res) => {
const { name, email } = req.body;
const result = await db.query(
'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
[name, email]
);
res.status(201).json(result.rows[0]);
});
app.listen(3001);
// order-service/clients/userClient.js -- calling User Service via REST
const axios = require('axios');
const USER_SERVICE_URL = process.env.USER_SERVICE_URL || 'http://localhost:3001';
class UserClient {
async getUser(userId) {
try {
const response = await axios.get(`${USER_SERVICE_URL}/users/${userId}`, {
timeout: 3000, // 3-second timeout -- never wait forever
});
return response.data;
} catch (err) {
if (err.response?.status === 404) {
return null; // User not found -- handle gracefully
}
throw new Error(`User Service unavailable: ${err.message}`);
}
}
async getUsersBatch(userIds) {
// Batch call to avoid N+1 problem
const response = await axios.post(`${USER_SERVICE_URL}/users/batch`, {
ids: userIds,
}, { timeout: 5000 });
return response.data;
}
}
module.exports = new UserClient();
2.2 REST Best Practices Between Services
// 1. ALWAYS set timeouts
const response = await axios.get(url, { timeout: 3000 });
// 2. Use retry with exponential backoff
const axiosRetry = require('axios-retry');
const client = axios.create({ timeout: 3000 });
axiosRetry(client, {
retries: 3,
retryDelay: (retryCount) => {
return Math.pow(2, retryCount) * 1000; // 1s, 2s, 4s
},
retryCondition: (error) => {
// Only retry on network errors or 5xx
return axiosRetry.isNetworkOrIdempotentRequestError(error)
|| error.response?.status >= 500;
},
});
// 3. Use circuit breaker to avoid cascading failures
const CircuitBreaker = require('opossum');
function getUserCircuit(userId) {
return axios.get(`${USER_SERVICE_URL}/users/${userId}`, { timeout: 3000 });
}
const breaker = new CircuitBreaker(getUserCircuit, {
timeout: 5000, // If function takes longer than 5s, trip
errorThresholdPercentage: 50, // Trip if 50% of requests fail
resetTimeout: 10000, // Try again after 10s
});
breaker.fallback((userId) => {
// Return cached data or a default
return { id: userId, name: 'Unknown', cached: true };
});
// Usage
const user = await breaker.fire(userId);
2.3 gRPC
gRPC uses Protocol Buffers (binary format) over HTTP/2. It is faster than REST/JSON but requires more setup.
// user.proto -- Protocol Buffer definition
syntax = "proto3";
package user;
service UserService {
rpc GetUser (GetUserRequest) returns (UserResponse);
rpc CreateUser (CreateUserRequest) returns (UserResponse);
rpc GetUsersBatch (GetUsersBatchRequest) returns (UsersBatchResponse);
}
message GetUserRequest {
string id = 1;
}
message CreateUserRequest {
string name = 1;
string email = 2;
}
message UserResponse {
string id = 1;
string name = 2;
string email = 3;
string created_at = 4;
}
message GetUsersBatchRequest {
repeated string ids = 1;
}
message UsersBatchResponse {
repeated UserResponse users = 1;
}
// user-service/grpcServer.js
const grpc = require('@grpc/grpc-js');
const protoLoader = require('@grpc/proto-loader');
const packageDef = protoLoader.loadSync('./user.proto');
const userProto = grpc.loadPackageDefinition(packageDef).user;
const server = new grpc.Server();
server.addService(userProto.UserService.service, {
GetUser: async (call, callback) => {
const user = await db.query('SELECT * FROM users WHERE id = $1', [call.request.id]);
if (!user.rows[0]) {
return callback({ code: grpc.status.NOT_FOUND, message: 'User not found' });
}
callback(null, user.rows[0]);
},
CreateUser: async (call, callback) => {
const { name, email } = call.request;
const result = await db.query(
'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
[name, email]
);
callback(null, result.rows[0]);
},
GetUsersBatch: async (call, callback) => {
const result = await db.query('SELECT * FROM users WHERE id = ANY($1)', [call.request.ids]);
callback(null, { users: result.rows });
},
});
server.bindAsync('0.0.0.0:50051', grpc.ServerCredentials.createInsecure(), () => {
console.log('gRPC User Service on :50051');
});
// order-service/clients/userGrpcClient.js
const grpc = require('@grpc/grpc-js');
const protoLoader = require('@grpc/proto-loader');
const packageDef = protoLoader.loadSync('./user.proto');
const userProto = grpc.loadPackageDefinition(packageDef).user;
const client = new userProto.UserService(
process.env.USER_GRPC_URL || 'localhost:50051',
grpc.credentials.createInsecure()
);
// Promisify for async/await usage
function getUser(id) {
return new Promise((resolve, reject) => {
client.GetUser({ id }, (err, response) => {
if (err) return reject(err);
resolve(response);
});
});
}
function getUsersBatch(ids) {
return new Promise((resolve, reject) => {
client.GetUsersBatch({ ids }, (err, response) => {
if (err) return reject(err);
resolve(response.users);
});
});
}
module.exports = { getUser, getUsersBatch };
2.4 REST vs gRPC Comparison
| Dimension | REST (HTTP/JSON) | gRPC (HTTP/2 + Protobuf) |
|---|---|---|
| Format | JSON (text, human-readable) | Protobuf (binary, compact) |
| Speed | Slower (text parsing) | Faster (binary serialisation) |
| Streaming | Limited (SSE, WebSockets separate) | Built-in bidirectional streaming |
| Contract | OpenAPI/Swagger (optional) | .proto files (required, strict) |
| Browser support | Native | Requires gRPC-Web proxy |
| Debugging | Easy (curl, Postman) | Harder (need gRPC tools) |
| Adoption | Universal | Growing, strong in internal services |
| Best for | External APIs, simple services | Internal service-to-service, high-throughput |
3. Asynchronous Communication
3.1 Message Queues (Point-to-Point)
A producer sends a message to a queue. One consumer processes it. The message is removed after processing.
Producer --> [Queue] --> Consumer
Example: "Process this payment" --> [payment-queue] --> Payment Worker
// order-service/publisher.js -- Publishing to RabbitMQ queue
const amqp = require('amqplib');
let channel;
async function connect() {
const conn = await amqp.connect(process.env.RABBITMQ_URL || 'amqp://localhost');
channel = await conn.createChannel();
await channel.assertQueue('payment-processing', { durable: true });
await channel.assertQueue('email-notifications', { durable: true });
}
async function requestPayment(orderData) {
channel.sendToQueue(
'payment-processing',
Buffer.from(JSON.stringify({
orderId: orderData.id,
userId: orderData.userId,
amount: orderData.totalAmount,
timestamp: new Date().toISOString(),
})),
{ persistent: true } // survive broker restart
);
}
async function requestEmailNotification(emailData) {
channel.sendToQueue(
'email-notifications',
Buffer.from(JSON.stringify(emailData)),
{ persistent: true }
);
}
module.exports = { connect, requestPayment, requestEmailNotification };
// payment-service/worker.js -- Consuming from RabbitMQ queue
const amqp = require('amqplib');
async function startWorker() {
const conn = await amqp.connect(process.env.RABBITMQ_URL || 'amqp://localhost');
const channel = await conn.createChannel();
await channel.assertQueue('payment-processing', { durable: true });
channel.prefetch(1); // Process one message at a time
console.log('Payment worker waiting for messages...');
channel.consume('payment-processing', async (msg) => {
const payment = JSON.parse(msg.content.toString());
console.log(`Processing payment for order ${payment.orderId}: $${payment.amount}`);
try {
// Process payment (call Stripe, etc.)
const result = await processPayment(payment);
if (result.success) {
// Publish success event
channel.sendToQueue(
'order-updates',
Buffer.from(JSON.stringify({
orderId: payment.orderId,
status: 'paid',
paymentId: result.paymentId,
})),
{ persistent: true }
);
} else {
// Publish failure event
channel.sendToQueue(
'order-updates',
Buffer.from(JSON.stringify({
orderId: payment.orderId,
status: 'payment_failed',
reason: result.error,
})),
{ persistent: true }
);
}
channel.ack(msg); // Acknowledge -- message is removed from queue
} catch (err) {
console.error('Payment processing error:', err);
// Negative acknowledge -- requeue for retry
channel.nack(msg, false, true);
}
});
}
startWorker();
3.2 Events / Pub-Sub (One-to-Many)
A producer publishes an event to a topic. Multiple consumers can subscribe and react independently.
Producer --> [Topic: order.created] --> Consumer A (Inventory)
--> Consumer B (Notification)
--> Consumer C (Analytics)
Each consumer gets a copy of the event.
They process independently. They do not know about each other.
// order-service/events.js -- Publishing events to RabbitMQ exchange (pub/sub)
const amqp = require('amqplib');
let channel;
async function connect() {
const conn = await amqp.connect(process.env.RABBITMQ_URL || 'amqp://localhost');
channel = await conn.createChannel();
// 'topic' exchange allows routing by pattern
await channel.assertExchange('domain-events', 'topic', { durable: true });
}
async function publish(eventType, data) {
const event = {
eventId: `evt_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
eventType,
timestamp: new Date().toISOString(),
data,
};
channel.publish(
'domain-events',
eventType, // routing key, e.g., "order.created"
Buffer.from(JSON.stringify(event)),
{ persistent: true }
);
console.log(`Published event: ${eventType}`);
}
module.exports = { connect, publish };
// Usage in order route:
// await publish('order.created', { orderId: order.id, userId, items, total });
// inventory-service/eventConsumer.js -- Subscribing to events
const amqp = require('amqplib');
async function subscribeToOrderEvents() {
const conn = await amqp.connect(process.env.RABBITMQ_URL || 'amqp://localhost');
const channel = await conn.createChannel();
await channel.assertExchange('domain-events', 'topic', { durable: true });
// Each service gets its own queue (durable, survives restarts)
const q = await channel.assertQueue('inventory-order-events', { durable: true });
// Bind to specific event patterns
await channel.bindQueue(q.queue, 'domain-events', 'order.created');
await channel.bindQueue(q.queue, 'domain-events', 'order.cancelled');
channel.consume(q.queue, async (msg) => {
const event = JSON.parse(msg.content.toString());
switch (event.eventType) {
case 'order.created':
await reserveInventory(event.data);
break;
case 'order.cancelled':
await releaseInventory(event.data);
break;
}
channel.ack(msg);
});
console.log('Inventory Service subscribed to order events');
}
// notification-service/eventConsumer.js -- Another subscriber to the SAME events
async function subscribeToEvents() {
const conn = await amqp.connect(process.env.RABBITMQ_URL || 'amqp://localhost');
const channel = await conn.createChannel();
await channel.assertExchange('domain-events', 'topic', { durable: true });
const q = await channel.assertQueue('notification-events', { durable: true });
// Subscribe to multiple event types
await channel.bindQueue(q.queue, 'domain-events', 'order.*'); // all order events
await channel.bindQueue(q.queue, 'domain-events', 'payment.*'); // all payment events
await channel.bindQueue(q.queue, 'domain-events', 'user.created');
channel.consume(q.queue, async (msg) => {
const event = JSON.parse(msg.content.toString());
switch (event.eventType) {
case 'order.created':
await sendEmail(event.data.userId, 'Order Confirmation', `Order ${event.data.orderId} received!`);
break;
case 'payment.succeeded':
await sendEmail(event.data.userId, 'Payment Received', `Payment of $${event.data.amount} confirmed.`);
break;
case 'user.created':
await sendEmail(event.data.email, 'Welcome!', 'Thanks for signing up.');
break;
}
channel.ack(msg);
});
console.log('Notification Service subscribed to events');
}
4. Message Brokers: RabbitMQ vs Kafka
| Dimension | RabbitMQ | Apache Kafka |
|---|---|---|
| Model | Message queue (traditional) | Distributed log (append-only) |
| Delivery | Push to consumers | Consumers pull from log |
| Message retention | Deleted after acknowledgement | Retained for configurable duration |
| Ordering | Per queue | Per partition |
| Throughput | Thousands/sec | Millions/sec |
| Replay | Not possible (message gone after ack) | Possible (consumers can re-read the log) |
| Consumer groups | Competing consumers on one queue | Consumer groups with partition assignment |
| Best for | Task queues, RPC, routing | Event streaming, event sourcing, high throughput |
| Complexity | Lower | Higher (ZooKeeper/KRaft, partitions, offsets) |
RabbitMQ mental model:
Queue is a mailbox. Message goes in, one consumer takes it out. Gone.
Kafka mental model:
Log is a newspaper archive. Events are appended. Consumers read at their own pace.
Old events are still there for replay.
5. When to Use Each Pattern
5.1 Decision Guide
+----------------------------------------------------+
| "Does the caller NEED the result right now?" |
| |
| YES --> Synchronous (REST or gRPC) |
| | |
| +-- "Is it internal service-to-service?" |
| | YES and high throughput --> gRPC |
| | YES and simple --> REST |
| | NO (external/browser) --> REST |
| |
| NO --> Asynchronous |
| | |
| +-- "Does exactly ONE consumer process this?" |
| | YES --> Message Queue (point-to-point) |
| | |
| +-- "Do MULTIPLE consumers need this event?" |
| YES --> Pub/Sub (topic/exchange) |
+----------------------------------------------------+
5.2 Common Scenarios
| Scenario | Pattern | Why |
|---|---|---|
| User clicks "View Profile" | Sync REST | Need immediate response to render UI |
| User places an order | Sync REST + Async events | Immediate acknowledgment, then async processing |
| Resize uploaded image | Async queue | Heavy processing; user does not wait |
| Notify multiple services of a state change | Pub/sub events | Multiple consumers, decoupled |
| Real-time analytics pipeline | Kafka streaming | High throughput, ordered, replayable |
| Request credit score from external API | Sync REST with circuit breaker | External dependency; need the result now |
| Nightly batch report | Async queue | Scheduled, no real-time need |
| Sync search index after product update | Async event | Eventual consistency is fine for search |
6. Latency Implications
6.1 Synchronous Call Chains
Client --> Gateway --> Service A --> Service B --> Service C
|
Total latency = Gateway + A + B + C + network overhead x3
If each service takes 50ms and network adds 5ms per hop:
Total = 50 + 5 + 50 + 5 + 50 + 5 = 165ms
If Service C is slow (200ms):
Total = 50 + 5 + 50 + 5 + 200 + 5 = 315ms
The SLOWEST service determines overall latency.
Mitigation strategies:
// 1. Parallelise independent calls
const [user, products, recommendations] = await Promise.all([
userClient.getUser(userId),
catalogClient.getProducts(productIds),
recommendationClient.getRecommendations(userId),
]);
// Total latency = max(user, products, recommendations) instead of sum
// 2. Use caching to avoid repeated calls
const NodeCache = require('node-cache');
const cache = new NodeCache({ stdTTL: 300 }); // 5-minute TTL
async function getUserCached(userId) {
const cached = cache.get(`user:${userId}`);
if (cached) return cached;
const user = await userClient.getUser(userId);
cache.set(`user:${userId}`, user);
return user;
}
// 3. Set aggressive timeouts
const response = await axios.get(url, {
timeout: 2000, // Do not wait more than 2 seconds
});
6.2 Asynchronous Latency
Client --> Service A --> Message Broker --> Service B (processes later)
|
+-- Gets immediate response (202 Accepted)
|
+-- Total perceived latency for client: ~50ms (just Service A)
|
+-- Service B processes 100ms later, but client does not wait
7. Coupling Implications
7.1 Synchronous Coupling
Temporal coupling: Both services must be running at the same time.
Service A --REST--> Service B
If B is down, A fails.
Behavioural coupling: A depends on B's interface.
If B changes its API, A breaks.
7.2 Asynchronous Decoupling
No temporal coupling: Producer and consumer do not need to be running simultaneously.
Service A --event--> Broker --event--> Service B
If B is down, the broker holds the message. B processes when it comes back.
Reduced behavioural coupling: Services agree on event schemas, not API contracts.
The producer does not know (or care) who consumes the event.
8. Choreography vs Orchestration
8.1 Choreography (Decentralised)
Each service knows what to do when it receives an event. No central coordinator.
Order Inventory Payment Notification
| | | |
|--OrderCreated-| | |
| | | |
| InventoryReserved---| |
| | | |
| | PaymentSucceeded-----|
| | | |
|<-----OrderConfirmed---------| |
| | | EmailSent
Pros: Loose coupling, no single point of failure, each service is autonomous. Cons: Hard to see the full flow, can become "event spaghetti" with many services.
8.2 Orchestration (Centralised)
A central orchestrator tells each service what to do and when.
+------------------+
| Order Saga |
| Orchestrator |
+--------+---------+
|
+-----------------+-----------------+
| | |
1. Reserve 2. Charge 3. Confirm
Inventory Payment Order
| | |
+----+----+ +---+---+ +----+----+
|Inventory| |Payment| | Order |
| Service | |Service| | Service |
+---------+ +-------+ +---------+
Pros: Clear flow visibility, easier to debug, handles complex workflows. Cons: Central point of failure, orchestrator can become a "god service."
8.3 When to Use Each
| Criteria | Choreography | Orchestration |
|---|---|---|
| Number of steps | 2-4 | 5+ |
| Workflow complexity | Simple, linear | Branching, conditional, parallel |
| Team structure | Each team owns their events | Central platform team available |
| Observability needs | Low (can trace via events) | High (need to see full workflow) |
| Failure handling | Each service handles its own | Centralised compensation logic |
9. Idempotency: The Safety Net
Messages can be delivered more than once (network retries, broker redelivery). Every consumer must be idempotent -- processing the same message twice must produce the same result.
// BAD: Not idempotent -- balance decreases every time message is processed
async function handlePayment(event) {
await db.query('UPDATE accounts SET balance = balance - $1 WHERE user_id = $2',
[event.amount, event.userId]);
}
// If this runs twice: user is charged twice!
// GOOD: Idempotent -- uses idempotency key
async function handlePayment(event) {
// Check if this event was already processed
const existing = await db.query(
'SELECT id FROM processed_events WHERE event_id = $1',
[event.eventId]
);
if (existing.rows.length > 0) {
console.log(`Event ${event.eventId} already processed, skipping`);
return; // Already processed -- safe to skip
}
const client = await db.connect();
try {
await client.query('BEGIN');
await client.query(
'UPDATE accounts SET balance = balance - $1 WHERE user_id = $2',
[event.amount, event.userId]
);
await client.query(
'INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())',
[event.eventId]
);
await client.query('COMMIT');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
10. Communication Patterns at a Glance
+----------------+-------------------+-------------------+
| | Request-Response | Event-Driven |
+----------------+-------------------+-------------------+
| Synchronous | REST, gRPC | (rare -- webhooks)|
+----------------+-------------------+-------------------+
| Asynchronous | Async request via | Pub/sub, |
| | queue (RPC over | event streaming |
| | message broker) | (Kafka, RabbitMQ) |
+----------------+-------------------+-------------------+
11. Key Takeaways
- Default to synchronous REST for simplicity. Switch to async or gRPC only when you have a clear reason.
- Asynchronous communication decouples services in time -- the producer and consumer do not need to be running simultaneously.
- Use message queues for work distribution (one consumer per message). Use pub/sub for event notification (many consumers per event).
- gRPC beats REST for internal high-throughput communication but adds complexity and requires proto file management.
- Synchronous call chains amplify latency. Use
Promise.allto parallelise independent calls and set aggressive timeouts. - Circuit breakers prevent cascading failures when a downstream service is slow or down.
- Choreography works for simple flows; orchestration works for complex workflows. Most real systems use a mix of both.
- Every consumer must be idempotent. Messages will be delivered more than once. Design for it.
12. Explain-It Challenge
-
You are building a food delivery app. When a customer places an order, the system must: validate the restaurant is open, confirm menu items are available, calculate delivery fee, charge the customer, and notify the restaurant. Design the communication pattern. Which calls are synchronous? Which are asynchronous? Why?
-
A developer proposes using Kafka for all inter-service communication, even simple request-response queries like "get user by ID." Explain why this is overkill and what you would recommend instead.
-
Your Order Service calls the Inventory Service, which calls the Pricing Service, which calls the Discount Service. The total latency is 800ms. Propose three concrete strategies to reduce this latency to under 200ms.
Navigation << 6.1.d Database per Service | 6.1.e Service Communication Patterns | Exercise Questions >>