Episode 6 — Scaling Reliability Microservices Web3 / 6.1 — Microservice Foundations

6.1 -- Interview Questions

11 curated interview questions with model answers, ranging from beginner to advanced. Each includes why interviewers ask it and what they are looking for.


Navigation README | Exercise Questions | Interview Questions | Quick Revision


How to Use This Material

  1. Read the question and try to answer it out loud (as if in an interview) before reading the model answer.
  2. Focus on the "Why Interviewers Ask" section -- understanding the intent helps you tailor your answer.
  3. Practise with the Quick-Fire Table at the bottom for rapid recall.
  4. Time yourself: Beginner answers should take 1-2 minutes. Intermediate: 2-3 minutes. Advanced: 3-5 minutes.
  5. Use the STAR method (Situation, Task, Action, Result) for scenario-based questions where applicable.

Beginner (4 Questions)

Q1. What is the difference between a monolithic application and a microservices application?

Why Interviewers Ask: They want to confirm you understand both architectures at a fundamental level and can articulate the trade-offs, not just parrot buzzwords.

Model Answer:

A monolithic application packages all functionality -- user management, orders, payments, notifications -- into a single codebase that is built, tested, and deployed as one unit. All modules share the same database and communicate via in-process function calls.

A microservices application decomposes the system into small, independently deployable services. Each service owns a specific business capability, has its own database, and communicates with other services over the network (typically HTTP or message queues).

The key trade-offs are:

MonolithMicroservices
DeploymentSingle unit, all-or-nothingIndependent per service
DatabaseShared, ACID transactionsSeparate, eventual consistency
ScalingScale everything togetherScale individual services
ComplexitySimpler to develop and debugDistributed systems complexity
Team couplingHigh (shared codebase)Low (independent repos)

A monolith is the right starting point for most applications. Microservices make sense when team size, deployment frequency, or scaling needs outgrow what a monolith can support.


Q2. What does "database per service" mean and why is it important?

Why Interviewers Ask: This tests whether you understand data ownership -- the most critical (and most violated) principle of microservices.

Model Answer:

Database per service means each microservice has its own data store that only it can read from and write to. No other service accesses that database directly -- all access goes through the service's API.

This is important for three reasons:

  1. Independent deployment. If services share a database, a schema change in one service can break another. With separate databases, each service controls its own schema.

  2. Independent scaling. A read-heavy Catalog Service might use a replicated PostgreSQL setup, while an Analytics Service uses ClickHouse for columnar queries. Shared databases force a one-size-fits-all solution.

  3. Loose coupling. A shared database is hidden coupling -- it looks like the services are independent, but they are actually tightly connected through shared tables.

The trade-off is that you lose ACID transactions across services. You handle this with eventual consistency patterns like the Saga pattern, where each service performs a local transaction and publishes an event. If a step fails, compensating transactions undo the previous steps.


Q3. What is the Strangler Fig Pattern?

Why Interviewers Ask: They want to know if you understand incremental migration -- big-bang rewrites are a red flag.

Model Answer:

The Strangler Fig Pattern is a migration strategy for gradually replacing a monolith with microservices, named after the strangler fig tree that slowly envelops its host.

The process works like this:

  1. Identify a seam -- choose a module in the monolith with clear boundaries (I usually start with the one that has the most independent data and the lowest risk).
  2. Build the new service -- replicate that module's functionality as a standalone microservice with its own database.
  3. Route traffic -- place an API gateway or reverse proxy in front of the monolith. Route requests for the extracted functionality to the new service; everything else still goes to the monolith.
  4. Migrate data -- move the relevant data from the monolith's database to the new service's database.
  5. Remove the old code from the monolith.
  6. Repeat for the next module.

The key advantage is that this is incremental and reversible. At any point, you can stop extracting and the system still works. This is far safer than a big-bang rewrite, which has a historically high failure rate.


Q4. What is the difference between synchronous and asynchronous communication between services?

Why Interviewers Ask: This is foundational for any microservices discussion. They want to see that you understand the coupling and latency implications.

Model Answer:

In synchronous communication, the calling service sends a request and waits for a response before continuing. REST over HTTP and gRPC are the most common examples. The caller is blocked until the callee responds or times out.

// Synchronous: Order Service waits for User Service response
const user = await axios.get(`http://user-service/users/${userId}`);
// Code below this line does not execute until the response arrives

In asynchronous communication, the calling service publishes a message or event and continues immediately without waiting. A message broker (RabbitMQ, Kafka) holds the message until the consuming service processes it.

// Asynchronous: Order Service publishes event and moves on
await publishEvent('order.created', { orderId, userId, items });
// Code continues immediately -- no waiting

The key difference is temporal coupling. Synchronous communication requires both services to be running simultaneously. Asynchronous communication decouples them -- if the consumer is down, the broker holds the message until it recovers.

I use synchronous communication when the caller needs the result immediately (e.g., fetching user data to render a page). I use asynchronous communication when the work can be deferred (e.g., sending an email notification, updating a search index).


Intermediate (4 Questions)

Q5. How do you decide where to draw service boundaries?

Why Interviewers Ask: This separates candidates who can design systems from those who only implement them. Bad boundaries are the most common microservices failure mode.

Model Answer:

I use Domain-Driven Design to identify service boundaries, specifically the concept of bounded contexts.

The process starts with understanding the business domain. I work with domain experts to identify the distinct business capabilities -- in an e-commerce system, these might be Catalog, Ordering, Payment, Shipping, Identity, and Notification.

Each bounded context becomes a candidate service. The key insight is that the same concept can have different meanings in different contexts. "Product" in the Catalog context has images, descriptions, and reviews. "Product" in the Shipping context has weight, dimensions, and fragility ratings. Each context stores only what it needs.

I validate boundaries with several heuristics:

  • Can this service be deployed independently? If deploying Service A always requires deploying Service B, the boundary is wrong.
  • Does one team own this service? If ownership is unclear, the boundary is unclear.
  • Is the coupling low? If two services communicate for 80% of requests, they should probably be one service.
  • Does this map to a business capability? "User Management" is a capability. "Database Access Layer" is not.

The biggest mistakes I watch for are: splitting too thin (nano-services that always deploy together), splitting by technical layer instead of business capability, and sharing databases across services.


Q6. Explain the Saga pattern. How do you handle failures in a distributed transaction?

Why Interviewers Ask: This is the core data consistency challenge in microservices. They want to see you understand both the pattern and the failure modes.

Model Answer:

The Saga pattern replaces ACID transactions with a sequence of local transactions, each in a different service, coordinated through events or commands. If any step fails, compensating transactions undo the previous steps.

There are two coordination approaches:

Choreography: Each service listens for events and reacts independently. When the Order Service creates an order, it publishes OrderCreated. The Inventory Service hears this, reserves stock, and publishes InventoryReserved. The Payment Service hears that, charges the card, and publishes PaymentSucceeded. If payment fails, it publishes PaymentFailed, and the Inventory Service compensates by releasing the stock.

Orchestration: A central Saga Orchestrator directs each step. It tells the Inventory Service to reserve, waits for a response, tells the Payment Service to charge, and if anything fails, it runs compensations in reverse order.

For failure handling, the critical concept is compensating transactions. Every forward action must have an inverse:

  • Reserve inventory -> Release inventory
  • Charge credit card -> Refund credit card
  • Create order -> Cancel order

A tricky edge case is when the compensation itself fails. For example, the refund call to the payment provider times out. In that case, I write to a dead-letter queue and alert the operations team. Some failures require human intervention.

I use choreography for simple sagas with 2-4 steps and orchestration for complex workflows with 5+ steps, conditional branching, or parallel execution.


Q7. When would you NOT use microservices?

Why Interviewers Ask: This is a trap question for candidates who only know the hype. Interviewers value pragmatism over trend-following.

Model Answer:

I would not use microservices in several situations:

Small teams (under 8-10 engineers). The operational overhead of microservices -- separate CI/CD pipelines, distributed tracing, service discovery, data consistency management -- requires dedicated effort. A small team will spend more time on infrastructure than on features.

Early-stage startups. The first priority is finding product-market fit, which means rapid iteration and frequent pivots. Microservices slow this down. If you pivot and 3 of your 5 services are now irrelevant, you have wasted significant engineering effort.

Simple domains. If the application is a single-domain CRUD application (a blog, a dashboard, a portfolio), there is nothing meaningful to decompose. Microservices add complexity without adding value.

No DevOps maturity. If the team does not have automated CI/CD, centralised logging, and monitoring infrastructure, microservices will be a nightmare. You need the operational foundation before you add distributed systems complexity.

When "we might need to scale" is the only reason. This is premature optimisation. Start with a monolith, monitor for bottlenecks, and extract services when you have concrete evidence that specific components need independent scaling.

The pattern I recommend is: start with a monolith, enforce module boundaries (modular monolith), and extract services only when deployment independence, scaling needs, or team autonomy demands it.


Q8. Compare REST and gRPC for inter-service communication. When would you choose each?

Why Interviewers Ask: They want to see that you understand the technical trade-offs beyond "gRPC is faster."

Model Answer:

REST uses HTTP/1.1 with JSON payloads. It is human-readable, universally supported, and easy to debug with tools like curl and Postman. The trade-off is that JSON serialisation and text-based parsing are slower than binary formats.

gRPC uses HTTP/2 with Protocol Buffers (binary serialisation). It is significantly faster -- Protocol Buffers are 3-10x smaller than JSON and faster to serialise. HTTP/2 supports multiplexing (multiple requests over one connection), header compression, and bidirectional streaming.

I choose REST for:

  • External-facing APIs (browser clients, third-party integrations) -- universal support.
  • Simple internal services with low throughput -- the performance difference is negligible and the debugging ease is valuable.
  • Teams without gRPC experience -- the learning curve and tooling setup are non-trivial.

I choose gRPC for:

  • High-throughput internal service-to-service calls where the latency and bandwidth savings matter (e.g., a recommendation service making thousands of calls per second).
  • Strict contract enforcement -- .proto files are required, so API contracts are always explicit and versioned.
  • Streaming use cases -- gRPC's native streaming support is ideal for real-time data feeds between services.

In practice, many systems use REST for external APIs and gRPC for internal service-to-service communication. They are not mutually exclusive.


Advanced (3 Questions)

Q9. Design the microservices architecture for an online marketplace (buyers, sellers, products, orders, payments, reviews). Address service boundaries, data ownership, communication patterns, and one failure scenario.

Why Interviewers Ask: This is a full system design question. They evaluate your ability to decompose a complex domain, make trade-offs, and handle failure modes.

Model Answer:

Service Decomposition (Bounded Contexts):

ServiceOwnsKey Operations
Identity ServiceUsers, auth, profilesRegister, login, get profile
Catalog ServiceProducts, categories, searchList products, search, manage listings
Order ServiceOrders, order items, cartCreate order, get order history
Payment ServiceTransactions, refunds, payoutsCharge buyer, pay seller, refund
Review ServiceReviews, ratingsSubmit review, get product reviews
Notification ServiceTemplates, delivery logsSend email/push/SMS

Data Ownership:

  • Catalog owns product data (name, description, images, seller_id, price).
  • Order snapshots product data at order time (product_name, price_at_order) so order history is accurate even if prices change.
  • Payment stores transaction records with order_id reference but does not own order data.
  • Review stores reviews with product_id and user_id references but does not own product or user data.

Communication Patterns:

  • Synchronous (REST): User fetches product page (Gateway -> Catalog + Reviews in parallel). User places order (Gateway -> Order Service, which calls Catalog to validate product).
  • Asynchronous (Events via RabbitMQ): OrderCreated triggers Payment Service to charge the buyer. PaymentSucceeded triggers Notification Service to send confirmation. OrderDelivered triggers Notification Service to prompt for review. ReviewSubmitted triggers Catalog Service to update average rating.

Failure Scenario: Payment fails after order creation.

This is a saga. The Order Service creates the order in pending status and publishes OrderCreated. The Payment Service receives the event and attempts to charge the buyer's card. If the charge fails:

  1. Payment Service publishes PaymentFailed with the orderId and failure reason.
  2. Order Service receives PaymentFailed and updates the order status to payment_failed.
  3. Notification Service receives PaymentFailed and sends the buyer an email: "Your payment could not be processed."
  4. No inventory was reserved (in this design, inventory reservation happens after payment succeeds), so no inventory compensation is needed.

The key design decision here is when to reserve inventory. Reserving before payment risks holding stock for failed payments. Reserving after payment risks selling out-of-stock items. I would reserve after payment for most products, with pre-payment reservation only for high-demand limited items.


Q10. You inherit a system with 30 microservices, but the team frequently experiences cascading failures where one slow service brings down the entire platform. Diagnose the problem and propose solutions.

Why Interviewers Ask: This tests your ability to diagnose distributed systems issues and apply resilience patterns.

Model Answer:

Diagnosis: Cascading failure through synchronous call chains without resilience patterns.

The root cause is almost certainly that services call each other synchronously without timeouts, retries, or circuit breakers. When Service X becomes slow (e.g., due to a slow database query), Service Y (which calls X) also becomes slow because it is waiting for X's response. Service Z (which calls Y) becomes slow too. Eventually, thread pools and connection pools across the entire chain are exhausted, and the whole system appears down.

Normal:    A (50ms) --> B (50ms) --> C (50ms)  = 150ms total
Failure:   A (waiting) --> B (waiting) --> C (5000ms timeout)
           A's thread pool fills up. A cannot serve ANY requests.
           Cascade: everything that depends on A also fails.

Solutions (in priority order):

  1. Timeouts on every outbound call. No service should wait indefinitely. I would set aggressive timeouts (2-3 seconds) on every HTTP call and database query.

  2. Circuit breakers (e.g., opossum for Node.js). When Service X's error rate exceeds a threshold (e.g., 50%), the circuit breaker trips and immediately returns a fallback response without calling X. After a cooldown period, it tries again.

  3. Bulkheads. Isolate outbound calls to different services into separate connection pools. A slow call to Service X should not consume the connection pool used for Service Y.

  4. Asynchronous communication where possible. Identify calls that do not need an immediate response and convert them to events. For example, sending notifications, updating analytics, and syncing search indexes can all be event-driven.

  5. Graceful degradation. When a dependency is unavailable, return partial data or cached data instead of failing entirely. If the Review Service is down, the product page still shows the product -- just without reviews.

  6. Distributed tracing (Jaeger, Datadog). Add correlation IDs to every request so you can trace the full call chain and quickly identify which service is the bottleneck.

  7. Load shedding. When a service is overwhelmed, reject excess requests with 503 (Service Unavailable) rather than queuing them and becoming slower for everyone.

The fundamental principle is: design for failure. In a distributed system, partial failure is the normal state, not an exception.


Q11. Compare choreography and orchestration for saga coordination. You are building an order processing system with 6 steps: validate order, reserve inventory, calculate tax, process payment, generate invoice, and send confirmation. Which approach would you choose and why? Address failure handling, observability, and team autonomy.

Why Interviewers Ask: This is a nuanced architectural decision. They want to see that you can evaluate trade-offs specific to the scenario rather than applying a generic rule.

Model Answer:

For a 6-step saga with conditional logic (tax calculation depends on inventory reservation, invoice depends on payment), I would choose orchestration for the core order flow, with choreography for the notification step at the end.

Why orchestration for the first 5 steps:

  1. Visibility. With 6 steps, choreography creates an event chain that is nearly impossible to trace without dedicated tooling. An orchestrator holds the complete workflow state -- when debugging a failed order, I look in one place.

  2. Complex failure handling. If payment fails at step 4, I need to reverse steps 3, 2, and 1 in order. With choreography, each service must know which compensations to trigger, creating implicit dependencies. With orchestration, the compensations are explicit in the orchestrator's code:

async execute(orderData) {
  try {
    await this.orderService.validate(orderData);        // Step 1
    await this.inventoryService.reserve(orderData);      // Step 2
    const tax = await this.taxService.calculate(orderData); // Step 3
    await this.paymentService.charge(orderData, tax);    // Step 4
    await this.invoiceService.generate(orderData);       // Step 5
    // Step 6 via event (see below)
  } catch (err) {
    await this.compensate(); // Reverse in order
  }
}
  1. Conditional logic. Tax calculation needs the reserved inventory data (to know the shipping origin). Payment needs the tax amount. These data dependencies are natural in orchestration (pass results between steps) but awkward in choreography (stuff data into events).

Why choreography for the notification step:

After the order is confirmed, I publish an OrderConfirmed event. The Notification Service subscribes and sends the confirmation email independently. This step has no compensating action, does not produce data needed by other steps, and should not block the order flow.

Addressing the three concerns:

  • Failure handling: The orchestrator runs compensations in reverse. Each step has a defined compensation (release inventory, void tax, refund payment, void invoice). If a compensation fails, it writes to a dead-letter queue for manual resolution.

  • Observability: The orchestrator logs every step transition with timestamps and correlation IDs. I can query "show me all orders that failed at step 4 in the last hour" from a single service.

  • Team autonomy: The trade-off with orchestration is that the team owning the orchestrator must coordinate with all service teams. I mitigate this by keeping the orchestrator thin -- it only coordinates; business logic stays in the services. Each service team still owns their service's API contract and internal logic.

The hybrid approach (orchestration for the core flow, choreography for fire-and-forget side effects) gives the best of both: clear workflow management where it matters and loose coupling where it does not.


Quick-Fire Table

Use this for rapid-fire interview prep. Cover the answer column and test yourself.

QuestionOne-Line Answer
Monolith vs microservices in one sentence?Monolith: single deploy, shared DB. Microservices: independent deploys, separate DBs.
Biggest advantage of a monolith?Simplicity -- one codebase, one deployment, ACID transactions.
Biggest advantage of microservices?Independent deployment and scaling per service.
What is a bounded context?A boundary within which a domain model is defined and consistent.
What is the Saga pattern?Sequence of local transactions with compensating actions on failure.
Choreography vs orchestration?Choreography: decentralised events. Orchestration: central coordinator.
When NOT to use microservices?Small team, simple domain, early startup, no DevOps maturity.
What is eventual consistency?Data converges across services over time, not immediately.
REST vs gRPC?REST: human-readable, universal. gRPC: fast binary, strict contracts, streaming.
What is a circuit breaker?Stops calling a failing service; returns fallback; auto-recovers.
What is an Anti-Corruption Layer?Translates an external service's model into your internal model.

Navigation Exercise Questions | Interview Questions | Quick Revision >>