Episode 4 — Generative AI Engineering / 4.17 — LangChain Practical

Interview Questions: LangChain Practical

Model answers for LangChain fundamentals, chains and templates, tools and memory, agents, and LCEL.

How to use this material (instructions)

Read lessons in order — README.md, then 4.17.a -> 4.17.e.
Practice out loud — definition -> example -> tradeoff.
Pair with exercises — 4.17-Exercise-Questions.md.
Quick review — 4.17-Quick-Revision.md.

Beginner (Q1-Q4)

Q1. What is LangChain and why would you use it?

Why interviewers ask: Tests whether you understand the framework's purpose and can articulate when it adds value vs when it adds unnecessary complexity.

Model answer:

LangChain is an open-source framework for building applications powered by large language models. It provides composable building blocks — prompt templates, model wrappers, output parsers, tools, memory modules, and agents — that snap together via the pipe operator to form complex pipelines.

You would use LangChain when your application needs multiple LLM components working together: retrieval-augmented generation, multi-turn memory, tool-calling agents, or multi-provider support. The framework handles the glue code — streaming, batching, fallbacks, observability — so you focus on application logic.

You would not use LangChain for simple, single-purpose API calls where the abstraction overhead outweighs the benefit. A function that calls the OpenAI API once with a hardcoded prompt does not need a framework.

Q2. Explain the difference between a chain and an agent in LangChain.

Why interviewers ask: This is the most fundamental architectural distinction in LangChain — it determines how your application makes decisions.

Model answer:

A chain follows a fixed, predetermined path. You define the steps at development time: prompt -> model -> parser. Every input goes through the exact same pipeline. Chains are predictable, fast, and easy to test.

An agent follows a dynamic path decided at runtime by the LLM. The model receives the user's question and a list of available tools, then enters a think-act-observe loop: it decides which tool to call, observes the result, and either calls another tool or produces a final answer. The steps vary based on the question.

Use a chain when you know the exact steps needed (summarization, translation, extraction). Use an agent when the steps depend on the user's input (research questions, customer support with multiple data sources, open-ended tasks).

Q3. What is LCEL and how does the pipe operator work?

Why interviewers ask: LCEL is the modern API for LangChain — understanding it shows you know the current best practices, not just legacy patterns.

Model answer:

LCEL stands for LangChain Expression Language. It is the composition layer that connects LangChain components using the .pipe() method. Every component implements the Runnable interface (.invoke(), .stream(), .batch(), .pipe()), and LCEL lets you chain them together.

const chain = prompt.pipe(model).pipe(parser);

This creates a new Runnable where the output of the prompt flows into the model, and the model's output flows into the parser. The resulting chain itself is a Runnable, so it supports .invoke() for single calls, .stream() for streaming, and .batch() for parallel processing — all automatically.

LCEL replaced the legacy chain classes (LLMChain, SequentialChain) because it is more flexible — any Runnable can pipe to any other Runnable — and provides streaming, batching, fallbacks, and observability for free.

Q4. How does memory work in LangChain?

Why interviewers ask: Memory is one of the most common pain points in chatbot development — this tests practical understanding.

Model answer:

LLMs are stateless — each API call is independent. Without memory, a chatbot forgets everything between turns. LangChain's memory modules solve this by storing conversation history and injecting it into each prompt.

The main memory types are:

BufferMemory: Stores every message verbatim. Simple but token usage grows linearly — unsuitable for long conversations.
ConversationBufferWindowMemory: Keeps only the last N exchanges. Predictable token usage but loses older context completely.
ConversationSummaryMemory: Uses an LLM to maintain a running summary. Compact but costs an extra API call per turn and loses exact details.
VectorStoreMemory: Embeds conversation turns into a vector store and retrieves semantically relevant past exchanges. Smart but adds embedding latency.

In production, you combine memory with persistent storage (Redis, PostgreSQL) so conversations survive server restarts, and you use session IDs to separate conversations between different users.

Intermediate (Q5-Q8)

Q5. How would you design a production LangChain application with fallbacks and error handling?

Why interviewers ask: Separates engineers who have built real systems from those who have only followed tutorials.

Model answer:

Production LangChain applications need multiple layers of resilience:

Model-level fallbacks: Use .withFallbacks() to chain providers — try GPT-4o, fall back to GPT-4o-mini, then Claude. This handles rate limits and outages.

const model = new ChatOpenAI({ modelName: 'gpt-4o' })
  .withFallbacks({ fallbacks: [gpt4oMini, claude] });

Retry logic: Use .withRetry({ stopAfterAttempt: 3 }) on chains for transient failures.

Agent safety: Set maxIterations to prevent infinite loops, enable handleParsingErrors so parse failures are sent back to the model for correction, and implement timeouts at the application level.

Output validation: Use Zod schemas with .withStructuredOutput() to validate model responses. If validation fails, retry with a corrective prompt.

Observability: Enable LangSmith tracing to see every step of every chain execution in production. Alert on failure rates, latency spikes, and token cost anomalies.

Graceful degradation: When the LLM pipeline fails completely, return a predefined fallback response ("I'm unable to help right now") rather than crashing.

Q6. When would you use LangChain vs building with raw SDK calls? Give specific examples.

Why interviewers ask: Tests engineering judgment — not everything needs a framework. The best answer acknowledges tradeoffs honestly.

Model answer:

Use LangChain when:

RAG pipelines — document loaders, text splitters, vector store integrations, and retrieval chains save weeks of development. LangChain has battle-tested implementations.
Agent systems — the AgentExecutor loop, tool management, and scratchpad handling are complex to build correctly. LangChain handles edge cases.
Multi-provider support — if you need to test OpenAI vs Anthropic vs Gemini, LangChain's provider abstraction means you change one line to swap models.
Observability — LangSmith integration gives you full execution traces with zero code changes.

Use raw SDK when:

Simple completions — one model, one prompt, one response. LangChain adds overhead without benefit.
Latency-critical paths — LangChain adds a few milliseconds of abstraction overhead per call. For real-time applications, this matters.
Minimal dependencies — serverless functions with strict bundle sizes benefit from lighter dependencies.
Full control — custom streaming formats, custom retry logic, or non-standard error handling that doesn't fit LangChain's patterns.

Hybrid approach (common in production): Use LangChain for the complex parts (RAG, agents) and raw SDK for the simple, latency-critical parts (classification, quick completions).

Q7. Explain the agent execution flow in detail. What is agent_scratchpad?

Why interviewers ask: Tests deep understanding of how agents actually work, not just surface-level "agents call tools."

Model answer:

The agent execution follows a think-act-observe loop managed by the AgentExecutor:

Iteration 1: The executor sends the user's question + tool descriptions + empty agent_scratchpad to the LLM. The LLM responds with either a tool call (specifying tool name and arguments) or a final answer.

If tool call: The executor runs the specified tool, captures the result, and appends both the tool call and the tool result to agent_scratchpad. This is the "observe" step.

Iteration 2: The executor sends the same prompt but with the now-populated agent_scratchpad (containing the previous tool call and result). The LLM now has additional context and can decide to call another tool or give a final answer.

This repeats until the LLM produces a final answer or maxIterations is reached.

The agent_scratchpad is a MessagesPlaceholder in the prompt template that accumulates the sequence of AIMessage (with tool calls) and ToolMessage (with tool results) objects. It acts as the agent's working memory for the current task — distinct from conversation memory, which spans across tasks.

The key design insight: the LLM does not execute tools. It outputs structured intent (which tool, what arguments). The AgentExecutor interprets this intent, executes the tool, and feeds the result back. The LLM is the brain; the executor is the hands.

Q8. How do you choose the right memory strategy for a production chatbot?

Why interviewers ask: Tests system design thinking — memory choices have direct impact on cost, accuracy, and user experience.

Model answer:

The choice depends on three factors: conversation length, recall requirements, and cost budget.

Short conversations (< 20 turns): Use BufferMemory. Simple, perfect recall, and the token cost is manageable. A 20-turn conversation is roughly 4,000-6,000 tokens of history — well within context limits.

Medium conversations (20-50 turns): Use BufferWindowMemory with k=10-15 most recent turns. The user gets good short-term recall. If they reference something from 30 turns ago, the model won't remember — this is usually acceptable for most chatbots.

Long conversations (50+ turns): Two options:

SummaryMemory — the model has a gist of the entire conversation but loses exact quotes. Good for support chats where the customer's issue and history matter but exact wording doesn't.
Hybrid — BufferWindowMemory for the last 5 turns (exact recall of recent context) + VectorStoreMemory for semantic retrieval of older relevant turns.

Cross-session recall (the user returns days later): Use VectorStoreMemory backed by a persistent database. When the user returns, the system retrieves relevant past interactions based on the current query.

In all production cases, back the memory with persistent storage (Redis, PostgreSQL) and use session IDs to isolate conversations.

Advanced (Q9-Q11)

Q9. Design a multi-step LangChain pipeline for a document analysis system.

Why interviewers ask: Tests your ability to architect a complex system using LangChain's building blocks, balancing composability with practical constraints.

Model answer:

Requirements: The system receives a document (PDF/text), classifies it, extracts structured data, validates it, and stores results.

Pipeline architecture:

Step 1 — Ingest (RunnableLambda)
  Load document -> Split into chunks -> Compute embeddings
  
Step 2 — Classify (prompt | model | parser)
  Send first 500 tokens to classifier chain
  Output: { type: "invoice" | "contract" | "resume" }

Step 3 — Route (RunnableBranch)
  If invoice -> invoiceExtractionChain (extracts vendor, amount, date, line items)
  If contract -> contractExtractionChain (extracts parties, terms, dates, obligations)
  If resume -> resumeExtractionChain (extracts name, experience, skills, education)

Step 4 — Validate (RunnableLambda)
  Parse extraction output with Zod schema
  If valid -> proceed
  If invalid -> retry with correction prompt (up to 2 attempts)

Step 5 — Enrich (RunnableParallel)
  Run in parallel:
  - Summarize the document (summarizeChain)
  - Extract key entities (entityChain)
  - Generate tags (tagChain)

Step 6 — Store (RunnableLambda)
  Save to database + vector store for future retrieval

Key design decisions:

RunnableBranch for routing — different document types need fundamentally different extraction schemas
RunnableParallel for enrichment — summary, entities, and tags are independent, so parallel execution cuts latency by 3x
Validation with retry — LLM output is not guaranteed to match the schema. Retry with a correction prompt is cheaper than failing
Fallbacks on the model — .withFallbacks() for provider resilience
LangSmith tracing — every step traced for debugging failed extractions

Q10. How would you evaluate and optimize a LangChain agent in production?

Why interviewers ask: Tests production mindset — building the agent is step one; measuring and improving it is the ongoing challenge.

Model answer:

Evaluation framework:

1. Define metrics:

Task completion rate — did the agent answer the question correctly?
Tool selection accuracy — did it call the right tools?
Iteration efficiency — how many tool calls did it need? (fewer is better)
Latency — end-to-end time including all tool calls
Cost — total tokens used across all iterations
Error rate — how often does it hit maxIterations, parsing errors, or tool failures?

2. Build an evaluation dataset: Create 200+ test cases covering: single-tool questions, multi-tool questions, questions requiring no tools, edge cases (ambiguous queries, impossible requests), and adversarial inputs.

3. Run automated evals: Use LangSmith's evaluation features or a custom script to run the agent on all test cases, measure the metrics above, and track them over time.

4. Optimization strategies:

Prompt engineering — refine the system prompt's tool usage instructions based on failure analysis
Tool description tuning — the #1 cause of wrong tool selection is vague descriptions. Analyze failures and rewrite descriptions
Reduce iterations — if the agent consistently needs 4 tool calls for questions that should need 2, the tool design or descriptions need improvement
Model selection — GPT-4o is more reliable than GPT-4o-mini for complex reasoning but costs more. Use the cheapest model that meets the accuracy threshold
Caching — cache tool results for repeated queries (e.g., weather data can be cached for 30 minutes)

5. Monitor in production: Track all metrics in real-time via LangSmith. Alert on: completion rate drops below threshold, average iterations increases, error rate spikes, or cost exceeds budget.

Q11. Compare LangChain, LangGraph, and building a custom agent framework. When would you choose each?

Why interviewers ask: Tests breadth of knowledge and architectural judgment at the senior level.

Model answer:

LangChain (core framework with AgentExecutor): Best for standard agent patterns — a single agent with tools in a think-act-observe loop. Quick to build, well-documented, good ecosystem. Limitations: the AgentExecutor assumes a simple loop structure. Anything more complex (branching, human approval, multi-agent) requires workarounds.

LangGraph: Best for complex, stateful workflows that look like flowcharts rather than loops. LangGraph models workflows as directed graphs where nodes are computation steps and edges define the flow (including conditional edges and cycles). Key capabilities:

Multi-agent collaboration (Agent A hands off to Agent B)
Human-in-the-loop (pause execution, wait for approval, resume)
Persistent state across sessions (checkpoint the full graph state)
Complex branching and merging

Use LangGraph when you need workflows like: "The research agent gathers data, sends it to the analysis agent, which produces a report that requires human approval before the publishing agent posts it."

Custom framework: Best when you need full control over every aspect of execution, or when LangChain/LangGraph's abstractions don't fit your architecture. Custom frameworks are common at companies with:

Proprietary execution environments (not standard Node.js/Python)
Unique control flow requirements (e.g., real-time collaborative agents)
Extreme performance requirements (every millisecond matters)
Organizational resistance to framework dependencies

The tradeoff: you build and maintain everything yourself — tool execution, error recovery, streaming, observability, retry logic. This is significant ongoing engineering cost.

Decision matrix:

Scenario	Recommendation
Simple agent with tools	LangChain AgentExecutor
RAG pipeline	LangChain chains
Multi-agent workflow	LangGraph
Human-in-the-loop approvals	LangGraph
Single API call wrapper	Raw SDK
Extreme performance needs	Custom framework
Rapid prototyping	LangChain
Long-term production system	Evaluate all three based on your specific requirements

Quick-fire

#	Question	One-line answer
1	Install command for LangChain + OpenAI?	`npm install langchain @langchain/openai`
2	Modern pipe syntax?	`prompt.pipe(model).pipe(parser)`
3	What is LCEL?	LangChain Expression Language — pipe-based composition of Runnables
4	BufferMemory problem in long chats?	Token usage grows without bound — exceeds context window
5	What does agent_scratchpad hold?	Accumulated tool calls and results from current task
6	How to make streaming work?	Call `.stream()` on any LCEL chain — it's automatic
7	Legacy chain classes status?	Deprecated — use LCEL pipe syntax instead
8	How to add fallbacks?	`model.withFallbacks({ fallbacks: [fallbackModel] })`
9	RunnableParallel purpose?	Run multiple chains simultaneously on the same input
10	When to use agent vs chain?	Agent: dynamic steps. Chain: fixed steps

<- Back to 4.17 — LangChain Practical (README)