Episode 4 — Generative AI Engineering / 4.17 — LangChain Practical

4.17.c — Tools and Memory

In one sentence: LangChain tools give your LLM the ability to take actions in the real world (search the web, query databases, call APIs), while memory modules let your chains remember previous interactions so conversations feel continuous rather than starting fresh every turn.

Navigation: <- 4.17.b Chains and Prompt Templates | 4.17.d — Working with Agents ->

1. LangChain Tools: Built-in and Custom

A tool in LangChain is a function that an LLM can decide to call. Each tool has a name, a description (which the LLM reads to decide when to use it), and an implementation (the actual function that runs).

Why tools matter

Without tools, an LLM can only generate text based on its training data. With tools, it can:

Search the web for current information
Query a database for real-time data
Perform calculations with guaranteed accuracy
Call external APIs (send emails, create tickets, update records)
Read and write files

Built-in tools

LangChain provides ready-made tools for common tasks:

import { TavilySearchResults } from '@langchain/community/tools/tavily_search';
import { Calculator } from '@langchain/community/tools/calculator';
import { WikipediaQueryRun } from '@langchain/community/tools/wikipedia_query_run';

// Web search tool
const searchTool = new TavilySearchResults({
  maxResults: 3,
  apiKey: process.env.TAVILY_API_KEY
});

// Calculator tool
const calculatorTool = new Calculator();

// Wikipedia tool
const wikiTool = new WikipediaQueryRun({
  topKResults: 1,
  maxDocContentLength: 1000
});

// Each tool has a name, description, and can be invoked
console.log(searchTool.name);        // "tavily_search_results_json"
console.log(searchTool.description); // "A search engine. Useful for..."

const searchResults = await searchTool.invoke('LangChain latest version 2025');
console.log(searchResults);

The tool interface

Every LangChain tool follows this interface:

Tool:
  name: string          — Unique identifier (the LLM uses this to call the tool)
  description: string   — Natural language description (the LLM reads this to decide WHEN to use it)
  schema: ZodSchema     — Input schema (what arguments the tool accepts)
  invoke(input): output — The actual function that runs

The description is critical — it is the only thing the LLM has to decide which tool to use. A vague description leads to wrong tool selection. A precise description leads to correct tool selection.

2. Creating Custom Tools

Using DynamicTool (simplest approach)

import { DynamicTool } from '@langchain/community/tools/dynamic';

const weatherTool = new DynamicTool({
  name: 'get_weather',
  description: 'Get the current weather for a given city. Input should be a city name like "London" or "New York".',
  func: async (cityName) => {
    // In production, call a real weather API
    const response = await fetch(
      `https://api.weatherapi.com/v1/current.json?key=${process.env.WEATHER_API_KEY}&q=${cityName}`
    );
    const data = await response.json();
    return JSON.stringify({
      city: data.location.name,
      temperature: data.current.temp_c,
      condition: data.current.condition.text
    });
  }
});

// Test the tool directly
const weather = await weatherTool.invoke('London');
console.log(weather);
// {"city":"London","temperature":15,"condition":"Partly cloudy"}

Using DynamicStructuredTool (with typed input)

import { DynamicStructuredTool } from '@langchain/community/tools/dynamic';
import { z } from 'zod';

const databaseQueryTool = new DynamicStructuredTool({
  name: 'query_database',
  description: 'Query the product database. Use this to look up product information by name or category.',
  schema: z.object({
    query: z.string().describe('The search query for products'),
    category: z.string().optional().describe('Optional category filter: electronics, clothing, food'),
    limit: z.number().default(5).describe('Maximum number of results to return')
  }),
  func: async ({ query, category, limit }) => {
    // In production, query your actual database
    const results = await db.products.search({
      text: query,
      category: category,
      limit: limit
    });
    return JSON.stringify(results);
  }
});

Using the tool() function (modern approach)

import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const calculateDiscount = tool(
  async ({ originalPrice, discountPercent }) => {
    const discount = originalPrice * (discountPercent / 100);
    const finalPrice = originalPrice - discount;
    return JSON.stringify({
      originalPrice,
      discountPercent,
      discountAmount: discount.toFixed(2),
      finalPrice: finalPrice.toFixed(2)
    });
  },
  {
    name: 'calculate_discount',
    description: 'Calculate the final price after applying a percentage discount.',
    schema: z.object({
      originalPrice: z.number().describe('The original price in dollars'),
      discountPercent: z.number().min(0).max(100).describe('The discount percentage (0-100)')
    })
  }
);

const result = await calculateDiscount.invoke({
  originalPrice: 99.99,
  discountPercent: 20
});
console.log(result);
// {"originalPrice":99.99,"discountPercent":20,"discountAmount":"20.00","finalPrice":"79.99"}

Tool design best practices

Practice	Why
Descriptive names	`search_product_catalog` > `search` — the LLM needs context
Detailed descriptions	Include what the tool does, when to use it, and what input format is expected
Structured schemas	Use Zod schemas with `.describe()` on each field — the LLM sees these descriptions
Return strings	Tools should return serialized strings (JSON) — LLMs process text
Handle errors gracefully	Return error messages as strings rather than throwing — the agent can recover from a string error
Keep tools focused	One tool per action — don't make a tool that does 5 different things

// BAD: Vague description
const tool1 = new DynamicTool({
  name: 'api',
  description: 'Calls an API',
  func: async (input) => { /* ... */ }
});

// GOOD: Precise description
const tool2 = new DynamicTool({
  name: 'get_order_status',
  description: 'Look up the current status of a customer order. Input must be an order ID like "ORD-12345". Returns the order status (pending, shipped, delivered, cancelled) and estimated delivery date.',
  func: async (orderId) => {
    try {
      const order = await orderService.getStatus(orderId);
      return JSON.stringify(order);
    } catch (error) {
      return `Error: Could not find order ${orderId}. Please verify the order ID format (e.g., ORD-12345).`;
    }
  }
});

3. Memory Modules

LLMs are stateless — each API call is independent. Without memory, every conversation turn starts from scratch. LangChain's memory modules solve this by storing and injecting conversation history automatically.

The memory problem

Turn 1: User: "My name is Alex."       → AI: "Nice to meet you, Alex!"
Turn 2: User: "What is my name?"        → AI: "I don't know your name."  (no memory!)

Why? Each API call is independent. The model doesn't "remember" turn 1.
The solution: explicitly pass conversation history with each request.
Memory modules automate this.

BufferMemory — store everything

The simplest memory: stores every message in the conversation verbatim.

import { BufferMemory } from 'langchain/memory';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnableSequence, RunnablePassthrough } from '@langchain/core/runnables';

const memory = new BufferMemory({
  returnMessages: true,     // Return as Message objects (not a string)
  memoryKey: 'history'      // Key used in the prompt template
});

const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'You are a friendly assistant. Keep track of the conversation.'],
  new MessagesPlaceholder('history'),
  ['user', '{input}']
]);

const model = new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0.7 });
const parser = new StringOutputParser();

// Build a chain with memory
async function chat(input) {
  // Load memory variables
  const memoryVariables = await memory.loadMemoryVariables({});

  // Run the chain
  const chain = prompt.pipe(model).pipe(parser);
  const response = await chain.invoke({
    input: input,
    history: memoryVariables.history
  });

  // Save the new exchange to memory
  await memory.saveContext(
    { input: input },
    { output: response }
  );

  return response;
}

// Multi-turn conversation
console.log(await chat('My name is Alex and I work at Acme Corp.'));
// "Nice to meet you, Alex! How's everything at Acme Corp?"

console.log(await chat('What is my name?'));
// "Your name is Alex!"

console.log(await chat('Where do I work?'));
// "You work at Acme Corp!"

Tradeoff: BufferMemory stores every message. After 50 turns, the history consumes thousands of tokens. In long conversations, this will exceed the context window.

ConversationSummaryMemory — compress old history

Instead of storing every message, this memory keeps a running summary of the conversation. It uses an LLM to summarize older turns, keeping the history compact.

import { ConversationSummaryMemory } from 'langchain/memory';
import { ChatOpenAI } from '@langchain/openai';

const memory = new ConversationSummaryMemory({
  llm: new ChatOpenAI({ modelName: 'gpt-4o-mini', temperature: 0 }),
  returnMessages: true,
  memoryKey: 'history'
});

// After many turns, instead of storing every message, memory contains:
// "The user's name is Alex. They work at Acme Corp as a frontend developer.
//  They asked about React performance optimization. The assistant recommended
//  useMemo and React.memo for expensive computations..."

// This might be 100 tokens instead of 5,000 tokens of raw conversation history.

Tradeoff: Summary memory costs an extra LLM call per turn (to update the summary) and loses exact details. The user said "I love React 19" but the summary might only say "The user likes React."

ConversationBufferWindowMemory — sliding window

Keeps only the last N exchanges. Simple and predictable.

import { ConversationBufferWindowMemory } from 'langchain/memory';

const memory = new ConversationBufferWindowMemory({
  k: 5,                    // Keep last 5 exchanges (10 messages: 5 user + 5 assistant)
  returnMessages: true,
  memoryKey: 'history'
});

// Turn 1-5: all stored
// Turn 6: turn 1 is dropped, turns 2-6 stored
// Turn 7: turns 1-2 dropped, turns 3-7 stored
// etc.

Tradeoff: Hard cutoff — the model completely forgets anything beyond the window. No gradual degradation.

VectorStoreMemory — semantic retrieval of past conversations

Stores conversation turns in a vector database and retrieves the most relevant past exchanges for the current query. This is the most sophisticated approach.

import { VectorStoreRetrieverMemory } from 'langchain/memory';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { OpenAIEmbeddings } from '@langchain/openai';

const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());

const memory = new VectorStoreRetrieverMemory({
  vectorStoreRetriever: vectorStore.asRetriever(3), // Retrieve top 3 relevant memories
  memoryKey: 'history'
});

// Save some context
await memory.saveContext(
  { input: 'My favorite programming language is TypeScript' },
  { output: 'TypeScript is a great choice! The type system helps catch bugs early.' }
);

await memory.saveContext(
  { input: 'I had pizza for lunch' },
  { output: 'Pizza is always a solid choice!' }
);

await memory.saveContext(
  { input: 'I am building a React dashboard' },
  { output: 'React dashboards benefit from good state management...' }
);

// Query: "What language should I use for my project?"
// VectorStoreMemory retrieves the TypeScript and React exchanges
// (semantically relevant) but NOT the pizza exchange (irrelevant)
const relevantMemories = await memory.loadMemoryVariables({
  input: 'What language should I use for my project?'
});

Tradeoff: Extra latency for embedding + retrieval per turn. May miss contextually important but semantically distant information.

Memory comparison table

Memory Type	Token Usage	Recall Accuracy	Extra Cost	Best For
BufferMemory	Grows linearly	Perfect (everything stored)	None	Short conversations (< 20 turns)
BufferWindowMemory	Fixed (k * 2 messages)	Recent only	None	Chatbots with long sessions
ConversationSummaryMemory	Grows slowly	Approximate	1 LLM call per turn	Long conversations needing gist
VectorStoreMemory	Fixed retrieval size	Semantic (may miss things)	Embedding per turn	Conversations spanning many topics

4. Adding Memory to Chains

Here is a complete chatbot implementation with memory:

import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { BufferMemory } from 'langchain/memory';
import { RunnableWithMessageHistory } from '@langchain/core/runnables';
import { ChatMessageHistory } from 'langchain/stores/message/in_memory';

// Store for multiple conversation sessions
const messageHistories = {};

function getMessageHistory(sessionId) {
  if (!messageHistories[sessionId]) {
    messageHistories[sessionId] = new ChatMessageHistory();
  }
  return messageHistories[sessionId];
}

const model = new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0.7 });
const parser = new StringOutputParser();

const prompt = ChatPromptTemplate.fromMessages([
  ['system', 'You are a helpful assistant. You remember everything the user tells you.'],
  new MessagesPlaceholder('history'),
  ['user', '{input}']
]);

const chain = prompt.pipe(model).pipe(parser);

// Wrap the chain with message history management
const chainWithHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: (sessionId) => getMessageHistory(sessionId),
  inputMessagesKey: 'input',
  historyMessagesKey: 'history'
});

// Conversation with session tracking
const config = { configurable: { sessionId: 'user-123' } };

const r1 = await chainWithHistory.invoke(
  { input: 'Hi! My name is Jordan and I am learning TypeScript.' },
  config
);
console.log(r1);
// "Hello Jordan! That's great that you're learning TypeScript..."

const r2 = await chainWithHistory.invoke(
  { input: 'What am I learning?' },
  config
);
console.log(r2);
// "You're learning TypeScript!"

// Different session — separate memory
const config2 = { configurable: { sessionId: 'user-456' } };
const r3 = await chainWithHistory.invoke(
  { input: 'What is my name?' },
  config2
);
console.log(r3);
// "I don't know your name yet! What should I call you?"

5. Persistent Memory with Databases

In-memory storage is lost when the process restarts. For production, you need persistent storage.

Using Redis for message history

import { RedisChatMessageHistory } from '@langchain/redis';

function getMessageHistory(sessionId) {
  return new RedisChatMessageHistory({
    sessionId: sessionId,
    url: process.env.REDIS_URL || 'redis://localhost:6379'
  });
}

// Now conversations persist across server restarts
const chainWithHistory = new RunnableWithMessageHistory({
  runnable: chain,
  getMessageHistory: getMessageHistory,
  inputMessagesKey: 'input',
  historyMessagesKey: 'history'
});

Using a SQL database

// Conceptual example — store messages in PostgreSQL
import { BaseListChatMessageHistory } from '@langchain/core/chat_history';
import { HumanMessage, AIMessage } from '@langchain/core/messages';

class PostgresChatHistory extends BaseListChatMessageHistory {
  constructor(sessionId, pool) {
    super();
    this.sessionId = sessionId;
    this.pool = pool;
  }

  async getMessages() {
    const result = await this.pool.query(
      'SELECT role, content FROM messages WHERE session_id = $1 ORDER BY created_at',
      [this.sessionId]
    );
    return result.rows.map(row =>
      row.role === 'human'
        ? new HumanMessage(row.content)
        : new AIMessage(row.content)
    );
  }

  async addMessage(message) {
    const role = message._getType() === 'human' ? 'human' : 'ai';
    await this.pool.query(
      'INSERT INTO messages (session_id, role, content, created_at) VALUES ($1, $2, $3, NOW())',
      [this.sessionId, role, message.content]
    );
  }

  async clear() {
    await this.pool.query(
      'DELETE FROM messages WHERE session_id = $1',
      [this.sessionId]
    );
  }
}

Production memory architecture

+----------------------------------------------------------+
|  CLIENT REQUEST                                           |
|  { sessionId: "user-123", input: "What did I say?" }     |
+------------------+---------------------------------------+
                   |
                   v
+----------------------------------------------------------+
|  MEMORY LAYER                                             |
|                                                           |
|  1. Load history from persistent store (Redis/Postgres)   |
|  2. If history too long:                                  |
|     a. Summarize old messages (ConversationSummaryMemory) |
|     b. OR keep last N messages (WindowMemory)             |
|     c. OR retrieve relevant messages (VectorStoreMemory)  |
|  3. Inject history into prompt template                   |
+------------------+---------------------------------------+
                   |
                   v
+----------------------------------------------------------+
|  CHAIN: prompt -> model -> parser                         |
+------------------+---------------------------------------+
                   |
                   v
+----------------------------------------------------------+
|  SAVE: Store new user message + AI response               |
|  back to persistent store                                 |
+----------------------------------------------------------+

6. Combining Tools and Memory

The real power emerges when you combine tools and memory — an agent that can take actions AND remember previous interactions:

import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts';
import { DynamicStructuredTool } from '@langchain/community/tools/dynamic';
import { z } from 'zod';
import { AgentExecutor, createOpenAIToolsAgent } from 'langchain/agents';
import { BufferMemory } from 'langchain/memory';

// Define tools
const lookupTool = new DynamicStructuredTool({
  name: 'lookup_user_orders',
  description: 'Look up all orders for a user by their email address.',
  schema: z.object({
    email: z.string().email().describe('The user email address')
  }),
  func: async ({ email }) => {
    // Simulated database lookup
    const orders = {
      'alex@example.com': [
        { id: 'ORD-001', status: 'shipped', total: 59.99 },
        { id: 'ORD-002', status: 'delivered', total: 124.50 }
      ]
    };
    return JSON.stringify(orders[email] || []);
  }
});

const refundTool = new DynamicStructuredTool({
  name: 'process_refund',
  description: 'Process a refund for an order. Only use when the user explicitly requests a refund.',
  schema: z.object({
    orderId: z.string().describe('The order ID to refund, e.g. ORD-001'),
    reason: z.string().describe('The reason for the refund')
  }),
  func: async ({ orderId, reason }) => {
    return JSON.stringify({
      success: true,
      refundId: `REF-${Date.now()}`,
      message: `Refund initiated for order ${orderId}. Reason: ${reason}. Processing time: 3-5 business days.`
    });
  }
});

// Setup
const model = new ChatOpenAI({ modelName: 'gpt-4o', temperature: 0 });
const tools = [lookupTool, refundTool];

const prompt = ChatPromptTemplate.fromMessages([
  ['system', `You are a helpful customer service agent. Be polite and helpful.
When looking up orders, ask for the user's email if not provided.
Only process refunds when the user explicitly asks for one.`],
  new MessagesPlaceholder('history'),
  ['user', '{input}'],
  new MessagesPlaceholder('agent_scratchpad')
]);

const agent = await createOpenAIToolsAgent({ llm: model, tools, prompt });

const memory = new BufferMemory({
  returnMessages: true,
  memoryKey: 'history'
});

const executor = new AgentExecutor({
  agent,
  tools,
  memory,
  verbose: true  // Log agent reasoning
});

// Multi-turn conversation with tools and memory
const r1 = await executor.invoke({
  input: 'Hi, my email is alex@example.com. Can you check my orders?'
});
console.log(r1.output);
// "I found 2 orders: ORD-001 (shipped, $59.99) and ORD-002 (delivered, $124.50)."

const r2 = await executor.invoke({
  input: 'Can I get a refund on the first one?'
});
console.log(r2.output);
// Agent remembers ORD-001 from the previous turn
// "I've processed a refund for ORD-001. Refund REF-... initiated. 3-5 business days."

const r3 = await executor.invoke({
  input: 'Thanks! What was my total spending across both orders?'
});
console.log(r3.output);
// Agent remembers the order details: "$59.99 + $124.50 = $184.49 total."

7. Key Takeaways

Tools extend LLM capabilities beyond text generation — search, calculate, query databases, call APIs. The tool's description is what the LLM reads to decide when to use it, so make it precise.
Custom tools are easy to create — use DynamicTool for simple string input, DynamicStructuredTool or tool() for typed input with Zod schemas.
Memory makes conversations continuous — without it, every API call is stateless. Choose the right memory type based on conversation length and requirements.
BufferMemory stores everything (simple but token-expensive), WindowMemory keeps last N turns (predictable but forgetful), SummaryMemory compresses history (compact but approximate), VectorStoreMemory retrieves semantically (smart but adds latency).
Persistent memory requires external storage — Redis, PostgreSQL, or any database. In-memory storage is lost on restart.
Tools + memory together enable powerful agents that can take actions and maintain context across a multi-turn conversation.

Explain-It Challenge

A product manager asks "why does the chatbot forget what I said 5 minutes ago?" Explain the problem and propose a solution using LangChain memory.
You are building a customer support bot that needs to look up orders, check inventory, and process refunds. Design the tool definitions (name, description, schema) for each tool.
Compare BufferMemory and ConversationSummaryMemory for a chatbot that handles 100+ turn conversations. Which would you choose and why?

Navigation: <- 4.17.b Chains and Prompt Templates | 4.17.d — Working with Agents ->