Episode 4 — Generative AI Engineering / 4.9 — Combining Streaming with Structured Data
4.9.b — Returning Structured JSON After Generation
In one sentence: The two-phase response pattern streams human-readable text to the user for a great experience, then returns (or extracts) structured JSON for the system to process — giving you the best of both worlds: real-time UX and machine-readable data from a single user interaction.
Navigation: <- 4.9.a — Streaming Conversational Text | 4.9.c — Separating UI from System Outputs ->
1. The Core Problem
You are building a product recommendation chatbot. When the user asks "What laptop should I buy for video editing?", two things need to happen:
- The user sees a helpful, conversational explanation streamed in real time ("Based on your needs, I'd recommend...")
- The system receives structured data it can act on (product IDs, price ranges, confidence scores for analytics, database writes, or triggering downstream APIs)
These two needs conflict:
Streaming text: Great for humans. Terrible for machines.
You can't JSON.parse() a half-streamed sentence.
Structured JSON: Great for machines. Terrible for UX.
Users stare at a spinner until the entire JSON is generated.
You need BOTH. That's the two-phase response pattern.
2. Pattern 1: Stream Text, Then Append JSON (Single Call)
The simplest approach: instruct the model to stream its conversational response first, then output a structured JSON block at the end, separated by a delimiter.
2.1 The prompt design
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const systemPrompt = `You are a product recommendation assistant.
When responding to the user:
1. First, write a helpful conversational explanation of your recommendations.
2. Then, after your explanation, output EXACTLY the delimiter: ---JSON---
3. After the delimiter, output a valid JSON object with this structure:
{
"recommendations": [
{
"product": "string",
"reason": "string",
"priceRange": "string",
"confidence": 0.0-1.0
}
],
"category": "string",
"userIntent": "string"
}
IMPORTANT: The JSON must be valid. Do not include any text after the JSON block.`;
async function streamThenParse(userMessage) {
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage }
],
stream: true,
temperature: 0.7
});
let fullText = '';
let jsonStarted = false;
let jsonBuffer = '';
const delimiter = '---JSON---';
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (!content) continue;
fullText += content;
if (!jsonStarted) {
// Check if we've hit the delimiter
if (fullText.includes(delimiter)) {
jsonStarted = true;
// Extract any content after the delimiter that arrived in this chunk
const parts = fullText.split(delimiter);
jsonBuffer = parts[1] || '';
// Don't stream the delimiter or JSON to the user
} else {
// Still in conversational text — stream to user
process.stdout.write(content);
}
} else {
// We're in the JSON section — accumulate silently
jsonBuffer += content;
}
}
// Parse the accumulated JSON
let structuredData = null;
try {
structuredData = JSON.parse(jsonBuffer.trim());
} catch (error) {
console.error('\nFailed to parse JSON:', error.message);
console.error('Raw JSON buffer:', jsonBuffer);
}
return {
conversationalText: fullText.split(delimiter)[0].trim(),
structuredData,
rawResponse: fullText
};
}
// Usage
const result = await streamThenParse('What laptop should I buy for video editing under $2000?');
console.log('\n\n--- Structured Data ---');
console.log(JSON.stringify(result.structuredData, null, 2));
// {
// "recommendations": [
// { "product": "MacBook Pro 16\" M3 Pro", "reason": "...", "priceRange": "$1999-$2499", "confidence": 0.92 },
// { "product": "Dell XPS 15", "reason": "...", "priceRange": "$1599-$1999", "confidence": 0.85 }
// ],
// "category": "laptops",
// "userIntent": "video editing"
// }
2.2 Delimiter detection edge cases
The delimiter might arrive split across chunks. Handle this robustly:
function createDelimiterDetector(delimiter) {
let buffer = '';
let detected = false;
let preDelimiterText = '';
let postDelimiterText = '';
return {
feed(content) {
if (detected) {
postDelimiterText += content;
return { phase: 'json', content };
}
buffer += content;
// Check for complete delimiter
const delimIndex = buffer.indexOf(delimiter);
if (delimIndex !== -1) {
detected = true;
preDelimiterText = buffer.substring(0, delimIndex);
postDelimiterText = buffer.substring(delimIndex + delimiter.length);
return { phase: 'transition', content: null };
}
// Check for partial delimiter at the end of buffer
// e.g., buffer ends with "---JS" which could be start of "---JSON---"
for (let i = 1; i < delimiter.length; i++) {
if (buffer.endsWith(delimiter.substring(0, i))) {
// Potential partial match — hold back these characters
const safe = buffer.substring(0, buffer.length - i);
// Only emit the safe portion
if (safe.length > preDelimiterText.length) {
const newContent = safe.substring(preDelimiterText.length);
preDelimiterText = safe;
return { phase: 'text', content: newContent };
}
return { phase: 'text', content: '' };
}
}
// No partial match — emit everything
if (buffer.length > preDelimiterText.length) {
const newContent = buffer.substring(preDelimiterText.length);
preDelimiterText = buffer;
return { phase: 'text', content: newContent };
}
return { phase: 'text', content: '' };
},
getResult() {
return {
detected,
text: preDelimiterText,
json: postDelimiterText
};
}
};
}
2.3 Pros and cons of delimiter-based separation
| Pros | Cons |
|---|---|
| Single API call (lower cost) | Model might forget or misplace the delimiter |
| Simple to implement | JSON could be malformed (model error) |
| Lower latency (one round trip) | Delimiter detection adds complexity |
| Works with any model | Conversational text might reference the JSON structure |
| Temperature > 0 OK for text portion | Cannot use response_format: { type: "json_object" } since the response is mixed |
3. Pattern 2: Two Separate API Calls (Stream + Structured)
A more robust approach: make two API calls — one streaming for the user, one non-streaming with structured output mode for the system.
3.1 Sequential two-call pattern
async function twoCallPattern(userMessage, conversationHistory = []) {
const messages = [
{
role: 'system',
content: 'You are a helpful product recommendation assistant. Explain your recommendations conversationally.'
},
...conversationHistory,
{ role: 'user', content: userMessage }
];
// --- CALL 1: Stream conversational text to the user ---
const textStream = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
temperature: 0.7 // Natural, varied language
});
let conversationalText = '';
for await (const chunk of textStream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
conversationalText += content;
}
}
console.log('\n\n[Extracting structured data...]');
// --- CALL 2: Extract structured JSON (non-streaming) ---
const structuredResponse = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Extract structured data from this product recommendation conversation.
Return a JSON object with:
- recommendations: array of { product, reason, priceRange, confidence }
- category: product category
- userIntent: what the user is looking for`
},
{ role: 'user', content: userMessage },
{ role: 'assistant', content: conversationalText }
],
response_format: { type: 'json_object' },
temperature: 0 // Deterministic for structured extraction
});
const structuredData = JSON.parse(structuredResponse.choices[0].message.content);
return {
conversationalText,
structuredData,
usage: {
streamCall: 'streamed (usage tracked separately)',
structuredCall: structuredResponse.usage
}
};
}
3.2 Parallel two-call pattern (faster)
When the structured data doesn't depend on the conversational text, run both calls in parallel:
async function parallelTwoCallPattern(userMessage) {
const baseMessages = [
{ role: 'user', content: userMessage }
];
// Start BOTH calls simultaneously
const [textResult, structuredResult] = await Promise.all([
// Call 1: Streaming text
(async () => {
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: 'You are a helpful assistant. Explain your answer conversationally.'
},
...baseMessages
],
stream: true,
temperature: 0.7
});
let text = '';
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
text += content;
}
}
return text;
})(),
// Call 2: Structured JSON (runs in parallel)
(async () => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Analyze this user request and return structured JSON:
{
"recommendations": [{ "product": "string", "reason": "string", "priceRange": "string", "confidence": number }],
"category": "string",
"userIntent": "string",
"followUpQuestions": ["string"]
}`
},
...baseMessages
],
response_format: { type: 'json_object' },
temperature: 0
});
return JSON.parse(response.choices[0].message.content);
})()
]);
return {
conversationalText: textResult,
structuredData: structuredResult
};
}
3.3 Comparison: sequential vs parallel
| Aspect | Sequential | Parallel |
|---|---|---|
| Total latency | Stream time + structured call time | Max(stream time, structured call time) |
| Consistency | Structured data matches conversational text (it's extracted from it) | Structured data generated independently (may differ) |
| Cost | 2 API calls; second call includes first response as context | 2 API calls; less context in each |
| When to use | Structured data must reference the conversational response | Structured data can be generated independently |
| Reliability | If stream succeeds, extraction usually succeeds | Either call can fail independently |
4. Pattern 3: Stream Then Parse (Hybrid)
Stream the entire response as text, then parse structured data out of it after the stream completes. This works when the model includes structured data within its conversational response (like code blocks).
4.1 Extracting JSON from markdown code blocks
async function streamThenExtract(userMessage) {
const systemPrompt = `You are a data analysis assistant.
When answering the user:
1. Explain your analysis conversationally
2. Include a JSON summary in a \`\`\`json code block within your response
3. Continue with any additional commentary after the code block
The JSON summary should contain:
{
"findings": [{ "metric": "string", "value": "string", "trend": "up|down|stable" }],
"overallAssessment": "string",
"confidenceLevel": "high|medium|low"
}`;
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage }
],
stream: true,
temperature: 0.7
});
let fullText = '';
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
fullText += content;
}
}
// After streaming completes, extract JSON from the full text
const structuredData = extractJsonFromMarkdown(fullText);
return { fullText, structuredData };
}
function extractJsonFromMarkdown(text) {
// Match ```json ... ``` blocks
const jsonBlockRegex = /```json\s*\n([\s\S]*?)\n```/g;
const matches = [];
let match;
while ((match = jsonBlockRegex.exec(text)) !== null) {
try {
const parsed = JSON.parse(match[1].trim());
matches.push(parsed);
} catch (error) {
console.warn('Found JSON block but failed to parse:', error.message);
}
}
// Also try to find bare JSON objects (not in code blocks)
if (matches.length === 0) {
const bareJsonRegex = /\{[\s\S]*?"[\w]+"[\s\S]*?\}/g;
while ((match = bareJsonRegex.exec(text)) !== null) {
try {
const parsed = JSON.parse(match[0]);
matches.push(parsed);
} catch {
// Not valid JSON, skip
}
}
}
return matches.length === 1 ? matches[0] : matches.length > 0 ? matches : null;
}
4.2 Real-time JSON detection during streaming
Instead of waiting for the stream to complete, you can detect and parse JSON blocks as they finish during streaming:
async function streamWithRealTimeJsonDetection(userMessage, callbacks) {
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `Respond conversationally. Include structured data in \`\`\`json blocks.`
},
{ role: 'user', content: userMessage }
],
stream: true
});
let fullText = '';
let inCodeBlock = false;
let codeBlockContent = '';
let codeBlockLang = '';
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (!content) continue;
fullText += content;
callbacks.onToken(content);
// Detect code block boundaries
// Check the accumulated text for code block markers
if (!inCodeBlock && fullText.endsWith('```json\n') ||
(!inCodeBlock && fullText.includes('```json') && content.includes('\n') && !codeBlockLang)) {
inCodeBlock = true;
codeBlockLang = 'json';
codeBlockContent = '';
} else if (inCodeBlock && fullText.endsWith('```')) {
// Code block just closed
inCodeBlock = false;
// Remove the trailing ``` from the content
const jsonStr = codeBlockContent.replace(/```$/, '').trim();
try {
const parsed = JSON.parse(jsonStr);
callbacks.onJsonDetected(parsed);
} catch (error) {
callbacks.onJsonError?.(jsonStr, error);
}
codeBlockLang = '';
} else if (inCodeBlock) {
codeBlockContent += content;
}
}
callbacks.onComplete(fullText);
}
// Usage
await streamWithRealTimeJsonDetection(
'Analyze the performance metrics: CPU 85%, Memory 72%, Disk 45%',
{
onToken: (token) => process.stdout.write(token),
onJsonDetected: (data) => {
console.log('\n[STRUCTURED DATA DETECTED]:', JSON.stringify(data));
// Immediately save to database, trigger alerts, etc.
},
onJsonError: (raw, error) => {
console.warn('\n[JSON parse failed]:', error.message);
},
onComplete: (fullText) => {
console.log('\n[Stream complete]');
}
}
);
5. Pattern 4: OpenAI Function Calling + Streaming
OpenAI's function calling (tool use) can be combined with streaming. The model streams its conversational response AND generates function call arguments. This is a powerful pattern for structured + streamed output.
async function streamWithFunctionCall(userMessage) {
const tools = [
{
type: 'function',
function: {
name: 'save_recommendation',
description: 'Save product recommendations to the database',
parameters: {
type: 'object',
properties: {
recommendations: {
type: 'array',
items: {
type: 'object',
properties: {
product: { type: 'string' },
reason: { type: 'string' },
priceRange: { type: 'string' },
confidence: { type: 'number', minimum: 0, maximum: 1 }
},
required: ['product', 'reason', 'priceRange', 'confidence']
}
},
category: { type: 'string' },
userIntent: { type: 'string' }
},
required: ['recommendations', 'category', 'userIntent']
}
}
}
];
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a product recommendation assistant.
When making recommendations:
1. Explain your reasoning conversationally to the user
2. ALSO call the save_recommendation function with structured data`
},
{ role: 'user', content: userMessage }
],
tools,
stream: true
});
let conversationalText = '';
let functionName = '';
let functionArgs = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
// Stream conversational text
if (delta?.content) {
process.stdout.write(delta.content);
conversationalText += delta.content;
}
// Accumulate function call (arrives in chunks too)
if (delta?.tool_calls) {
for (const toolCall of delta.tool_calls) {
if (toolCall.function?.name) {
functionName = toolCall.function.name;
}
if (toolCall.function?.arguments) {
functionArgs += toolCall.function.arguments;
}
}
}
}
// Parse the function call arguments
let structuredData = null;
if (functionArgs) {
try {
structuredData = JSON.parse(functionArgs);
console.log('\n\n[Function call detected: ' + functionName + ']');
console.log(JSON.stringify(structuredData, null, 2));
} catch (error) {
console.error('\nFailed to parse function args:', error.message);
}
}
return {
conversationalText,
functionCall: functionName ? { name: functionName, arguments: structuredData } : null
};
}
6. Production-Ready: Combined Pattern with Fallbacks
In production, you need a robust implementation that handles failures at every stage:
class StreamAndStructureService {
constructor(openai, options = {}) {
this.openai = openai;
this.model = options.model || 'gpt-4o';
this.maxRetries = options.maxRetries || 2;
}
/**
* Primary pattern: Stream text with delimiter, extract JSON.
* Fallback: If JSON extraction fails, make a second structured call.
*/
async execute(userMessage, systemPrompt, jsonSchema, onToken) {
// Phase 1: Stream with delimiter
const streamResult = await this.streamWithDelimiter(
userMessage,
systemPrompt,
onToken
);
// Phase 2: Try to extract JSON from the streamed response
let structuredData = this.tryExtractJson(streamResult.fullText);
// Phase 3: If extraction failed, make a dedicated structured call
if (!structuredData) {
console.log('\n[Delimiter extraction failed — making structured call]');
structuredData = await this.fallbackStructuredCall(
userMessage,
streamResult.conversationalText,
jsonSchema
);
}
// Phase 4: Validate against schema (basic validation)
const isValid = this.validateStructure(structuredData, jsonSchema);
return {
text: streamResult.conversationalText,
data: structuredData,
valid: isValid,
method: structuredData ? (isValid ? 'success' : 'partial') : 'failed'
};
}
async streamWithDelimiter(userMessage, systemPrompt, onToken) {
const delimiter = '---STRUCTURED_DATA---';
const fullSystemPrompt = `${systemPrompt}
After your conversational response, output the delimiter "${delimiter}" followed by a JSON object matching the required schema. Do not include any text after the JSON.`;
const stream = await this.openai.chat.completions.create({
model: this.model,
messages: [
{ role: 'system', content: fullSystemPrompt },
{ role: 'user', content: userMessage }
],
stream: true,
temperature: 0.7
});
let fullText = '';
let delimiterReached = false;
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (!content) continue;
fullText += content;
if (!delimiterReached) {
if (fullText.includes(delimiter)) {
delimiterReached = true;
} else {
onToken?.(content);
}
}
}
const parts = fullText.split(delimiter);
return {
fullText,
conversationalText: parts[0].trim(),
jsonPortion: parts[1]?.trim() || ''
};
}
tryExtractJson(text) {
// Try delimiter-based extraction
const delimiter = '---STRUCTURED_DATA---';
if (text.includes(delimiter)) {
const jsonStr = text.split(delimiter)[1]?.trim();
if (jsonStr) {
try {
return JSON.parse(jsonStr);
} catch { /* fall through */ }
}
}
// Try code block extraction
const codeBlockMatch = text.match(/```json\s*\n([\s\S]*?)\n```/);
if (codeBlockMatch) {
try {
return JSON.parse(codeBlockMatch[1].trim());
} catch { /* fall through */ }
}
return null;
}
async fallbackStructuredCall(userMessage, conversationalText, jsonSchema) {
for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
try {
const response = await this.openai.chat.completions.create({
model: this.model,
messages: [
{
role: 'system',
content: `Extract structured data from this conversation. Return valid JSON matching this schema:\n${JSON.stringify(jsonSchema, null, 2)}`
},
{ role: 'user', content: userMessage },
{ role: 'assistant', content: conversationalText }
],
response_format: { type: 'json_object' },
temperature: 0
});
return JSON.parse(response.choices[0].message.content);
} catch (error) {
if (attempt === this.maxRetries) {
console.error('Structured extraction failed after retries:', error.message);
return null;
}
}
}
}
validateStructure(data, schema) {
if (!data || typeof data !== 'object') return false;
// Basic required field check
if (schema.required) {
for (const field of schema.required) {
if (!(field in data)) return false;
}
}
return true;
}
}
// Usage
const service = new StreamAndStructureService(openai);
const result = await service.execute(
'What laptop should I buy for video editing under $2000?',
'You are a product recommendation assistant.',
{
type: 'object',
required: ['recommendations', 'category'],
properties: {
recommendations: { type: 'array' },
category: { type: 'string' },
userIntent: { type: 'string' }
}
},
(token) => process.stdout.write(token)
);
console.log('\n\nMethod:', result.method);
console.log('Valid:', result.valid);
console.log('Data:', JSON.stringify(result.data, null, 2));
7. Anthropic Claude: Stream + Structured with Tool Use
Claude's tool use can also combine streaming text with structured output:
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
async function claudeStreamAndStructure(userMessage) {
const tools = [
{
name: 'save_analysis',
description: 'Save the structured analysis results',
input_schema: {
type: 'object',
properties: {
findings: {
type: 'array',
items: {
type: 'object',
properties: {
metric: { type: 'string' },
value: { type: 'string' },
trend: { type: 'string', enum: ['up', 'down', 'stable'] }
},
required: ['metric', 'value', 'trend']
}
},
summary: { type: 'string' },
riskLevel: { type: 'string', enum: ['low', 'medium', 'high'] }
},
required: ['findings', 'summary', 'riskLevel']
}
}
];
const stream = anthropic.messages.stream({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
tools,
messages: [
{
role: 'user',
content: `Analyze this and explain your findings conversationally. Also use the save_analysis tool to record structured results.\n\n${userMessage}`
}
]
});
let conversationalText = '';
let toolInput = '';
let currentBlockType = null;
stream.on('contentBlockStart', (event) => {
currentBlockType = event.content_block.type;
});
stream.on('text', (text) => {
// This fires for text content blocks only
process.stdout.write(text);
conversationalText += text;
});
stream.on('inputJson', (json) => {
// This fires for tool use input (arrives incrementally)
toolInput += json;
});
const finalMessage = await stream.finalMessage();
// Extract structured data from tool use
let structuredData = null;
for (const block of finalMessage.content) {
if (block.type === 'tool_use') {
structuredData = block.input;
break;
}
}
console.log('\n\n--- Structured Data ---');
console.log(JSON.stringify(structuredData, null, 2));
return { conversationalText, structuredData };
}
8. Cost Analysis: One Call vs Two Calls
Understanding the cost tradeoffs is essential for production decisions:
Scenario: User asks for laptop recommendations
- User message: ~50 tokens
- Conversational response: ~500 tokens
- Structured JSON: ~200 tokens
- System prompts: ~200 tokens each
PATTERN 1: Single call with delimiter
Input: 50 (user) + 200 (system) = 250 tokens
Output: 500 (text) + 10 (delimiter) + 200 (JSON) = 710 tokens
Total: 960 tokens, 1 API call
PATTERN 2: Two sequential calls
Call 1 (streaming):
Input: 50 + 200 = 250 tokens
Output: 500 tokens
Call 2 (structured):
Input: 50 + 200 + 500 (previous response) = 750 tokens
Output: 200 tokens
Total: 1,700 tokens, 2 API calls
PATTERN 3: Two parallel calls
Call 1 (streaming):
Input: 50 + 200 = 250 tokens
Output: 500 tokens
Call 2 (structured):
Input: 50 + 200 = 250 tokens
Output: 200 tokens
Total: 1,200 tokens, 2 API calls
Cost comparison (GPT-4o: $2.50/1M input, $10/1M output):
Pattern 1: $0.000625 + $0.0071 = $0.00773
Pattern 2: $0.0025 + $0.0070 = $0.00950
Pattern 3: $0.00125 + $0.0070 = $0.00825
At 100K requests/day:
Pattern 1: $773/day
Pattern 2: $950/day (+23%)
Pattern 3: $825/day (+7%)
Decision guide
| Factor | Single call (delimiter) | Two calls (sequential) | Two calls (parallel) |
|---|---|---|---|
| Cost | Best | Worst | Middle |
| Latency | Best | Worst | Middle |
| JSON reliability | Lower (model might forget) | Best (response_format) | Good |
| Consistency | N/A (same response) | Best (extracted from text) | Lower (independent) |
| Complexity | Medium | Low | Low |
| Best for | High-volume, cost-sensitive | Reliability-critical | Speed + reliability |
9. Handling Failures in Each Pattern
9.1 Single-call failure modes
// Problem: Model forgot the delimiter
function handleMissingDelimiter(fullText, delimiter) {
if (!fullText.includes(delimiter)) {
// Attempt 1: Look for JSON at the end of the text
const lastBrace = fullText.lastIndexOf('}');
const firstBrace = fullText.lastIndexOf('{', lastBrace);
if (firstBrace !== -1) {
try {
const json = JSON.parse(fullText.substring(firstBrace, lastBrace + 1));
const text = fullText.substring(0, firstBrace).trim();
return { text, json, method: 'brace-detection' };
} catch { /* fall through */ }
}
// Attempt 2: Look for ```json blocks
const codeBlock = fullText.match(/```json\s*\n([\s\S]*?)\n```/);
if (codeBlock) {
try {
const json = JSON.parse(codeBlock[1]);
const text = fullText.replace(codeBlock[0], '').trim();
return { text, json, method: 'code-block' };
} catch { /* fall through */ }
}
// Attempt 3: Return text only, flag for structured extraction
return { text: fullText, json: null, method: 'extraction-needed' };
}
}
9.2 Two-call failure modes
async function robustTwoCallPattern(userMessage, onToken) {
// Call 1: Stream text (if this fails, the whole operation fails)
let conversationalText;
try {
conversationalText = await streamText(userMessage, onToken);
} catch (error) {
return {
success: false,
error: 'streaming_failed',
message: error.message
};
}
// Call 2: Structured extraction (can fail independently)
let structuredData;
try {
structuredData = await extractStructured(userMessage, conversationalText);
} catch (error) {
// Text was delivered successfully — return it with a note about structured failure
return {
success: true,
partial: true,
text: conversationalText,
data: null,
warning: 'Structured extraction failed: ' + error.message
};
}
return {
success: true,
partial: false,
text: conversationalText,
data: structuredData
};
}
10. Key Takeaways
- The two-phase response pattern solves the fundamental tension: humans want streaming text, systems want structured JSON. You can serve both from one interaction.
- Single-call with delimiter is cheapest but least reliable. Use for high-volume, cost-sensitive applications where occasional JSON extraction failures are acceptable.
- Two sequential calls are most reliable because the second call uses
response_format: { type: "json_object" }and has the full conversational text as context. - Two parallel calls balance speed and reliability — use when the structured data can be generated independently of the conversational text.
- Always implement fallbacks — if the primary extraction method fails, have a secondary method ready. The user should never see a broken response.
- Cost adds up — at scale, the difference between one and two API calls is significant. Profile your actual usage before choosing a pattern.
Explain-It Challenge
- Your team is debating whether to use the single-call delimiter pattern or two separate API calls. The application processes 500,000 requests per day. Build the cost argument for each approach.
- A junior developer says "Why don't we just use
response_format: { type: 'json_object' }and stream it? Problem solved." Explain why this does not solve the dual-purpose problem. - You are building a medical triage chatbot. The streamed text goes to the patient; the structured JSON goes to the doctor's dashboard. Which pattern do you choose and why? What are the reliability requirements?
Navigation: <- 4.9.a — Streaming Conversational Text | 4.9.c — Separating UI from System Outputs ->