Episode 4 — Generative AI Engineering / 4.9 — Combining Streaming with Structured Data

4.9.b — Returning Structured JSON After Generation

In one sentence: The two-phase response pattern streams human-readable text to the user for a great experience, then returns (or extracts) structured JSON for the system to process — giving you the best of both worlds: real-time UX and machine-readable data from a single user interaction.

Navigation: <- 4.9.a — Streaming Conversational Text | 4.9.c — Separating UI from System Outputs ->

1. The Core Problem

You are building a product recommendation chatbot. When the user asks "What laptop should I buy for video editing?", two things need to happen:

The user sees a helpful, conversational explanation streamed in real time ("Based on your needs, I'd recommend...")
The system receives structured data it can act on (product IDs, price ranges, confidence scores for analytics, database writes, or triggering downstream APIs)

These two needs conflict:

Streaming text:     Great for humans. Terrible for machines. 
                    You can't JSON.parse() a half-streamed sentence.

Structured JSON:    Great for machines. Terrible for UX.
                    Users stare at a spinner until the entire JSON is generated.

You need BOTH.      That's the two-phase response pattern.

2. Pattern 1: Stream Text, Then Append JSON (Single Call)

The simplest approach: instruct the model to stream its conversational response first, then output a structured JSON block at the end, separated by a delimiter.

2.1 The prompt design

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const systemPrompt = `You are a product recommendation assistant.

When responding to the user:
1. First, write a helpful conversational explanation of your recommendations.
2. Then, after your explanation, output EXACTLY the delimiter: ---JSON---
3. After the delimiter, output a valid JSON object with this structure:
{
  "recommendations": [
    {
      "product": "string",
      "reason": "string",
      "priceRange": "string",
      "confidence": 0.0-1.0
    }
  ],
  "category": "string",
  "userIntent": "string"
}

IMPORTANT: The JSON must be valid. Do not include any text after the JSON block.`;

async function streamThenParse(userMessage) {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userMessage }
    ],
    stream: true,
    temperature: 0.7
  });

  let fullText = '';
  let jsonStarted = false;
  let jsonBuffer = '';
  const delimiter = '---JSON---';

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (!content) continue;

    fullText += content;

    if (!jsonStarted) {
      // Check if we've hit the delimiter
      if (fullText.includes(delimiter)) {
        jsonStarted = true;
        // Extract any content after the delimiter that arrived in this chunk
        const parts = fullText.split(delimiter);
        jsonBuffer = parts[1] || '';
        // Don't stream the delimiter or JSON to the user
      } else {
        // Still in conversational text — stream to user
        process.stdout.write(content);
      }
    } else {
      // We're in the JSON section — accumulate silently
      jsonBuffer += content;
    }
  }

  // Parse the accumulated JSON
  let structuredData = null;
  try {
    structuredData = JSON.parse(jsonBuffer.trim());
  } catch (error) {
    console.error('\nFailed to parse JSON:', error.message);
    console.error('Raw JSON buffer:', jsonBuffer);
  }

  return {
    conversationalText: fullText.split(delimiter)[0].trim(),
    structuredData,
    rawResponse: fullText
  };
}

// Usage
const result = await streamThenParse('What laptop should I buy for video editing under $2000?');

console.log('\n\n--- Structured Data ---');
console.log(JSON.stringify(result.structuredData, null, 2));
// {
//   "recommendations": [
//     { "product": "MacBook Pro 16\" M3 Pro", "reason": "...", "priceRange": "$1999-$2499", "confidence": 0.92 },
//     { "product": "Dell XPS 15", "reason": "...", "priceRange": "$1599-$1999", "confidence": 0.85 }
//   ],
//   "category": "laptops",
//   "userIntent": "video editing"
// }

2.2 Delimiter detection edge cases

The delimiter might arrive split across chunks. Handle this robustly:

function createDelimiterDetector(delimiter) {
  let buffer = '';
  let detected = false;
  let preDelimiterText = '';
  let postDelimiterText = '';

  return {
    feed(content) {
      if (detected) {
        postDelimiterText += content;
        return { phase: 'json', content };
      }

      buffer += content;

      // Check for complete delimiter
      const delimIndex = buffer.indexOf(delimiter);
      if (delimIndex !== -1) {
        detected = true;
        preDelimiterText = buffer.substring(0, delimIndex);
        postDelimiterText = buffer.substring(delimIndex + delimiter.length);
        return { phase: 'transition', content: null };
      }

      // Check for partial delimiter at the end of buffer
      // e.g., buffer ends with "---JS" which could be start of "---JSON---"
      for (let i = 1; i < delimiter.length; i++) {
        if (buffer.endsWith(delimiter.substring(0, i))) {
          // Potential partial match — hold back these characters
          const safe = buffer.substring(0, buffer.length - i);
          // Only emit the safe portion
          if (safe.length > preDelimiterText.length) {
            const newContent = safe.substring(preDelimiterText.length);
            preDelimiterText = safe;
            return { phase: 'text', content: newContent };
          }
          return { phase: 'text', content: '' };
        }
      }

      // No partial match — emit everything
      if (buffer.length > preDelimiterText.length) {
        const newContent = buffer.substring(preDelimiterText.length);
        preDelimiterText = buffer;
        return { phase: 'text', content: newContent };
      }

      return { phase: 'text', content: '' };
    },

    getResult() {
      return {
        detected,
        text: preDelimiterText,
        json: postDelimiterText
      };
    }
  };
}

2.3 Pros and cons of delimiter-based separation

Pros	Cons
Single API call (lower cost)	Model might forget or misplace the delimiter
Simple to implement	JSON could be malformed (model error)
Lower latency (one round trip)	Delimiter detection adds complexity
Works with any model	Conversational text might reference the JSON structure
Temperature > 0 OK for text portion	Cannot use `response_format: { type: "json_object" }` since the response is mixed

3. Pattern 2: Two Separate API Calls (Stream + Structured)

A more robust approach: make two API calls — one streaming for the user, one non-streaming with structured output mode for the system.

3.1 Sequential two-call pattern

async function twoCallPattern(userMessage, conversationHistory = []) {
  const messages = [
    {
      role: 'system',
      content: 'You are a helpful product recommendation assistant. Explain your recommendations conversationally.'
    },
    ...conversationHistory,
    { role: 'user', content: userMessage }
  ];

  // --- CALL 1: Stream conversational text to the user ---
  const textStream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
    temperature: 0.7 // Natural, varied language
  });

  let conversationalText = '';
  for await (const chunk of textStream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
      conversationalText += content;
    }
  }

  console.log('\n\n[Extracting structured data...]');

  // --- CALL 2: Extract structured JSON (non-streaming) ---
  const structuredResponse = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Extract structured data from this product recommendation conversation.
Return a JSON object with:
- recommendations: array of { product, reason, priceRange, confidence }
- category: product category
- userIntent: what the user is looking for`
      },
      { role: 'user', content: userMessage },
      { role: 'assistant', content: conversationalText }
    ],
    response_format: { type: 'json_object' },
    temperature: 0 // Deterministic for structured extraction
  });

  const structuredData = JSON.parse(structuredResponse.choices[0].message.content);

  return {
    conversationalText,
    structuredData,
    usage: {
      streamCall: 'streamed (usage tracked separately)',
      structuredCall: structuredResponse.usage
    }
  };
}

3.2 Parallel two-call pattern (faster)

When the structured data doesn't depend on the conversational text, run both calls in parallel:

async function parallelTwoCallPattern(userMessage) {
  const baseMessages = [
    { role: 'user', content: userMessage }
  ];

  // Start BOTH calls simultaneously
  const [textResult, structuredResult] = await Promise.all([
    // Call 1: Streaming text
    (async () => {
      const stream = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [
          {
            role: 'system',
            content: 'You are a helpful assistant. Explain your answer conversationally.'
          },
          ...baseMessages
        ],
        stream: true,
        temperature: 0.7
      });

      let text = '';
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
          process.stdout.write(content);
          text += content;
        }
      }
      return text;
    })(),

    // Call 2: Structured JSON (runs in parallel)
    (async () => {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [
          {
            role: 'system',
            content: `Analyze this user request and return structured JSON:
{
  "recommendations": [{ "product": "string", "reason": "string", "priceRange": "string", "confidence": number }],
  "category": "string",
  "userIntent": "string",
  "followUpQuestions": ["string"]
}`
          },
          ...baseMessages
        ],
        response_format: { type: 'json_object' },
        temperature: 0
      });

      return JSON.parse(response.choices[0].message.content);
    })()
  ]);

  return {
    conversationalText: textResult,
    structuredData: structuredResult
  };
}

3.3 Comparison: sequential vs parallel

Aspect	Sequential	Parallel
Total latency	Stream time + structured call time	Max(stream time, structured call time)
Consistency	Structured data matches conversational text (it's extracted from it)	Structured data generated independently (may differ)
Cost	2 API calls; second call includes first response as context	2 API calls; less context in each
When to use	Structured data must reference the conversational response	Structured data can be generated independently
Reliability	If stream succeeds, extraction usually succeeds	Either call can fail independently

4. Pattern 3: Stream Then Parse (Hybrid)

Stream the entire response as text, then parse structured data out of it after the stream completes. This works when the model includes structured data within its conversational response (like code blocks).

4.1 Extracting JSON from markdown code blocks

async function streamThenExtract(userMessage) {
  const systemPrompt = `You are a data analysis assistant.

When answering the user:
1. Explain your analysis conversationally
2. Include a JSON summary in a \`\`\`json code block within your response
3. Continue with any additional commentary after the code block

The JSON summary should contain:
{
  "findings": [{ "metric": "string", "value": "string", "trend": "up|down|stable" }],
  "overallAssessment": "string",
  "confidenceLevel": "high|medium|low"
}`;

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userMessage }
    ],
    stream: true,
    temperature: 0.7
  });

  let fullText = '';

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
      fullText += content;
    }
  }

  // After streaming completes, extract JSON from the full text
  const structuredData = extractJsonFromMarkdown(fullText);

  return { fullText, structuredData };
}

function extractJsonFromMarkdown(text) {
  // Match ```json ... ``` blocks
  const jsonBlockRegex = /```json\s*\n([\s\S]*?)\n```/g;
  const matches = [];
  let match;

  while ((match = jsonBlockRegex.exec(text)) !== null) {
    try {
      const parsed = JSON.parse(match[1].trim());
      matches.push(parsed);
    } catch (error) {
      console.warn('Found JSON block but failed to parse:', error.message);
    }
  }

  // Also try to find bare JSON objects (not in code blocks)
  if (matches.length === 0) {
    const bareJsonRegex = /\{[\s\S]*?"[\w]+"[\s\S]*?\}/g;
    while ((match = bareJsonRegex.exec(text)) !== null) {
      try {
        const parsed = JSON.parse(match[0]);
        matches.push(parsed);
      } catch {
        // Not valid JSON, skip
      }
    }
  }

  return matches.length === 1 ? matches[0] : matches.length > 0 ? matches : null;
}

4.2 Real-time JSON detection during streaming

Instead of waiting for the stream to complete, you can detect and parse JSON blocks as they finish during streaming:

async function streamWithRealTimeJsonDetection(userMessage, callbacks) {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Respond conversationally. Include structured data in \`\`\`json blocks.`
      },
      { role: 'user', content: userMessage }
    ],
    stream: true
  });

  let fullText = '';
  let inCodeBlock = false;
  let codeBlockContent = '';
  let codeBlockLang = '';

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (!content) continue;

    fullText += content;
    callbacks.onToken(content);

    // Detect code block boundaries
    // Check the accumulated text for code block markers
    if (!inCodeBlock && fullText.endsWith('```json\n') || 
        (!inCodeBlock && fullText.includes('```json') && content.includes('\n') && !codeBlockLang)) {
      inCodeBlock = true;
      codeBlockLang = 'json';
      codeBlockContent = '';
    } else if (inCodeBlock && fullText.endsWith('```')) {
      // Code block just closed
      inCodeBlock = false;
      // Remove the trailing ``` from the content
      const jsonStr = codeBlockContent.replace(/```$/, '').trim();
      try {
        const parsed = JSON.parse(jsonStr);
        callbacks.onJsonDetected(parsed);
      } catch (error) {
        callbacks.onJsonError?.(jsonStr, error);
      }
      codeBlockLang = '';
    } else if (inCodeBlock) {
      codeBlockContent += content;
    }
  }

  callbacks.onComplete(fullText);
}

// Usage
await streamWithRealTimeJsonDetection(
  'Analyze the performance metrics: CPU 85%, Memory 72%, Disk 45%',
  {
    onToken: (token) => process.stdout.write(token),
    onJsonDetected: (data) => {
      console.log('\n[STRUCTURED DATA DETECTED]:', JSON.stringify(data));
      // Immediately save to database, trigger alerts, etc.
    },
    onJsonError: (raw, error) => {
      console.warn('\n[JSON parse failed]:', error.message);
    },
    onComplete: (fullText) => {
      console.log('\n[Stream complete]');
    }
  }
);

5. Pattern 4: OpenAI Function Calling + Streaming

OpenAI's function calling (tool use) can be combined with streaming. The model streams its conversational response AND generates function call arguments. This is a powerful pattern for structured + streamed output.

async function streamWithFunctionCall(userMessage) {
  const tools = [
    {
      type: 'function',
      function: {
        name: 'save_recommendation',
        description: 'Save product recommendations to the database',
        parameters: {
          type: 'object',
          properties: {
            recommendations: {
              type: 'array',
              items: {
                type: 'object',
                properties: {
                  product: { type: 'string' },
                  reason: { type: 'string' },
                  priceRange: { type: 'string' },
                  confidence: { type: 'number', minimum: 0, maximum: 1 }
                },
                required: ['product', 'reason', 'priceRange', 'confidence']
              }
            },
            category: { type: 'string' },
            userIntent: { type: 'string' }
          },
          required: ['recommendations', 'category', 'userIntent']
        }
      }
    }
  ];

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a product recommendation assistant.
When making recommendations:
1. Explain your reasoning conversationally to the user
2. ALSO call the save_recommendation function with structured data`
      },
      { role: 'user', content: userMessage }
    ],
    tools,
    stream: true
  });

  let conversationalText = '';
  let functionName = '';
  let functionArgs = '';

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta;

    // Stream conversational text
    if (delta?.content) {
      process.stdout.write(delta.content);
      conversationalText += delta.content;
    }

    // Accumulate function call (arrives in chunks too)
    if (delta?.tool_calls) {
      for (const toolCall of delta.tool_calls) {
        if (toolCall.function?.name) {
          functionName = toolCall.function.name;
        }
        if (toolCall.function?.arguments) {
          functionArgs += toolCall.function.arguments;
        }
      }
    }
  }

  // Parse the function call arguments
  let structuredData = null;
  if (functionArgs) {
    try {
      structuredData = JSON.parse(functionArgs);
      console.log('\n\n[Function call detected: ' + functionName + ']');
      console.log(JSON.stringify(structuredData, null, 2));
    } catch (error) {
      console.error('\nFailed to parse function args:', error.message);
    }
  }

  return {
    conversationalText,
    functionCall: functionName ? { name: functionName, arguments: structuredData } : null
  };
}

6. Production-Ready: Combined Pattern with Fallbacks

In production, you need a robust implementation that handles failures at every stage:

class StreamAndStructureService {
  constructor(openai, options = {}) {
    this.openai = openai;
    this.model = options.model || 'gpt-4o';
    this.maxRetries = options.maxRetries || 2;
  }

  /**
   * Primary pattern: Stream text with delimiter, extract JSON.
   * Fallback: If JSON extraction fails, make a second structured call.
   */
  async execute(userMessage, systemPrompt, jsonSchema, onToken) {
    // Phase 1: Stream with delimiter
    const streamResult = await this.streamWithDelimiter(
      userMessage,
      systemPrompt,
      onToken
    );

    // Phase 2: Try to extract JSON from the streamed response
    let structuredData = this.tryExtractJson(streamResult.fullText);

    // Phase 3: If extraction failed, make a dedicated structured call
    if (!structuredData) {
      console.log('\n[Delimiter extraction failed — making structured call]');
      structuredData = await this.fallbackStructuredCall(
        userMessage,
        streamResult.conversationalText,
        jsonSchema
      );
    }

    // Phase 4: Validate against schema (basic validation)
    const isValid = this.validateStructure(structuredData, jsonSchema);

    return {
      text: streamResult.conversationalText,
      data: structuredData,
      valid: isValid,
      method: structuredData ? (isValid ? 'success' : 'partial') : 'failed'
    };
  }

  async streamWithDelimiter(userMessage, systemPrompt, onToken) {
    const delimiter = '---STRUCTURED_DATA---';
    const fullSystemPrompt = `${systemPrompt}

After your conversational response, output the delimiter "${delimiter}" followed by a JSON object matching the required schema. Do not include any text after the JSON.`;

    const stream = await this.openai.chat.completions.create({
      model: this.model,
      messages: [
        { role: 'system', content: fullSystemPrompt },
        { role: 'user', content: userMessage }
      ],
      stream: true,
      temperature: 0.7
    });

    let fullText = '';
    let delimiterReached = false;

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (!content) continue;
      fullText += content;

      if (!delimiterReached) {
        if (fullText.includes(delimiter)) {
          delimiterReached = true;
        } else {
          onToken?.(content);
        }
      }
    }

    const parts = fullText.split(delimiter);
    return {
      fullText,
      conversationalText: parts[0].trim(),
      jsonPortion: parts[1]?.trim() || ''
    };
  }

  tryExtractJson(text) {
    // Try delimiter-based extraction
    const delimiter = '---STRUCTURED_DATA---';
    if (text.includes(delimiter)) {
      const jsonStr = text.split(delimiter)[1]?.trim();
      if (jsonStr) {
        try {
          return JSON.parse(jsonStr);
        } catch { /* fall through */ }
      }
    }

    // Try code block extraction
    const codeBlockMatch = text.match(/```json\s*\n([\s\S]*?)\n```/);
    if (codeBlockMatch) {
      try {
        return JSON.parse(codeBlockMatch[1].trim());
      } catch { /* fall through */ }
    }

    return null;
  }

  async fallbackStructuredCall(userMessage, conversationalText, jsonSchema) {
    for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
      try {
        const response = await this.openai.chat.completions.create({
          model: this.model,
          messages: [
            {
              role: 'system',
              content: `Extract structured data from this conversation. Return valid JSON matching this schema:\n${JSON.stringify(jsonSchema, null, 2)}`
            },
            { role: 'user', content: userMessage },
            { role: 'assistant', content: conversationalText }
          ],
          response_format: { type: 'json_object' },
          temperature: 0
        });

        return JSON.parse(response.choices[0].message.content);
      } catch (error) {
        if (attempt === this.maxRetries) {
          console.error('Structured extraction failed after retries:', error.message);
          return null;
        }
      }
    }
  }

  validateStructure(data, schema) {
    if (!data || typeof data !== 'object') return false;

    // Basic required field check
    if (schema.required) {
      for (const field of schema.required) {
        if (!(field in data)) return false;
      }
    }

    return true;
  }
}

// Usage
const service = new StreamAndStructureService(openai);

const result = await service.execute(
  'What laptop should I buy for video editing under $2000?',
  'You are a product recommendation assistant.',
  {
    type: 'object',
    required: ['recommendations', 'category'],
    properties: {
      recommendations: { type: 'array' },
      category: { type: 'string' },
      userIntent: { type: 'string' }
    }
  },
  (token) => process.stdout.write(token)
);

console.log('\n\nMethod:', result.method);
console.log('Valid:', result.valid);
console.log('Data:', JSON.stringify(result.data, null, 2));

7. Anthropic Claude: Stream + Structured with Tool Use

Claude's tool use can also combine streaming text with structured output:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

async function claudeStreamAndStructure(userMessage) {
  const tools = [
    {
      name: 'save_analysis',
      description: 'Save the structured analysis results',
      input_schema: {
        type: 'object',
        properties: {
          findings: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                metric: { type: 'string' },
                value: { type: 'string' },
                trend: { type: 'string', enum: ['up', 'down', 'stable'] }
              },
              required: ['metric', 'value', 'trend']
            }
          },
          summary: { type: 'string' },
          riskLevel: { type: 'string', enum: ['low', 'medium', 'high'] }
        },
        required: ['findings', 'summary', 'riskLevel']
      }
    }
  ];

  const stream = anthropic.messages.stream({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4096,
    tools,
    messages: [
      {
        role: 'user',
        content: `Analyze this and explain your findings conversationally. Also use the save_analysis tool to record structured results.\n\n${userMessage}`
      }
    ]
  });

  let conversationalText = '';
  let toolInput = '';
  let currentBlockType = null;

  stream.on('contentBlockStart', (event) => {
    currentBlockType = event.content_block.type;
  });

  stream.on('text', (text) => {
    // This fires for text content blocks only
    process.stdout.write(text);
    conversationalText += text;
  });

  stream.on('inputJson', (json) => {
    // This fires for tool use input (arrives incrementally)
    toolInput += json;
  });

  const finalMessage = await stream.finalMessage();

  // Extract structured data from tool use
  let structuredData = null;
  for (const block of finalMessage.content) {
    if (block.type === 'tool_use') {
      structuredData = block.input;
      break;
    }
  }

  console.log('\n\n--- Structured Data ---');
  console.log(JSON.stringify(structuredData, null, 2));

  return { conversationalText, structuredData };
}

8. Cost Analysis: One Call vs Two Calls

Understanding the cost tradeoffs is essential for production decisions:

Scenario: User asks for laptop recommendations
  - User message: ~50 tokens
  - Conversational response: ~500 tokens
  - Structured JSON: ~200 tokens
  - System prompts: ~200 tokens each

PATTERN 1: Single call with delimiter
  Input:  50 (user) + 200 (system) = 250 tokens
  Output: 500 (text) + 10 (delimiter) + 200 (JSON) = 710 tokens
  Total:  960 tokens, 1 API call

PATTERN 2: Two sequential calls
  Call 1 (streaming):
    Input:  50 + 200 = 250 tokens
    Output: 500 tokens
  Call 2 (structured):
    Input:  50 + 200 + 500 (previous response) = 750 tokens
    Output: 200 tokens
  Total:  1,700 tokens, 2 API calls

PATTERN 3: Two parallel calls
  Call 1 (streaming):
    Input:  50 + 200 = 250 tokens
    Output: 500 tokens
  Call 2 (structured):
    Input:  50 + 200 = 250 tokens
    Output: 200 tokens
  Total:  1,200 tokens, 2 API calls

Cost comparison (GPT-4o: $2.50/1M input, $10/1M output):
  Pattern 1: $0.000625 + $0.0071  = $0.00773
  Pattern 2: $0.0025   + $0.0070  = $0.00950
  Pattern 3: $0.00125  + $0.0070  = $0.00825

At 100K requests/day:
  Pattern 1: $773/day
  Pattern 2: $950/day  (+23%)
  Pattern 3: $825/day  (+7%)

Decision guide

Factor	Single call (delimiter)	Two calls (sequential)	Two calls (parallel)
Cost	Best	Worst	Middle
Latency	Best	Worst	Middle
JSON reliability	Lower (model might forget)	Best (response_format)	Good
Consistency	N/A (same response)	Best (extracted from text)	Lower (independent)
Complexity	Medium	Low	Low
Best for	High-volume, cost-sensitive	Reliability-critical	Speed + reliability

9. Handling Failures in Each Pattern

9.1 Single-call failure modes

// Problem: Model forgot the delimiter
function handleMissingDelimiter(fullText, delimiter) {
  if (!fullText.includes(delimiter)) {
    // Attempt 1: Look for JSON at the end of the text
    const lastBrace = fullText.lastIndexOf('}');
    const firstBrace = fullText.lastIndexOf('{', lastBrace);
    if (firstBrace !== -1) {
      try {
        const json = JSON.parse(fullText.substring(firstBrace, lastBrace + 1));
        const text = fullText.substring(0, firstBrace).trim();
        return { text, json, method: 'brace-detection' };
      } catch { /* fall through */ }
    }

    // Attempt 2: Look for ```json blocks
    const codeBlock = fullText.match(/```json\s*\n([\s\S]*?)\n```/);
    if (codeBlock) {
      try {
        const json = JSON.parse(codeBlock[1]);
        const text = fullText.replace(codeBlock[0], '').trim();
        return { text, json, method: 'code-block' };
      } catch { /* fall through */ }
    }

    // Attempt 3: Return text only, flag for structured extraction
    return { text: fullText, json: null, method: 'extraction-needed' };
  }
}

9.2 Two-call failure modes

async function robustTwoCallPattern(userMessage, onToken) {
  // Call 1: Stream text (if this fails, the whole operation fails)
  let conversationalText;
  try {
    conversationalText = await streamText(userMessage, onToken);
  } catch (error) {
    return {
      success: false,
      error: 'streaming_failed',
      message: error.message
    };
  }

  // Call 2: Structured extraction (can fail independently)
  let structuredData;
  try {
    structuredData = await extractStructured(userMessage, conversationalText);
  } catch (error) {
    // Text was delivered successfully — return it with a note about structured failure
    return {
      success: true,
      partial: true,
      text: conversationalText,
      data: null,
      warning: 'Structured extraction failed: ' + error.message
    };
  }

  return {
    success: true,
    partial: false,
    text: conversationalText,
    data: structuredData
  };
}

10. Key Takeaways

The two-phase response pattern solves the fundamental tension: humans want streaming text, systems want structured JSON. You can serve both from one interaction.
Single-call with delimiter is cheapest but least reliable. Use for high-volume, cost-sensitive applications where occasional JSON extraction failures are acceptable.
Two sequential calls are most reliable because the second call uses response_format: { type: "json_object" } and has the full conversational text as context.
Two parallel calls balance speed and reliability — use when the structured data can be generated independently of the conversational text.
Always implement fallbacks — if the primary extraction method fails, have a secondary method ready. The user should never see a broken response.
Cost adds up — at scale, the difference between one and two API calls is significant. Profile your actual usage before choosing a pattern.

Explain-It Challenge

Your team is debating whether to use the single-call delimiter pattern or two separate API calls. The application processes 500,000 requests per day. Build the cost argument for each approach.
A junior developer says "Why don't we just use response_format: { type: 'json_object' } and stream it? Problem solved." Explain why this does not solve the dual-purpose problem.
You are building a medical triage chatbot. The streamed text goes to the patient; the structured JSON goes to the doctor's dashboard. Which pattern do you choose and why? What are the reliability requirements?

Navigation: <- 4.9.a — Streaming Conversational Text | 4.9.c — Separating UI from System Outputs ->