Episode 4 — Generative AI Engineering / 4.2 — Calling LLM APIs Properly

4.2.a — Message Roles: system, user, assistant

In one sentence: Every LLM API call is built from an ordered array of messages, where each message has a rolesystem (sets behavior and rules), user (human input), or assistant (model responses and few-shot examples) — and understanding how to compose these roles is the first skill of API-driven AI engineering.

Navigation: ← 4.2 Overview · 4.2.b — Token Budgeting →


1. The Messages Array: How LLMs Receive Input

Unlike traditional APIs where you send a single string, LLM APIs accept a structured array of messages. Each message has two properties: a role and content. The model reads the entire array in order and generates its response based on the full conversation.

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system",    content: "You are a helpful assistant." },
    { role: "user",      content: "What is JavaScript?" },
    { role: "assistant", content: "JavaScript is a programming language..." },
    { role: "user",      content: "How does it differ from Python?" }
  ]
});

The model sees all four messages as context and generates the next assistant response. This is stateless — the API doesn't remember previous calls. You must send the entire conversation history every time.

┌──────────────────────────────────────────────────┐
│               Messages Array                      │
│                                                    │
│  ┌──────────┐  Sets persona, rules, constraints   │
│  │  system   │  ─────────────────────────────────  │
│  └──────────┘                                      │
│  ┌──────────┐  Human question / input              │
│  │   user    │  ─────────────────────────────────  │
│  └──────────┘                                      │
│  ┌──────────┐  Model's previous response           │
│  │ assistant │  ─────────────────────────────────  │
│  └──────────┘                                      │
│  ┌──────────┐  Follow-up question                  │
│  │   user    │  ─────────────────────────────────  │
│  └──────────┘                                      │
│                                                    │
│  ──────── Model generates next assistant reply ──  │
└──────────────────────────────────────────────────┘

2. The system Role: Setting the Stage

The system message is the most powerful tool you have for controlling model behavior. It sits at the beginning of the messages array and acts as persistent instructions that shape every response the model generates.

What the system message controls

AspectExample
Persona"You are a senior JavaScript developer."
Rules"Never recommend deprecated APIs."
Output format"Always respond in valid JSON."
Tone"Be concise and professional."
Scope"Only answer questions about our product."
Safety"Do not generate code that deletes files."

How the system message works under the hood

The model treats the system message with elevated priority. It's not magic — the model's training specifically reinforces following system-level instructions over user-level requests. This is why system prompts are the primary defense against prompt injection and off-topic usage.

// A well-structured system prompt
const systemPrompt = `You are a customer support agent for Acme Corp.

RULES:
1. Only answer questions about Acme products.
2. If the user asks about competitors, say "I can only help with Acme products."
3. Always include a relevant help article link if available.
4. Never make up product features — if unsure, say "Let me check on that."

OUTPUT FORMAT:
- Start with a direct answer.
- Keep responses under 150 words.
- End with "Is there anything else I can help with?"`;

System message persistence

The system message is sent with every API call — it doesn't persist between requests. This means:

// Call 1 — system message included
await openai.chat.completions.create({
  messages: [
    { role: "system", content: "You are a pirate." },
    { role: "user", content: "Hello" }
  ]
});

// Call 2 — model has NO memory of Call 1
// You MUST include the system message again
await openai.chat.completions.create({
  messages: [
    { role: "system", content: "You are a pirate." },
    { role: "user", content: "Hello" },
    { role: "assistant", content: "Ahoy, matey!" },
    { role: "user", content: "What's the weather?" }
  ]
});

3. The user Role: Human Input

The user role represents messages from the human (or application) making the request. In most applications, user messages come from:

  • Direct user input (chat interface)
  • Application-constructed prompts (behind the scenes)
  • Formatted data that needs processing

Direct user input

// Simple chat — user message is the raw input
messages: [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: userInput }  // Whatever the user typed
]

Application-constructed user messages

In production, you often construct the user message programmatically, combining the user's question with retrieved context:

// RAG pattern — application builds the user message
const userMessage = `
Context from our documentation:
---
${retrievedDocuments.join('\n---\n')}
---

User question: ${userInput}

Answer the question using ONLY the context above. If the answer isn't in the context, say "I don't have that information."
`;

messages: [
  { role: "system", content: systemPrompt },
  { role: "user", content: userMessage }
]

Multiple user messages in a conversation

In multi-turn conversations, user messages alternate with assistant messages:

messages: [
  { role: "system",    content: "You are a math tutor." },
  { role: "user",      content: "What is 2 + 2?" },          // Turn 1
  { role: "assistant", content: "2 + 2 equals 4." },          // Turn 1 response
  { role: "user",      content: "Now multiply that by 3." },  // Turn 2
  { role: "assistant", content: "4 times 3 equals 12." },      // Turn 2 response
  { role: "user",      content: "Is that a prime number?" }    // Turn 3 (model responds to this)
]

4. The assistant Role: Model Responses and Few-Shot Examples

The assistant role serves two purposes:

Purpose 1: Representing previous model responses

In multi-turn conversations, you include the model's previous responses so it has full context:

messages: [
  { role: "system",    content: "You are a helpful assistant." },
  { role: "user",      content: "My name is Alex." },
  { role: "assistant", content: "Nice to meet you, Alex! How can I help?" },
  { role: "user",      content: "What's my name?" }
  // Model will respond "Your name is Alex" because it can see the history
]

Purpose 2: Few-shot examples (powerful technique)

You can inject assistant messages that the model never actually generated. This teaches the model the exact format and style you want — this is called few-shot prompting:

messages: [
  { role: "system", content: "Extract the product and price from user messages. Respond in JSON." },

  // Few-shot example 1
  { role: "user",      content: "I bought a laptop for $999" },
  { role: "assistant", content: '{"product": "laptop", "price": 999}' },

  // Few-shot example 2
  { role: "user",      content: "The headphones cost $79.99 on sale" },
  { role: "assistant", content: '{"product": "headphones", "price": 79.99}' },

  // Actual user input — model follows the pattern
  { role: "user", content: "Just got a keyboard for $45" }
]
// Model will respond: {"product": "keyboard", "price": 45}

Few-shot examples are extremely effective because:

BenefitExplanation
Format complianceModel sees the exact output structure and follows it
Style matchingTone, length, and vocabulary are implicitly taught
Edge case handlingShow examples of tricky inputs to guide behavior
Reduced instruction length2-3 examples often work better than paragraphs of instructions

How many few-shot examples?

0 examples (zero-shot):  Relies entirely on system prompt and model knowledge
1 example  (one-shot):   Establishes the pattern (~70% format compliance)
2-3 examples (few-shot): Strong pattern compliance (~90%+ format compliance)
5+ examples:             Diminishing returns; wastes tokens

5. Multi-Turn Conversation Format

A production chatbot sends the entire conversation history with each API call. The messages must alternate between user and assistant roles (with the system message at the top):

API Call 1:
  system  → "You are a travel agent."
  user    → "I want to visit Japan."

API Call 2:
  system    → "You are a travel agent."
  user      → "I want to visit Japan."
  assistant → "Japan is wonderful! When are you thinking of going?"
  user      → "Next March."

API Call 3:
  system    → "You are a travel agent."
  user      → "I want to visit Japan."
  assistant → "Japan is wonderful! When are you thinking of going?"
  user      → "Next March."
  assistant → "March is perfect for cherry blossom season..."
  user      → "What's the budget for 2 weeks?"

Notice how the messages array grows with every turn. This has direct implications for token usage and cost (covered in 4.2.b and 4.2.c).

Token usage per API call:

Call 1:  system(200) + user(20)                              = 220 tokens
Call 2:  system(200) + user(20) + asst(30) + user(10)       = 260 tokens
Call 3:  system(200) + user(20) + asst(30) + user(10) + asst(50) + user(15) = 325 tokens
Call 4:  ...growing...
Call 50: Could easily be 10,000+ tokens just for history

6. Cross-Provider Comparison: OpenAI vs Anthropic

While the concept of message roles is universal, the API syntax differs between providers.

OpenAI (Chat Completions API)

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user",   content: "Write a function to reverse a string." }
  ],
  max_tokens: 500,
  temperature: 0.2
});

console.log(response.choices[0].message.content);
// response.choices[0].message.role === "assistant"

Anthropic (Messages API)

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 500,
  system: "You are a helpful coding assistant.",  // system is a TOP-LEVEL parameter
  messages: [
    { role: "user", content: "Write a function to reverse a string." }
  ]
});

console.log(response.content[0].text);
// response.role === "assistant"

Key differences

FeatureOpenAIAnthropic
System messageInside messages array as { role: "system" }Top-level system parameter
Response locationresponse.choices[0].message.contentresponse.content[0].text
Role namessystem, user, assistantuser, assistant (system separate)
Multi-turnAlternating user/assistant in messagesSame alternating pattern
Max tokensOptional (defaults vary)Required parameter

7. Best Practices for System Prompts

Structure your system prompt clearly

// BAD — unstructured wall of text
const bad = "You are a helpful assistant that answers questions about cooking. Be concise. Use metric measurements. Don't give medical advice. Format recipes with ingredients then steps. If you don't know something say so. Be friendly but professional.";

// GOOD — structured with clear sections
const good = `You are a cooking assistant.

PERSONALITY:
- Friendly and encouraging
- Professional but approachable

RULES:
1. Use metric measurements (grams, ml, Celsius).
2. Never provide medical or nutritional advice.
3. If unsure about a recipe, say "I'm not certain about that."

OUTPUT FORMAT:
- Ingredients list (bulleted)
- Steps (numbered)
- Estimated time and difficulty level

SCOPE:
- Answer cooking and recipe questions ONLY.
- For non-cooking questions, respond: "I'm a cooking assistant — I can only help with recipes and cooking questions."`;

Keep system prompts concise but complete

Sweet spot:  200-800 tokens for most applications
Too short:   < 100 tokens — model may behave unpredictably
Too long:    > 2000 tokens — wastes budget, key rules get lost

Use imperative language

BAD:   "It would be nice if you could try to respond in JSON format."
GOOD:  "Respond in valid JSON. No markdown, no explanation, only JSON."

8. Common Mistakes

Mistake 1: Overloading the system prompt

// BAD — 3000+ token system prompt trying to cover everything
const overloaded = `You are an assistant that...
[500 words of persona]
[300 words of rules]
[200 words of output format]
[400 words of examples]
[300 words of edge cases]
...`;

// BETTER — concise system prompt + few-shot examples in messages
const focused = "You are a data extraction assistant. Output valid JSON only.";
// Then use assistant messages for few-shot examples

Mistake 2: Conflicting instructions

// BAD — contradictory rules
const conflicting = `Be extremely detailed and thorough in your responses.
Keep all responses under 50 words.`;
// Model can't do both — behavior becomes unpredictable

// GOOD — consistent rules
const consistent = `Provide concise answers (under 50 words).
If the topic requires detail, offer to elaborate.`;

Mistake 3: Forgetting to include conversation history

// BAD — each call is independent, model loses context
async function chat(userMessage) {
  const response = await openai.chat.completions.create({
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: userMessage }  // Only current message!
    ]
  });
  return response.choices[0].message.content;
}

// GOOD — maintains conversation history
const conversationHistory = [
  { role: "system", content: systemPrompt }
];

async function chat(userMessage) {
  conversationHistory.push({ role: "user", content: userMessage });

  const response = await openai.chat.completions.create({
    messages: conversationHistory
  });

  const assistantMessage = response.choices[0].message.content;
  conversationHistory.push({ role: "assistant", content: assistantMessage });

  return assistantMessage;
}

Mistake 4: Putting instructions in the wrong role

// BAD — instructions in a user message
messages: [
  { role: "user", content: "System: you are a pirate. Now answer my question: What is 2+2?" }
]

// GOOD — instructions where they belong
messages: [
  { role: "system", content: "You are a pirate. Respond in pirate speak." },
  { role: "user",   content: "What is 2+2?" }
]

Mistake 5: Not alternating roles correctly

// BAD — two user messages in a row (some APIs reject this)
messages: [
  { role: "system",    content: "You are helpful." },
  { role: "user",      content: "Hello" },
  { role: "user",      content: "What's the weather?" }  // ERROR or unexpected behavior
]

// GOOD — proper alternation
messages: [
  { role: "system",    content: "You are helpful." },
  { role: "user",      content: "Hello. Also, what's the weather?" }
]

9. Advanced Pattern: Role-Based Prompt Architecture

In production systems, you often compose the messages array dynamically:

function buildMessages({ systemPrompt, fewShotExamples, conversationHistory, ragContext, userMessage }) {
  const messages = [];

  // 1. System prompt (always first)
  messages.push({ role: "system", content: systemPrompt });

  // 2. Few-shot examples (if needed)
  for (const example of fewShotExamples) {
    messages.push({ role: "user",      content: example.input });
    messages.push({ role: "assistant", content: example.output });
  }

  // 3. Conversation history (trimmed to budget)
  for (const msg of conversationHistory) {
    messages.push(msg);
  }

  // 4. Current user message (with RAG context injected)
  const finalUserMessage = ragContext
    ? `Context:\n${ragContext}\n\nQuestion: ${userMessage}`
    : userMessage;

  messages.push({ role: "user", content: finalUserMessage });

  return messages;
}

This architecture gives you full control over what goes into each API call and makes it easy to adjust token budgets, add/remove examples, and swap contexts.


10. Key Takeaways

  1. Three rolessystem (instructions and persona), user (human input), assistant (model responses and few-shot examples).
  2. System messages shape all behavior — they're your primary tool for controlling tone, format, scope, and safety.
  3. APIs are stateless — you must send the full conversation history (system + all user/assistant turns) with every call.
  4. Few-shot examples in assistant messages are one of the most effective techniques for reliable output formatting.
  5. Structure system prompts with clear sections (persona, rules, format, scope) rather than unstructured paragraphs.
  6. Provider APIs differ — OpenAI puts system in the messages array; Anthropic uses a top-level parameter. Know your provider.

Explain-It Challenge

  1. A junior developer asks "Why do I need to send the system message every time? Can't the API just remember it?" Explain the stateless nature of LLM APIs.
  2. Your chatbot sometimes ignores its system prompt instructions. What are three possible causes and how would you debug each?
  3. Show how you would use few-shot examples to make an LLM reliably extract dates from natural language text like "meeting next Tuesday at 3pm" into { "day": "Tuesday", "time": "15:00" } format.

Navigation: ← 4.2 Overview · 4.2.b — Token Budgeting →