Every chat-style LLM call ships with two kinds of instruction. One is the system prompt — the stable frame that defines who the model is and how it behaves. The other is the user prompt — the turn-by-turn task. They look similar in the API payload but play different roles, and treating them as interchangeable is one of the most common mistakes in practical prompt design. This post, part of the context engineering pillar, covers what belongs where and why the split matters.
The Two Prompt Roles
Chat APIs at Anthropic, OpenAI, and Google all accept a system-role instruction separate from the user's message. The system prompt is the operator's message — set once for the session. The user prompt is the end user's turn — changing with each request.
That split lines up with how context behaves in practice. A well-designed application has content that should hold steady across every call — persona, rules, tool availability, output format, tone. It also has content that changes every turn — the question, the document just retrieved, the tool output just returned. Put the first group in system, the second in user. Get it wrong and three things hurt: caching, attention, and consistency.
What Belongs in the System Prompt
The system prompt is where stable context lives. "Stable" means it reads the same on turn 1 and turn 50, and the same for user A and user B in the same application role.
- Persona and role. Who the model is in this application.
- Behavioral rules. What the model must always do or never do.
- Tool availability and usage policy. Which tools exist, when to prefer one, what to do when a tool fails.
- Output format defaults. Expected structure — JSON schema, Markdown, prose, length bounds.
- Tone and style. Register, vocabulary, handling disagreement.
- Evergreen examples. Few-shot examples that illustrate how to reason on every turn — not examples tied to the current task.
- Stable world facts. Pricing tables, tier definitions, error-code glossaries. If they change often, retrieve them instead.
Think of system as the application's operating manual. If the content would be identical across 1,000 consecutive requests, it belongs here.
What Belongs in the User Prompt
The user prompt is where dynamic context lives — anything that varies per request.
- The current task. The actual question, instruction, or command for this turn.
- Retrieved content. Documents pulled from search or RAG for this query. (See dynamic context assembly.)
- Tool outputs. Results returned from function calls this turn.
- Task-specific examples. Few-shot examples matching this question's shape.
- Current user context. End user's name, tier, recent tickets — anything scoped to the active session.
- Conversation history. Prior turns, possibly compressed.
A useful test: if swapping this content out changes the answer for the same user on the same turn, it's user-prompt material.
The Split at a Glance
| Goes in system prompt | Goes in user prompt |
|---|---|
| Persona / role | Current question or task |
| Behavioral rules | Retrieved documents for this turn |
| Tool availability and policy | Tool outputs from this turn |
| Output format defaults | Task-specific few-shot examples |
| Tone and style | Current user's account context |
| Evergreen few-shot examples | Conversation history |
| Stable world facts (rarely change) | Dynamic facts (change per request) |
The line isn't always sharp — a recurring user's saved preferences can live in either, depending on scope. But the default split catches most decisions.
Why Caching Cares
Prompt caching reuses the processed form of a prompt prefix when the provider recognizes it from a previous call. A hit cuts cost and latency on the prefix; a miss re-pays the full prefill.
Cache recognition works off an exact-prefix match. The first bytes of your prompt must look identical across calls. If the system prompt contains a timestamp, user ID, or rotating note, the cache never fires — every call is a new prefix. If the system prompt is stable and dynamic content sits later, the cache fires reliably.
That's why "stable in system, dynamic in user" isn't a cleanliness rule — it's what makes caching work. Teams that move rotating content into system "for emphasis" usually discover their cache hit rate collapsed. The savings are real: a well-structured system prompt can cut per-request cost substantially on repeat calls, depending on provider, cache tier, and prefix size.
Corollary: if "stable" content actually changes once a day, putting it at the end of the system prompt — or the start of the user prompt — keeps the higher-value prefix cacheable.
Why Attention Cares
Transformers don't weight all positions equally. The start of the input gets durable attention — tokens near the beginning influence generation throughout. The system prompt, sitting early, gets outsized weight.
That's useful when the content deserves it: hard rules, persona, safety constraints. It's a liability when the content is low-value boilerplate — every downstream decision gets colored by whatever's at the top. Putting dynamic details in system doesn't just break caching; it also spends attention-rich real estate on content that may not apply to this turn. A tight persona with clear rules gets more out of that position than a 4K-token prefix the model has to filter through.
Rule of thumb: if you can't explain why a line needs to influence every turn, it probably doesn't belong in system.
Why Consistency Cares
Conversations drift. Across turns, tone loosens, rules get stretched, format discipline slips. The system prompt is the anchor — as long as it's stable, it's re-read on every turn and re-asserts the frame.
Move rules into user prompts and the anchor disappears. Turn 1 the model cites section numbers; turn 5 it doesn't, because that turn's user message didn't remind it. The fix isn't copying the rule into every message — it's stating it once, in system, where it belongs.
Same logic the other way: if a "rule" actually depends on context — "be more detailed for power-tier users" — that's a conditional. Put the tier flag in user; the rule about how to use it stays in system.
Multi-Turn Dynamics
In a multi-turn chat, the system prompt persists. The provider sends it (or a cached version) on every turn. User messages accumulate: turn N sees the system prompt plus all prior user and assistant messages, then the new user message.
Practical consequences:
- A rule added to system takes effect on the next turn and every turn after.
- A rule dropped into a user message takes effect that turn only; generalization is unreliable.
- Conversation history is user-side context. Compression and summarization happen on the user side.
- Tool-use traces live in message history. System describes which tools exist; history shows which tools fired.
For agents running long loops, the split affects compaction too. Compaction tools typically preserve system verbatim and compress message history — so system survives by default, user-side content may get summarized away. Plan with that asymmetry in mind.
Model-Specific Notes
The basic shape — one system role, alternating user/assistant — is consistent across Anthropic, OpenAI, and Google as of early 2026. Positioning differences:
- Anthropic (Claude). System prompt is a top-level
systemparameter, not a message withrole: "system". - OpenAI (GPT family). System prompt is a message with
role: "system"(orrole: "developer"on newer models), placed first in the messages array. - Google (Gemini). A
systemInstructionfield sits alongside thecontentsarray.
All three support prompt caching on the stable prefix and reward "stable in system, dynamic in user." Exact cache semantics, TTL, and minimum cacheable sizes differ per provider — covered in the caching guide. The split matters the same way everywhere; the API shape is the detail.
Example: Same Task, Good vs Bad Split
A hypothetical support agent handling a refund question. Both versions produce an answer, but only one caches and stays consistent across turns.
Bad split — dynamic content in system:
[SYSTEM]
You are a support agent for Acme SaaS.
Current user: alice@example.com (Pro tier, signed up 2024-03-12)
Their last 3 tickets: #4431 (resolved), #4502 (resolved), #4618 (open)
Today is 2026-04-20. The current refund window is 14 days.
Refund requests beyond 14 days require manager approval.
Available tools: refund_issue, escalate_to_manager, lookup_order.
[USER]
Can I get a refund on order #55120?
Problems: user identity, ticket history, date, and session facts are all in system. Every request has a different system prompt, so the cache never fires. On follow-ups, date or ticket list shifts — rules and facts drift together.
Good split — stable in system, dynamic in user:
[SYSTEM]
You are a support agent for Acme SaaS. Answer using only the
information provided by tools or in the user message. Never
speculate about prices or account status.
Rules:
- Refund requests within the refund window can be processed directly.
- Requests outside the window require manager approval via escalate_to_manager.
- Always cite the relevant policy section in your reply.
Tools available:
- refund_issue(order_id): process a refund
- escalate_to_manager(ticket_id, reason): escalate
- lookup_order(order_id): fetch order metadata including purchase date
Output: 2-4 sentences, plain text, cite policy section at the end.
[USER]
Current user: alice@example.com (Pro tier).
Today: 2026-04-20.
Refund window (Pro tier): 14 days.
Open ticket: #4618.
Question: Can I get a refund on order #55120?
Why it works: the system prompt is stable across every request. Rules, tool list, and format cache and apply uniformly. Dynamic details — user, date, window, ticket — live in user, where they belong. A follow-up turn reuses the same system prefix (cache hit) with updated per-turn context.
Common Anti-Patterns
- Timestamps in system. "Today is X" changes daily and breaks the cache. It's user-prompt content.
- Per-user data in system. User ID, tier, history — all user-prompt content. System is the role; user is the session.
- No system prompt. Relying on the first user message to set persona. Works once; falls apart by turn 3 as the frame drifts out of attention.
- Duplicating rules in every user prompt. Wastes tokens and invites drift if a copy gets edited.
- Conflicting rules across system and user. System says "always cite section numbers"; user says "just give me the answer." The model picks, and which it picks varies.
- Rotating "emphasis" content into system. Breaks caching for marginal attention gain. Use structural anchors in the user prompt instead.
FAQ
Is there any task where putting dynamic content in system is correct?
Rarely. If you don't use caching and a date-sensitive note has to sit up top, it can — without much harm. For production apps that do cache, put it in user, or at the end of system if it must be there.
What if I don't have a system prompt — just user turns?
You can set persona in the first user message, but you lose consistency across long conversations and waste the attention-rich early position. For anything beyond a single-turn script, add a system prompt.
How long should a system prompt be?
Long enough to establish persona, rules, tools, and format — no longer. Many effective system prompts sit at 200–800 tokens. Past a couple thousand the model starts filtering more than following. If it's growing, check for dynamic content or reference material that should be retrieved instead.
Can I change the system prompt mid-conversation?
Technically yes, but it defeats the cache for the new prefix. Prefer designing a system prompt that holds across the full conversation, or starting a new session when the frame genuinely changes.
Do tool definitions count as system or user?
Tool availability and usage rules belong in system — part of the operating manual. Tool outputs from this turn belong in user (technically in tool/assistant messages, but logically dynamic per-turn content).
Wrap-Up
System prompts and user prompts are both context, but not the same kind. System holds the stable frame — persona, rules, tools, format. User holds the dynamic task — the question, retrieved content, tool output, session. Put each in the right place and caching stays effective, attention stays focused, and behavior stays consistent across long conversations.
Most prompt-structure problems trace to one of two errors: stuffing dynamic content into system (breaks caching, spends attention), or leaking rules into user turns (consistency drifts). Draw the line at "does this change per request?" and the rest usually follows.
For the pillar, context engineering. For the cache mechanics, the prompt caching guide. For the broader discipline, context engineering vs prompt engineering. For per-turn assembly, dynamic context assembly patterns. For the term itself, system prompt.