System Prompt vs User Prompt: What Goes Where (2026)

Q: What is the difference between a system prompt and a user prompt?

A system prompt is the stable frame that defines who the model is and how it behaves — set once for the session. A user prompt is the turn-by-turn task that changes with each request. Chat APIs at Anthropic, OpenAI, and Google all accept a system-role instruction separate from the user's message: the system prompt is the operator's message, and the user prompt is the end user's turn. The split lines up with how context behaves. A well-designed application has content that should hold steady across every call — persona, rules, tool availability, output format, tone — and content that changes every turn, like the question or a retrieved document. Put the first group in system, the second in user. Treating them as interchangeable is one of the most common mistakes in practical prompt design.

Q: What belongs in the system prompt?

The system prompt holds stable context — content that reads the same on turn 1 and turn 50, and the same for user A and user B in the same application role. That includes the persona and role (who the model is in this application), behavioral rules (what it must always or never do), tool availability and usage policy (which tools exist and when to prefer one), output format defaults (JSON schema, Markdown, length bounds), tone and style, evergreen few-shot examples that illustrate how to reason on every turn, and stable world facts like pricing tables or error-code glossaries. Think of system as the application's operating manual: if the content would be identical across 1,000 consecutive requests, it belongs here. If facts change often, retrieve them instead of hardcoding them in system.

Q: What belongs in the user prompt?

The user prompt holds dynamic context — anything that varies per request. That includes the current task (the actual question or command for this turn), retrieved content (documents pulled from search or RAG for this query), tool outputs (results returned from function calls this turn), task-specific few-shot examples that match this question's shape, current user context (the end user's name, tier, or recent tickets), and conversation history, possibly compressed. A useful test: if swapping this content out changes the answer for the same user on the same turn, it is user-prompt material. The cleanest rule for the whole split is to ask 'does this change per request?' — if yes, it goes in user; if it stays identical across requests, it goes in system.

Q: Can I put dynamic content like the current date in the system prompt?

Rarely a good idea. A timestamp like 'Today is X' changes daily and breaks the cache, so it is user-prompt content. The same goes for per-user data — user ID, tier, history are all user-prompt content, because system is the role and user is the session. Beyond caching, putting dynamic details in system also spends attention-rich early real estate on content that may not apply to this turn: transformers give the start of the input durable attention, so a tight persona with clear rules gets more out of that position than a large prefix the model has to filter through. If you do not use caching and a date-sensitive note must sit up top, it can without much harm, but for production apps that cache, put it in user or at the end of system.

Q: How does the split affect consistency across a long conversation?

The system prompt is the anchor. Conversations drift — tone loosens, rules get stretched, format discipline slips across turns — but as long as the system prompt is stable, it is re-read on every turn and re-asserts the frame. Move rules into user prompts and the anchor disappears: turn 1 the model cites section numbers, turn 5 it does not, because that turn's user message did not remind it. The fix is not copying the rule into every message, which wastes tokens and invites drift if a copy gets edited; it is stating the rule once, in system, where it belongs. A rule added to system takes effect on the next turn and every turn after, while a rule dropped into a single user message takes effect that turn only and generalizes unreliably.

Imtiaz Rayhan

Every chat-style LLM call ships with two kinds of instruction. One is the system prompt — the stable frame that defines who the model is and how it behaves. The other is the user prompt — the turn-by-turn task. They look similar in the API payload but play different roles, and treating them as interchangeable is one of the most common mistakes in practical prompt design. This post, part of the context engineering pillar, covers what belongs where and why the split matters.

The Two Prompt Roles

Chat APIs at Anthropic, OpenAI, and Google all accept a system-role instruction separate from the user's message. The system prompt is the operator's message — set once for the session. The user prompt is the end user's turn — changing with each request.

That split lines up with how context behaves in practice. A well-designed application has content that should hold steady across every call — persona, rules, tool availability, output format, tone. It also has content that changes every turn — the question, the document just retrieved, the tool output just returned. Put the first group in system, the second in user. Get it wrong and three things hurt: caching, attention, and consistency.

What Belongs in the System Prompt

The system prompt is where stable context lives. "Stable" means it reads the same on turn 1 and turn 50, and the same for user A and user B in the same application role.

Persona and role. Who the model is in this application.
Behavioral rules. What the model must always do or never do.
Tool availability and usage policy. Which tools exist, when to prefer one, what to do when a tool fails.
Output format defaults. Expected structure — JSON schema, Markdown, prose, length bounds.
Tone and style. Register, vocabulary, handling disagreement.
Evergreen examples. Few-shot examples that illustrate how to reason on every turn — not examples tied to the current task.
Stable world facts. Pricing tables, tier definitions, error-code glossaries. If they change often, retrieve them instead.

Think of system as the application's operating manual. If the content would be identical across 1,000 consecutive requests, it belongs here.

What Belongs in the User Prompt

The user prompt is where dynamic context lives — anything that varies per request.

The current task. The actual question, instruction, or command for this turn.
Retrieved content. Documents pulled from search or RAG for this query. (See dynamic context assembly.)
Tool outputs. Results returned from function calls this turn.
Task-specific examples. Few-shot examples matching this question's shape.
Current user context. End user's name, tier, recent tickets — anything scoped to the active session.
Conversation history. Prior turns, possibly compressed.

A useful test: if swapping this content out changes the answer for the same user on the same turn, it's user-prompt material.

The Split at a Glance

Goes in system prompt	Goes in user prompt
Persona / role	Current question or task
Behavioral rules	Retrieved documents for this turn
Tool availability and policy	Tool outputs from this turn
Output format defaults	Task-specific few-shot examples
Tone and style	Current user's account context
Evergreen few-shot examples	Conversation history
Stable world facts (rarely change)	Dynamic facts (change per request)

The line isn't always sharp — a recurring user's saved preferences can live in either, depending on scope. But the default split catches most decisions.

Why Caching Cares

Prompt caching reuses the processed form of a prompt prefix when the provider recognizes it from a previous call. A hit cuts cost and latency on the prefix; a miss re-pays the full prefill.

Cache recognition works off an exact-prefix match. The first bytes of your prompt must look identical across calls. If the system prompt contains a timestamp, user ID, or rotating note, the cache never fires — every call is a new prefix. If the system prompt is stable and dynamic content sits later, the cache fires reliably.

That's why "stable in system, dynamic in user" isn't a cleanliness rule — it's what makes caching work. Teams that move rotating content into system "for emphasis" usually discover their cache hit rate collapsed. The savings are real: a well-structured system prompt can cut per-request cost substantially on repeat calls, depending on provider, cache tier, and prefix size.

Corollary: if "stable" content actually changes once a day, putting it at the end of the system prompt — or the start of the user prompt — keeps the higher-value prefix cacheable.

Why Attention Cares

Transformers don't weight all positions equally. The start of the input gets durable attention — tokens near the beginning influence generation throughout. The system prompt, sitting early, gets outsized weight.

That's useful when the content deserves it: hard rules, persona, safety constraints. It's a liability when the content is low-value boilerplate — every downstream decision gets colored by whatever's at the top. Putting dynamic details in system doesn't just break caching; it also spends attention-rich real estate on content that may not apply to this turn. A tight persona with clear rules gets more out of that position than a 4K-token prefix the model has to filter through.

Rule of thumb: if you can't explain why a line needs to influence every turn, it probably doesn't belong in system.

Why Consistency Cares

Conversations drift. Across turns, tone loosens, rules get stretched, format discipline slips. The system prompt is the anchor — as long as it's stable, it's re-read on every turn and re-asserts the frame.

Move rules into user prompts and the anchor disappears. Turn 1 the model cites section numbers; turn 5 it doesn't, because that turn's user message didn't remind it. The fix isn't copying the rule into every message — it's stating it once, in system, where it belongs.

Same logic the other way: if a "rule" actually depends on context — "be more detailed for power-tier users" — that's a conditional. Put the tier flag in user; the rule about how to use it stays in system.

Multi-Turn Dynamics

In a multi-turn chat, the system prompt persists. The provider sends it (or a cached version) on every turn. User messages accumulate: turn N sees the system prompt plus all prior user and assistant messages, then the new user message.

Practical consequences:

A rule added to system takes effect on the next turn and every turn after.
A rule dropped into a user message takes effect that turn only; generalization is unreliable.
Conversation history is user-side context. Compression and summarization happen on the user side.
Tool-use traces live in message history. System describes which tools exist; history shows which tools fired.

For agents running long loops, the split affects compaction too. Compaction tools typically preserve system verbatim and compress message history — so system survives by default, user-side content may get summarized away. Plan with that asymmetry in mind.

Model-Specific Notes

The basic shape — one system role, alternating user/assistant — is consistent across Anthropic, OpenAI, and Google as of early 2026. Positioning differences:

Anthropic (Claude). System prompt is a top-level system parameter, not a message with role: "system".
OpenAI (GPT family). System prompt is a message with role: "system" (or role: "developer" on newer models), placed first in the messages array.
Google (Gemini). A systemInstruction field sits alongside the contents array.

All three support prompt caching on the stable prefix and reward "stable in system, dynamic in user." Exact cache semantics, TTL, and minimum cacheable sizes differ per provider — covered in the caching guide. The split matters the same way everywhere; the API shape is the detail.

Example: Same Task, Good vs Bad Split

A hypothetical support agent handling a refund question. Both versions produce an answer, but only one caches and stays consistent across turns.

Bad split — dynamic content in system:

code

[SYSTEM]
You are a support agent for Acme SaaS.
Current user: alice@example.com (Pro tier, signed up 2024-03-12)
Their last 3 tickets: #4431 (resolved), #4502 (resolved), #4618 (open)
Today is 2026-04-20. The current refund window is 14 days.
Refund requests beyond 14 days require manager approval.
Available tools: refund_issue, escalate_to_manager, lookup_order.

[USER]
Can I get a refund on order #55120?

Problems: user identity, ticket history, date, and session facts are all in system. Every request has a different system prompt, so the cache never fires. On follow-ups, date or ticket list shifts — rules and facts drift together.

Good split — stable in system, dynamic in user:

code

[SYSTEM]
You are a support agent for Acme SaaS. Answer using only the
information provided by tools or in the user message. Never
speculate about prices or account status.

Rules:
- Refund requests within the refund window can be processed directly.
- Requests outside the window require manager approval via escalate_to_manager.
- Always cite the relevant policy section in your reply.

Tools available:
- refund_issue(order_id): process a refund
- escalate_to_manager(ticket_id, reason): escalate
- lookup_order(order_id): fetch order metadata including purchase date

Output: 2-4 sentences, plain text, cite policy section at the end.

[USER]
Current user: alice@example.com (Pro tier).
Today: 2026-04-20.
Refund window (Pro tier): 14 days.
Open ticket: #4618.

Question: Can I get a refund on order #55120?

Why it works: the system prompt is stable across every request. Rules, tool list, and format cache and apply uniformly. Dynamic details — user, date, window, ticket — live in user, where they belong. A follow-up turn reuses the same system prefix (cache hit) with updated per-turn context.

Common Anti-Patterns

Timestamps in system. "Today is X" changes daily and breaks the cache. It's user-prompt content.
Per-user data in system. User ID, tier, history — all user-prompt content. System is the role; user is the session.
No system prompt. Relying on the first user message to set persona. Works once; falls apart by turn 3 as the frame drifts out of attention.
Duplicating rules in every user prompt. Wastes tokens and invites drift if a copy gets edited.
Conflicting rules across system and user. System says "always cite section numbers"; user says "just give me the answer." The model picks, and which it picks varies.
Rotating "emphasis" content into system. Breaks caching for marginal attention gain. Use structural anchors in the user prompt instead.

FAQ

Is there any task where putting dynamic content in system is correct?

Rarely. If you don't use caching and a date-sensitive note has to sit up top, it can — without much harm. For production apps that do cache, put it in user, or at the end of system if it must be there.

What if I don't have a system prompt — just user turns?

You can set persona in the first user message, but you lose consistency across long conversations and waste the attention-rich early position. For anything beyond a single-turn script, add a system prompt.

How long should a system prompt be?

Long enough to establish persona, rules, tools, and format — no longer. Many effective system prompts sit at 200–800 tokens. Past a couple thousand the model starts filtering more than following. If it's growing, check for dynamic content or reference material that should be retrieved instead.

Can I change the system prompt mid-conversation?

Technically yes, but it defeats the cache for the new prefix. Prefer designing a system prompt that holds across the full conversation, or starting a new session when the frame genuinely changes.

Do tool definitions count as system or user?

Tool availability and usage rules belong in system — part of the operating manual. Tool outputs from this turn belong in user (technically in tool/assistant messages, but logically dynamic per-turn content).

Wrap-Up

System prompts and user prompts are both context, but not the same kind. System holds the stable frame — persona, rules, tools, format. User holds the dynamic task — the question, retrieved content, tool output, session. Put each in the right place and caching stays effective, attention stays focused, and behavior stays consistent across long conversations.

Most prompt-structure problems trace to one of two errors: stuffing dynamic content into system (breaks caching, spends attention), or leaking rules into user turns (consistency drifts). Draw the line at "does this change per request?" and the rest usually follows.

For the pillar, context engineering. For the cache mechanics, the prompt caching guide. For the broader discipline, context engineering vs prompt engineering. For per-turn assembly, dynamic context assembly patterns. For the term itself, system prompt.

System Prompt vs User Prompt: What Goes Where (2026)

The Two Prompt Roles

What Belongs in the System Prompt

What Belongs in the User Prompt

The Split at a Glance

Why Caching Cares

Why Attention Cares

Why Consistency Cares

Multi-Turn Dynamics

Model-Specific Notes

Example: Same Task, Good vs Bad Split

Common Anti-Patterns

FAQ

Is there any task where putting dynamic content in system is correct?

What if I don't have a system prompt — just user turns?

How long should a system prompt be?

Can I change the system prompt mid-conversation?

Do tool definitions count as system or user?

Wrap-Up

Ready to write better prompts?

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

Context Engineering: The 2026 Replacement for Prompt Engineering

Prompt Caching Guide (2026): Cutting LLM Costs With Cache Hits

Context Engineering vs Prompt Engineering: The Difference Explained (2026)