Multi-Agent Prompting Guide: Coordinating Specialist Agents (2026)

Q: What is the orchestrator-worker topology?

Orchestrator-worker is the default and most common multi-agent topology: one planner, many executors. The orchestrator's prompt is about decomposition and delegation — read the goal, produce sub-tasks, hand each to a worker, assemble the output. Workers' prompts are about execution within a tight scope — here is your one job, here are your inputs, produce this output. It's the most common topology because failure modes are legible: when something goes wrong, you know whether the plan was bad (orchestrator), a sub-task failed (worker), or the hand-off dropped information, rather than the typical single-prompt post-mortem of "the agent got confused somewhere around step four." Other topologies exist — peer-to-peer, hierarchical (nested orchestrators), and pipeline — but most production systems are orchestrator-worker with occasional hierarchical nesting for research-heavy sub-tasks.

Q: Should multi-agent workers share context or stay isolated?

Start isolated and add a task record. Shared context — pasting the full conversation into each worker call — keeps workers coordinated but is expensive in tokens, dilutes focus, and reintroduces the role-confusion problem multi-agent was supposed to solve (a writer that sees the researcher's scratchpad starts doing research). Isolated context is cheaper and more focused: each worker sees only its inputs, and coordination happens through explicit hand-offs, but hand-off design becomes load-bearing because dropping a needed piece of context fails the run in hard-to-trace ways. The pragmatic middle is isolated context with structured hand-offs plus a task record the orchestrator maintains for the goal, constraints, and previous hand-offs. Resist pushing more into shared context without a specific reason, and prune the task record so it doesn't grow unbounded.

Q: What are the main failure modes of multi-agent systems?

Four patterns account for most failures. Cascade failures: an early worker returns slightly wrong output and later workers compound it — the researcher cites a nonexistent paper, the writer quotes it, the critic misses it; fix by having the orchestrator validate hand-offs against a schema and the task constraints before passing downstream. Redundant work: two workers do the same sub-task; fix with tighter scoping and explicit "do not do X" clauses. Incoherence: each worker is individually correct but the assembled output is inconsistent because no one sees the whole picture; fix with a final assembly step that reads the output end-to-end. Coordination overhead: the system spends more tokens on hand-offs than on work; fix by collapsing small workers into one with a larger scope, since over-decomposition is as bad as under-decomposition.

Q: When is multi-agent prompting worth the complexity?

Three signals say yes. Distinct phases: phases that genuinely want different prompts — research then draft then review, or spec then implement then test — where real differences in inputs, outputs, and success criteria mean specialist prompts beat a generalist one. Long runs: tasks running for hours or producing a large artifact benefit from checkpoints between agents, since each hand-off is a natural review point a single-agent run lacks. Review checkpoints: anything where a critic or human should sign off at specific points — high-stakes writing, production code, regulated work — fits, because the review is itself an agent with its own prompt. It isn't worth it for short or single-concern tasks: a two-paragraph support reply is overhead without payoff, and a small bug fix is simpler as a single ReAct loop. Multi-agent also loses when sub-tasks aren't independent.

Q: How is multi-agent prompting different from plan-and-execute?

Plan-and-execute is a two-phase reasoning pattern inside one agent loop: plan, then execute. Multi-agent is a coordination pattern that distributes those phases across specialist agents with separate prompts. The two compose rather than compete — an orchestrator often uses plan-and-execute as its planning method, and workers often use ReAct internally for exploration or a single-shot call for a deterministic step. So multi-agent is about how you split and route work between agents, while plan-and-execute is about how a single agent reasons through a task. A common cost win of multi-agent is mixing models: a stronger model for the orchestrator and critic, cheaper models for mechanical workers, at the cost of prompt-compatibility testing across models.

Imtiaz Rayhan

Multi-agent prompting splits a problem across several specialist agents — a planner, a researcher, a writer, a reviewer — each with its own prompt and scope. The default topology is orchestrator-worker: one agent decomposes and delegates, workers execute scoped sub-tasks and report back. The central design question is not "how many agents" but "what does each one see" — shared context keeps workers coordinated but expensive; isolated context keeps them focused but demands explicit hand-offs.

What Multi-Agent Prompting Is

A multi-agent system replaces one long prompt with several short ones, each tuned to a role. Instead of asking one agent to "research, draft, and review," the setup hands research to a researcher whose prompt is about source selection, the draft to a writer whose prompt is about structure and voice, and the review to a critic whose prompt is about what to flag.

Single-agent prompting stops scaling in three places: role confusion (one prompt juggling planner, coder, and tester trades them off), context pressure (long runs accumulate observations that drown the signal), and parallelism (independent sub-tasks cannot run concurrently in one loop). Multi-agent appears when any of these starts to bite. See the agentic AI glossary entry for the broader category.

Orchestrator-Worker Topology

One planner, many executors. The orchestrator's prompt is about decomposition and delegation: read the goal, produce sub-tasks, hand each to a worker, assemble the output. Workers' prompts are about execution within a tight scope: here is your one job, here are your inputs, produce this output.

This is the most common topology because failure modes are legible. When something goes wrong, you know whether the plan was bad (orchestrator), a sub-task failed (worker), or the hand-off dropped information. Contrast with a single long prompt where "the agent got confused somewhere around step four" is the typical post-mortem.

The pattern pairs with plan-and-execute prompting: the orchestrator is a plan-and-execute planner. Workers can use whatever pattern fits — a ReAct loop for exploration, a single-shot call for a deterministic step.

Other Topologies

Peer-to-peer. Agents as equals, no central coordinator. Rare because coordination overhead explodes — every agent has to decide who does what, the problem an orchestrator solves once. Shows up in debate patterns, but the judge is effectively an orchestrator.

Hierarchical (nested orchestrators). Workers that are themselves orchestrators. Works when the top-level task has phases that are each decomposable. The risk is depth — each nesting level adds hand-off cost, and by level three you are often better off flatter.

Pipeline. Agents chained in fixed order: A feeds B, B feeds C. Not really multi-agent coordination — a workflow with specialist steps — but the prompt design is similar.

Topology	When it fits	Main risk
Orchestrator-worker	Most real tasks — distinct sub-tasks, clear ownership	Orchestrator becomes a bottleneck
Hierarchical	Multi-level decomposition (research → subtopics → sources)	Hand-off cost compounds with depth
Peer-to-peer	Debate, critique, consensus	Coordination overhead often dominates
Pipeline	Fixed stages with predictable I/O	Brittle when any stage surprises the next

Most production systems are orchestrator-worker with occasional hierarchical nesting for research-heavy sub-tasks.

Specialist Prompts

Each worker gets a prompt tuned to its role. This is where multi-agent earns its cost.

A researcher's prompt is about source quality and breadth — where to look, how to cite, when to stop — not about prose voice. A writer's prompt is about structure and tone and gets sources as input rather than selecting them. A critic's prompt is about what to flag and never rewrites.

The temptation is to paste the same generic preamble into every worker. Resist it. Specialization means each prompt is different in a way that reflects its job; a generic preamble makes workers behave like generalists with different names. For related prompt-construction techniques see the complete guide to prompting AI coding agents.

Hand-Offs

The hand-off is the contract between agents. Two things have to be explicit.

What gets passed. Define the shape — a JSON object, a labelled section, a tool-call result. The worker's prompt describes this as required output; the orchestrator's prompt describes it as expected input. Mismatch is the most common failure: the researcher returns prose, the writer expects a list with URLs, the writer improvises and hallucinates.

What does not. Workers often do not need the full task history. Strip the hand-off to what the next agent needs.

Hand-offs make state visible. In a single-agent run, state hides in a long context window. In a multi-agent run, state is the hand-off object — log it, diff it, replay from it. That visibility is half the reason multi-agent systems are easier to debug once built.

Shared vs Isolated Context

The central design choice. Either every worker sees every other worker's output plus the full orchestrator context, or each worker sees only its inputs and the task description.

Shared context is easy — paste the full conversation into each call. It keeps workers coordinated but is expensive in tokens, dilutes focus, and reintroduces the role-confusion problem multi-agent was supposed to solve. A writer that sees the researcher's scratchpad starts doing research.

Isolated context is cheaper and more focused. Each worker sees only its inputs; coordination happens through explicit hand-offs. The cost is that hand-off design becomes load-bearing — drop a piece of context the next worker needed and the run fails in hard-to-trace ways.

The pragmatic middle is isolated context with structured hand-offs: workers do not share state, but the orchestrator maintains a task record they can reference for the goal, constraints, and previous hand-offs.

Approach	Cost	Focus	Failure mode
Fully shared	High tokens per worker	Low	Role confusion returns
Fully isolated	Low tokens per worker	High	Hand-off drops critical state
Isolated + task record	Moderate	High	Task record grows unbounded without pruning

Start isolated, add a task record, resist pushing more into shared context without a specific reason.

Failure Modes

Four patterns account for most multi-agent failures.

Cascade failures. An early worker returns slightly wrong output; later workers treat it as ground truth and compound the error. The researcher cites a paper that does not exist; the writer quotes it; the critic does not catch it. Fix: the orchestrator validates hand-offs against a schema and the task constraints before passing downstream.

Redundant work. Two workers do the same sub-task because decomposition overlaps or a worker re-does what the previous worker "should have" done. Fix: tighter scoping in the orchestrator prompt and explicit "do not do X" clauses where overlap is a known risk.

Incoherence. Each worker does its job correctly; the assembled output is inconsistent because no one has the whole picture. The introduction promises five points, the body has four, the conclusion references a sixth. Fix: a final assembly step that reads the output end-to-end.

Coordination overhead. The system spends more tokens on hand-offs than on work. Fix: collapse small workers back into a worker with a larger scope. Over-decomposition is as bad as under-decomposition.

A Multi-Agent Prompt Setup (Hypothetical)

Illustrative orchestrator and worker prompts for a "research and draft a short brief" task. Hypothetical — meant to show the form, not a real run.

code

ORCHESTRATOR PROMPT

You are an orchestrator. You do not research or write. You decompose
the task, delegate to workers, and assemble the final output.

Goal: {{user_goal}}

Workers available:
- researcher: returns JSON {sources: [{url, title, one_line}]}
- writer:     takes {sources, outline} and returns markdown draft
- critic:     takes {draft} and returns JSON {blockers, suggestions}

Steps:
1. Produce an outline (3-5 section titles).
2. Call researcher with the goal and outline; expect 5-10 sources.
3. Call writer with sources and outline.
4. Call critic with the draft.
5. If critic returns blockers, call writer again with blockers.
   Otherwise, return the draft.

Task record (shared with workers):
- Original goal: {{user_goal}}
- Audience:      {{audience}}
- Word budget:   {{word_budget}}

code

RESEARCHER PROMPT

You are a research agent. Your only job is to find sources. Do not
write prose, do not draft, do not summarize beyond one line per source.

Inputs: goal, outline, task record.

For each source return:
{ "url": "...", "title": "...", "one_line": "why this matters" }

Rules:
- Prefer primary sources over aggregators.
- Do not invent URLs. Return fewer sources rather than fabricating.
- Return JSON only, no prose.

code

CRITIC PROMPT

You are a critic. You do not rewrite. You flag issues.

Inputs: draft, task record.

Return JSON:
{
  "blockers":    [{"section": "...", "issue": "...", "why": "..."}],
  "suggestions": [{"section": "...", "suggestion": "..."}]
}

A blocker is anything that would embarrass us to publish. Everything
else is a suggestion. Return JSON only.

Notice the shape. The orchestrator never discusses how to research or write — it only decomposes and routes. Each worker is narrow and refuses other workers' jobs. Hand-offs are explicit JSON so the orchestrator can validate before passing downstream.

When Multi-Agent Is Worth the Complexity

Three signals.

Distinct phases. Phases that genuinely want different prompts — research then draft then review, or spec then implement then test. If phases are real (different inputs, outputs, success criteria), specialist prompts beat a generalist prompt.

Long runs. Tasks that run for hours or produce a large artifact benefit from checkpoints between agents. Each hand-off is a natural review point; a single-agent run has no clean checkpoints.

Review checkpoints. Anything where a critic or human should sign off at specific points — high-stakes writing, production code, regulated work — fits because the review is itself an agent with its own prompt, triggered at specific hand-offs.

When It Isn't

Short tasks and single-concern tasks. A two-paragraph support reply is overhead without payoff. A small bug fix is simpler as a single ReAct loop. Multi-agent also loses when sub-tasks are not independent — if worker two needs worker one's full reasoning trace (not just its output), you are fighting the pattern.

Multi-agent frameworks exist for this kind of work — LangGraph, CrewAI, AutoGen, and platform-specific agent SDKs each offer orchestration primitives. Which framework (or none) fits depends on language, hosting, and how much control you want over the coordination layer; the prompting principles apply regardless.

Common Anti-Patterns

Over-decomposition. Ten workers for a task that wants two. Coordination overhead dominates and output loses coherence. Fix: start with the smallest number of workers that captures the distinct roles.
Generic worker prompts. Every worker starts with the same preamble; specialization is only in the name. Fix: worker prompts should differ materially — different instructions, output formats, refusals.
Shared context everywhere. Pasting the full conversation into every worker call. Cheap to implement, expensive in tokens, defeats specialization. Fix: isolated context plus a small task record.
No hand-off schema. Workers return prose; the orchestrator parses prose; parsing fails on edge cases. Fix: structured hand-offs (JSON, typed objects) validated at the orchestrator.
Workers re-doing earlier work. The writer "double-checks" the researcher's sources and goes down a research rabbit hole. Fix: explicit "you do not do X" clauses where overlap is a risk.
No final assembly step. The orchestrator concatenates pieces without reading the whole. Incoherence sneaks in. Fix: an assembly step that reads the output end-to-end.

FAQ

How is multi-agent different from plan-and-execute?

Plan-and-execute is a two-phase pattern inside one agent loop: plan, then execute. Multi-agent distributes those phases across specialist agents with separate prompts. The patterns compose — an orchestrator often uses plan-and-execute as its planning method, and workers often use ReAct internally. Multi-agent is a coordination pattern; plan-and-execute is a reasoning pattern.

Should every worker use the same model?

Not necessarily. A common setup is a stronger model for the orchestrator and critic and cheaper models for mechanical workers. Mixing models is one of the main cost wins of multi-agent. The cost is prompt-compatibility testing — a prompt that works on one model may need tweaking for another.

How do I debug a multi-agent run?

Log every hand-off. Hand-off objects are your debug trace — you can replay any worker call from its inputs, diff outputs across runs, and see exactly where a cascade started. Schema-validate hand-offs at the orchestrator so malformed outputs fail loudly at the source.

When should a worker spawn its own sub-agents?

When its sub-task is itself decomposable and large enough to justify the hand-off cost. "Research the competitive landscape" is a reasonable candidate. "Write the introduction" is not — nesting there adds coordination cost without decomposition value.

Wrap-Up

Multi-agent prompting is what you reach for when one prompt stops being enough — roles conflict, context grows faster than signal, parallelism would pay off. Orchestrator-worker is the default topology, specialist prompts are where the value lives, hand-off design is where systems succeed or fail. Keep workers isolated by default, use a task record for shared state, validate hand-offs at the orchestrator, and resist decomposing every task into ten agents when two will do. For how this pattern composes with others see the complete guide to prompting AI coding agents; for adjacent patterns see plan-and-execute prompting, tool-use prompting patterns, and ReAct prompting.

Multi-Agent Prompting Guide: Coordinating Specialist Agents (2026)

What Multi-Agent Prompting Is

Orchestrator-Worker Topology

Other Topologies

Specialist Prompts

Hand-Offs

Shared vs Isolated Context

Failure Modes

A Multi-Agent Prompt Setup (Hypothetical)

When Multi-Agent Is Worth the Complexity

When It Isn't

Common Anti-Patterns

FAQ

How is multi-agent different from plan-and-execute?

Should every worker use the same model?

How do I debug a multi-agent run?

When should a worker spawn its own sub-agents?

Wrap-Up

Ready to write better prompts?

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

The Complete Guide to Prompting AI Coding Agents (2026)

Plan-and-Execute Prompting: Decompose First, Then Act (2026)

Tool Use Prompting Patterns: Getting Reliable Tool Calls (2026)