Plan-and-execute is a two-phase agent pattern: the model decomposes the goal into an explicit plan first, a human or checker reviews the plan, and then each step runs in turn. It trades the reactive flexibility of a ReAct loop for predictability, cost control, and a natural review gate. Teams reach for it when the task is decomposable up front and the cost of a wrong turn is higher than the cost of a little planning overhead.
What Plan-and-Execute Is
There are two phases and they do not overlap.
- Plan. The model reads the goal and produces a structured list of steps. Each step names what will be done, often with inputs, expected outputs, and a verification check. The plan is a single artifact the user or a checker can read end-to-end before anything else runs.
- Execute. Each step from the plan is dispatched — often as its own sub-prompt to the same model or a cheaper one — and the results are collected. If a step fails or produces a surprising result, the system either falls back to a small re-plan or halts for human review.
The key difference from ReAct is where the reasoning lives. ReAct reasons before every action, in the same loop as the action. Plan-and-execute reasons once up front, commits to a plan, and then the execution phase is more mechanical. That shape buys you three things: an artifact you can review (the plan), independent steps you can execute in parallel, and a clearer failure surface when something does go wrong. See the agentic AI glossary entry for the broader category this pattern lives in.
Why Decompose First
Four reasons, in rough order of why teams adopt it.
Reviewability. If the agent is about to refactor twelve files, you want to see the list before it touches anything. A plan is easy to skim, easy to redline, and easy to veto step-by-step. A ReAct trace is the opposite — long, scattered across turns, and by the time you read it the work is done.
Cost control. Reasoning tokens are the expensive ones, and ReAct pays them at every step. Plan-and-execute pays them once in the planning call; execution steps can use smaller models or tighter prompts because the thinking is already done.
Parallel execution. If steps three, four, and five are independent — write tests, update docs, update the changelog — plan-and-execute can fan them out to parallel workers. ReAct cannot, because step four depends on what step three observed.
Failure isolation. When a plan-and-execute run fails, you know exactly which step failed and what it was supposed to do. The plan gives you a map; the failure is a pin on the map.
When Plan-and-Execute Wins
The pattern is strongest when the task is decomposable up front — you can enumerate the steps without running any of them. "Generate a project scaffold," "produce a weekly report from these data sources," "migrate this API from REST to GraphQL" are all decomposable: the steps are mostly known, dependencies are predictable, and the environment is not going to surprise you much.
It is also strong for high-stakes work where a human review gate is non-negotiable. Anything that touches production, money, or user data benefits from a plan the operator signs off on before execution. The plan becomes an explicit contract: "this is what I am about to do."
And it is strong under token budgets. A ReAct loop's cost grows with steps and observation length because every past turn stays in context. A plan-and-execute run's cost is mostly linear in the number of steps, and each execution step runs on a smaller window because it only sees the plan and its own inputs.
When ReAct Wins
Plan-and-execute hurts when the task is not decomposable because you do not know the environment yet. Debugging an unfamiliar codebase, exploring a dataset, answering a multi-hop question against a search API — the right next step depends on what the previous step returned. A plan written before any observations is a plan written blind, and blind plans tend to be wrong in confident, specific ways: file paths that do not exist, APIs that do not behave as assumed, data shapes that do not match reality. It also hurts when observations change the goal — if step two returns "X is a dead end, try Y," a plan-and-execute run has to throw out the plan and re-plan; a ReAct run would have pivoted in the next thought.
Rough decision table:
| Task shape | Best pattern | Why |
|---|---|---|
| Decomposable workflow, predictable environment | Plan-and-execute | Plan once, execute cheaply, review up front |
| Noisy environment, observation-dependent steps | ReAct | Reasoning needs to see each result |
| Role-separated work (plan vs review vs code) | Multi-agent | Different roles, different prompts, explicit handoffs |
| Spec exists, implementation is the unknown | Spec-driven coding | Specification becomes the plan, agent executes against it |
| High-stakes irreversible action | Plan-and-execute with human gate | Review the plan before anything touches production |
| Pure reasoning, no tools | Chain-of-thought | No actions to plan, no observations to react to |
In practice these patterns stack. A multi-agent system often has a planner that produces a plan-and-execute plan which then dispatches ReAct-style workers for steps that need exploration. The question is less "either/or" and more "which layer."
Anatomy of a Plan Prompt
A plan is only as good as the prompt that produced it. Four things make a plan worth executing.
Constraints. Name the non-negotiables the plan must respect — budget, tools available, output format, files that are off-limits, deadlines. Constraints move decisions into the plan rather than into execution. "Use only the api/v2 namespace, no shell commands, ten steps maximum" is much easier to review than a plan that has to be re-checked at every step.
Ordering. Ask for dependencies. Step five depends on step two's output; step seven is independent and can run in parallel. An unlabelled list forces the executor to guess; a labelled one lets it fan out safely.
Verification gates. Each step needs a check — a test that passes, a file that exists, a status code. Without gates, "execute" means "hope the step worked." With gates, a failed gate halts the run at exactly the step that failed.
A re-plan clause. Say what happens when a step fails — re-plan from the failed step, halt and ask, or roll back. Without that clause, agents improvise, and improvisation after a committed plan is where plan-and-execute runs get brittle.
Plan Revision and Execution
The point of a plan you can read is that you can change it. A good system presents the plan as a first-class artifact and waits for a sign-off before execution starts. The reviewer can be a human or another agent running a checklist — the mechanism matters less than the gate existing. Human edits are usually dropping out-of-scope steps, reordering to respect a constraint the model missed, tightening a vague verification gate, or splitting a fuzzy step into two concrete ones. When another agent reviews, it is checking for missing prerequisites, circular dependencies, and constraint violations — the setup used in multi-agent prompting workflows where a planner and a critic share the planning phase.
Once approved, each step typically becomes its own sub-prompt. That sub-prompt is smaller than a ReAct prompt for the same work — it only needs the goal, the step description, inputs from earlier steps, and the verification gate. The step description should be verbatim from the approved plan so the agent cannot quietly redefine the task, and the gate should be checked programmatically where possible, not by asking the model "did it work." The gap between "the model said it worked" and "the check confirmed it worked" is where these runs fail silently.
A Plan-and-Execute Prompt Example (Hypothetical)
Illustrative planner prompt and the plan it would produce. Hypothetical — meant to show the form, not a real run.
You are a planning agent. Produce a plan, do not execute it.
Goal: Add pagination to the /api/v2/orders endpoint.
Constraints:
- Do not modify files outside app/api/v2/orders/ or tests/api/v2/orders/.
- Use cursor-based pagination (opaque cursor, not offset).
- Each step must declare its verification gate.
- Each step must declare whether it depends on previous steps.
- No step may run a shell command other than the test runner.
- Maximum 8 steps.
Output the plan as numbered steps in this format:
[N] <step title>
Depends on: <step numbers or "none">
Action: <what will be done, one or two sentences>
Verification: <concrete check that proves the step succeeded>
After the final step, write:
RE-PLAN: <what triggers a re-plan and who decides>
A plan in response might look like this (hypothetical):
[1] Read the current endpoint
Depends on: none
Action: Read app/api/v2/orders/route.ts and the matching test file to
understand the existing query, response shape, and test setup.
Verification: Summary of current shape echoed back in the plan output.
[2] Design the cursor
Depends on: 1
Action: Propose cursor format (base64-encoded {id, createdAt}) and page
size defaults, document in a comment in route.ts.
Verification: Comment merged; no functional change yet.
[3] Add failing test for cursor pagination
Depends on: 1
Action: Write a test that asks for page 1, expects a next_cursor, then
asks for page 2 with that cursor and expects distinct rows.
Verification: Test file committed; running it fails with a clear message.
[4] Implement cursor parsing
Depends on: 2, 3
Action: Parse incoming cursor, fall back to first page when absent.
Verification: Unit test for parser passes.
[5] Implement cursor-based query
Depends on: 4
Action: Replace offset query with (createdAt, id) seek predicate.
Verification: Test from step 3 passes.
[6] Update response shape
Depends on: 5
Action: Emit next_cursor in response when more rows exist.
Verification: Test from step 3 passes; existing tests still pass.
[7] Update endpoint docs
Depends on: 6
Action: Update the endpoint README with cursor usage and example.
Verification: README diff reviewed; no code change.
RE-PLAN: If step 5 reveals the existing query cannot express the seek
predicate without a new index, halt and ask. Do not add an index
without approval.
Notice the shape. Every step has a verification gate. Independent steps (1 and 3) are marked so the executor can run them in parallel. The re-plan clause names a specific failure mode and forces a human gate rather than quiet improvisation. That is what makes the plan executable rather than aspirational.
Common Anti-Patterns
- Planning without observations when observations are required. Planning a debug session or an exploratory analysis blind produces confident wrong plans. Fix: do a short ReAct pre-flight to collect the observations, then plan against them.
- No verification gates. "Execute step five" with no check means "assume step five worked." Errors compound silently across the rest of the plan. Fix: require a concrete, machine-checkable gate per step.
- Plans that are actually prose. A seven-paragraph narrative is not a plan — it is a draft. Executors cannot dispatch paragraphs. Fix: enforce structured output (numbered steps, explicit dependencies, explicit gates) in the planner prompt.
- No re-plan clause. When a step fails and the system does not know what to do, agents improvise — and improvised recovery after a committed plan is where these systems get brittle. Fix: specify the failure policy (halt, re-plan from step N, roll back) in the planner prompt.
- Planner and executor sharing state they should not. If the executor has access to the full planning context, it starts second-guessing the plan mid-step. Fix: give the executor only the approved plan and the step's inputs.
- Using plan-and-execute for tasks the model cannot decompose. If the planner produces vague steps like "figure out the right approach" or "investigate the issue," the plan is not a plan. Fix: fall back to ReAct for that task; planning adds no value.
FAQ
How is plan-and-execute different from ReAct?
ReAct interleaves reasoning and acting — think, act, observe, think again. Plan-and-execute separates the two phases: reason once to produce a plan, then execute mechanically. ReAct is flexible and handles surprise well; plan-and-execute is predictable, reviewable, and cheaper when the steps are known up front. Most real systems use both — plan-and-execute at the outer loop, ReAct inside any step that needs exploration.
Can I use plan-and-execute for coding agents?
Yes, and many coding workflows already do implicitly. Spec-driven AI coding is plan-and-execute with the spec acting as the plan. The pattern works best for well-scoped tasks — feature implementation from a spec, refactors with a clear target, migrations with defined steps. For exploratory debugging, ReAct usually wins.
Should the planner and executor be the same model?
Not necessarily, and decoupling them can help. Planning benefits from a stronger model with more reasoning capacity; execution often does not. Splitting them this way is one of the main cost wins of the pattern and is a natural pathway into multi-agent prompting where planner and executor are separate roles.
What should I do when a step fails mid-execution?
Three reasonable options, chosen in the planner prompt rather than ad-hoc. Halt and surface the failure — safest, best for high-stakes runs. Re-plan from the failed step using the observations so far — good when the failure is informative. Roll back and notify — right when the work is transactional. Letting the executor improvise without a policy is how plan-and-execute runs end up in strange, hard-to-audit states.
Wrap-Up
Plan-and-execute is the pattern that makes agent runs reviewable. You pay a small upfront cost — one planning call — and you get back an artifact you can read, edit, parallelize, and use as a contract for what the agent is about to do. Noisy environments still want ReAct. But for decomposable workflows and anything you want a human to sign off on, plan-and-execute is the pattern that scales. For how it fits inside larger systems see the complete guide to prompting AI coding agents; for adjacent patterns see multi-agent prompting and spec-driven AI coding.