When an AI coding agent goes wrong, the instinct is to restart — kill the run, reset the branch, try again. Sometimes that is right. Often it is not. What the agent accumulated — what it tried, what it saw, what failed — is worth more than a fresh context window. Agent debugging prompts extract that information and steer the agent out of the hole. Four categories cover most cases: inspection, rollback, re-scoping, and root-cause.
Why Agents Get Stuck or Wrong
An agent is not a deterministic program; it is a loop that picks actions based on what it sees. Four common failure shapes:
- Looping. Same sequence repeated — run tests, read output, edit the same file, run tests — with no progress. Usually a feedback signal the agent keeps misreading.
- Missing context. A wrong mental model of the codebase. The agent writes a plausible patch against code that does not exist, or calls a function whose signature has changed.
- Wrong scope. The task was ambiguous; the agent chose an interpretation you did not want. Competent work on the wrong problem.
- Tool failures. Shell, editor, search, or test output that the agent misinterpreted or never saw. Hallucinated progress is the tell.
Each failure wants a different prompt. Treating them all the same ("try again with more detail") works some of the time and wastes tokens the rest. The pillar, The Complete Guide to Prompting AI Coding Agents, covers the broader setup; this post is about what to do once things are off.
Inspection Prompts — "What Did You Just Do?"
The first move when an agent seems off is inspection. You do not know what the agent believes, and guessing is worse than asking. Inspection prompts surface the agent's current state before you decide what to do next.
Three prompts do most of this work:
- "Tell me what you did in the last N steps and what you saw." Forces the agent to reconstruct the recent history in plain text. If its summary disagrees with reality — files that were not edited, commands that did not run — you have found the problem.
- "What is the current state of the working tree? What files have changed?" Grounds the conversation in something verifiable.
git statusandgit diffare facts; the agent's impression is not. - "What are you seeing in the test output?" Surfaces what the agent actually read. Truncated logs, ANSI codes, or output the agent skimmed past are common.
Inspection is cheap. Worst case, you confirm the agent is on track and lose a minute. Best case, you catch a hallucinated edit before spending ten more minutes on its consequences. It is also the step most users skip — the pressure when a run goes sideways is to do something. Inspection feels slow and is faster than any alternative built on a wrong premise.
Rollback Prompts — "Undo to Last Good State"
Once the run is not salvageable as-is, rollback is the next step. The goal is to get back to a known-good state cleanly.
Rollback prompts come in three granularities:
| Scope | Prompt shape | When to use |
|---|---|---|
| Single step | "Revert the last edit to file.ts and stop." | One bad change you can point at |
| Recent changes | "Discard changes to the last 3 files you touched." | Agent drifted into files it should not have |
| Full reset | "Discard all uncommitted changes and return to commit abc123." | Unsure what is contaminated; want a clean slate |
A few rules that make rollback safer:
- Name the end state, not just "undo." The agent should know exactly where it is supposed to land.
- Commit before risky runs. A pre-run commit turns any rollback into
git resetorgit checkout. Without one, rollback depends on the agent's memory of what it changed — unreliable past a few steps. - Rollback before re-prompting. Fixing a mess in-place usually produces a worse mess. Clean state first, then re-prompt.
Rollback is also the right move when the agent is looping. Loops persist because the agent keeps seeing the same inputs; clearing the workspace breaks the loop at the input layer.
Re-Scoping Prompts — "Narrow the Task"
Many bad runs are scope problems. The agent was asked to do too much, or something that branched into unexpected territory. Re-scoping prompts cut the task until it fits.
Common shapes:
- "Stop trying to do X. Focus only on Y." Explicit abandonment of a sub-goal. Useful when the agent keeps trying to finish something that is not working.
- "Narrow this to just the change in
file.ts. Leave everything else for a separate task." Shrinks scope to one file. Often the fastest way out of a spiraling multi-file change. - "Split this into two steps: first pass the test for case A; handle case B after." Converts a hard task into two easier ones.
- "Only make changes that
pnpm testvalidates. Skip any that do not have a test." Uses the test runner as a scope boundary.
Re-scoping is the most underused of the four patterns. The instinct is to add more context to the original task. Often what it needs is less — a smaller ask, a clearer boundary, an explicit "do not do X." For longer work, spec-driven AI coding turns the scope boundary into a document before the run starts, preventing most of what you would otherwise re-scope mid-run.
Root-Cause Prompts — "Why Did That Fail?"
The first three categories navigate out of a bad state. Root-cause prompts understand why the state went bad, so the next run does not repeat it.
The shape is diagnostic:
- "Why did step N fail? Walk me through the cause, not the symptom." Separates the observable failure (test red) from the underlying cause (wrong assumption about the data shape).
- "What assumption were you making when you wrote that code? Where is it wrong?" Surfaces the agent's mental model. Assumptions hide in generated code; this prompt pulls them out.
- "If you tried this again, what would you do differently?" Captures the lesson in the agent's framing — paste it into the next prompt so the agent enters with the lesson absorbed. Similar logic to reflexion prompting.
Root-cause prompts are valuable even after a successful run. A shaky foundation fails in a new way next time. A caution: the agent's root-cause explanation is a hypothesis, not a proof — usually plausible, often wrong in specifics. Treat it like a junior engineer's post-mortem: a starting point, not the final word.
An Example Debugging Session (Hypothetical)
A short, illustrative exchange — paths, commands, and responses are hypothetical.
YOU The migration task has been running for ten minutes and the tests
are still red. Something is off. What have you done in the last
five steps, and what is the current state of the working tree?
AGENT Last five steps:
1. Edited prisma/schema.prisma to add the `user_preferences` table.
2. Ran `pnpm prisma migrate dev` — failed with "shadow DB not found".
3. Edited `.env` to add `SHADOW_DATABASE_URL`.
4. Re-ran the migration — same error.
5. Edited `prisma/schema.prisma` again to rename the table.
Current `git diff` shows changes in prisma/schema.prisma, .env,
and tests/setup.ts.
YOU You should not be editing .env or tests/setup.ts for this task.
Discard changes to those two files. Keep schema.prisma.
AGENT Done. `git status` now shows only prisma/schema.prisma modified.
YOU The shadow DB issue is a local setup problem, not a migration
problem. Stop trying to run `migrate dev`. Narrow the task to
just: edit prisma/schema.prisma so the schema is correct for the
new table. I will run the migration manually after.
AGENT Understood. Focused only on schema.prisma. The current version
adds `user_preferences` with columns... [continues]
Four things happened: inspection, rollback (".env and tests/setup.ts"), re-scoping ("stop trying to run migrate"), and an implicit root-cause observation. Each was short. None required restarting the run.
When to Abandon and Restart
Sometimes the right move is to kill the run. Signals the prompt is not salvageable:
- Inspection output disagrees with reality. If
git statusshows five changed files and the agent insists it changed two, the context is corrupted. Start over. - Loops persist after rollback. If the same loop reappears after a clean workspace, the prompt — not the state — is the problem.
- Re-scoping keeps failing. If three attempts to narrow the task fail, your original framing is off. Write a new spec and run again.
- Hallucinated tool output. Agent reports a test passing that is actually red, a file created that does not exist. Stop immediately; continuing compounds the drift.
- Context window exhaustion. Long runs lose earlier context. Past a point, re-prompting is just prompting a different agent with less memory.
Abandoning is not failure. A ten-minute wasted run followed by a two-minute clean run is cheaper than thirty minutes of salvage.
Recognizing Agent Failure Modes
Some failure modes are hard to see without knowing what to look for:
- Hallucinated progress. Success reports without supporting evidence — "the test now passes" without a test run. Always ask for the artifact.
- Premature success. Agent stops when one criterion is met, ignoring others in the spec. Shows up when acceptance criteria are loose.
- Infinite loops. Run → read output → same edit → run. Usually the agent is misreading a feedback signal.
- Silent scope drift. Agent solves a bigger problem than you asked for, touching files you did not name.
git diff --name-onlycatches this in one command. - Confidence without evidence. Explanations that sound right but do not connect to the code — "this fixes it because X" where X is not in the file.
The cheapest defense is ending every run with "run the tests and show me the output" — force the agent to ground its claim in something verifiable.
Common Anti-Patterns
- Retry without inspection. Re-running "try again" throws away the information the failed run produced. Fix: inspect first.
- No scope change on repeat failure. Same prompt, same failure. Fix: if two attempts fail, the task is too big — narrow it.
- Punishing the agent. "You are failing. Do better." Zero instruction content. Fix: say what to do, not how to feel about failing.
- Blanket "do better" prompts. "Be more careful" does not change the input. Fix: change scope, constraints, or the evidence the agent reads.
- Rollback without a target state. "Undo your changes" leaves the agent guessing. Fix: name the commit or file state explicitly.
- Debugging past the token budget. Long sessions blow through context. A fresh run with the lesson encoded is cheaper than round five.
FAQ
How do I know if an agent is looping or just taking a while?
Ask it. "What are you working on, and what is your next step?" A productive run has a clear next step; a looping run has no answer or repeats itself. When in doubt, stop and inspect.
Can I automate any of this?
Parts of it. Some harnesses enforce "commit before each task" and "show git diff after each step" automatically. What is hard to automate is the judgment — deciding whether to re-scope, abandon, or trust the explanation. The prompts are light; the decisions are not.
Does this apply to non-coding agents?
The categories carry over — inspection, rollback, re-scoping, root-cause are generic. The specifics (git diff, test output) are coding-specific. For a research agent, rollback means discarding a line of investigation; for an ops agent, undoing a config change. See the agentic AI glossary entry for how the category generalizes.
How many debugging rounds is too many?
Three is a reasonable ceiling per task. Past that, context overhead and odds of a wrong original framing both compound. Kill the run, write a tighter prompt, start fresh. The next run will be cheaper than the fourth round of salvage.
What if the agent refuses to discard its changes?
Discard them yourself. git reset --hard and git clean -fd do not need the agent's cooperation. If the agent keeps re-creating the changes, the prompt still has them as an implicit goal — fix the prompt, not the diff.