Agent Debugging Prompts: Fixing Stuck or Wrong Agents (2026)

Q: What are the four agent debugging prompt patterns?

Four categories of debugging prompt cover most cases when an AI coding agent goes wrong. Inspection prompts surface the agent's current state before you act — asking what it did in the last N steps, the current state of the working tree, and what it sees in the test output. Rollback prompts return the workspace to a known-good state at one of three granularities: a single edit, the last few touched files, or a full reset to a named commit. Re-scoping prompts narrow a task that was too big or drifted, by abandoning a sub-goal, shrinking to one file, splitting into steps, or using the test runner as a scope boundary. Root-cause prompts ask why the state went bad so the next run does not repeat it. Each failure shape wants a different prompt — treating them all as 'try again with more detail' wastes tokens.

Q: How do I know if an AI agent is looping or just taking a while?

Ask it directly: 'What are you working on, and what is your next step?' A productive run has a clear next step. A looping run has no answer or simply repeats itself. When in doubt, stop and inspect rather than letting it continue. Looping shows up as the same sequence repeated — run tests, read output, edit the same file, run tests — with no progress, usually because the agent keeps misreading a feedback signal. Inspection is cheap: worst case you confirm the agent is on track and lose a minute, best case you catch a hallucinated edit before spending ten more minutes on its consequences. If a loop persists even after you roll back to a clean workspace, the prompt itself is the problem, not the state, and it is time to start over with a tighter prompt.

Q: When should I abandon an agent run and start over instead of debugging?

Kill the run when the prompt is no longer salvageable. Specific signals: inspection output disagrees with reality, such as git status showing five changed files while the agent insists it changed two, which means the context is corrupted. Loops that persist after a clean rollback, indicating the prompt rather than the state is wrong. Re-scoping that keeps failing after about three attempts, meaning your original framing is off and you should write a new spec. Hallucinated tool output, like a test reported passing that is actually red — stop immediately, because continuing compounds the drift. And context window exhaustion on long runs, where re-prompting just prompts a different agent with less memory. Abandoning is not failure: a ten-minute wasted run followed by a two-minute clean run is cheaper than thirty minutes of salvage.

Q: What are common anti-patterns when debugging AI agents?

The post lists several. Retrying without inspection throws away the information the failed run produced — inspect first. Repeating the same prompt after a failure with no scope change just reproduces the failure; if two attempts fail, the task is too big, so narrow it. Punishing the agent with messages like 'You are failing, do better' carries zero instruction content — say what to do, not how to feel. Blanket 'be more careful' prompts do not change the input; change the scope, constraints, or the evidence the agent reads instead. Rolling back without naming a target state leaves the agent guessing — name the commit or file state explicitly. And debugging past the token budget blows through context, when a fresh run with the lesson encoded would be cheaper than a fifth round of salvage.

Q: How can I prevent an AI agent from hallucinating progress?

The cheapest defense is to end every run with 'run the tests and show me the output,' forcing the agent to ground its claim in something verifiable. Hallucinated progress looks like success reports without supporting evidence — 'the test now passes' with no test run, or a file reported created that does not exist. Always ask for the artifact. Related failure modes to watch for include premature success, where the agent stops after meeting one criterion while ignoring others in the spec; silent scope drift, where it touches files you did not name, which `git diff --name-only` catches in one command; and confidence without evidence, where an explanation sounds right but does not connect to the actual code. Inspection prompts and a mandatory test-output check at the end keep the agent's claims tied to verifiable facts rather than its impression.

Imtiaz Rayhan

When an AI coding agent goes wrong, the instinct is to restart — kill the run, reset the branch, try again. Sometimes that is right. Often it is not. What the agent accumulated — what it tried, what it saw, what failed — is worth more than a fresh context window. Agent debugging prompts extract that information and steer the agent out of the hole. Four categories cover most cases: inspection, rollback, re-scoping, and root-cause.

Why Agents Get Stuck or Wrong

An agent is not a deterministic program; it is a loop that picks actions based on what it sees. Four common failure shapes:

Looping. Same sequence repeated — run tests, read output, edit the same file, run tests — with no progress. Usually a feedback signal the agent keeps misreading.
Missing context. A wrong mental model of the codebase. The agent writes a plausible patch against code that does not exist, or calls a function whose signature has changed.
Wrong scope. The task was ambiguous; the agent chose an interpretation you did not want. Competent work on the wrong problem.
Tool failures. Shell, editor, search, or test output that the agent misinterpreted or never saw. Hallucinated progress is the tell.

Each failure wants a different prompt. Treating them all the same ("try again with more detail") works some of the time and wastes tokens the rest. The pillar, The Complete Guide to Prompting AI Coding Agents, covers the broader setup; this post is about what to do once things are off.

Inspection Prompts — "What Did You Just Do?"

The first move when an agent seems off is inspection. You do not know what the agent believes, and guessing is worse than asking. Inspection prompts surface the agent's current state before you decide what to do next.

Three prompts do most of this work:

"Tell me what you did in the last N steps and what you saw." Forces the agent to reconstruct the recent history in plain text. If its summary disagrees with reality — files that were not edited, commands that did not run — you have found the problem.
"What is the current state of the working tree? What files have changed?" Grounds the conversation in something verifiable. git status and git diff are facts; the agent's impression is not.
"What are you seeing in the test output?" Surfaces what the agent actually read. Truncated logs, ANSI codes, or output the agent skimmed past are common.

Inspection is cheap. Worst case, you confirm the agent is on track and lose a minute. Best case, you catch a hallucinated edit before spending ten more minutes on its consequences. It is also the step most users skip — the pressure when a run goes sideways is to do something. Inspection feels slow and is faster than any alternative built on a wrong premise.

Rollback Prompts — "Undo to Last Good State"

Once the run is not salvageable as-is, rollback is the next step. The goal is to get back to a known-good state cleanly.

Rollback prompts come in three granularities:

Scope	Prompt shape	When to use
Single step	"Revert the last edit to `file.ts` and stop."	One bad change you can point at
Recent changes	"Discard changes to the last 3 files you touched."	Agent drifted into files it should not have
Full reset	"Discard all uncommitted changes and return to commit `abc123`."	Unsure what is contaminated; want a clean slate

A few rules that make rollback safer:

Name the end state, not just "undo." The agent should know exactly where it is supposed to land.
Commit before risky runs. A pre-run commit turns any rollback into git reset or git checkout. Without one, rollback depends on the agent's memory of what it changed — unreliable past a few steps.
Rollback before re-prompting. Fixing a mess in-place usually produces a worse mess. Clean state first, then re-prompt.

Rollback is also the right move when the agent is looping. Loops persist because the agent keeps seeing the same inputs; clearing the workspace breaks the loop at the input layer.

Re-Scoping Prompts — "Narrow the Task"

Many bad runs are scope problems. The agent was asked to do too much, or something that branched into unexpected territory. Re-scoping prompts cut the task until it fits.

Common shapes:

"Stop trying to do X. Focus only on Y." Explicit abandonment of a sub-goal. Useful when the agent keeps trying to finish something that is not working.
"Narrow this to just the change in file.ts. Leave everything else for a separate task." Shrinks scope to one file. Often the fastest way out of a spiraling multi-file change.
"Split this into two steps: first pass the test for case A; handle case B after." Converts a hard task into two easier ones.
"Only make changes that pnpm test validates. Skip any that do not have a test." Uses the test runner as a scope boundary.

Re-scoping is the most underused of the four patterns. The instinct is to add more context to the original task. Often what it needs is less — a smaller ask, a clearer boundary, an explicit "do not do X." For longer work, spec-driven AI coding turns the scope boundary into a document before the run starts, preventing most of what you would otherwise re-scope mid-run.

Root-Cause Prompts — "Why Did That Fail?"

The first three categories navigate out of a bad state. Root-cause prompts understand why the state went bad, so the next run does not repeat it.

The shape is diagnostic:

"Why did step N fail? Walk me through the cause, not the symptom." Separates the observable failure (test red) from the underlying cause (wrong assumption about the data shape).
"What assumption were you making when you wrote that code? Where is it wrong?" Surfaces the agent's mental model. Assumptions hide in generated code; this prompt pulls them out.
"If you tried this again, what would you do differently?" Captures the lesson in the agent's framing — paste it into the next prompt so the agent enters with the lesson absorbed. Similar logic to reflexion prompting.

Root-cause prompts are valuable even after a successful run. A shaky foundation fails in a new way next time. A caution: the agent's root-cause explanation is a hypothesis, not a proof — usually plausible, often wrong in specifics. Treat it like a junior engineer's post-mortem: a starting point, not the final word.

An Example Debugging Session (Hypothetical)

A short, illustrative exchange — paths, commands, and responses are hypothetical.

code

YOU  The migration task has been running for ten minutes and the tests
     are still red. Something is off. What have you done in the last
     five steps, and what is the current state of the working tree?

AGENT Last five steps:
      1. Edited prisma/schema.prisma to add the `user_preferences` table.
      2. Ran `pnpm prisma migrate dev` — failed with "shadow DB not found".
      3. Edited `.env` to add `SHADOW_DATABASE_URL`.
      4. Re-ran the migration — same error.
      5. Edited `prisma/schema.prisma` again to rename the table.
      Current `git diff` shows changes in prisma/schema.prisma, .env,
      and tests/setup.ts.

YOU  You should not be editing .env or tests/setup.ts for this task.
     Discard changes to those two files. Keep schema.prisma.

AGENT Done. `git status` now shows only prisma/schema.prisma modified.

YOU  The shadow DB issue is a local setup problem, not a migration
     problem. Stop trying to run `migrate dev`. Narrow the task to
     just: edit prisma/schema.prisma so the schema is correct for the
     new table. I will run the migration manually after.

AGENT Understood. Focused only on schema.prisma. The current version
     adds `user_preferences` with columns... [continues]

Four things happened: inspection, rollback (".env and tests/setup.ts"), re-scoping ("stop trying to run migrate"), and an implicit root-cause observation. Each was short. None required restarting the run.

When to Abandon and Restart

Sometimes the right move is to kill the run. Signals the prompt is not salvageable:

Inspection output disagrees with reality. If git status shows five changed files and the agent insists it changed two, the context is corrupted. Start over.
Loops persist after rollback. If the same loop reappears after a clean workspace, the prompt — not the state — is the problem.
Re-scoping keeps failing. If three attempts to narrow the task fail, your original framing is off. Write a new spec and run again.
Hallucinated tool output. Agent reports a test passing that is actually red, a file created that does not exist. Stop immediately; continuing compounds the drift.
Context window exhaustion. Long runs lose earlier context. Past a point, re-prompting is just prompting a different agent with less memory.

Abandoning is not failure. A ten-minute wasted run followed by a two-minute clean run is cheaper than thirty minutes of salvage.

Recognizing Agent Failure Modes

Some failure modes are hard to see without knowing what to look for:

Hallucinated progress. Success reports without supporting evidence — "the test now passes" without a test run. Always ask for the artifact.
Premature success. Agent stops when one criterion is met, ignoring others in the spec. Shows up when acceptance criteria are loose.
Infinite loops. Run → read output → same edit → run. Usually the agent is misreading a feedback signal.
Silent scope drift. Agent solves a bigger problem than you asked for, touching files you did not name. git diff --name-only catches this in one command.
Confidence without evidence. Explanations that sound right but do not connect to the code — "this fixes it because X" where X is not in the file.

The cheapest defense is ending every run with "run the tests and show me the output" — force the agent to ground its claim in something verifiable.

Common Anti-Patterns

Retry without inspection. Re-running "try again" throws away the information the failed run produced. Fix: inspect first.
No scope change on repeat failure. Same prompt, same failure. Fix: if two attempts fail, the task is too big — narrow it.
Punishing the agent. "You are failing. Do better." Zero instruction content. Fix: say what to do, not how to feel about failing.
Blanket "do better" prompts. "Be more careful" does not change the input. Fix: change scope, constraints, or the evidence the agent reads.
Rollback without a target state. "Undo your changes" leaves the agent guessing. Fix: name the commit or file state explicitly.
Debugging past the token budget. Long sessions blow through context. A fresh run with the lesson encoded is cheaper than round five.

FAQ

How do I know if an agent is looping or just taking a while?

Ask it. "What are you working on, and what is your next step?" A productive run has a clear next step; a looping run has no answer or repeats itself. When in doubt, stop and inspect.

Can I automate any of this?

Parts of it. Some harnesses enforce "commit before each task" and "show git diff after each step" automatically. What is hard to automate is the judgment — deciding whether to re-scope, abandon, or trust the explanation. The prompts are light; the decisions are not.

Does this apply to non-coding agents?

The categories carry over — inspection, rollback, re-scoping, root-cause are generic. The specifics (git diff, test output) are coding-specific. For a research agent, rollback means discarding a line of investigation; for an ops agent, undoing a config change. See the agentic AI glossary entry for how the category generalizes.

How many debugging rounds is too many?

Three is a reasonable ceiling per task. Past that, context overhead and odds of a wrong original framing both compound. Kill the run, write a tighter prompt, start fresh. The next run will be cheaper than the fourth round of salvage.

What if the agent refuses to discard its changes?

Discard them yourself. git reset --hard and git clean -fd do not need the agent's cooperation. If the agent keeps re-creating the changes, the prompt still has them as an implicit goal — fix the prompt, not the diff.

Agent Debugging Prompts: Fixing Stuck or Wrong Agents (2026)

Why Agents Get Stuck or Wrong

Inspection Prompts — "What Did You Just Do?"

Rollback Prompts — "Undo to Last Good State"

Re-Scoping Prompts — "Narrow the Task"

Root-Cause Prompts — "Why Did That Fail?"

An Example Debugging Session (Hypothetical)

When to Abandon and Restart

Recognizing Agent Failure Modes

Common Anti-Patterns

FAQ

How do I know if an agent is looping or just taking a while?

Can I automate any of this?

Does this apply to non-coding agents?

How many debugging rounds is too many?

What if the agent refuses to discard its changes?

AI prompts built for developers

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

The Complete Guide to Prompting AI Coding Agents (2026)

Spec-Driven AI Coding: Writing Specs Agents Execute Well (2026)

Autonomous Testing With AI Agents (2026)