Devin AI Prompting Guide (2026)

Q: How should I structure a Devin work order prompt?

A good Devin prompt is a compact spec — written well, it fits a scroll's worth of text. It carries six things. A goal: one paragraph stating what needs to be true when the session ends. Context: where the code lives, prior state, and cross-cutting constraints. Acceptance criteria: a numbered list of checkable conditions, such as commands that should pass, files that should change, and behavior the agent can verify. Scope and non-goals: what's in, what's out, and which files must not be touched. Environment hints: install, test, and build commands, with credentials living in configuration rather than the prompt. And a closing step: what to produce and whether to stop or ask for review. Everything you leave out, the agent infers — and expensive inferences like rebuilding an environment or rewriting in a style your team doesn't use are exactly what derail a run.

Q: How do I stop Devin from editing files I didn't want changed?

Scope is the biggest lever on session cost and quality, because autonomous agents with no boundary expand scope — tidying a nearby file, upgrading a stale dependency, or adding tests for code they didn't touch. Use three layers. First, list out-of-scope files and folders explicitly in the prompt, for example "do not modify supabase/migrations/*" or "do not add dependencies." Second, require git diff --name-only in the closing report so any stray edits become visible. Third, review the plan before execution — if it mentions files you marked off-limits, steer before the run starts. Layered constraints beat any single one. Scope is the agent's biggest failure mode precisely because the model believes it's being helpful when it expands, so you don't talk it out of that by asking nicely; you talk it out by listing files.

Q: What should I do if Devin gets stuck in a loop?

Stop and intervene rather than letting it keep grinding, because you pay for every retry of a failing approach. A stuck agent is usually either missing context — a credential, a doc link, or a design decision — or working on the wrong approach entirely. So either unblock it by supplying the missing piece and continuing, or restart with a revised prompt that names the right approach ("the approach should be X, not Y") rather than hoping it recovers on its own. A long wander is a signal to restart, not to wait. More broadly, the temptation in any long session is to let the agent keep trying, but the right move is to intervene earlier than feels polite — at natural checkpoints like a failing test it keeps retrying, a dependency decision, or a point where it asks for input.

Q: Does Devin replace code review?

No. Devin produces a change, but you still read it. Autonomy shifts your attention from writing the code to specifying it and reviewing the result — but the review itself does not go away. For load-bearing changes, you should diff-review carefully and run the acceptance commands yourself rather than trusting that the agent verified everything. This is consistent with how the whole workflow is framed: the prompt carries goal, context, acceptance criteria, and scope up front, the plan review catches misdirection before the run goes long, and human diff-review confirms the result at the end. The autonomy is the product, but it makes specification and review more important, not less, since the agent is acting on your prompt for an entire unattended run.

Imtiaz Rayhan

Prompting Devin is closer to writing a work order than chatting. Devin, from Cognition, is positioned as an autonomous AI software engineer: it plans, executes, debugs, and verifies across many tool calls in a sandboxed environment. The autonomy is the product, and the prompt is where you set the run up to succeed or to drift. Specificity is cheap here; ambiguity is expensive in time and tokens.

What Devin Is

Devin is an autonomous coding agent that operates in its own sandboxed cloud workspace — shell, browser, editor — rather than inside your local IDE. You describe a task; Devin plans, executes, observes results, and iterates until it believes the task is done or needs your input.

Because the loop is long and mostly unattended, the prompt does more work than in chat. In chat, the model waits for your next turn. In Devin, the model is running on what you gave it plus what it inferred. Inferred context is where runs go sideways.

See the pillar: The Complete Guide to Prompting AI Coding Agents. For the category, see agentic AI; for the mechanic, tool use.

How Prompting Devin Differs From Chat and From In-IDE Agents

Coding agents sit on a spectrum of autonomy. Chat AIs answer a turn and stop. In-IDE agents edit files while you watch. Terminal-native agents like Claude Code run longer loops in your active session. Devin sits at the far end: a session is something you kick off and come back to.

Dimension	Chat AI	In-IDE agent	Devin
Unit of work	A turn	An edit or a short multi-file change	A session (long, multi-step)
Who watches	You, every turn	You, at edit time	You, at checkpoints
Environment	None	Your editor and repo	Sandboxed cloud workspace
Typical prompt shape	A question	A targeted ask with file pointers	A work order with acceptance criteria
Cost of under-specifying	Next turn clarifies	A wrong edit	Minutes of tool calls spent the wrong way
Cost of over-specifying	A verbose prompt	Redundant context	Still cheap — autonomy rewards specificity

The practical consequence: Devin prompts should read like specs, not questions. "Why is the login flow flaky?" gives no acceptance criteria, no scope, no stop condition — and you pay for the fuzziness in session time. See the Cursor AI prompting guide and the Replit Agent prompting guide for siblings.

Session Structure — What You Are Actually Prompting

A Devin session unfolds across many tool calls: reading files, running commands, browsing the web, editing code, running tests. You are not prompting a response — you are seeding a loop. That reframe changes what the prompt needs to carry:

The goal. What "done" looks like in one or two sentences.
The context. Where the code lives, what environment exists, what conventions apply.
Acceptance criteria. Checkable conditions — commands that should pass, files that should change, behavior the agent can verify.
Scope and non-goals. What is in and out. Files not to touch.
Environment hints. Install, test, build commands. Credentials live in configuration, not the prompt.
Stop conditions. When to hand back, ask, or stop trying.

Everything you leave out, the agent infers. Cheap inferences — reading a README. Expensive ones — rebuilding an environment, running tests that need un-configured services, or rewriting in a style your team does not use. See spec-driven AI coding.

Writing a Good Devin Work Order

A good Devin prompt is a compact spec:

Goal — one paragraph. What needs to be true when the session ends.
Context — where the code is, prior state, cross-cutting constraints.
Acceptance criteria — a numbered list of checkable conditions.
Scope and non-goals — what is in, what is out, what must not be touched.
Environment hints — how to set up, run, verify.
Closing step — what to produce and whether to stop or ask for review.

Same shape as plan-and-execute prompting: the agent plans against the spec and executes against the criteria. Written well, it fits a scroll's worth of text.

Plan Review — The First Checkpoint

Devin typically proposes a plan before executing — a decomposition of the goal into steps. Treat this as the most valuable checkpoint in the session. A bad plan runs for an hour before you notice; a reviewed plan catches the drift in a minute.

What to look for:

Does the plan match the goal? Unasked-for steps mean the agent has misunderstood the scope.
Are acceptance criteria present as verification steps? If the plan skips your checks, it will not reliably hit them.
Is the plan touching files you marked out of scope? Catch this now.
Are assumptions visible? If the agent's guesses about the repo are wrong, correct them before execution starts.
Is the plan specific enough to verify? "Fix the bug" is not a plan. "Reproduce the failure with a test, then modify X so the test passes" is.

If the plan is good, accept it. If close, steer. If off, rewrite — a fresh plan against a refined prompt beats patching a confused one. Plan-review affordances evolve across versions; check current Devin docs for specifics. The principle — review before the session runs long — does not change.

Handling Checkpoints — Mid-Session Steering

A long session has natural checkpoints: the plan, a failing test the agent keeps retrying, a dependency decision, a point where the agent asks for input. Small nudges here save a lot of wasted work.

Redirect. "Use the existing apiClient in lib/api/client.ts instead of a new HTTP wrapper."
Narrow. "Scope this session to unit tests; open a follow-up for integration tests."
Unblock. Supply the missing piece — credential, doc link, design decision — and continue.
Replan. If the plan is wrong rather than the execution, ask for a new plan. Do not patch a broken approach step by step.
Stop. If the session is not converging, stop. Refine the prompt and restart. A cheap restart beats an expensive wander.

The temptation is to let the agent keep trying. The right move is to intervene earlier than feels polite.

Setting Scope Boundaries — Explicit Don'ts and Budgets

Scope is the biggest lever on session cost and quality. Autonomous agents with no boundary will expand scope — tidy a nearby file, upgrade a stale dependency, add tests for code they did not touch. Often the difference between a clean diff and a sprawling one.

Name what is in scope. A directory, a file set, a specific function.
Name what is out of scope. "Do not modify supabase/migrations/*." "Do not add dependencies." "Do not touch app/admin/*."
Budget the work. "Prefer a minimal change." "If the fix requires edits outside lib/auth/, stop and surface it."
Require a summary. git diff --name-only in the final report, plus a line on anything the agent wanted to change but did not.

Scope is the agent's biggest failure mode because the model believes it is being helpful when it expands. You do not talk it out of that by asking nicely; you talk it out by listing files.

Work-Order Prompt Example

A concrete Devin work order against a hypothetical repo. Replace paths and commands with your own — this is a shape, not a template.

code

GOAL
  Add a rate limiter to the public signup endpoint. Behavior should match
  the login endpoint's existing rate-limit wrapper. No schema changes.

CONTEXT
  Repo: github.com/acme/acme-web (branch: feature/signup-rate-limit)
  Stack: TypeScript (strict), Next.js 15 App Router, Supabase, Jest.
  Reference implementation: app/api/login/route.ts uses `withRateLimit`
    from lib/rate-limit.ts. Apply the same wrapper to signup.
  Identifier for rate-limiting: IP + email (same as login).

ACCEPTANCE CRITERIA
  1. app/api/signup/route.ts returns HTTP 429 when the rate limit is hit.
  2. Existing signup behavior (201 on success, validation errors on bad
     input) is unchanged.
  3. `npm run typecheck` passes.
  4. `npm run lint` passes.
  5. `npm test` passes, including a new unit test for the signup
     rate-limit path modeled on the login test.
  6. `git diff --name-only` shows only:
       app/api/signup/route.ts
       app/api/signup/route.test.ts
  7. A short summary of the change and commands run.

SCOPE AND NON-GOALS
  In scope:  the signup route and its test file.
  Out of scope: any other route, the rate limiter itself (lib/rate-limit.ts),
    database schema, and dependencies. Do not install new packages.

ENVIRONMENT HINTS
  Install:   `npm ci`
  Test:      `npm test`
  Typecheck: `npm run typecheck`
  Lint:      `npm run lint`
  Env vars needed: SUPABASE_URL, SUPABASE_ANON_KEY (present in the sandbox
    via the configured secrets — do not echo them).

CLOSING STEP
  Open a PR against `staging` titled "feat: rate limit signup endpoint".
  Include the diff summary and command outputs in the PR description.
  Do not merge. Stop after opening the PR.

Every section closes a gap the agent would otherwise guess at: runnable acceptance, explicit scope, clear stop.

Bad vs. Good — Before and After

A bad Devin prompt: "Add rate limiting to signup. Make sure login still works." The agent has to discover the existing rate limiter, pick an identifier strategy, decide what tests to write, and guess whether to open a PR. The good prompt — the work order above — makes those decisions up front. Ten times longer. The session is shorter, cleaner, reviewable.

Common Anti-Patterns

Treating the prompt like a chat turn. A one-liner starts a one-hour session with nothing to steer against. Fix: write a work order with goal, acceptance, scope, closing step.
Skipping the plan review. The plan is the cheapest intervention point in the session. Fix: read every plan before accepting.
No explicit out-of-scope list. The agent tidies things you did not ask it to. Fix: list the files and folders it must not touch.
Unverifiable acceptance. "Make it work" gives the agent nothing to check. Fix: name commands, greps, or file lists that prove success.
Letting a stuck session keep grinding. You pay for every retry of a failing approach. Fix: stop, redirect, or restart with a refined prompt.
Secrets in the prompt. Credentials belong in the sandbox's secret store. Fix: reference env var names; keep values out.

FAQ

How is a Devin session different from a chat conversation?

A chat conversation is a sequence of turns you watch. A Devin session is a long-running loop in a sandbox, usually while you are doing something else. The prompt has to carry goal, context, acceptance criteria, and scope up front, because the agent is working against that prompt for the whole run — not your next message.

Should I accept the plan Devin proposes, or rewrite it?

Read it first. If it matches your goal and ties verification to your acceptance criteria, accept it. If close, steer with a revision. If the shape is wrong — missing verification, out-of-scope work, bad assumptions — refine the prompt and start over. Plan review is the highest-leverage checkpoint.

How do I stop Devin from editing files I did not want changed?

Three layers. List out-of-scope files and folders in the prompt. Require git diff --name-only in the closing report so stray edits are visible. Review the plan before execution — if it mentions files you marked off, steer before the run starts. Layered constraints beat any single one.

What if Devin gets stuck in a loop?

Stop and intervene. A stuck agent is usually missing context (a credential, a doc, a decision) or working on the wrong approach. Unblock it or restart with a revised prompt — "the approach should be X, not Y" — rather than hoping it recovers. Long wanders are a signal to restart, not to wait.

Does Devin replace code review?

No. Devin produces a change; you still read it. Autonomy shifts your attention from writing the code to specifying it and reviewing the result, but the review does not go away. For load-bearing changes, diff-review carefully and run the acceptance commands yourself. See spec-driven AI coding and the pillar guide.