Prompting Devin is closer to writing a work order than chatting. Devin, from Cognition, is positioned as an autonomous AI software engineer: it plans, executes, debugs, and verifies across many tool calls in a sandboxed environment. The autonomy is the product, and the prompt is where you set the run up to succeed or to drift. Specificity is cheap here; ambiguity is expensive in time and tokens.
What Devin Is
Devin is an autonomous coding agent that operates in its own sandboxed cloud workspace — shell, browser, editor — rather than inside your local IDE. You describe a task; Devin plans, executes, observes results, and iterates until it believes the task is done or needs your input.
Because the loop is long and mostly unattended, the prompt does more work than in chat. In chat, the model waits for your next turn. In Devin, the model is running on what you gave it plus what it inferred. Inferred context is where runs go sideways.
See the pillar: The Complete Guide to Prompting AI Coding Agents. For the category, see agentic AI; for the mechanic, tool use.
How Prompting Devin Differs From Chat and From In-IDE Agents
Coding agents sit on a spectrum of autonomy. Chat AIs answer a turn and stop. In-IDE agents edit files while you watch. Terminal-native agents like Claude Code run longer loops in your active session. Devin sits at the far end: a session is something you kick off and come back to.
| Dimension | Chat AI | In-IDE agent | Devin |
|---|---|---|---|
| Unit of work | A turn | An edit or a short multi-file change | A session (long, multi-step) |
| Who watches | You, every turn | You, at edit time | You, at checkpoints |
| Environment | None | Your editor and repo | Sandboxed cloud workspace |
| Typical prompt shape | A question | A targeted ask with file pointers | A work order with acceptance criteria |
| Cost of under-specifying | Next turn clarifies | A wrong edit | Minutes of tool calls spent the wrong way |
| Cost of over-specifying | A verbose prompt | Redundant context | Still cheap — autonomy rewards specificity |
The practical consequence: Devin prompts should read like specs, not questions. "Why is the login flow flaky?" gives no acceptance criteria, no scope, no stop condition — and you pay for the fuzziness in session time. See the Cursor AI prompting guide and the Replit Agent prompting guide for siblings.
Session Structure — What You Are Actually Prompting
A Devin session unfolds across many tool calls: reading files, running commands, browsing the web, editing code, running tests. You are not prompting a response — you are seeding a loop. That reframe changes what the prompt needs to carry:
- The goal. What "done" looks like in one or two sentences.
- The context. Where the code lives, what environment exists, what conventions apply.
- Acceptance criteria. Checkable conditions — commands that should pass, files that should change, behavior the agent can verify.
- Scope and non-goals. What is in and out. Files not to touch.
- Environment hints. Install, test, build commands. Credentials live in configuration, not the prompt.
- Stop conditions. When to hand back, ask, or stop trying.
Everything you leave out, the agent infers. Cheap inferences — reading a README. Expensive ones — rebuilding an environment, running tests that need un-configured services, or rewriting in a style your team does not use. See spec-driven AI coding.
Writing a Good Devin Work Order
A good Devin prompt is a compact spec:
- Goal — one paragraph. What needs to be true when the session ends.
- Context — where the code is, prior state, cross-cutting constraints.
- Acceptance criteria — a numbered list of checkable conditions.
- Scope and non-goals — what is in, what is out, what must not be touched.
- Environment hints — how to set up, run, verify.
- Closing step — what to produce and whether to stop or ask for review.
Same shape as plan-and-execute prompting: the agent plans against the spec and executes against the criteria. Written well, it fits a scroll's worth of text.
Plan Review — The First Checkpoint
Devin typically proposes a plan before executing — a decomposition of the goal into steps. Treat this as the most valuable checkpoint in the session. A bad plan runs for an hour before you notice; a reviewed plan catches the drift in a minute.
What to look for:
- Does the plan match the goal? Unasked-for steps mean the agent has misunderstood the scope.
- Are acceptance criteria present as verification steps? If the plan skips your checks, it will not reliably hit them.
- Is the plan touching files you marked out of scope? Catch this now.
- Are assumptions visible? If the agent's guesses about the repo are wrong, correct them before execution starts.
- Is the plan specific enough to verify? "Fix the bug" is not a plan. "Reproduce the failure with a test, then modify
Xso the test passes" is.
If the plan is good, accept it. If close, steer. If off, rewrite — a fresh plan against a refined prompt beats patching a confused one. Plan-review affordances evolve across versions; check current Devin docs for specifics. The principle — review before the session runs long — does not change.
Handling Checkpoints — Mid-Session Steering
A long session has natural checkpoints: the plan, a failing test the agent keeps retrying, a dependency decision, a point where the agent asks for input. Small nudges here save a lot of wasted work.
- Redirect. "Use the existing
apiClientinlib/api/client.tsinstead of a new HTTP wrapper." - Narrow. "Scope this session to unit tests; open a follow-up for integration tests."
- Unblock. Supply the missing piece — credential, doc link, design decision — and continue.
- Replan. If the plan is wrong rather than the execution, ask for a new plan. Do not patch a broken approach step by step.
- Stop. If the session is not converging, stop. Refine the prompt and restart. A cheap restart beats an expensive wander.
The temptation is to let the agent keep trying. The right move is to intervene earlier than feels polite.
Setting Scope Boundaries — Explicit Don'ts and Budgets
Scope is the biggest lever on session cost and quality. Autonomous agents with no boundary will expand scope — tidy a nearby file, upgrade a stale dependency, add tests for code they did not touch. Often the difference between a clean diff and a sprawling one.
- Name what is in scope. A directory, a file set, a specific function.
- Name what is out of scope. "Do not modify
supabase/migrations/*." "Do not add dependencies." "Do not touchapp/admin/*." - Budget the work. "Prefer a minimal change." "If the fix requires edits outside
lib/auth/, stop and surface it." - Require a summary.
git diff --name-onlyin the final report, plus a line on anything the agent wanted to change but did not.
Scope is the agent's biggest failure mode because the model believes it is being helpful when it expands. You do not talk it out of that by asking nicely; you talk it out by listing files.
Work-Order Prompt Example
A concrete Devin work order against a hypothetical repo. Replace paths and commands with your own — this is a shape, not a template.
GOAL
Add a rate limiter to the public signup endpoint. Behavior should match
the login endpoint's existing rate-limit wrapper. No schema changes.
CONTEXT
Repo: github.com/acme/acme-web (branch: feature/signup-rate-limit)
Stack: TypeScript (strict), Next.js 15 App Router, Supabase, Jest.
Reference implementation: app/api/login/route.ts uses `withRateLimit`
from lib/rate-limit.ts. Apply the same wrapper to signup.
Identifier for rate-limiting: IP + email (same as login).
ACCEPTANCE CRITERIA
1. app/api/signup/route.ts returns HTTP 429 when the rate limit is hit.
2. Existing signup behavior (201 on success, validation errors on bad
input) is unchanged.
3. `npm run typecheck` passes.
4. `npm run lint` passes.
5. `npm test` passes, including a new unit test for the signup
rate-limit path modeled on the login test.
6. `git diff --name-only` shows only:
app/api/signup/route.ts
app/api/signup/route.test.ts
7. A short summary of the change and commands run.
SCOPE AND NON-GOALS
In scope: the signup route and its test file.
Out of scope: any other route, the rate limiter itself (lib/rate-limit.ts),
database schema, and dependencies. Do not install new packages.
ENVIRONMENT HINTS
Install: `npm ci`
Test: `npm test`
Typecheck: `npm run typecheck`
Lint: `npm run lint`
Env vars needed: SUPABASE_URL, SUPABASE_ANON_KEY (present in the sandbox
via the configured secrets — do not echo them).
CLOSING STEP
Open a PR against `staging` titled "feat: rate limit signup endpoint".
Include the diff summary and command outputs in the PR description.
Do not merge. Stop after opening the PR.
Every section closes a gap the agent would otherwise guess at: runnable acceptance, explicit scope, clear stop.
Bad vs. Good — Before and After
A bad Devin prompt: "Add rate limiting to signup. Make sure login still works." The agent has to discover the existing rate limiter, pick an identifier strategy, decide what tests to write, and guess whether to open a PR. The good prompt — the work order above — makes those decisions up front. Ten times longer. The session is shorter, cleaner, reviewable.
Common Anti-Patterns
- Treating the prompt like a chat turn. A one-liner starts a one-hour session with nothing to steer against. Fix: write a work order with goal, acceptance, scope, closing step.
- Skipping the plan review. The plan is the cheapest intervention point in the session. Fix: read every plan before accepting.
- No explicit out-of-scope list. The agent tidies things you did not ask it to. Fix: list the files and folders it must not touch.
- Unverifiable acceptance. "Make it work" gives the agent nothing to check. Fix: name commands, greps, or file lists that prove success.
- Letting a stuck session keep grinding. You pay for every retry of a failing approach. Fix: stop, redirect, or restart with a refined prompt.
- Secrets in the prompt. Credentials belong in the sandbox's secret store. Fix: reference env var names; keep values out.
FAQ
How is a Devin session different from a chat conversation?
A chat conversation is a sequence of turns you watch. A Devin session is a long-running loop in a sandbox, usually while you are doing something else. The prompt has to carry goal, context, acceptance criteria, and scope up front, because the agent is working against that prompt for the whole run — not your next message.
Should I accept the plan Devin proposes, or rewrite it?
Read it first. If it matches your goal and ties verification to your acceptance criteria, accept it. If close, steer with a revision. If the shape is wrong — missing verification, out-of-scope work, bad assumptions — refine the prompt and start over. Plan review is the highest-leverage checkpoint.
How do I stop Devin from editing files I did not want changed?
Three layers. List out-of-scope files and folders in the prompt. Require git diff --name-only in the closing report so stray edits are visible. Review the plan before execution — if it mentions files you marked off, steer before the run starts. Layered constraints beat any single one.
What if Devin gets stuck in a loop?
Stop and intervene. A stuck agent is usually missing context (a credential, a doc, a decision) or working on the wrong approach. Unblock it or restart with a revised prompt — "the approach should be X, not Y" — rather than hoping it recovers. Long wanders are a signal to restart, not to wait.
Does Devin replace code review?
No. Devin produces a change; you still read it. Autonomy shifts your attention from writing the code to specifying it and reviewing the result, but the review does not go away. For load-bearing changes, diff-review carefully and run the acceptance commands yourself. See spec-driven AI coding and the pillar guide.