Spec-Driven AI Coding: Writing Specs Agents Execute Well (2026)

Q: What sections does a good spec for an AI agent need?

A complete spec typically has five sections: a user story or goal (what outcome this change produces and for whom), acceptance criteria (concrete, verifiable conditions that are true when the change is done), out-of-scope items (tempting adjacent work the agent should not touch), constraints (technical and non-technical boundaries like stack, compatibility, budget, and conventions), and context links (pointers to the files, docs, or tickets the agent should read first). The first three — user story, acceptance criteria, and out-of-scope — are the irreducible core. Drop any of them and you are relying on the agent's defaults, which are better than they used to be but still not your defaults.

Q: When is spec-driven coding overkill?

Spec-driven coding is a discipline, not a dogma, and there are cases where the overhead exceeds the cost of a bad run. Skip the spec for tiny changes like renaming a variable, fixing a typo, or adjusting a config value, where a one-line prompt is faster and no worse. Skip it for one-liners such as 'remove the console.log on line 42,' which do not need a user story. Skip it for tight-loop debugging, where hypothesis-and-response cycling happens in seconds and stopping to write a spec breaks the loop. And skip it for exploratory work, where you do not yet know what the change should be. The rule of thumb: if the run takes less time than writing the spec, skip the spec; if the run is autonomous, hard to restart, or touches files you cannot easily review, write it.

Q: Is spec-driven AI coding the same as formal spec-driven development?

No. Formal methods such as TLA+ and model checkers prove properties mathematically. Spec-driven AI coding borrows the word but not the rigor. The spec here is closer to a tightened PR description: enough structure to be reviewable, not a formal proof. The shape itself is not novel — engineering teams have written specs for decades — spec-driven AI coding borrows that shape and tightens it around what an agent specifically needs to execute a change well. The goal is a reviewable artifact that catches misunderstandings before the agent runs, not mathematical certainty about program behavior.

Imtiaz Rayhan

Spec-driven AI coding treats the specification — not the chat prompt — as the primary artifact you write. You invest time in a precise spec that names the user story, acceptance criteria, out-of-scope items, and constraints, and the agent executes it. The spec is reviewable before any code runs, reusable across agents, and version-controllable. As agents get more autonomous and runs get longer, the spec — not the conversation — becomes the leverage point.

Why Specs Beat Chats for Autonomous Agents

A chat prompt is ephemeral. You type, the agent responds, both scroll away. Fine when the unit of work is a suggestion you accept in a second. Not fine when it is a twenty-minute autonomous run that edits a dozen files.

A spec is the opposite shape:

Reviewable before the agent runs. You read it like a PR description, catch the misunderstanding, and fix it before a single file changes. A chat prompt only reveals misunderstandings in the output.
Reusable across agents. The same spec can seed Claude Code, GitHub Copilot Workspace, Cursor, or an autonomous agent with different tooling. A chat transcript optimized for one tool does not transfer.
Version-controllable. A spec lives in a file — in the repo, an issue, or a doc. You can diff it, comment on it, link it from the PR it produced.
Cheaper to iterate on. Editing a spec costs seconds. Re-running an agent to patch a misunderstanding costs minutes and tokens.

The shift is from conversation as the primary interface to artifact as the primary interface. The conversation still happens, but around the spec instead of replacing it. See the pillar, The Complete Guide to Prompting AI Coding Agents. For the category, see agentic AI.

Anatomy of a Good Spec

The shape is not novel. Engineering teams have written specs for decades; spec-driven AI coding borrows the shape and tightens it around what an agent needs. A complete spec for an agent typically has five sections:

User story or goal. What outcome does this change produce, and for whom?
Acceptance criteria. Concrete, verifiable conditions that are true when the change is done.
Out of scope. Tempting adjacent work the agent should not touch.
Constraints. Technical and non-technical boundaries — stack, compatibility, budget, conventions.
Context links. Pointers to files, docs, or tickets the agent should read before starting.

The first three are the irreducible core. Drop any of them and you are relying on the agent's defaults, which are better than they used to be but still not your defaults.

Where Specs Live

There is no single right answer, but there are useful defaults:

Location	Best for	Trade-off
Inline in the prompt	One-off tasks, exploratory work	Not reusable, no review trail
`CLAUDE.md` or project context file	Stable conventions, repo-wide constraints	Too slow-moving for task-level specs
Issue tracker (GitHub, Linear)	Task-level specs for features and fixes	Requires tooling that reads the tracker
Dedicated spec doc in the repo	Larger features, multi-step work	Review overhead scales with spec size
PR template / scratch doc	Solo work with no tracker	Gets stale; hard to find later

Most teams settle on a split: repo-wide conventions in CLAUDE.md, task-level specs in the issue tracker or a dedicated doc, inline prompts for tiny changes. Match the lifetime of the spec to the lifetime of the artifact it lives in.

Writing Acceptance Criteria

Acceptance criteria are where most specs fail. The common failure is writing "it works" or "the feature functions correctly" — both unverifiable. An acceptance criterion is verifiable or it is not a criterion.

Three marks of a good one:

Concrete. Names a specific test command, file, behavior, or output. "pnpm test auth/session.test.ts passes" beats "tests pass."
Bounded. Says what must be true, not everything that could be true. "Returns 401 on an expired token" is tighter than "the auth flow is correct."
Checkable without the agent. You — or CI — can verify it independently. If the only way to know it is done is to ask the agent, it is not a criterion.

A weak spec says: "Refresh the session token before expiry." A strong spec lists: "When the token is within 60 seconds of expiry, a new token is fetched before the next API call; pnpm test lib/auth/refresh.test.ts passes; no changes outside lib/auth/; the signature of getSession() is unchanged." The second is slightly longer and dramatically harder to misinterpret.

Explicitly Naming Out-of-Scope

Agents drift. The training distribution rewards helpful-but-unasked-for work — renaming variables while fixing a bug, modernizing a pattern while adding a feature, touching unrelated files because "they were there." Some is useful; most is scope creep that makes the diff harder to review.

Out-of-scope sections are cheap insurance. A few lines that say "do not touch these files" keep the run bounded. Wording matters:

"Do not edit migrations." — clear.
"Avoid changes to migrations unless necessary." — invitation to decide it is necessary.
"Migrations are owned by a different team and must not be touched." — unambiguous.

The list doubles as a review checklist. If git diff --name-only shows a file from it, the run violated the spec.

Stating Constraints

Constraints are the non-behavioral facts the agent needs. They do not describe what the feature does; they describe the world it must fit into.

Useful ones to name:

Tech stack. Language version, framework, test runner, package manager. Stating them up front avoids a bad assumption in the first few minutes.
Compatibility. "Must work on Node 18+" steers away from features that will bite you later.
Budget. "Add no new dependencies" is common. So is "no new network calls in the hot path."
Patterns. "Use the existing db client" keeps the change consistent with the codebase.

Constraints are the most commonly skipped section, and also the one that causes the subtlest failures — a PR that works but introduces a dependency you do not want, or uses a pattern the codebase is migrating away from.

Spec Iteration — The First One Is Usually Wrong

First drafts are wrong the same way first drafts of anything are wrong: the writer knows more than the text does. You have context the spec does not name, assumptions it does not state, scope decisions you made without writing down. An agent, which has none of that, executes exactly what is on the page.

Treat the first spec as a draft. Run the agent — or, if the tool supports it, just the spec-to-plan step — and read what comes back. Gaps show up as a plan that touches files you did not expect, criteria the agent weakened because the original was ambiguous, or assumptions surfaced that you did not know you had. The second iteration closes those gaps. Two passes is usually right. See plan-and-execute prompting for the same logic applied to the plan step.

A Good Spec Example (Hypothetical)

A hypothetical spec for a small but non-trivial change, shaped for an autonomous agent. Paths and commands are illustrative.

code

USER STORY
  As a user of the password reset flow, I should receive an error
  if I submit an expired reset token, not a silent redirect to the
  login page that looks like success.

CONTEXT
  - Relevant files:
      app/api/auth/reset/route.ts      (the handler to change)
      lib/auth/tokens.ts               (token validation lives here)
      app/auth/reset/page.tsx          (the client page)
      app/api/auth/reset/route.test.ts (existing test file)
  - The existing flow validates the token, but on failure calls
    `redirect('/login')` instead of returning an error response.

ACCEPTANCE CRITERIA
  1. Expired token returns HTTP 400 with { error: 'token_expired' }.
  2. Invalid token returns HTTP 400 with { error: 'token_invalid' }.
  3. A valid, unexpired token continues to work as before.
  4. The client page displays a human-readable message per case.
  5. `pnpm test app/api/auth/reset/route.test.ts` passes; new tests
     cover cases 1, 2, and 3.
  6. `pnpm typecheck` and `pnpm lint` pass.
  7. `git diff --name-only` shows only the four files listed above.

OUT OF SCOPE
  - Changing how reset tokens are generated or stored.
  - Refactoring `lib/auth/tokens.ts` beyond what is needed.
  - Any changes to the login page or session middleware.
  - Adding a rate limit or lockout (separate task).

CONSTRAINTS
  - No new dependencies.
  - Error messages must be i18n-safe (use the existing `t()` helper).
  - Response shape must match app/api/auth/login/route.ts.

Every section earns its place. The user story explains why; acceptance criteria say what "done" means; out-of-scope keeps the diff small; constraints keep the change consistent; the context list tells the agent where to start.

When Spec-Driven Is Overkill

Spec-driven coding is a discipline, not a dogma. Cases where the overhead exceeds the cost of a bad run:

Tiny changes. Renaming a variable, fixing a typo, adjusting a config value. A one-line prompt is faster and no worse.
One-liners. "Remove the console.log on line 42" does not need a user story.
Tight-loop debugging. Hypothesis and response cycling in seconds. Stopping to write a spec breaks the loop — see the agent debugging prompts guide.
Exploratory work. You do not know what the change should be; the spec will emerge from the exploration.

Rule of thumb: if the run takes less time than writing the spec, skip the spec. If the run is autonomous, hard to restart, or touches files you cannot easily review, write it.

Common Anti-Patterns

"Works correctly" as an acceptance criterion. Unverifiable, so effectively absent. Fix: name a command or behavior that proves it works.
Implicit out-of-scope. You know what the agent should not touch; the spec does not. Fix: write the list. It is usually three lines.
Spec that is actually an implementation. Step-by-step instructions remove the agent's room to think and remove your review surface. Fix: describe the outcome and constraints; let the plan decide the approach.
Constraints as preferences. "Try to avoid new dependencies" is a wish. Fix: "Do not add new dependencies."
No context links. The agent re-discovers the codebase every run. Fix: list the three to five files it must read first.
Editing the spec mid-run. The agent has already planned against the old one; changes create drift. Fix: stop the run, edit, restart.

FAQ

Is spec-driven AI coding the same as formal spec-driven development?

No. Formal methods — TLA+, model checkers — prove properties mathematically. Spec-driven AI coding borrows the word but not the rigor. The spec here is closer to a tightened PR description: enough structure to be reviewable, not a formal proof.

Where should I put the spec — inline, in a file, or in the issue tracker?

Depends on reuse. One-off task, inline is fine. Team task or something you will review, the issue tracker or a dedicated doc travels better. Repo-wide conventions belong in CLAUDE.md or the equivalent, not in every prompt.

How do I write acceptance criteria when the task is exploratory?

You usually cannot, and that is a signal to use a different shape. Use the conversational mode until the shape is clear, then write the spec for the implementation pass. Forcing criteria onto an open question produces either trivial criteria or wrong ones.

Does spec-driven work for teams, or just solo developers?

It works better for teams, because the spec is a shared artifact. Two developers prompting with their own framing get divergent implementations; two developers reviewing the same spec before the run catch divergence before it ships. See the GitHub Copilot Workspace guide for a tool that builds this review surface in.

What happens to the spec after the code ships?

Link it from the PR so the reviewer sees the original intent. If the spec encodes a lasting rule — a convention, a constraint, a pattern — promote that rule to the persistent context file so the next task inherits it.

Spec-Driven AI Coding: Writing Specs Agents Execute Well (2026)

Why Specs Beat Chats for Autonomous Agents

Anatomy of a Good Spec

Where Specs Live

Writing Acceptance Criteria

Explicitly Naming Out-of-Scope

Stating Constraints

Spec Iteration — The First One Is Usually Wrong

A Good Spec Example (Hypothetical)

When Spec-Driven Is Overkill

Common Anti-Patterns

FAQ

Is spec-driven AI coding the same as formal spec-driven development?

Where should I put the spec — inline, in a file, or in the issue tracker?

How do I write acceptance criteria when the task is exploratory?

Does spec-driven work for teams, or just solo developers?

What happens to the spec after the code ships?

AI prompts built for developers

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

The Complete Guide to Prompting AI Coding Agents (2026)

GitHub Copilot Workspace Prompting Guide (2026)

Agent Debugging Prompts: Fixing Stuck or Wrong Agents (2026)