A senior engineer opens a blank doc on a Tuesday morning and writes "Design: Multi-tenant rate limiter." They know what the system needs to do and which approach they think is right. Four hours later the doc is still three paragraphs and a headache. The writing is the work, and the writing is the part AI is good at — if the prompt carries the scaffolding a spec needs. A generic "write a spec for a multi-tenant rate limiter" produces a thousand words of shaped-like-a-spec content that says nothing specific. The spec scaffold — problem framing, approach, trade-offs, non-goals, open questions — is the difference between high-volume vague output and a v0 an engineer can actually iterate on.
This post sits in the engineering track of our prompt engineering for business teams guide and pairs with AI architecture review prompts, AI incident postmortem prompts, and spec-driven AI coding.
Why "Write a Spec for X" Fails
Three failure modes stack. The first is the training-data average. Asked to produce a spec with no scaffold, a model returns the average of specs on the public internet — a one-paragraph problem statement, a diagram placeholder, three bullet points under a half-dozen generic headings. It reads like a spec. It decides nothing. Production specs are commitments to one approach over others, and the commitment is where AI defaults abandon you.
The second is missing inputs. A spec is a function of the problem, the constraints, the existing system, and the decisions already made. A prompt that names only the problem leaves the model to invent the rest — an architecture that does not match your stack, a capacity assumption that does not match your traffic, non-goals that might be actual goals for your team. The output is internally consistent and externally wrong.
The third is the monolithic ask. Even a well-specified prompt produces weak output when it asks for the full spec in one shot. The model balances tokens across sections regardless of where the difficulty sits. If the real work is in the trade-offs, you want most of the model's attention there, and section-level prompting is the only way to get it.
The Spec Scaffold
A scaffold is the shape the prompt enforces. Five sections cover most engineering specs; more if the scope demands it, but rarely fewer:
| Section | What it answers | What makes it load-bearing |
|---|---|---|
| Problem framing | What are we solving, for whom, and why now? | Names the forcing function. Without it, the spec is a solution in search of a problem. |
| Approach | What are we proposing to build, at a high level? | The commitment. One approach, named and described, not three options hedged together. |
| Trade-offs | What do we lose by choosing this approach? | The honesty. A spec with no trade-offs is either trivial or hiding them. |
| Non-goals | What are we explicitly not solving? | The boundary. Keeps reviewers from asking about scope that was never in scope. |
| Open questions | What is unresolved and who decides? | The handle. Converts ambiguity into action instead of leaving it as hidden risk. |
An Architecture Decision Record (ADR) collapses this further — problem, decision, consequences — and we cover that shape below. A full design doc may add rollout, migration, and observability. The five above are the minimum viable scaffold that survives review.
The scaffold is itself a prompt template: a reusable structure with slots for problem-specific data. Team-level adoption means the template lives somewhere shared — a repo, a wiki, a snippet library — and gets improved as the team learns what produces reviewable specs.
Prompt Per Section
The highest-leverage move in spec prompting is splitting one "write the spec" prompt into five section-scoped prompts, each with its own inputs. Problem framing needs the forcing function, affected users, and current behavior. Approach needs the candidate design and the constraints it must fit. Trade-offs needs the alternatives and the dimensions to compare on. Non-goals needs explicit scope decisions already made. Open questions needs the risk surface the author is aware of.
Section-level prompting pays out three ways. You can iterate on one section without re-running the whole spec. You can feed section-specific inputs the full-doc prompt would not have context room for. And you can apply different constraints per section: terse for non-goals, exhaustive for trade-offs, cautious for open questions.
The simplest workflow: outline the spec as a bullet list, then run one prompt per bullet. Stitch the outputs together and edit for flow. The stitched draft is usually cleaner than a one-shot spec because each section was written with the attention of a focused prompt.
Research-to-Spec Prompts
Many specs start as a slack thread, an issue comment chain, or a scratchpad of RFC-style notes. Turning that raw material into a spec draft is an extraction task AI is well-suited to — provided you tell it what to extract and what to discard.
A research-to-spec prompt takes the raw material as input, the scaffold as structure, and extraction rules as behavior. Rules that make this work: name the source of any claim drawn from the input; mark stated opinions versus agreed-upon facts; surface unresolved points as open questions rather than invented answers; and leave sections empty and flagged when the input does not support them, instead of fabricating content to fill the shape.
The output is a draft, not a finished spec. The value is the transformation from "we talked about this in slack" to "we have a draft to edit." The editing is where the engineer's judgment goes.
ADR Format
Architecture Decision Records are shorter, decision-focused specs — often one or two pages — that capture a single choice and the reasoning behind it. The canonical shape is three sections:
- Context. What situation forced this decision, and what constraints bound it?
- Decision. What did we choose?
- Consequences. What follows from this choice — good, bad, and neutral?
ADRs prompt well because the scope is bounded. A good ADR prompt names the decision, the alternatives considered, and the constraints that ruled others out. The model's job is to articulate the reasoning a future engineer can audit — not to invent the decision.
Useful rules: require at least two alternatives with a one-sentence rejection reason each; require consequences to include at least one negative and one neutral, not only positives; and require the context section to name the forcing function — a timeline, a capacity limit, a compliance deadline. These push the model past the "we chose X because X is great" shape that is the ADR equivalent of affirmation-theatre. For reviewing architectural decisions once made, see AI architecture review prompts.
Iterating on the Spec
A v0 spec is not a finished spec. The iteration prompts are where most of the leverage hides, because editing a draft is a different shape of work than producing one.
Three iteration prompts worth having in the snippet library:
- Critique. Given the current draft and the scaffold, ask the model to find weak sections — problem framing that does not actually name the problem, an approach section that slips into implementation detail, trade-offs that only list positives, non-goals that contradict implicit goals. Critique prompts benefit from the same constraint as architecture review critique: name the component, name the concern, do not summarize.
- Alternative framing. Given the current problem framing, ask for two or three alternative framings — different scopes, different users, different forcing functions. Surfaces whether the chosen framing is the right one or just the first one that came to mind.
- Consistency check. Ask the model to flag any claim in one section contradicted by a claim in another. Specs drift internally during editing — a non-goal becomes a goal two sections later — and consistency prompting catches drift before review does.
None of these replace the engineer's judgment. They produce material the engineer considers and accepts or rejects. For the same pattern applied to agent-driven code, see spec-driven AI coding.
Example: Problem-Framing Prompt (Hypothetical)
A prompt for the first section of a spec. The example is hypothetical — paths, systems, and users are illustrative.
ROLE:
You are a senior engineer writing the problem-framing section of a
technical spec. You name the specific problem, the affected users,
the current behavior, and the forcing function. You do not propose
solutions in this section.
CONTEXT:
System: multi-tenant API gateway, serves ~40 internal services.
Affected users: tenant administrators, internal service owners.
Current behavior: a single global rate limit applies to all
tenants. When one tenant bursts, it consumes the shared budget
and other tenants see 429s despite being within their own
expected usage.
Forcing function: a launch next quarter moves two tenants to
traffic tiers 10x their current levels. Without per-tenant
limits, the launch will cause noisy-neighbor incidents on
unrelated tenants.
Existing constraints:
- Must use the existing Redis cluster; no new datastore.
- Must not add more than 5ms p99 to request latency.
TASK:
Draft the problem-framing section only. Include:
- A one-sentence problem statement.
- Affected users, named specifically.
- Current behavior, described factually with the user-visible
symptom.
- The forcing function and its timeline.
- The constraints the solution must respect.
Do NOT propose a solution, name a specific algorithm, or outline
an approach. That belongs in the approach section.
ACCEPTANCE:
- Problem statement is specific to this system, not generic.
- At least one affected user is named with a concrete scenario.
- Forcing function has a timeline (quarter, month, or date).
- No solution language. If drafting drifts toward solutions, stop
and rewrite.
The load-bearing parts are the explicit constraint to avoid solutions and the acceptance clause that enforces specificity. Without both, the output drifts toward a generic description that would fit any rate limiter anywhere.
Example: Trade-offs Prompt (Hypothetical)
A shorter prompt for the trade-offs section, run after the approach is drafted.
ROLE:
You are drafting the trade-offs section of a technical spec. You
list what is lost by choosing the proposed approach, not what is
gained.
CONTEXT:
Proposed approach: per-tenant token buckets in Redis, refilled by
a background job, evaluated in the gateway middleware.
Alternatives considered (already rejected):
- In-process limits per gateway node (rejected: inconsistent
across nodes).
- Separate Redis cluster per tenant tier (rejected: operational
overhead).
TASK:
List the trade-offs of the proposed approach. For each:
- Name the specific cost (latency, operational, correctness,
cost, or flexibility).
- State the magnitude or condition under which it matters.
- Note whether it is mitigable and how, or explicitly
unmitigable.
Include at least one trade-off that would be a deal-breaker if a
specific condition held — and name the condition.
Do NOT list benefits. Benefits belong in the approach section.
The prompt is short because the work is narrow. The acceptance-style final clause — at least one deal-breaker trade-off with its condition — is what pulls the output past "some latency added" and into "if tenant count exceeds ~50k the Redis memory footprint becomes the limit."
Common Anti-Patterns
- "Write the spec" as a single prompt. Produces shaped-like-a-spec content with no section doing real work. Fix: prompt per section.
- No forcing function in problem framing. Produces a problem description that could have been written any time, for any team. Fix: require a timeline or deadline that makes the problem now.
- Trade-offs section that lists benefits. The model averages over public design docs, many of which hedge. Fix: explicit constraint that the trade-offs section lists costs only.
- Non-goals that contradict the approach. Usually because non-goals were written first and approach drifted. Fix: consistency-check prompt after the full draft is assembled.
- Fabricated open questions. The model invents unresolved issues to fill the section. Fix: require open questions to cite the section they arose from, or flag the section "none identified."
- ADRs with only one alternative. Produces "we chose X because X" reasoning that does not hold up in review. Fix: require at least two alternatives with one-sentence rejection reasons.
FAQ
Should the spec author use AI, or should the reviewer?
Both, for different jobs. The author uses AI to produce the v0 faster, section by section. The reviewer uses AI on a finished draft for critique and consistency checks, catching weaknesses the author is too close to see. The same prompts rarely serve both.
How long should an AI-drafted spec be?
As long as the problem requires. AI defaults trend toward longer outputs because training-data specs err on the verbose side. If your team's convention is two-page design docs, put that constraint in the prompt explicitly — "total length under 1,200 words" — and edit aggressively after. Length is not a quality signal.
Does this work for RFC-style specs meant for external review?
Yes, with one adjustment. External RFCs carry audience context an internal spec can assume — who the reviewers are, what they already know, what formality is expected. Include that context in the prompt. An RFC for a standards body and one for an internal working group look different, and the model needs to know which.
What about specs for exploratory work?
The spec scaffold assumes an approach to commit to. If you do not have one, the shape is wrong — you are writing a research plan, not a spec. Use a different prompt: outline what you would need to learn to choose between approaches, and what signals would tell you. Once the exploration resolves, write the spec.
The spec is the place an engineering decision becomes reviewable. AI does not make the decision, but it makes the writing around the decision cheap enough that the spec actually gets written — and gets written well enough to review.