Skip to main content
Back to Blog
Featured
prompt engineeringprompt qualityprompt evaluationSurePrompts Quality Rubricprompt scoringAEO

The SurePrompts Quality Rubric: A 7-Dimension Framework for Scoring Prompts

A structured way to evaluate prompt quality across 7 dimensions, scored 1-5 each for a max of 35. Replaces 'this prompt feels off' with concrete scores you can act on.

SurePrompts Team
April 21, 2026
7 min read

TL;DR

The SurePrompts Quality Rubric scores a prompt across 7 dimensions — role clarity, context sufficiency, instruction specificity, format structure, example quality, constraint tightness, and output validation — each 1-5, for a max of 35.

Tip

TL;DR: The SurePrompts Quality Rubric scores a prompt across 7 dimensions — role clarity, context sufficiency, instruction specificity, format structure, example quality, constraint tightness, and output validation — each 1-5 for a max of 35. Anything 28+ is production-ready; 21-27 needs revision; below 21 is not yet working.

Key takeaways:

  • The Rubric scores, it does not judge. 7 dimensions × 1-5 = max 35. A low score on one dimension is a specific fix to make next, not a verdict on the whole prompt.
  • Output validation is the most under-built dimension. Prompts that score well everywhere else but 1 on validation cause production incidents disproportionately often.
  • 28+ ships. 21-27 needs revision. Below 21 is not yet functional. The thresholds are deliberate, not arbitrary percentages.
  • RCAF to draft, Rubric to audit. Pair the Rubric with RCAF Prompt Structure — use both, not one.
  • For agent prompts, weight constraint tightness and output validation higher. Agents fail differently from one-shot prompts; a 28 one-shot can be a 22 agent prompt if the weighting is not adjusted.

Why a rubric at all?

Most prompt improvement today happens by vibe. A prompt "feels off," so the engineer tweaks wording until it "feels better." This works for simple prompts and breaks at scale: two engineers can't agree on what "better" means, a good prompt on Monday stops working on Thursday, and nobody can explain why.

A rubric replaces vibe with dimensions. Instead of this prompt is bad, you say this prompt scores 2/5 on output validation. That statement is actionable — you can add output validation and rescore.

The SurePrompts Quality Rubric is designed for one job: fast iteration with a shared vocabulary. It is not a gate, not a scoring system to impress anyone, and not a replacement for actually running the prompt on eval data. It is the thing you use between draft and eval to catch obvious weaknesses.

The 7 dimensions

1. Role clarity (1-5)

Does the prompt assign the AI a specific, coherent role?

  • 5: Explicit role with scope, voice, expertise level, and posture. ("You are a senior backend engineer reviewing a pull request for production readiness.")
  • 3: Role present but vague. ("You are a helpful assistant.")
  • 1: No role. The model is guessing who it's supposed to be.

2. Context sufficiency (1-5)

Does the prompt include everything the model needs to do the task well?

  • 5: All relevant background (the user's situation, constraints, prior decisions, relevant domain knowledge) is present.
  • 3: Some context; the model can mostly proceed but will make assumptions.
  • 1: Near-zero context. The model will fabricate or refuse.

3. Instruction specificity (1-5)

How precise is the task description?

  • 5: The task, its sub-tasks, and the success criteria are named explicitly.
  • 3: The task is named; sub-steps and success criteria are implicit.
  • 1: Vague verb ("help me with X"), no sub-structure.

4. Format structure (1-5)

Is the expected output format specified?

  • 5: Exact structure defined (schema, section headers, tone, length). Ideally with an example.
  • 3: Format named ("as a list") but not specified in detail.
  • 1: No format instructions.

5. Example quality (1-5)

Are the few-shot examples (if any) well-chosen?

  • 5: 2-4 examples covering diverse input cases and the edge case(s) that matter.
  • 3: 1-2 generic examples.
  • 1: No examples, or examples that don't match the actual input distribution.

For zero-shot prompts, score this dimension based on whether the prompt makes zero-shot viable — some tasks genuinely don't need examples; others silently need them and suffer without.

6. Constraint tightness (1-5)

Are constraints (what the model must NOT do, length limits, banned words, output types) specified?

  • 5: Explicit constraints covering the known failure modes for this task.
  • 3: Some constraints, but the common failure modes are unaddressed.
  • 1: No constraints. The model will do whatever it wants.

7. Output validation (1-5)

Is there a plan for validating the output before using it?

  • 5: Output is machine-validated (schema check, regex, programmatic test) or explicitly reviewed against criteria.
  • 3: Output is human-reviewed but without a checklist.
  • 1: Output is used as-is, with no validation path.

This is the dimension most often at 1, and it's frequently the reason a prompt that "works" in testing breaks in production.

Scoring guidance

ScoreMeaning
28-35Production-ready. Ship it.
21-27Working draft. Fix the lowest-scoring dimensions.
14-20Needs major revision. Pick the 3 lowest scores and address them.
7-13Not yet functional. Rewrite from scratch using RCAF + Rubric.

Worked example

Consider this starting prompt:

Write me a product description for a new blender.

Scored against the Rubric:

  • Role clarity: 1 (no role)
  • Context sufficiency: 1 (no product details)
  • Instruction specificity: 2 (task named, nothing else)
  • Format structure: 1 (no format specified)
  • Example quality: 1 (no examples)
  • Constraint tightness: 1 (no constraints)
  • Output validation: 1 (no validation plan)

Total: 8/35. Not functional.

Revised using RCAF structure and Rubric feedback:

Role: You are an ecommerce copywriter writing for a mid-market kitchen appliance brand. Voice: confident, practical, no hype.

>

Context: The product is the Vortex Pro 700W countertop blender. Key specs: 700W motor, 6 speeds, 48oz glass jar, BPA-free lid, stainless steel blades, 7-year warranty. Target buyer: home cook who wants a reliable blender without pro-chef overkill.

>

Action: Write a product description optimized for an Amazon listing page. Cover: hero statement, 5 bullet-point feature benefits, 1 short paragraph on who it's for, 1 short paragraph on what's in the box.

>

Format:

- Hero statement: 1 sentence, <20 words

- Feature bullets: 5 bullets, each <15 words, benefit-first

- Who-it's-for paragraph: 2-3 sentences

- What's-in-the-box paragraph: 1-2 sentences listing items

>

Constraints: Do not use the words "revolutionary," "game-changing," or "ultimate." Do not make claims about blending ice unless asked (motor is 700W, which is borderline). Do not invent accessories not listed in the spec.

>

Validation: After writing, list the 5 claims you made that could not be verified from the context above, so I can check them.

Scored:

  • Role clarity: 5
  • Context sufficiency: 4 (we didn't include competitor positioning or price point)
  • Instruction specificity: 5
  • Format structure: 5
  • Example quality: 2 (no example; for Amazon copy we might want one, but zero-shot is viable here)
  • Constraint tightness: 4 (good banned-word list; could add length limit on output)
  • Output validation: 5 (the "list unverifiable claims" instruction is an in-prompt validation step)

Total: 30/35. Ship it.

Our position

  • The Rubric is a diagnostic, not a gate. Don't hold a prompt back over a 26 if the eval-set results are fine.
  • Output validation is the single highest-leverage dimension. Prompts that score well elsewhere but 1 on validation cause production incidents disproportionately often.
  • The Rubric is deliberately 7 dimensions. Fewer misses failure modes; more becomes theater.
  • For agent prompts, double-weight constraint tightness and output validation. Single-shot failure modes differ from multi-step drift.
  • Use the Rubric paired with RCAF for drafting. RCAF to draft, Rubric to audit.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Ready to write better prompts?

SurePrompts turns plain English into expert-level AI prompts. 350+ templates, real-time preview, works with any model.

Try AI Prompt Generator