We scored 971 real AI prompts on the same 8 dimensions that separate a weak prompt from an expert one. The average scored just 20.5 out of 100. Here is what the data says about how people actually prompt in 2026 — and the one change that recovers the most points.
Everyone says "it's all about the prompt." Almost nobody measures whether their prompts are any good. So we did.
This is the first State of AI Prompting report: an analysis of 971 real prompts people submitted in 2026, each scored 0-100 on the eight dimensions that drive output quality. The numbers are worse than you'd guess — and they point to a single, fixable habit.
Info
How we measured it. Every prompt was graded by the same deterministic engine behind our free Prompt Quality Score tool — 8 weighted dimensions totaling 100 points: Length & Detail (20), Role/Persona (15), Specificity (15), Structure (15), Output Format (10), Constraints (10), Context (10), Examples (5). Sample: 971 prompts submitted between March 25 and June 2, 2026. No prompt text was stored or analyzed beyond the score — this report is aggregate-only.
The headline: the average prompt is failing
A score of 20.5 isn't "needs work" — it's a near-blank prompt. The median was even lower, at 16/100. The 25th percentile was 2/100: a quarter of all prompts were essentially a single line of vague instruction.
Put the whole distribution side by side and the picture is stark:
| Score band | Share of raw prompts |
|---|---|
| 0–19 | 54.8% |
| 20–39 | 27.3% |
| 40–59 | 12.3% |
| 60–79 | 5.0% |
| 80–100 | 0.6% |
89.6% of prompts scored below 50. Fewer than 1 in 150 cleared 80. The way most people prompt in 2026 leaves the vast majority of a model's capability untouched — see why your AI prompts suck for the mechanics of why a thin prompt produces a generic answer.
The one fix that adds the most points: assign a role
The most-skipped element in the entire dataset is also one of the highest-weighted. 90.4% of prompts never tell the AI who to be.
Here's how often each dimension is missing entirely, paired with what people most need to fix:
| Dimension | Prompts missing it | What the fix looks like |
|---|---|---|
| Role / Persona | 90.4% | "You are an expert financial analyst…" |
| Examples | 97.4% | "Here's an example of the output I want…" |
| Constraints | 92.1% | "Do not exceed 200 words. Avoid jargon." |
| Context | 91.9% | "This is for a non-technical executive audience." |
| Structure | 88.7% | Numbered steps or labeled sections |
| Output Format | 82.1% | "Return a markdown table with 3 columns." |
| Length & Detail | 47.0% | Enough specifics to remove ambiguity |
| Specificity | 16.5% | Real numbers, names, and terms |
Examples are missing even more often (97.4%), but Examples is worth only 5 points — fixing it can't move the needle much. Role/Persona is the highest-leverage gap: it's worth 15 points and nine in ten prompts skip it. Add a single role line and most prompts jump a full grade. The RCAF framework — Role, Context, Action, Format — exists precisely because these are the four dimensions people drop most.
Specificity is the one bright spot: only 16.5% of prompts are vague on names and numbers. People are concrete about what they want — they just never tell the model how to be an expert about it.
The proof: engineering the same prompts lifts them 276%
Every prompt in this dataset was also rewritten into a structured prompt. Scoring those engineered versions on the identical rubric shows what closing those gaps is worth:
| Metric | Raw prompt | Engineered prompt |
|---|---|---|
| Average score | 20.5 / 100 | 77.2 / 100 |
| Median score | 16 | 85 |
| Scoring 80+ | 0.6% | 70.3% |
| Scoring below 50 | 89.6% | 9.0% |
The intent in people's prompts was usually fine. The structure wasn't. Adding the role, constraints, context, and output format the rubric rewards moved the average from a failing 20.5 to a strong 77.2 — and flipped the distribution from "almost none clear 80" to "most do." You can run any prompt through the same scorer yourself on the Prompt Quality Score tool, or build a structured one from scratch in the prompt builder.
Who writes the best prompts? Claude and Perplexity users
Quality varied by which model people were targeting. Users aiming at Claude, Copilot, and Perplexity wrote noticeably stronger raw prompts than those targeting ChatGPT:
| Target model | Share | Avg raw score | Avg engineered score |
|---|---|---|---|
| General (no model set) | 36.4% | 16.2 | 81.0 |
| Claude | 23.5% | 27.8 | 82.5 |
| Gemini | 10.5% | 19.2 | 71.4 |
| Grok | 8.8% | 18.1 | 56.1 |
| ChatGPT | 7.1% | 17.2 | 80.5 |
| Copilot | 4.8% | 28.0 | 81.1 |
| DeepSeek | 4.8% | 17.5 | 65.0 |
| Perplexity | 3.4% | 28.7 | 78.3 |
| Llama | 0.7% | 15.1 | 62.4 |
The pattern is consistent with how people use each tool: Perplexity, Claude, and Copilot pull more deliberate, work-oriented prompting, while the largest group — people who didn't specify a model at all — wrote the thinnest prompts (16.2 average). If you're picking a model for a specific job, see which AI model should you use.
Prompts are short — and getting slightly better
The average prompt is 44 words, but that average hides how short most are: 55% of prompts are under 26 words, and 27% are under 10.
| Prompt length | Share |
|---|---|
| Under 10 words | 26.9% |
| 10–25 words | 28.3% |
| 26–50 words | 19.9% |
| 51–100 words | 12.8% |
| Over 100 words | 12.2% |
There is one encouraging signal in the trend. Average raw quality crept up month over month — 14.4 in March, 20.2 in April, 21.8 in May — suggesting people are slowly learning to prompt with more structure. Slowly.
What this means for you
If your prompts look like the average in this data, you're leaving most of the model on the table. The fastest wins, in order of points recovered:
- Assign a role. One line — "You are an expert [X]" — and you've fixed the gap 90% of people have.
- State the output format. Tell the model exactly what shape you want back.
- Add constraints and context. Who's it for, what to avoid, how long.
- Then add length and an example if the task is complex.
That's the structure of a good prompt — and it's exactly what a generator handles for you. Paste a prompt into the Prompt Quality Score tool to see where yours scores, then use the prompt builder to close the gaps automatically.
Warning
Methodology and limits. This is a snapshot, not a census. The sample is 971 prompts from people using SurePrompts (March–June 2026), so it skews toward users already seeking better prompts — the broader average is likely lower, not higher. Scores come from a deterministic 8-dimension heuristic, not human quality ratings, and "engineered" prompts are SurePrompts' own structured output, so the lift reflects that structuring. We'll re-run this report as the dataset grows.
Frequently asked questions
What is the average AI prompt quality score?
Across 971 real prompts submitted to SurePrompts between March and June 2026, the average prompt scored 20.5 out of 100 on an 8-dimension rubric. 89.6% of prompts scored below 50, and the median score was just 16.
What is the most common mistake people make in AI prompts?
Not assigning a role. 90.4% of prompts never tell the AI who to be (e.g. "You are an expert copywriter"). Adding a role is the single highest-impact fix because it is both the most-skipped element and one of the heaviest-weighted dimensions.
How much does prompt engineering actually improve a prompt?
In this dataset, restructuring raw prompts raised the average quality score from 20.5 to 77.2 out of 100 — a 276% increase. 70% of engineered prompts scored 80 or above, versus under 1% of raw prompts.
How was the State of AI Prompting study measured?
Every prompt was scored 0-100 across 8 weighted dimensions (length, role, specificity, structure, output format, constraints, context, examples) by the same deterministic engine that powers SurePrompts' free Prompt Quality Score tool. The sample is 971 real prompts from March–June 2026.
Do longer prompts score higher?
Length helps but isn't enough. The average prompt is 44 words, and 55% are under 26 words — but the larger gap is structural. Most prompts that fail aren't just short; they omit a role, constraints, and a defined output format.