Skip to main content
Back to Blog
data studyprompt engineeringAI promptsClaudeChatGPTresearch2026

People Write Better Prompts for Claude Than for ChatGPT (We Scored 971)

We scored 971 real prompts by the AI they were written for. Claude users' prompts score 62% higher than ChatGPT users' — but every model's users are failing.

June 17, 2026
7 min read

TL;DR

Across 971 real prompts scored 0–100 by the model they targeted, Claude drew the strongest prompts among the major assistants (27.8/100) and ChatGPT among the weakest (17.2/100) — a 62% gap. Copilot (28.0) and Perplexity (28.7) also scored high; people who picked no model at all scored lowest (16.2). Yet every group is failing: even the best-prompted model sits under 30/100.

This is a companion to The State of AI Prompting 2026, where 971 real prompts averaged just 20.5 out of 100. Here we split that same dataset by the AI each prompt was written for — and the model you're talking to turns out to predict how much effort you put in.

We already knew the average prompt is weak. The surprise is how unevenly that weakness is distributed. Split the same 971 prompts by their target model and a clear pattern appears: the audience writing for Claude shows up far better prepared than the audience writing for ChatGPT.

62%

How much higher Claude users' raw prompts score (27.8/100) than ChatGPT users' (17.2/100)

Raw prompt quality by model

Each prompt is scored 0–100 on the same eight dimensions used in the main report. "Raw" is the prompt as the person typed it; "engineered" is SurePrompts' restructured version of it.

ModelPromptsShareAvg raw scoreAvg engineered score
Perplexity333.4%28.7 / 10078.3 / 100
Copilot474.8%28.0 / 10081.1 / 100
Claude22823.5%27.8 / 10082.5 / 100
Gemini10210.5%19.2 / 10071.4 / 100
Grok858.8%18.1 / 10056.1 / 100
DeepSeek474.8%17.5 / 10065.0 / 100
ChatGPT697.1%17.2 / 10080.5 / 100
No model selected35336.4%16.2 / 10081.0 / 100
Llama70.7%15.1 / 10062.4 / 100

Info

Read the raw column, not the engineered one. Raw scores measure how people actually prompt. The engineered column reflects SurePrompts' model-specific formatting scored against the same rubric, so differences there are partly about output formatting, not user behavior. Claude (n=228) is the most robustly measured high scorer; Perplexity and Copilot edge it on much smaller samples.

The top of the table: Claude, Perplexity, and Copilot

The three highest-scoring audiences have something in common — they skew technical. Perplexity is built around research queries. Copilot lives inside coding and Office workflows. Claude has a heavy developer and analyst following. People arriving from those contexts are used to writing longer, more structured requests, so their raw prompts already carry more of what the rubric rewards: an explicit task, some context, and occasionally a defined output format.

Claude is the headline because its sample is large enough to trust. At 27.8/100 across 228 prompts, it isn't a fluke of a handful of power users — it's a consistent ~45% edge over Gemini (19.2) and a 62% edge over ChatGPT (17.2).

The bottom of the table: ChatGPT and the "no model" crowd

The two weakest groups are the two biggest stories.

ChatGPT (17.2/100) scores near the bottom of every named model. That isn't a knock on the model — it's a reflection of its reach. ChatGPT is the default front door to AI for the mainstream, and the mainstream writes one-line, role-less prompts. The very popularity that makes it the category leader also fills its prompt stream with "write me a caption" and "fix this."

No model selected (16.2/100) is the largest single group in the entire dataset — 36% of all prompts — and it scores lowest of all. People who don't even pause to pick which AI they're prompting are, unsurprisingly, the same people who don't pause to structure the request.

But here's the part that matters: everyone is failing

It's tempting to read this as "Claude users are good and ChatGPT users are bad." They aren't. Claude's leading score is still 27.8 out of 100 — a failing prompt. The gap between the best and worst model audiences is the gap between very weak and extremely weak. Across all 971 prompts the average is just 20.5/100, and 9 in 10 score below 50 no matter which model they target.

The lesson isn't "switch models." It's that the single biggest lever on your output quality is the prompt — and almost nobody is pulling it, regardless of which AI they prefer.

What to do with this

The fix is the same for every model, because the rubric is the same: give the AI a role, add specificity and context, define the output format, and show an example when the task is complex. That's the structure that separates a 17 from an 85.

Warning

Methodology and limits. Prompts are grouped by the target model selected at generation time; "No model selected" means the user left the default. Sample sizes vary widely — Claude (228) and Gemini (102) are robust, ChatGPT (69) and the sub-50 groups are directional, and Llama (7) is too small to draw conclusions from and is excluded from the headline. Scores are a deterministic 8-dimension heuristic, not human ratings. The sample is 971 prompts from SurePrompts users, March–June 2026, and skews toward people already seeking better prompts — so these are likely upper bounds. The full aggregate dataset is published under CC BY 4.0download the JSON (cite SurePrompts, State of AI Prompting 2026, by Model).

Frequently asked questions

Do people write better prompts for Claude or ChatGPT?

In a sample of 971 real prompts submitted to SurePrompts in 2026, prompts written for Claude scored an average of 27.8 out of 100, versus 17.2 for ChatGPT — a 62% gap. Both are still failing scores, but Claude consistently drew more structured prompts.

Which AI model gets the highest-quality prompts?

Among models with a meaningful sample, Claude (27.8/100, n=228) leads. Perplexity (28.7) and Copilot (28.0) score a touch higher on smaller samples (n=33 and n=47). The weakest prompts went to ChatGPT (17.2) and to sessions where no model was selected at all (16.2).

Why would prompt quality differ by AI model?

It reflects who is doing the prompting, not the model itself. Claude and Perplexity skew toward developers, analysts, and research-minded users who write longer, more structured prompts. ChatGPT's broad mainstream audience includes far more one-line, role-less prompts.

Is any model's audience writing good prompts?

No. Even the best-prompted model averages under 30 out of 100. The gap between models is the gap between bad and very bad — the entire distribution leaves most of every model's capability untapped.

How was prompt quality measured across models?

Every prompt was scored 0–100 across 8 weighted dimensions (length, role, specificity, structure, output format, constraints, context, examples) by the same deterministic engine behind the free SurePrompts Prompt Quality Score tool, then grouped by the target model selected at generation time. Sample: 971 prompts, March–June 2026, aggregate-only.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Get ready-made ChatGPT prompts

Browse our curated ChatGPT prompt library — tested templates you can use right away, no prompt engineering required.

Browse ChatGPT Prompts