People Write Better Prompts for Claude Than for ChatGPT (We Scored 1,324)

SurePrompts Editorial Team

This is a companion to The State of AI Prompting 2026, where 1,324 real prompts averaged just 21.8 out of 100. Here we split that same dataset by the AI each prompt was written for — and the model you're talking to turns out to predict how much effort you put in.

We already knew the average prompt is weak. The surprise is how unevenly that weakness is distributed. Split the same 1,324 prompts by their target model and a clear pattern appears: the audience writing for Claude shows up far better prepared than the audience writing for ChatGPT.

42%

How much higher Claude users' raw prompts score (28.3/100) than ChatGPT users' (19.9/100)

Raw prompt quality by model

Each prompt is scored 0–100 on the same eight dimensions used in the main report. "Raw" is the prompt as the person typed it; "engineered" is SurePrompts' restructured version of it.

Model	Prompts	Share	Avg raw score	Avg engineered score
Perplexity	49	3.7%	29.5 / 100	79.7 / 100
Claude	311	23.5%	28.3 / 100	83.2 / 100
Copilot	62	4.7%	26.2 / 100	81.6 / 100
ChatGPT	107	8.1%	19.9 / 100	83.2 / 100
Gemini	153	11.6%	19.5 / 100	75.1 / 100
Grok	108	8.2%	19.2 / 100	60.8 / 100
No model selected	466	35.2%	18.5 / 100	82.1 / 100
DeepSeek	61	4.6%	17.9 / 100	68.7 / 100
Llama	7	0.5%	15.1 / 100	62.4 / 100

Info

Read the raw column, not the engineered one. Raw scores measure how people actually prompt. The engineered column reflects SurePrompts' model-specific formatting scored against the same rubric, so differences there are partly about output formatting, not user behavior. Claude (n=311) is the most robustly measured high scorer; Perplexity edges it on a much smaller sample.

The top of the table: Claude, Perplexity, and Copilot

The three highest-scoring audiences have something in common — they skew technical. Perplexity is built around research queries. Copilot lives inside coding and Office workflows. Claude has a heavy developer and analyst following. People arriving from those contexts are used to writing longer, more structured requests, so their raw prompts already carry more of what the rubric rewards: an explicit task, some context, and occasionally a defined output format.

Claude is the headline because its sample is large enough to trust. At 28.3/100 across 311 prompts, it isn't a fluke of a handful of power users — it's a consistent ~45% edge over Gemini (19.5) and a 42% edge over ChatGPT (19.9).

The bottom of the table: the "no model" crowd

The weakest prompts now cluster in two places — and, unlike our first report, neither is ChatGPT.

No model selected (18.5/100) is the largest single group in the entire dataset — 35% of all prompts — and it sits near the bottom. People who don't even pause to pick which AI they're prompting are, unsurprisingly, the same people who don't pause to structure the request.

The genuinely lowest named models are Llama (15.1) and DeepSeek (17.9) — small, early-adopter audiences whose prompt volume here (n=7 and n=61) is too thin to read much into.

The more interesting movement is ChatGPT (19.9/100). In our first cut its users scored 17.2 — last among named models. This time they've climbed into the middle of the pack, now roughly even with Gemini (19.5) and Grok (19.2). ChatGPT is still the mainstream front door to AI, and the mainstream writes more one-line, role-less prompts than the Claude or Perplexity crowd — but the gap to the leaders has narrowed, not widened.

But here's the part that matters: everyone is failing

It's tempting to read this as "Claude users are good and ChatGPT users are bad." They aren't. Claude's leading score is still just 28.3 out of 100 — a failing prompt. The gap between the best and worst model audiences is the gap between very weak and extremely weak. Across all 1,324 prompts the average is just 21.8/100, and nearly 9 in 10 score below 50 no matter which model they target.

The lesson isn't "switch models." If you are weighing that decision anyway, our breakdown of the best AI model in 2026: ChatGPT vs Claude vs Gemini compared covers where each one pulls ahead — but it's that the single biggest lever on your output quality is the prompt, and almost nobody is pulling it, regardless of which AI they prefer.

What to do with this

The fix is the same for every model, because the rubric is the same: give the AI a role, add specificity and context, define the output format, and show an example when the task is complex. That's the structure that separates a 20 from an 85.

Paste any prompt into the free Prompt Quality Score tool to see exactly which of the eight dimensions you're leaving on the table.
Use the prompt builder to close those gaps automatically — it's tuned per model, whether you're writing for Claude, ChatGPT, or Gemini.
New to structuring prompts at all? Start with how to write an AI prompt.
Want to see where these tools fit together? Our rundown of the best prompt engineering tools in 2026 maps out the full workflow stack.

Notice the engineered column: whichever model you write for, structuring pulls the average into the 80s — even ChatGPT's thin raw prompts (19.9) reach 83.2 once restructured. That lift is exactly what the SurePrompts builder does for free on any single prompt. If you prompt for work every day, SurePrompts Pro ($3.99/month) saves every engineered prompt to cloud storage you can reach from any device and unlocks 200+ expert templates — so you start from a structured prompt for your model, not a blank box.

Warning

Methodology and limits. Prompts are grouped by the target model selected at generation time; "No model selected" means the user left the default. Sample sizes vary widely — No model (466), Claude (311), Gemini (153), Grok (108), and ChatGPT (107) are robust; Copilot (62), DeepSeek (61), and Perplexity (49) are directional; and Llama (7) is too small to draw conclusions from and is excluded from the headline. Scores are a deterministic 8-dimension heuristic, not human ratings. The sample is 1,324 prompts from SurePrompts users, March–June 2026, and skews toward people already seeking better prompts — so these are likely upper bounds. The full aggregate dataset is published under CC BY 4.0 — download the JSON (cite SurePrompts, State of AI Prompting 2026, by Model). Part of SurePrompts Research.

Frequently asked questions

Do people write better prompts for Claude or ChatGPT?

In a sample of 1,324 real prompts submitted to SurePrompts in 2026, prompts written for Claude scored an average of 28.3 out of 100, versus 19.9 for ChatGPT — a 42% gap. Both are still failing scores, but Claude consistently drew more structured prompts.

Which AI model gets the highest-quality prompts?

Among models with a meaningful sample, Claude (28.3/100, n=311) is the strongest robustly-measured audience. Perplexity (29.5) edges it on a much smaller sample (n=49), with Copilot (26.2) close behind. The weakest prompts went to Llama (15.1), DeepSeek (17.9), and sessions where no model was selected at all (18.5).

Why would prompt quality differ by AI model?

It reflects who is doing the prompting, not the model itself. Claude and Perplexity skew toward developers, analysts, and research-minded users who write longer, more structured prompts. ChatGPT's broad mainstream audience includes far more one-line, role-less prompts.

Is any model's audience writing good prompts?

No. Even the best-prompted model averages under 30 out of 100. The gap between models is the gap between bad and very bad — the entire distribution leaves most of every model's capability untapped.

How was prompt quality measured across models?

Every prompt was scored 0–100 across 8 weighted dimensions (length, role, specificity, structure, output format, constraints, context, examples) by the same deterministic engine behind the free SurePrompts Prompt Quality Score tool, then grouped by the target model selected at generation time. Sample: 1,324 prompts, March–June 2026, aggregate-only.

People Write Better Prompts for Claude Than for ChatGPT (We Scored 1,324)

Raw prompt quality by model

The top of the table: Claude, Perplexity, and Copilot

The bottom of the table: the "no model" crowd

But here's the part that matters: everyone is failing

What to do with this

Frequently asked questions

Do people write better prompts for Claude or ChatGPT?

Which AI model gets the highest-quality prompts?

Why would prompt quality differ by AI model?

Is any model's audience writing good prompts?

How was prompt quality measured across models?

Get ready-made ChatGPT prompts

Related Resources

Prompt Chain Builder Template

System Prompt Writer Template

Market Research Report Template

Prompt Engineering Framework Template

Related Articles

The State of AI Prompting 2026: The Average Prompt Scores 21.8/100

The Anatomy of a Failing AI Prompt: Where the Missing 80 Points Go

How to Write AI Prompts: The Complete Guide to Getting Better Results (2026)