Chain-of-Density Prompting: A Worked Example for Dense Summaries

Q: What is Chain-of-Density?

Chain-of-Density (CoD) is an iterative summarization prompt pattern introduced by Adams et al. in 2023. The model writes an initial summary, then rewrites it several times. Each pass must add one or two new entities from the source while keeping the summary's length constant. The result is a summary packed with more information per word than a default summarization prompt produces.

Q: When should I use Chain-of-Density instead of a single summarize prompt?

Use CoD when you have a long source, a hard length budget, and readers who need high information density — executive briefs, research abstracts, meeting recaps, press-clip digests. A single-pass summary tends to lead with one or two top facts and pad the rest. CoD forces the model to replace padding with additional named entities and events. Skip it when readers want a gentle narrative or when the source is already short.

Q: How many iterations do I actually need?

The paper runs five. In practice three is usually the best trade-off between density and readability. Iterations 4 and 5 add entities but often at the cost of clunky sentence structure. Pick your stopping point by scoring each pass on a rubric and choosing the densest version that still reads naturally.

Q: Does it work with smaller/cheaper models?

Partially. The pattern relies on the model being able to identify new salient entities in the source and rewrite prose while honoring a length constraint. Larger models handle both better. On smaller models you will see length drift (the summary grows each pass) and repeated entities. You can compensate by adding an explicit word count check in each iteration prompt, but you may still want to post-edit.

Q: Can I evaluate Chain-of-Density outputs with LLM-as-Judge?

Yes, and it pairs well. Use an [LLM-as-Judge](/blog/llm-as-judge-prompting-guide) rubric that scores each iteration on entity count, faithfulness to the source, and readability. Judge against the source document, not just the previous iteration. This also lets you auto-pick the best pass rather than always defaulting to iteration 5.

Q: Does Chain-of-Density work for non-summary tasks?

The underlying idea — iterative dense rewriting under a fixed constraint — generalizes. It can compress job descriptions, tighten bios, or pack more examples into a constrained prompt section. Anytime you have a length budget and want to maximize information per word, a CoD-style loop is worth trying. For non-summary tasks the 'entity' criterion becomes 'unit of useful information,' which you define in the prompt.

Imtiaz Rayhan

Key takeaways:

CoD is an iterative pattern, not a single prompt. Five passes, each constrained to add entities without growing length. Skipping iterations defeats the point.
The core constraint is length held constant. Without it, the model simply appends facts, which is just a longer summary. The fixed budget forces it to replace filler with entities.
Iteration 3 is usually the sweet spot. Iterations 4 and 5 are denser but read as compressed lists. Score every pass; do not default to the last one.
Pair CoD with an evaluation loop. An LLM-as-Judge pass or a SurePrompts Quality Rubric score per iteration lets you auto-select the best version.
CoD fails on short sources and subjective content. There has to be enough entity density in the source to keep adding through five passes, and the task has to be "convey facts" not "convey feeling."

Why Chain-of-Density exists

Default summarization prompts have a specific failure mode. Ask any frontier model to "summarize this in 100 words" and you usually get a summary that leads with the top one or two facts, then dilutes into generic connective tissue — "the article discusses," "several experts noted," "overall the trend suggests." The word count is honored. The information density is not.

This matters when the summary is a deliverable, not a preview. Executive briefs, PR clip digests, research abstracts, and meeting recaps are read instead of the source. A thin summary wastes its budget on filler. What you want is the maximum number of source entities — people, companies, numbers, events, decisions — packed into the same word count.

Chain-of-Density, introduced by Adams et al. in the 2023 paper From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting, is a pattern designed for exactly this. The authors framed summary quality as a trade-off between density (entities per word) and readability, and found that iterative rewriting under a fixed length constraint can push density substantially higher than a single-pass summary. Human raters in their study tended to prefer denser versions up to a point — after which compression made the text feel list-like rather than narrative.

You use CoD when density matters more than flow. You skip it when the reader wants a gentle overview.

The pattern

CoD is a five-step prompt chain. The skeleton:

code

Step 1 — Initial summary
You will generate increasingly entity-dense summaries of the source below.
Source: <long source document>

Write a 60-word summary. Use generic filler where needed — the first
pass is intentionally sparse. Identify 1-3 informative entities
(named people, companies, numbers, events) from the source that
are NOT yet included in the summary. List them.

Output:
SUMMARY: <60 words>
MISSING_ENTITIES: <1-3 entities>

code

Step 2 to 5 — Rewrite, each pass
Rewrite the previous summary at exactly the same length (60 words).
The new summary MUST include every entity from the previous summary
AND add 1-2 entities from the MISSING_ENTITIES list (or newly identified
from the source). Make space for new entities by compressing generic
language, not by removing prior entities.

Then list 1-3 new missing entities for the next pass.

Previous summary: <iteration N-1 output>
Previous missing entities: <from iteration N-1>

Output:
SUMMARY: <60 words, strictly>
MISSING_ENTITIES: <1-3 entities>

Three properties of the skeleton are load-bearing:

Length is held constant. Not "about 60 words." Exactly 60. Without this, the model grows the summary rather than compressing filler.
Prior entities must be preserved. Each pass is additive, not substitutive. The model cannot drop an entity just because a new one is more interesting — that turns CoD into a random-walk rewrite.
The missing-entity list is mechanical. You ask the model to name specific entities it is about to add. This stops the critique from collapsing into "make it better" vibes.

The worked example

We will run CoD on a hypothetical source: a 3-page product-launch blog post by a fictional SaaS company we will call Meridian Labs. Treat this as a labeled hypothetical — a real source would be longer and more cluttered, but the pattern is identical.

Source excerpt (hypothetical):

Meridian Labs today announced Atlas 2.0, a cloud-native observability platform that consolidates logs, metrics, and traces into a unified query layer. The launch, made at the company's MeridianCon 2026 conference in Austin on April 15, 2026, included a keynote by CEO Priya Raman and CTO Marcus Hollis. Atlas 2.0 is priced at $18 per host per month, a 28% reduction from the prior version's effective rate. Early adopters include Finlark, a Series B fintech, and Kestrel Aviation, which reduced mean time to detection from 47 minutes to under 6 minutes during beta. The platform runs on AWS, GCP, and Azure and ships with an SDK for Python, Go, TypeScript, and Rust. Meridian raised a $90M Series C led by Accel Partners in March 2026, bringing total funding to $142M. Former Datadog VP of Engineering Lena Ortiz joined the board as part of the round. The company plans a free tier limited to 50 hosts, available May 1, 2026. Competitors named in Raman's keynote included Datadog, New Relic, and Grafana.

Now the CoD passes, each exactly 60 words.

Iteration 1 (sparse — the baseline):

Meridian Labs announced Atlas 2.0, a cloud-native observability platform that unifies logs, metrics, and traces. The launch happened this month at the company's annual conference. Pricing is lower than the previous version. Several early customers reported improved incident response. The product works across major cloud providers and supports popular programming languages. The company also discussed recent funding and competitive positioning.

Entity count: roughly 3 (Meridian Labs, Atlas 2.0, observability). The rest is filler.

MISSING_ENTITIES: MeridianCon 2026, $18/host/month, 28% price reduction.

Iteration 3 (the practical stopping point):

Meridian Labs launched Atlas 2.0 at MeridianCon 2026 in Austin on April 15, unifying logs, metrics, and traces under one query layer. Priced at $18 per host — a 28% cut — with a free tier of 50 hosts starting May 1. Finlark and Kestrel Aviation reported faster incident detection in beta. The platform runs on AWS, GCP, and Azure.

Entity count: roughly 13. Reads naturally; every sentence carries multiple facts.

MISSING_ENTITIES: $90M Series C, Accel Partners, Lena Ortiz.

Iteration 5 (maximum density):

Meridian Labs launched Atlas 2.0 at MeridianCon 2026 Austin April 15, unifying logs, metrics, traces. CEO Priya Raman and CTO Marcus Hollis keynoted. Priced $18/host (−28%); free tier 50 hosts May 1. Finlark, Kestrel Aviation cut MTTD 47 to 6 minutes. Runs AWS, GCP, Azure; SDKs Python, Go, TypeScript, Rust. $90M Series C led by Accel; ex-Datadog Lena Ortiz joins board.

Entity count: roughly 25. Extremely dense. Reads like a press-ticker crawl, not prose.

Scoring the iterations

Run each pass through the SurePrompts Quality Rubric. For CoD outputs the three dimensions that move most are role clarity (constant across passes), format structure (length compliance), and constraint tightness (entities per word). Typical scores on a well-run CoD chain sit in the 28-31 range at iterations 3 and 4, with iteration 5 sometimes dropping a point on readability.

In this example:

Iteration 1 scores around 22/35 — the length is right but the density is low and no entity-specific structure exists yet.
Iteration 3 scores around 30/35 — dense, still readable, most constraints met.
Iteration 5 scores around 28/35 — densest, but readability slips; a human reader needs to work harder to parse it.

The rubric lets you pick between iterations 3 and 5 on purpose rather than by gut.

When to stop iterating

The density-readability trade-off is the whole game. Each extra pass trades prose quality for entity count. Where to stop depends on the reader:

A human skimming a brief. Iteration 3. Sentences still flow; entities are unmistakable.
A pipeline feeding downstream extraction (e.g., an LLM-as-judge extracting structured fields). Iteration 4 or 5. Readability matters less; density matters more.
Copy for a newsletter or digest. Iteration 2 or 3. Slight narrative flow is worth the entity loss.

In practice: always score every iteration (manually or via judge model). Never ship "whatever iteration 5 produced" by default. Related patterns like self-refine also rely on picking a stopping point rather than trusting the final pass — CoD is not unique here.

Common failure modes

Four patterns break CoD quietly. Watch for them when a chain feels off.

Adding the same entity twice. The model re-introduces "Atlas 2.0" at iteration 4 under slightly different phrasing. Fix: require the missing-entity list to exclude any entity already in the prior summary, and dedupe case-insensitively before prompting.
Running out of entities. On a short source, the missing-entity list dries up around iteration 3. The model starts listing low-signal entities ("April" becomes a named entity) or re-lists old ones. Fix: cap the chain at whatever iteration still produces informative missing entities.
Length drift. Each pass creeps from 60 to 63 to 68 words. Happens most on smaller models. Fix: put a literal word count check in the prompt and reject passes that miss by more than two words.
Incoherent structure from over-packing. At iteration 5 the model starts concatenating fragments without verbs to meet the length. Fix: treat iteration 3 as the usable output, not a stepping stone to iteration 5.

Variants

Three common adaptations:

News digests. Prioritize people, organizations, dates, and numbers as entities. Cap at 3 passes — news readers skim.
Technical docs (release notes, RFCs). Entities are features, API surfaces, version numbers, breaking changes. Run five passes; technical readers tolerate higher density. Instruct the model to preserve the distinction between what changed and what stayed the same.
Meeting notes. Entities are decisions, owners, deadlines, blockers. Two or three passes is usually enough; the goal is "did we capture every decision" not maximum density. Add a final pass that pulls out action items separately.

For all variants the skeleton stays the same: fixed length, additive entities, explicit missing-entity list. The variant is about what counts as an entity.

Our position

Iteration 3 is the default output, not iteration 5. Unless you have evidence your reader prefers maximum density, the third pass is the one you ship.
Always score every iteration. Picking the final pass by default treats CoD as a ritual, not a tool. Score with a rubric or judge; pick by score.
Length must be enforced mechanically. Natural-language instructions about length fail on smaller models. Post-process length; reject non-compliant passes.
CoD is for "convey facts" tasks. Do not use it for narrative, persuasive, or creative summaries. Information density is the wrong objective there.
CoD stacks with other patterns. Run it as the first stage in a chain, then feed the chosen iteration into an RCAF-structured formatting prompt or a judge pass. CoD produces the dense core; other stages package it.

The SurePrompts Quality Rubric — scoring framework for picking the best iteration.
Self-Refine Prompting Guide — the sibling iterative pattern, critique instead of densify.
LLM-as-Judge Prompting Guide — automate iteration selection with a judge model.
Prompt Chaining Guide — CoD is a chain; this covers the general mechanics.
RCAF Prompt Structure — structure the single-iteration wrappers around CoD.
Advanced Prompt Engineering Techniques — CoD in the wider toolkit.
Prompt Patterns for Content Strategy — where CoD fits in a content workflow.

Chain-of-Density Prompting: A Worked Example for Dense Summaries

Why Chain-of-Density exists

The pattern

The worked example

Scoring the iterations

When to stop iterating

Common failure modes

Variants

Our position

Ready to write better prompts?

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

The SurePrompts Quality Rubric: A 7-Dimension Framework for Scoring Prompts

The RCAF Prompt Structure: A 4-Part Skeleton for Maintainable Prompts

Prompt Chaining: How to Break Complex Tasks Into Simple Steps (2026 Guide)

Chain-of-Density Prompting: A Worked Example for Dense Summaries

Why Chain-of-Density exists

The pattern

The worked example

Scoring the iterations

When to stop iterating

Common failure modes

Variants

Our position

Related reading

Ready to write better prompts?

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

The SurePrompts Quality Rubric: A 7-Dimension Framework for Scoring Prompts

The RCAF Prompt Structure: A 4-Part Skeleton for Maintainable Prompts

Prompt Chaining: How to Break Complex Tasks Into Simple Steps (2026 Guide)