Skip to main content
Back to Blog
least-to-most promptingcompositional reasoningprompt engineeringchain of thoughtworked example

Least-to-Most Prompting: A Worked Example for Compositional Tasks

Least-to-Most decomposes a hard problem into easier sub-problems, solves them in order, and uses each result as input to the next. This tutorial walks through it end to end on a compositional reasoning task.

SurePrompts Team
April 22, 2026
10 min read

TL;DR

Least-to-Most prompting breaks a complex task into a sequence of easier sub-problems, solved in order with each result feeding the next — particularly effective on compositional tasks where the solution has clear prerequisite structure.

Least-to-Most decomposes a hard problem into an ordered sequence of easier sub-problems, then solves them one at a time with earlier answers feeding the later ones. Zhou et al. introduced it in 2022 in "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models." The paper's motivating observation: chain-of-thought sometimes generalises badly from short reasoning chains to long ones. Force the composition to be explicit — plan first, solve in prerequisite order — and the composition gap narrows.

Tip

Least-to-Most is for tasks with visible prerequisite structure. If you can sketch the sub-problems on paper before you start solving, Least-to-Most will usually beat a single-shot prompt or a free-form CoT trace. If you cannot sketch them, reach for self-ask prompting or a ReAct-style loop instead.

Key Takeaways

  • Two phases: decompose into ordered sub-problems, then solve in order with prior answers as context for later steps.
  • The decomposition itself is a prompt output — grade it, edit it, retry it before any solving happens.
  • Single-prompt Least-to-Most is convenient; multi-prompt Least-to-Most is auditable and where most of the reliability comes from.
  • Best fit: compositional tasks with clear prerequisite ordering — math word problems, migrations, multi-step code refactors, ordered research syntheses.
  • Main failure mode is compounding error: a wrong sub-answer early poisons every later step because later steps trust it.
  • On reasoning models, Least-to-Most is usually redundant; keep it for non-reasoning chat models or when you want the sub-problem list as a reviewable artifact.

Why Least-to-Most Exists

Compositional tasks punish single-shot prompts. Ask a model to refactor a pipeline with four ordered migrations and the common failure is not ignorance of any one migration — it is a prompt that collapses the dependencies. The model guesses at migration three before migration two has been written; the guess is internally consistent but inconsistent with what the previous step was supposed to produce.

Chain-of-thought helps but not always enough. A free-form trace lets the model reason step by step, but there is no structural guarantee that step n precedes step n+1 — the model can skip, re-order, or bundle steps. Least-to-Most adds that guarantee by making the step list an explicit artifact produced before any solving happens. See chain-of-thought prompting for the weaker, unstructured cousin.

The Pattern

Two phases. They do not overlap.

Phase 1 — Decomposition. Prompt the model to list the sub-problems in prerequisite order, smallest first. The output is just the list; no solving yet.

code
Problem: {problem}

List the sub-problems needed to solve this, in prerequisite order.
Start with the smallest/earliest sub-problem; each later sub-problem
may use answers from earlier sub-problems but not the reverse.
Output only the ordered list.

Phase 2 — Solving. For each sub-problem in the list, solve it with the answers to all previous sub-problems available as context. In a single-prompt version this is one long response. In a multi-prompt version each sub-problem is its own call.

code
Problem: {problem}
Sub-problem plan:
1. {sub_1}
2. {sub_2}
...

Solve sub-problem {k} given the answers so far:
Sub-problem 1 answer: {a_1}
...
Sub-problem {k-1} answer: {a_{k-1}}

Solve sub-problem {k}:

The scaffold is thin on purpose. What carries the pattern is the ordering — every step has strictly less context than a single-shot prompt on the full problem. The full problem is replaced by "this one sub-problem plus the answers that come before it." The trade: more calls, smaller calls, an explicit prerequisite chain.

Worked Example

A hypothetical legacy data pipeline refactor. A team has an overnight ETL loading CSV files into a MySQL reporting database. They want to migrate it to a streaming Kafka-to-Postgres pipeline with the warehouse reports working throughout. Names and details are illustrative.

Phase 1 — Decomposition output

code
Problem: Migrate the overnight CSV->MySQL ETL to a streaming
Kafka->Postgres pipeline, keeping all warehouse reports working
throughout the migration. The reports are business-critical and
cannot be down for more than 15 minutes in any window.

Sub-problem plan (ordered, smallest first):
1. Inventory every table the current ETL writes, and every report
   that reads from each table.
2. Pick the target Postgres schema — data types, partition strategy,
   primary keys, and the mapping from current MySQL columns.
3. Dual-write: extend the existing ETL to write each row into both
   MySQL and the new Postgres schema, so both stores stay in sync.
4. Backfill Postgres from MySQL for the historical window the
   reports depend on, and verify row counts match.
5. Migrate one low-risk report to read from Postgres and run it in
   shadow mode alongside the MySQL version for one week.
6. Migrate the remaining reports table-by-table as shadow runs agree;
   keep dual-write active until the last report is migrated.
7. Replace the batch CSV ingestion with the Kafka stream, pointed
   at the Postgres writer only; decommission MySQL.

Notice the prerequisite structure. Sub-problem 4 cannot run until 3 is live because backfill assumes both stores exist. Sub-problem 5 requires 4 because shadowing needs Postgres populated. Sub-problem 7 is last because it is irreversible. The decomposition encodes the "no report down for more than 15 minutes" constraint as an ordering rule: every switch is shadowed before it is primary.

Phase 2 — Solving, in order

At this point each sub-problem becomes its own call with the earlier answers in context. For the tutorial, we show two of the seven in shape.

code
Sub-problem 1: Inventory every table the ETL writes and every
report that reads from it.

Answer 1: Eight tables — orders, order_items, customers, products,
inventory, shipments, returns, sessions. Five reports: Daily Sales
(orders, order_items, products), Customer Cohorts (customers,
sessions, orders), Inventory Aging (inventory, products), Returns
Analysis (returns, orders, order_items), Fulfillment SLA
(shipments, orders). Overlapping reads on orders and order_items.

code
Sub-problem 3: Dual-write the ETL to both MySQL and Postgres.

Given answer 1 (8 tables, 5 reports) and answer 2 (Postgres
schema with partitioning on orders and shipments, BIGINT primary
keys, timestamptz for time fields):

Answer 3: Extend the ETL with a second writer targeting Postgres
using the mapping from sub-problem 2. Wrap both writes in one
transactional boundary per batch; on Postgres failure, log and
alert but do not roll back MySQL — MySQL stays authoritative
until sub-problem 6 completes. Add a checksum table with per-batch
row counts and a hash of the primary-key list, written to both
stores; a nightly reconciliation job compares and flags divergence.
Start dual-write on the two smallest tables (returns, sessions);
add the rest on a rolling schedule if reconciliation stays clean.

Each later sub-problem quotes earlier answers verbatim. Sub-problem 3 uses the table count and schema mapping from 1 and 2; sub-problem 4 uses the dual-write contract from 3; sub-problem 5 uses the row-count guarantee from 4. The context window for each call is the original problem, the ordered sub-problem list, and answers to prior sub-problems only.

Assembled result

The final output is the decomposition plus the seven sub-answers, plus a short synthesis naming the exit criteria: all five reports migrated, reconciliation clean for two weeks, MySQL writers disabled, old CSV job decommissioned. Because each sub-answer was produced with earlier answers in context, the synthesis is mostly mechanical — every ordering decision is already encoded.

Scoring with the quality rubric

Run the assembled result through the SurePrompts Quality Rubric. The rubric surfaces three things Least-to-Most should get right: correctness of the decomposition (ordering respects prerequisites), faithfulness of each sub-answer to its prior context, and composition (the final result follows from the sub-answers). Decomposition is the hardest to grade — the seven steps must cover the full problem and respect the no-downtime constraint. A weak plan like "1. Stand up Postgres, 2. Move all reports, 3. Delete MySQL" would score low on prerequisite ordering and never recover. Grade the list first; do not waste tokens solving a bad plan.

Least-to-Most vs. Chain-of-Thought vs. Self-Ask vs. Plan-and-Execute

Four reasoning scaffolds that look similar on paper and diverge in practice.

PatternShapeBest forArtifact to grade
Chain-of-thoughtFree-form step-by-step trace, one passReasoning that does not decompose cleanlyThe final trace
Least-to-MostOrdered sub-problems, then solve in orderCompositional tasks with prerequisite structureThe sub-problem list, then each solution
Self-AskIncremental follow-ups answered as they ariseMulti-hop Q&A, questions with unknown hop countEach follow-up and its answer
Plan-and-executePlan once, execute each step with optional toolsAgentic tasks where planning is cheaper than reactingThe plan, then each execution step

Least-to-Most and Plan-and-Execute are siblings — both commit to a plan upfront. The difference: Least-to-Most sub-problems are usually pure reasoning, plan-and-execute steps are often tool-using actions. Self-Ask is reactive and does not commit to a list before it starts. Chain-of-thought is the weakest structural commitment — no list, no ordering. Use Least-to-Most when you can see the sub-problems from the top; Self-Ask when you cannot. See the agentic prompt stack for how these layer.

Failure Modes

Wrong decomposition order. The plan lists B before A even though B depends on A. Solving fails outright or silently assumes inputs that do not yet exist. Fix: ask the model to state each sub-problem's prerequisites, then verify the ordering respects them.

Sub-task drift. The first three sub-answers stay on-problem; by sub-answer five the model is solving something adjacent. Fix: include the original problem statement and the current sub-problem's exact wording in every solving call. Do not rely on the model to remember the plan across six calls.

Compounding error. Sub-problem 2 is 90% right — one number off. Sub-problem 3 uses sub-answer 2 verbatim; by sub-answer 7 the error has travelled through five steps and looks authoritative. Fix: add a verification pass between phases that checks each sub-answer against the original problem before it enters the next step's context.

Model skipping dependencies. On a single-prompt run the model solves sub-problem 5 before 4, or bundles 4 and 5. Fix: split into a multi-prompt pipeline where each sub-problem is a separate call. This is where prompt chaining earns its keep — a chained Least-to-Most run is strictly more reliable, at the cost of tokens and latency.

Our Position

Decompose before you solve. Most of the lift is in the decomposition pass, not the solving pass. Running decomposition as its own call, reading the output, and correcting a bad plan before any solving happens is where the reliability compounds. A bad plan solved perfectly is still a bad answer.

Prefer multi-prompt Least-to-Most on anything that matters. Single-prompt is fine for demos; per-step calls are the whole point of the pattern. When a run fails, you want to know which hop failed, not scan a 2,000-token response for the break.

Skip Least-to-Most on reasoning models. Claude's extended thinking, o-series, and Gemini thinking already decompose internally. Layering on top burns tokens on a structure the model has already produced invisibly. Keep it for non-reasoning chat models and pipelines where the sub-problem list must exist as a reviewable artifact.

Grade the plan separately from the solutions. Score decomposition on prerequisite correctness, granularity, and coverage; solutions on factuality given prior context; composition on whether it follows. Three scores, not one. The same logic applies to self-ask prompting.

Do not reach for Least-to-Most when the task is not compositional. Atomic questions and open-ended exploration do not benefit — forcing a plan on them produces a plan that is brittle by step two. The pattern earns its keep on problems with visible prerequisite structure.

Neighbouring scaffolds: chain-of-thought prompting (free-form cousin), self-ask prompting (reactive cousin), plan-and-execute prompting (agentic cousin), and prompt chaining guide for running Least-to-Most as a pipeline. For reasoning-model trade-offs see prompting reasoning models. For production layering see the agentic prompt stack and advanced prompt engineering techniques. For evaluation apply the SurePrompts Quality Rubric. Glossary: least-to-most prompting, chain-of-thought, self-ask prompting, prompt chaining, plan-and-execute, reasoning model.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Ready to write better prompts?

SurePrompts turns plain English into expert-level AI prompts. 350+ templates, real-time preview, works with any model.

Try AI Prompt Generator