Skip to main content
Back to Blog
self-ask promptingmulti-hop reasoningchain of thoughtprompt engineeringresearch assistant

Self-Ask Prompting: A Guide to Decomposing Multi-Hop Questions

Self-Ask prompting makes the model ask and answer its own sub-questions before the final answer. Shown on multi-hop reasoning and research-assistant tasks with concrete prompt templates.

SurePrompts Team
April 22, 2026
10 min read

TL;DR

Self-Ask prompting has the model generate sub-questions, answer each, then compose the final answer. Particularly effective for multi-hop questions where the answer depends on intermediate facts.

Self-Ask is a prompting pattern where the model explicitly writes sub-questions and answers them before producing the final response. Introduced by Press et al. in 2022 in "Measuring and Narrowing the Compositionality Gap in Language Models," the paper observed that models often know each atomic fact required by a multi-hop question but fail to compose them when asked directly. The fix: a scaffold that forces decomposition. The pattern is simple, but it closes much of the gap on compositional reasoning.

Key Takeaways

  • Self-Ask is the explicit, structured cousin of chain-of-thought — sub-questions instead of a free-form trace.
  • It wins on multi-hop, compositional questions and adds little on atomic ones.
  • Wiring each sub-question to a search or retrieval tool turns Self-Ask into a minimal research agent.
  • Reasoning models already decompose internally, so Self-Ask is mostly for non-reasoning chat models and for pipelines that need auditable sub-questions as artifacts.
  • The common failure is compositional error — every sub-answer correct, but the final composition wrong — so evaluate the final step separately.

The Problem Self-Ask Solves

Ask a plain chat model: "Who was the president of the country that won the 1998 FIFA World Cup?" Two hops — which country won in 1998, and who was its president at that time. The model might answer correctly. It might also collapse the hops into a single confident guess and name the wrong person. The failure is not ignorance; Press et al. showed that models often know each fact in isolation and still fail to compose them.

This is the compositionality gap. The knowledge is there, but nothing in a naive prompt forces the model to lay out the intermediate steps. Chain-of-thought helps — "let's think step by step" gets you part of the way — but the trace is still free-form, and the model can skip from premise to conclusion without ever writing down the intermediate fact.

Self-Ask forces the decomposition. The scaffold makes each hop an explicit sub-question; the final answer cannot appear until every sub-question has an intermediate answer. That structural requirement closes much of the gap.

The Pattern

The literal scaffold is short. Drop this above the user's question:

code
Question: {question}
Are follow up questions needed here: Yes/No

If yes, format each step as:
Follow up: {sub-question}
Intermediate answer: {answer}

Repeat until you have enough to answer the original question.
Then write:
So the final answer is: {answer}

The model decides whether the question is compositional. If it is, it generates a sub-question, answers it, and either continues with another or stops and composes the final answer. The scaffold is deliberately thin — it does not prescribe the content of the sub-questions, just the shape of the trace.

Press et al. used a few-shot variant, seeding the model with two or three worked examples. In 2026 most frontier chat models follow the zero-shot version reliably, but if you see drift — the model skipping straight to "So the final answer is" without any follow-ups — adding one worked example usually fixes it. What you are encoding is: before you answer, ask what you would need to know, then answer that first.

Worked Example — Pure Self-Ask

Here is an illustrative trace on a hypothetical business question. The scenario, data, and names are made up; the point is to show the shape of the pattern.

code
Question: Which of our top 3 2025 customers by revenue
has a renewal coming up before June 2026?

Are follow up questions needed here: Yes.

Follow up: Who were our top 3 customers by revenue in 2025?
Intermediate answer: Acme Logistics, Brightwave Media, and Cerulean
Health were the top three customers by 2025 revenue.

Follow up: When is each of those three customers' contracts up
for renewal?
Intermediate answer: Acme Logistics renews in September 2026.
Brightwave Media renews in April 2026. Cerulean Health renews in
August 2026.

Follow up: Which of those renewal dates falls before June 2026?
Intermediate answer: Only Brightwave Media — April 2026 is before
June 2026. The other two renew later in 2026.

So the final answer is: Brightwave Media, with renewal in April 2026.

Three things to notice. First, the sub-questions are narrower than the original — each asks for one piece of information. Second, the final answer is a composition step, not a lookup. Third, the trace is auditable. If Brightwave's renewal is actually in July 2026, you can see exactly which sub-answer was wrong and fix that hop rather than re-running the whole prompt. A single-prompt version would force the model to hold three customer names, three renewal dates, and a comparison rule in one pass; on a noisy day it collapses hops and names the wrong customer with full confidence.

Worked Example — Self-Ask Plus a Search Tool

The pattern is most useful when each sub-question can trigger retrieval. The scaffold stays the same; you add a rule that says every Follow up goes through the search tool before the Intermediate answer is written.

code
Question: Which author of a New York Times bestseller in March 2026
previously won a National Book Award?

Are follow up questions needed here: Yes.

Follow up: Which books were on the NYT bestseller list in March 2026?
[search("NYT bestseller list March 2026")]
Intermediate answer: The top five titles in March 2026 were
[hypothetical list of titles and authors].

Follow up: Of the authors on that list, have any won the
National Book Award?
[search("National Book Award winners authors")]
Intermediate answer: [Hypothetical Author X] won the National Book
Award in 2014 for [Hypothetical Title].

So the final answer is: [Hypothetical Author X], whose March 2026
bestseller was [Hypothetical Title] and who previously won the
National Book Award in 2014.

This is a degenerate research agent — less flexible than a full ReAct loop, but easier to audit and cheaper to run. The model cannot improvise new tools or re-plan mid-trace; it can only ask the next sub-question and search for the answer. That rigidity is a feature on well-scoped research tasks.

The same shape generalises to RAG pipelines. Each Follow up becomes a retrieval query; each Intermediate answer is written from the retrieved passages. Our RAG prompt engineering guide covers the retrieval side; Self-Ask supplies the reasoning structure that decides what to retrieve.

When to Use Self-Ask vs. ReAct vs. Chain-of-Thought

All three are reasoning scaffolds; they differ in structure and in what they assume about the environment.

PatternBest forStructureTool use
Chain-of-thoughtReasoning that does not decompose into discrete sub-questionsFree-form step-by-step traceNone
Self-AskCompositional, multi-hop questions with clear sub-questionsExplicit Follow up / Intermediate answer pairsOptional per sub-question
ReActOpen-ended agentic tasks where the environment surprises youInterleaved Thought / Action / ObservationCore to the pattern

Use chain-of-thought when the question needs reasoning but does not break cleanly into sub-questions — most math word problems, logic puzzles, "explain the tradeoff" prompts. Use Self-Ask when the question is compositional and you can imagine the sub-questions you would ask on paper. Use ReAct when the path is not knowable up front and each observation changes what you do next.

Self-Ask and ReAct look similar when both are paired with search, but the shape differs. Self-Ask decides sub-questions from the original question; ReAct decides each next action from the last observation. Self-Ask is planning-flavoured, ReAct is reactive. For predictable sub-questions (product comparisons, fact-checking, structured lookups), Self-Ask is lighter. Prompt chaining can wrap a Self-Ask step inside a larger pipeline — one stage decomposes, the next scores the sub-answers, the next composes the final response. See the agentic prompt stack for how these scaffolds layer.

Failure Modes

The model refuses to decompose. It answers "Are follow up questions needed here: No" on a question that obviously needs them, then produces a confident single-hop guess. Fix: add one or two few-shot examples showing decomposition, or tighten the instruction to "Assume follow up questions are needed unless the question is a single atomic fact."

Sub-questions drift from the original. The first follow-up is on topic, the second veers into tangential territory, the third answers a different question entirely. Fix: include the original question in the scaffold for every hop, and instruct the model to state how each sub-question relates back.

Compositional error — right hops, wrong composition. Every intermediate answer is correct, and the final "So the final answer is" line names something that does not follow. The sneakiest failure because the trace looks clean. Fix: add a penultimate hop that restates the intermediate answers and the composition rule before writing the final answer.

Over-decomposition on atomic questions. "What is the capital of France?" emits three pointless sub-questions before Paris. Fix: do not use Self-Ask when you know the question is atomic; or trust the Yes/No gate to route simple questions directly.

Score each sub-question and intermediate answer against the original, then score the final composition as a separate criterion. The SurePrompts Quality Rubric covers factuality, relevance of sub-questions, and whether the final answer follows from the trace. LLM-as-Judge works well here because the decomposed trace is easier to grade than a free-form CoT.

Our Position

Self-Ask is underrated for non-agentic pipelines. Most 2026 discussion of multi-step reasoning jumps straight to ReAct or full agents, which are heavier than many tasks need. For predictable hops with no mid-flight adaptation, Self-Ask is a tenth of the code and gives you an auditable trace.

Do not use Self-Ask on reasoning models. Claude's extended thinking, o-series, and Gemini thinking already decompose; stacking Self-Ask on top is redundant. Save it for non-reasoning chat models and for pipelines where you want the sub-questions as artifacts.

Pair Self-Ask with retrieval before reaching for an agent. Much of what teams build agents for — "look up these three facts and compose an answer" — is a Self-Ask trace with a search tool. Start narrow, graduate to an agent only when the narrow pattern breaks.

Grade composition separately from hops. "Final answer correct: yes/no" misses the class of failures where the trace is right and the composition is wrong. Keep the scaffold thin — resist "Follow up category," "Confidence score," "Source citation" on every step until you have evidence the plain version is underperforming.

For the reasoning patterns that sit next to Self-Ask: chain-of-thought prompting (the free-form cousin), ReAct prompting guide (the agentic cousin), and prompt chaining guide (how to compose Self-Ask steps into larger pipelines). For retrieval-flavoured Self-Ask see the RAG prompt engineering guide. For layering patterns into production systems see the agentic prompt stack and advanced prompt engineering techniques. For evaluating Self-Ask outputs use the SurePrompts Quality Rubric. If you are new to the broader space, start with prompt engineering basics 2026. And browse the glossary for compact definitions of self-ask prompting, chain-of-thought, ReAct prompting, prompt chaining, chain-of-verification, and RAG.

Try it yourself

Build expert-level prompts from plain English with SurePrompts — 350+ templates with real-time preview.

Open Prompt Builder

Ready to write better prompts?

SurePrompts turns plain English into expert-level AI prompts. 350+ templates, real-time preview, works with any model.

Try AI Prompt Generator