Every Prompt Engineering Technique Explained: The Research-Backed Guide (2026)

Q: Do I still need chain of thought on reasoning models like o1?

Usually no. The Wharton Prompting Science Report (Meincke, Mollick et al., 2025) found that explicit CoT prompting adds negligible benefit on reasoning-native models like o1 and DeepSeek-R1 because step-by-step reasoning is already built in. Test zero-shot first and add CoT only when accuracy falls short.

Q: What is the difference between chain of thought and tree of thoughts?

Chain of thought follows a single linear reasoning path. Tree of thoughts explores multiple reasoning paths in parallel and can backtrack when a branch fails. On the Game of 24 benchmark, Yao et al. (NeurIPS 2023) found CoT solved only 4% of problems while ToT solved 74%.

Q: When should I use prompt chaining instead of a single prompt?

Use prompt chaining when the task has distinct phases like research, outline, draft, and edit. Chaining reduces hallucination by letting you verify each intermediate output before feeding it into the next step. Use a single prompt when the task is cohesive and fits comfortably in the context window.

Q: Can I combine multiple prompting techniques?

Yes, and for complex tasks you should. Yao et al.'s ReAct paper found that combining ReAct with CoT and self-consistency outperformed any single method on knowledge-intensive tasks. Most production systems layer 2-3 techniques — for example, role prompting plus chain of thought plus structured output.

Q: What is constitutional AI prompting?

Constitutional AI prompting, introduced by Bai et al. (Anthropic, 2022), gives the model a written set of principles and asks it to critique and revise its own outputs against those rules. It is used for safety-critical outputs like medical, legal, or financial content where self-checking prevents harm.

Q: What is the cheapest technique for improving prompt quality?

Role prompting — it costs zero additional tokens. Assigning the model a specific identity like 'senior cybersecurity analyst with 15 years of experience' changes vocabulary, depth, and reasoning patterns without adding example text. Combine it with 3-5 few-shot examples and you cover most quality issues cheaply.

Imtiaz Rayhan

Key takeaways:

Tree of Thoughts solved 74% of Game of 24 problems versus 4% for chain of thought alone, according to Yao et al.'s 2023 NeurIPS paper — but it costs 10-50x more tokens, so reserve it for high-stakes problems.
Self-consistency layered on top of chain of thought boosted GSM8K accuracy by +17.9% in Wang et al.'s 2022 Google Research paper, at the cost of generating 5-10 samples per question.
Chain of thought benefits vary sharply by model type — the Wharton Prompting Science Report (Meincke, Mollick et al., 2025) found CoT adds negligible benefit on reasoning-native models like o1 and DeepSeek-R1 because step-by-step reasoning is already built in.
Few-shot prompting hits its sweet spot at 3-5 examples; Brown et al.'s GPT-3 paper showed gains diminish after 5-8 examples while context window cost keeps climbing.
ReAct is the foundation of modern AI agents — Yao et al. (2022) introduced the Thought → Action → Observation loop that tools like LangChain, AutoGPT, and Claude tool-use now implement as variants.

Most prompt engineering guides list techniques without evidence. This one cites the original papers, benchmark numbers, and real performance data behind every method.

Twelve prompting techniques now have peer-reviewed research backing them. Each solves a different problem. Choosing wrong wastes tokens and gets worse results.

This guide covers every major technique from zero-shot to constitutional AI prompting. You'll get the research, the benchmarks, and copy-paste templates for each one.

74%

Tree of Thoughts accuracy on Game of 24, vs. 4% for chain of thought alone — according to Yao et al.'s 2023 NeurIPS paper

What Are Prompt Engineering Techniques?

Prompt engineering techniques are structured methods for writing AI inputs that improve output quality. They range from adding examples to orchestrating multi-step reasoning chains.

The field began with Brown et al.'s 2020 GPT-3 paper at NeurIPS. That research proved large language models could learn tasks from examples in the prompt itself.

Since then, Google, Anthropic, Princeton, and others have published techniques that push accuracy 10–40% higher on reasoning benchmarks.

Not every technique works for every task. The Wharton School's 2025 Prompting Science Report found that chain of thought prompting adds negligible benefit on reasoning models that already think step-by-step. Matching technique to task matters more than memorizing every method.

Zero-Shot Prompting: The Baseline

Zero-shot prompting gives the model a task with no examples. You describe what you want. The model figures out the rest.

Brown et al. demonstrated this in their 2020 GPT-3 paper. GPT-3 achieved 81.5 F1 on CoQA reading comprehension with zero examples.

That number climbed to 85.0 F1 with few-shot examples — a modest but meaningful gain.

Modern models handle zero-shot far better than GPT-3 did. Claude, GPT-4, and Gemini are instruction-tuned, which means they follow directions without needing examples.

Use zero-shot when: the task is straightforward, the model is instruction-tuned, or you need to conserve tokens.

code

Classify the following customer email as one of:
Billing, Technical Support, Sales, or General Inquiry.

Respond with only the category name.

Email: "I can't log into my account after resetting my password."

Tip

Zero-shot works best for classification, summarization, and translation. Add examples only when zero-shot accuracy falls short.

Few-Shot Prompting: Teaching by Example

Few-shot prompting provides examples of correct input-output pairs inside the prompt. The model learns the pattern and applies it to new inputs.

Brown et al.'s GPT-3 paper proved this approach at NeurIPS 2020. GPT-3 achieved 71.2% accuracy on TriviaQA in the few-shot setting — up from 64.3% in zero-shot.

The jump was even larger on SuperGLUE. Eight examples performed comparably to fine-tuned BERT models trained on 630,000 examples.

+6.9%

Few-shot over zero-shot accuracy on TriviaQA, according to Brown et al.'s GPT-3 paper (2020)

The key insight: larger models benefit more from examples. Brown et al. found that the gap between zero-shot and few-shot performance grows with model size.

How many examples do you need?

Three to five examples hit the sweet spot for most tasks. More examples eat context window space without proportional accuracy gains.

code

Classify the sentiment of each product review.

Review: "This laptop is incredibly fast and lightweight."
Sentiment: Positive

Review: "Battery died after two months. Terrible quality."
Sentiment: Negative

Review: "It's okay for the price, nothing special."
Sentiment: Neutral

Review: "The camera quality blew me away on this phone."
Sentiment:

✗Before

Classify this review — "The camera quality blew me away on this phone."

✓After

[Three labeled examples above] Review — "The camera quality blew me away on this phone." Sentiment:

For a deeper comparison of when to use each, see our zero-shot vs. few-shot guide.

Chain of Thought Prompting: Step-by-Step Reasoning

Chain of thought (CoT) prompting tells the model to show its reasoning before giving an answer. This single change unlocked complex reasoning in large language models.

Wei et al. published the foundational CoT paper at NeurIPS 2022 through Google Research. Their headline result: prompting PaLM 540B with eight chain-of-thought examples achieved state-of-the-art accuracy on the GSM8K math benchmark.

It surpassed even fine-tuned GPT-3 with a verifier. CoT improved performance across arithmetic, commonsense, and symbolic reasoning tasks.

+18%

Improvement on arithmetic reasoning tasks using CoT prompting, according to Wei et al. (2022)

Zero-shot CoT: The "Think Step by Step" Trick

Kojima et al. (2022) discovered something surprising. Adding "Let's think step by step" to a prompt — with no examples — improved reasoning performance.

This zero-shot variant works because large models already have latent reasoning abilities. The phrase activates them.

The nuance: Meincke, Mollick, et al.'s Wharton Prompting Science Report (2025) found that CoT benefits vary by model type. For non-reasoning models, CoT improves average performance.

For dedicated reasoning models like o1 and DeepSeek-R1, explicit CoT prompting adds negligible benefit. The reasoning is already built in.

code

Solve this step by step:

A store offers 25% off all items. An additional 10% loyalty
discount applies after the first discount. If a jacket
originally costs $200, what is the final price?

Think through each discount step before giving the answer.

For a complete breakdown of this technique, read our chain of thought prompting guide.

Tree of Thoughts: Exploring Multiple Paths

Tree of Thoughts (ToT) extends chain of thought by exploring multiple reasoning paths simultaneously. Instead of following one chain, the model generates several, evaluates them, and backtracks when needed.

Yao et al. introduced ToT at NeurIPS 2023 through Princeton and Google DeepMind. The framework uses search algorithms like breadth-first and depth-first search to navigate a tree of reasoning steps.

The performance gap is dramatic. On the Game of 24 benchmark, CoT prompting solved only 4% of problems.

ToT solved 74%. The difference comes from ToT's ability to try multiple approaches and abandon dead ends.

4% → 74%

CoT vs. ToT success rate on Game of 24 benchmark — Yao et al. (NeurIPS 2023)

The tradeoff: ToT uses significantly more tokens and API calls. Each step generates multiple candidates, and each candidate gets evaluated. For simple tasks, this overhead isn't worth it.

Use ToT when: the problem has multiple valid solution paths, requires strategic planning, or involves constraint satisfaction like puzzles and scheduling.

code

Three experts will solve this problem independently.
Each expert shares their reasoning step by step.
If any expert realizes their approach won't work,
they backtrack and try a different path.
After all experts present their solutions,
they vote on the best answer.

Problem: Using the numbers 2, 3, 5, and 12 with basic
arithmetic operations (+, -, *, /), make the number 24.
Each number must be used exactly once.

Warning

ToT can cost 10-50x more tokens than standard prompting. Reserve it for high-stakes problems where accuracy matters more than cost.

Self-Consistency: Majority Vote Reasoning

Self-consistency generates multiple reasoning paths for the same question, then picks the answer that appears most often. Think of it as a reliability layer on top of chain of thought.

Wang et al. published this technique through Google Research in 2022. Their paper reported striking improvements: +17.9% on GSM8K, +11.0% on SVAMP, and +12.2% on AQuA.

Additional gains appeared on StrategyQA (+6.4%) and ARC-challenge (+3.9%).

The intuition is elegant. A complex problem usually has multiple valid reasoning paths that lead to the same correct answer. By sampling diverse paths and taking the majority vote, you filter out one-off reasoning errors.

+17.9%

Self-consistency improvement over standard CoT on GSM8K math benchmark — Wang et al. (2022)

Cost consideration: Self-consistency requires generating 5–10 responses per question. Wang et al. found diminishing returns beyond 10 samples.

code

I will solve this problem 5 different ways, then compare
the answers to find the most reliable one.

Problem: A train travels 120 km at 60 km/h, then 80 km
at 40 km/h. What is the average speed for the entire trip?

Approach 1: [solve using total distance / total time]
Approach 2: [solve by calculating each segment separately]
Approach 3: [solve using the harmonic mean formula]
...
Final answer: [most common answer across all approaches]

Tip

Self-consistency shines on math, logic, and multi-step reasoning. It's less useful for creative or open-ended tasks where multiple valid answers exist.

ReAct: Reasoning Plus Acting

ReAct combines chain-of-thought reasoning with the ability to take actions — like searching the web, querying databases, or calling APIs. The model alternates between thinking and acting.

Yao et al. (2022) introduced ReAct through Princeton University. The framework interleaves reasoning traces with task-specific actions. The model thinks about what it knows, decides what information it needs, takes an action to get it, then reasons about the result.

On the HotPotQA benchmark, ReAct outperformed pure acting (no reasoning) on both question-answering and fact-verification tasks. The authors found that combining ReAct with CoT and self-consistency outperformed all individual methods.

ReAct's real power is grounding. Standard prompting relies entirely on the model's training data, which can be outdated or incomplete. ReAct lets the model fetch current information during reasoning.

ReAct is the foundation of modern AI agents. Tools like LangChain, AutoGPT, and Claude's tool-use all implement variants of the Thought → Action → Observation loop that ReAct pioneered.

code

Answer the following question by reasoning step by step
and searching for information when needed.

Question: What was the GDP growth rate of India in 2025?

Thought 1: I need current economic data for India's 2025
GDP growth. My training data may be outdated.
Action 1: Search "India GDP growth rate 2025 official data"
Observation 1: [search results would appear here]
Thought 2: Based on the search results, I can now answer.
Answer: [final answer with source citation]

Info

ReAct requires tool integration to reach its full potential. In a standard chat interface, you can simulate the pattern — but real ReAct needs the model to call external APIs.

Meta-Prompting: Prompts That Write Prompts

Meta-prompting asks the AI to generate or improve prompts rather than performing the task directly. You instruct the model to write the best possible prompt for a given goal.

This technique leverages the model's understanding of what makes instructions effective. Zhou et al.'s 2022 paper "Large Language Models Are Human-Level Prompt Engineers" showed that AI-generated prompts can match or exceed human-written ones on benchmark tasks.

Meta-prompting works in two directions. Forward meta-prompting asks the model to create a prompt for a task. Reverse meta-prompting gives the model an output and asks it to infer what prompt would produce it.

code

You are a prompt engineering expert. Write the most
effective prompt for the following task:

Task: Get an AI to write a detailed product comparison
between two SaaS tools, including pricing, features,
pros/cons, and a recommendation.

Requirements for the prompt you write:
- Specify the output format clearly
- Include role assignment
- Request specific data points
- Set the appropriate tone and length

Write only the prompt, nothing else.

✗Before

Compare Notion and Coda for me.

✓After

[Meta-prompt generates a detailed, structured prompt with role, format, criteria, and tone specifications]

SurePrompts' AI prompt generator automates meta-prompting. You describe what you need in plain English, and it builds a structured prompt with role, context, and format specifications.

Role and Persona Prompting: Setting the Expert

Role prompting assigns the model a specific identity, expertise level, and perspective before giving it a task. "You are a senior tax accountant" produces different output than "Answer this tax question."

The technique works because language models adjust their vocabulary, depth, and reasoning patterns based on the role they're given. A prompt assigning the "experienced pediatrician" role will use medical terminology appropriately and consider age-specific factors.

Persona prompting goes deeper than role assignment. It includes communication style, priorities, and constraints. A "startup CTO evaluating vendors" persona produces different analysis than a "Fortune 500 procurement officer" persona — even when asked the same question.

code

You are a senior cybersecurity analyst with 15 years of
experience in penetration testing and incident response.
You specialize in cloud infrastructure security for
financial services companies.

Analyze the following AWS architecture diagram for
security vulnerabilities. Prioritize findings by risk
level (Critical, High, Medium, Low). For each finding,
include: the vulnerability, potential impact, and
specific remediation steps.

[Architecture description here]

Tip

Stack roles with expertise levels for better results. "Senior data scientist specializing in NLP" outperforms "data scientist" on technical NLP tasks.

Prompt Chaining: Breaking Complex Tasks Apart

Prompt chaining splits a complex task into sequential steps, where each prompt's output feeds into the next one as input. Instead of asking one prompt to do everything, you build a pipeline.

The approach mirrors how humans handle complex work. A researcher doesn't write a paper in one sitting — they outline, draft sections, revise, and edit. Prompt chaining brings that same workflow to AI.

1

Prompt 1 — Research and gather key facts on the topic

2

Prompt 2 — Create an outline using the research output

3

Prompt 3 — Write each section based on the outline

4

Prompt 4 — Edit for clarity, accuracy, and tone

5

Prompt 5 — Generate a summary and headline options

Each step can use a different technique. Step 1 might use ReAct for research, and Step 3 might use role prompting for voice.

Step 4 might use self-consistency for quality checking.

Chaining also reduces hallucination. When one prompt handles everything, errors compound invisibly. With chains, you can verify each step's output before passing it forward.

code

# Step 1: Extract key data points
Extract all numerical claims, statistics, and dates
from the following article. Output as a numbered list.

[Article text]

# Step 2: Verify claims (separate prompt)
For each data point below, assess whether it is
plausible and consistent with publicly available data.
Flag any that seem incorrect or unverifiable.

[Output from Step 1]

# Step 3: Write summary (separate prompt)
Using only the verified data points below, write a
3-paragraph summary of the article's key findings.

[Verified output from Step 2]

For detailed implementation patterns, see our prompt chaining guide.

Constitutional AI Prompting: Built-In Guardrails

Constitutional AI (CAI) prompting gives the model a set of principles to self-evaluate and revise its own outputs. Instead of relying on human reviewers to catch problems, the model critiques itself.

Bai et al. introduced constitutional AI through Anthropic in December 2022. The core idea: give the model a "constitution" — a set of written rules — and have it critique, then revise, its own responses against those rules. The approach uses self-critique and revision without human-labeled harmful content.

The key benefit is scalability. Human review doesn't scale when models generate millions of responses daily. CAI lets the model enforce principles like helpfulness, harmlessness, and honesty autonomously.

As a prompting technique, you can apply constitutional principles to any model. Define your rules. Ask the model to generate, critique, and revise.

code

Generate a response to the user question below. Then
critique your response against these principles:

Principles:
1. Be helpful and directly answer the question
2. Acknowledge uncertainty — don't present guesses as facts
3. Avoid harmful, biased, or misleading content
4. Cite sources when making factual claims
5. Be concise — no unnecessary padding

User question: "What supplements should I take for anxiety?"

Step 1: Write your initial response.
Step 2: Critique the response against each principle.
Step 3: Write a revised response addressing the critique.

Warning

Constitutional prompting adds latency and tokens. Use it for high-stakes outputs — medical advice, legal content, financial recommendations — where self-checking prevents harm.

Structured Output Prompting: Controlling the Format

Structured output prompting constrains the model's response to a specific format — JSON, XML, Markdown tables, YAML, or custom schemas. This is essential for any application where AI output feeds into downstream code.

Without structure, parsing AI output becomes fragile string manipulation. With it, you get reliable, machine-readable data.

Modern models support structured outputs natively. OpenAI's API offers JSON mode and function calling, Claude supports tool use with defined schemas, and Gemini has structured output parameters.

code

Extract the following information from this job posting
and return it as valid JSON. Use null for any field not
found in the text.

{
  "job_title": "string",
  "company": "string",
  "location": "string",
  "salary_min": "number or null",
  "salary_max": "number or null",
  "experience_years": "number or null",
  "remote_policy": "remote | hybrid | onsite | null",
  "required_skills": ["string"],
  "nice_to_have_skills": ["string"]
}

Job posting:
[paste job posting here]

Tip

Always provide an example of the exact output format you want. Models follow demonstrated structure more reliably than described structure.

System Prompts and Custom Instructions

System prompts set persistent instructions that govern every response in a conversation. They define the model's role, constraints, output format, and behavioral boundaries before the user says anything.

System prompts differ from regular prompts in scope. A regular prompt is a single instruction. A system prompt is an ongoing context that shapes every subsequent response.

Every major AI provider supports them. OpenAI uses the "system" role in its API, and Anthropic uses a dedicated system parameter.

Custom GPTs and Claude Projects both let non-technical users set persistent instructions.

What belongs in a system prompt?

Effective system prompts cover identity, constraints, and format. They answer: Who are you? What should you never do? How should you format responses?

code

You are a senior technical writer for a developer
documentation platform. Your audience is experienced
software engineers.

Rules:
- Use precise technical language
- Include code examples in every explanation
- Use Python for examples unless asked otherwise
- Maximum 3 sentences per paragraph
- Never say "simply" or "just" — respect complexity
- When uncertain, say so rather than guessing
- Format all responses in Markdown

When asked about API endpoints, always include:
method, URL path, request body schema, and
response body schema with example values.

Info

System prompts have the highest priority in the model's attention. Place your most critical instructions there, not in the user message.

Choosing the Right Technique: A Decision Framework

No single technique wins everywhere. The right choice depends on task complexity, accuracy requirements, and budget.

Technique	Best For	Token Cost	Accuracy Gain
Zero-shot	Simple, clear tasks	Low	Baseline
Few-shot	Pattern-matching tasks	Medium	+5-10%
Chain of Thought	Multi-step reasoning	Medium	+10-18%
Tree of Thoughts	Strategic planning, puzzles	Highest	+20-70%
Self-Consistency	Math, logic problems	High (5-10x)	+12-18%
ReAct	Tasks needing current data	Medium-High	Varies
Meta-Prompting	Prompt optimization	Medium	Indirect
Role Prompting	Domain-specific tasks	Low	+5-15%
Prompt Chaining	Complex multi-step workflows	High	+10-30%
Constitutional AI	Safety-critical outputs	High	Safety-focused
Structured Output	Code/data integration	Low	Format reliability
System Prompts	Consistent behavior	Low	Consistency

Quick Decision Tree

Is the task simple and well-defined? Start with zero-shot. Add few-shot examples if accuracy is insufficient.

Does the task require reasoning? Use chain of thought. If the stakes are high, add self-consistency.

Does the task need exploration or planning? Use Tree of Thoughts.

Does the model need external information? Use ReAct or prompt chaining with tool access.

Is the output going into code? Use structured output prompting.

Does the task need safety guardrails? Layer constitutional AI principles on top.

Combining Techniques for Maximum Impact

The most effective prompt engineers combine techniques. Research consistently shows that hybrid approaches outperform any single method.

Yao et al.'s ReAct paper found that combining ReAct with CoT and self-consistency outperformed all individual prompting methods on knowledge-intensive tasks. Wang et al. showed that self-consistency layered on top of CoT boosted GSM8K performance by 17.9% over CoT alone.

A Real-World Stack

Here's how a production system might combine techniques for a complex research task:

code

# System prompt (persistent context)
You are a senior market research analyst at a
Fortune 500 consulting firm.

# Role prompting + Chain of thought + Structured output
Analyze the competitive landscape for [product category].

Think through your analysis step by step:
1. Identify the top 5 competitors
2. Evaluate each on pricing, features, and market share
3. Identify gaps and opportunities

Output your analysis as a JSON object with this schema:
{
  "competitors": [...],
  "market_gaps": [...],
  "recommendation": "string"
}

You can build prompts that combine any of these techniques using the SurePrompts prompt builder. It handles role assignment, format specification, and context framing automatically.

The Research Behind These Techniques

Every technique in this guide traces back to published research. Here are the foundational papers:

Technique	Paper	Authors	Year
Few-shot	Language Models are Few-Shot Learners	Brown et al.	2020
Chain of Thought	CoT Prompting Elicits Reasoning in LLMs	Wei et al.	2022
Self-Consistency	Self-Consistency Improves CoT Reasoning	Wang et al.	2022
ReAct	ReAct: Synergizing Reasoning and Acting	Yao et al.	2022
Tree of Thoughts	Tree of Thoughts: Deliberate Problem Solving	Yao et al.	2023
Constitutional AI	Constitutional AI: Harmlessness from AI Feedback	Bai et al.	2022

The field moves fast. Meincke and Mollick's 2025 Wharton report found that CoT's value has decreased for reasoning-native models. Techniques that were breakthrough in 2022 may be built into model architectures by 2026.

Stay current. What works today may be redundant tomorrow as models evolve.

Frequently Asked Questions

What is the most effective prompt engineering technique?

Chain of thought combined with self-consistency produces the highest accuracy on reasoning tasks. Wang et al.'s 2022 research showed +17.9% improvement on GSM8K when combining these two techniques. For non-reasoning tasks, few-shot prompting often suffices.

Do I need to use advanced techniques with modern models?

Not always. Meincke and Mollick's 2025 Wharton study found that reasoning models like o1 gain negligible benefit from explicit CoT prompting. The reasoning is already built into the model. Test zero-shot first — add complexity only when results fall short.

How many few-shot examples should I include?

Three to five examples work for most tasks. Brown et al.'s GPT-3 research showed that performance improves with each example, but gains diminish after 5-8. More examples consume context window space without proportional accuracy improvement.

What's the difference between chain of thought and tree of thoughts?

Chain of thought follows a single reasoning path. Tree of thoughts explores multiple paths and can backtrack.

CoT is linear; ToT is branching. ToT excels when problems have multiple valid solution strategies.

When should I use prompt chaining vs. a single prompt?

Use prompt chaining when the task has distinct phases (research → outline → draft → edit). Use a single prompt when the task is cohesive and doesn't exceed the model's context window. Chaining reduces hallucination by letting you verify intermediate outputs.

Can I combine multiple prompting techniques?

Yes — and you should for complex tasks. Layer role prompting with chain of thought for domain-specific reasoning. Add self-consistency for reliability.

Use structured output for machine-readable results. The most effective production systems combine 2-3 techniques.

How does prompt engineering change with reasoning models?

Reasoning models like o1 and DeepSeek-R1 internalize step-by-step thinking. Explicit CoT prompts can hurt performance by conflicting with built-in reasoning. Focus on clear task specification, structured output, and role context.

What's the cheapest way to improve prompt performance?

Start with role prompting — it costs zero additional tokens. Then try few-shot examples (3-5).

These two low-cost techniques solve most quality issues. Reserve self-consistency and ToT for problems requiring high accuracy.

Every Prompt Engineering Technique Explained: The Research-Backed Guide (2026)

What Are Prompt Engineering Techniques?

Zero-Shot Prompting: The Baseline

Few-Shot Prompting: Teaching by Example

How many examples do you need?

Chain of Thought Prompting: Step-by-Step Reasoning

Zero-shot CoT: The "Think Step by Step" Trick

Tree of Thoughts: Exploring Multiple Paths

Self-Consistency: Majority Vote Reasoning

ReAct: Reasoning Plus Acting

Meta-Prompting: Prompts That Write Prompts

Role and Persona Prompting: Setting the Expert

Prompt Chaining: Breaking Complex Tasks Apart

Constitutional AI Prompting: Built-In Guardrails

Structured Output Prompting: Controlling the Format

System Prompts and Custom Instructions

What belongs in a system prompt?

Choosing the Right Technique: A Decision Framework

Quick Decision Tree

Combining Techniques for Maximum Impact

A Real-World Stack

The Research Behind These Techniques

Frequently Asked Questions

What is the most effective prompt engineering technique?

Do I need to use advanced techniques with modern models?

How many few-shot examples should I include?

What's the difference between chain of thought and tree of thoughts?

When should I use prompt chaining vs. a single prompt?

Can I combine multiple prompting techniques?

How does prompt engineering change with reasoning models?

What's the cheapest way to improve prompt performance?

Build prompts like these in seconds

Related Resources

Prompt Refinement Template

Prompt Chain Builder Template

System Prompt Writer Template

Prompt Engineering Framework Template

Related Articles

Chain-of-Thought Prompting: The Secret to Complex Problem Solving

Few-Shot Prompting: Give AI Examples and Watch It Learn

Zero-Shot vs Few-Shot Prompting: When to Use Each (With Examples)