The biggest prompting mistake people make with reasoning models is doing too much. These models reward clarity over complexity, goals over procedures, and restraint over hand-holding. Everything you learned about chain-of-thought prompting in 2023 needs to be reconsidered.
What Are Reasoning Models?
Reasoning models are a class of language models that perform internal chain-of-thought reasoning before producing a visible response. Unlike standard models that generate tokens sequentially from left to right, reasoning models allocate dedicated computation — sometimes called test-time compute — to think through a problem before committing to an answer.
The three major reasoning model families in 2026 are:
- OpenAI o3 / o4-mini — OpenAI's reasoning series, successors to o1. o3 is the full-power model; o4-mini is a smaller, faster, cheaper variant that still reasons internally.
- Claude with extended thinking — Anthropic's approach, available on Claude Opus 4.5 and the Sonnet/Opus 4.6 family. Thinking tokens appear in a separate block from the final response. Claude 4.6 introduced adaptive thinking with effort levels.
- Gemini 2.5 Pro / Flash with Deep Think — Google's reasoning mode, where the model generates and evaluates multiple hypotheses in parallel before answering.
What makes these models fundamentally different from standard models like GPT-4o, Claude Haiku, or Gemini Flash (without Deep Think) is where the reasoning happens. In a standard model, the only reasoning that occurs is in the visible output — which is why chain-of-thought prompting works so well on them. You're literally giving the model space to reason by asking it to "think step by step."
In a thinking model, the reasoning happens in hidden tokens that you never see (or see in a separate thinking block). The model has already done the step-by-step work internally. Your job shifts from eliciting reasoning to directing it.
Info
Standard model prompting = getting the model to reason at all (chain-of-thought, few-shot examples, step-by-step instructions).
Reasoning model prompting = directing a model that already reasons toward the right problem, with the right constraints, at the right depth.
Why Traditional Prompting Advice Backfires
This is where most people get tripped up. The techniques that made you effective with GPT-4 and Claude 3 can actively hurt your results with reasoning models.
The chain-of-thought paradox
On a standard model, adding "think step by step" to a math problem boosts accuracy. On o3 or Claude with extended thinking, the same phrase is wasteful at best and counterproductive at worst.
Why? The model is already thinking step by step — internally, using dedicated thinking tokens. When you ask it to also think step by step in the visible output, one of two things happens:
- The model narrates its internal reasoning in the final answer, doubling the token cost without improving quality. You pay for the thinking tokens AND the narrated reasoning.
- The model constrains its internal reasoning to match your specified steps, potentially missing better approaches it would have found on its own.
Here's what this looks like in practice:
--- STANDARD MODEL (GPT-4o) ---
Prompt: "A farmer has 847 sheep. He sells 293, buys 156 more, then
loses 12% of his flock to disease. How many sheep remain?
Think step by step."
This works well. The model writes out each step, catches errors,
and arrives at the correct answer.
--- REASONING MODEL (o3) ---
Prompt: "A farmer has 847 sheep. He sells 293, buys 156 more, then
loses 12% of his flock to disease. How many sheep remain?"
No "think step by step" needed. o3 already runs the calculation
internally across its reasoning tokens. The answer is correct
and concise. Adding "think step by step" would just make the
response longer without improving accuracy.
Over-specifying steps constrains the model
This is subtler and more damaging. When you give a reasoning model a detailed procedure — "First do X, then do Y, then do Z" — you're replacing its reasoning with yours. And the model's reasoning is usually better than the procedure you'd write, because it can explore approaches you wouldn't think of.
--- OVER-SPECIFIED (hurts reasoning models) ---
Prompt: "Review this code for security vulnerabilities.
Step 1: Check for SQL injection.
Step 2: Check for XSS.
Step 3: Check for CSRF.
Step 4: Check for authentication bypasses.
Step 5: Summarize findings."
--- GOAL-ORIENTED (works with reasoning models) ---
Prompt: "Review this code for security vulnerabilities.
Focus on issues that could lead to data exposure or
unauthorized access. Rank findings by severity."
The first prompt limits the model to four categories of vulnerability. The second lets the model apply its own expertise, which might surface issues you never thought to look for — race conditions, insecure deserialization, SSRF, logic flaws in business rules.
Warning
The rule of thumb: If you're writing a prompt that reads like a procedure manual, you're probably constraining a reasoning model. State what you want, not how to get there.
Few-shot examples can hurt
On standard models, few-shot examples are one of the most reliable prompting techniques. You show the model two or three examples of input-output pairs, and it pattern-matches to produce the right format.
On reasoning models, few-shot examples for reasoning tasks can backfire. The model may anchor on the reasoning pattern in your examples instead of applying its own deeper analysis. For classification, formatting, and style tasks, few-shot examples still work fine. But for tasks where you want the model to actually think — problem-solving, analysis, debugging — let it think without anchors.
The 7 Principles of Reasoning Model Prompting
These principles apply across o3, Claude extended thinking, and Gemini Deep Think. Model-specific techniques come later.
1. State the goal, not the steps
Tell the model what success looks like, not how to achieve it. Reasoning models are path-finders — give them the destination and let them find the route.
--- WEAK ---
"Analyze this dataset by first calculating the mean, then the
median, then the standard deviation, then identifying outliers
using the IQR method, then summarizing trends."
--- STRONG ---
"Analyze this dataset. Identify the most important statistical
patterns and any anomalies that would affect business decisions.
Present findings in order of significance."
2. Give hard problems, not easy ones
Reasoning models have a minimum useful complexity. Simple tasks — summarizing a paragraph, translating a sentence, classifying sentiment — don't benefit from internal reasoning. The model spends thinking tokens on a problem that doesn't need them, and you pay for that computation without getting better results.
Reserve reasoning models for tasks that genuinely benefit from deliberation: multi-step math, code architecture decisions, legal analysis, research synthesis, debugging, strategic planning.
3. Let the model choose its approach
This is the hardest principle for experienced prompt engineers. You've spent years learning to be prescriptive — specifying frameworks, methodologies, and analysis structures. With reasoning models, that prescription becomes a ceiling.
Instead of "Use the SWOT framework to analyze this business," try "Analyze this business's competitive position. Use whatever analytical framework best fits the situation." The model may choose SWOT. It may combine multiple frameworks. It may invent an approach tailored to the specific business. All of these are likely better than forcing SWOT on a problem that might not suit it.
4. Provide constraints, not procedures
There's a critical difference between constraints and procedures. Constraints define the boundaries of acceptable output. Procedures define the path to get there.
Constraints (good for reasoning models):
- "Keep the response under 500 words"
- "Only use information from the provided documents"
- "All code must be Python 3.12 compatible"
- "Do not recommend solutions that cost more than $10,000/month"
Procedures (often harmful for reasoning models):
- "First read the document, then extract key points, then..."
- "Start by listing all variables, then..."
- "Begin your analysis with a summary of..."
Constraints let the model reason freely within boundaries. Procedures force it onto a track.
5. Use verification requests
One area where you should be prescriptive with reasoning models: asking them to check their work. Reasoning models can still make mistakes, and explicitly requesting verification causes the model to allocate additional thinking tokens to double-checking.
"Solve this optimization problem. After reaching a solution,
verify it by checking it against the original constraints
and testing with edge cases."
"Write a function that handles concurrent database writes.
Before finalizing, check for race conditions, deadlocks,
and data integrity issues."
Verification requests don't constrain how the model thinks — they add a quality gate at the end.
6. Separate output format from reasoning
Tell the model what to think about and how to format the answer in distinct sections. Mixing formatting instructions into the problem description clutters the reasoning space.
--- MIXED (cluttered) ---
"Analyze the following contract and list each risk as a bullet
point with the clause number in bold and a severity rating of
high/medium/low in parentheses, making sure to consider
indemnification limits, liability caps, and termination clauses."
--- SEPARATED (clean) ---
"Analyze the following contract for legal risks. Consider all
clauses that could expose us to financial liability, operational
disruption, or compliance issues.
Format: Bullet list. Each item: **Clause [number]** —
[risk description] (severity: high/medium/low)"
The separated version lets the model think about risks without simultaneously worrying about formatting. The formatting instructions are there, but they're clearly secondary to the analytical task.
7. Budget your thinking tokens wisely
Every reasoning model charges for thinking tokens. o3 and Claude extended thinking both bill for the internal reasoning that you may never see. This means that using a reasoning model on a simple task isn't just slower — it's more expensive for no benefit.
Tip
Cost rule of thumb: If a task takes you less than 10 seconds to do mentally, it probably doesn't need a reasoning model. Use GPT-4o, Claude Sonnet (without extended thinking), or Gemini Flash instead.
Budget thinking tokens by adjusting the model's reasoning effort parameter (covered in the model-specific sections below) rather than by writing shorter prompts. A short prompt with high reasoning effort is better than a long prompt with low reasoning effort.
o3 and o4-mini: OpenAI's Reasoning Models
OpenAI's reasoning model family includes o3 (the flagship) and o4-mini (the cost-efficient variant). Both perform internal chain-of-thought reasoning before responding.
The reasoning effort parameter
o3 and o4-mini expose a reasoning_effort parameter with three levels: low, medium, and high. This controls how many thinking tokens the model allocates before responding.
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="o3",
reasoning={
"effort": "high"
},
input="Design a distributed caching system that handles
10 million requests per second with sub-millisecond
latency. Consider consistency models, eviction
policies, and failure modes."
)
Low effort — Good for straightforward questions where the model needs some reasoning but not deep analysis. Coding tasks with clear specifications, factual questions that require synthesis.
Medium effort — The default for most tasks. Analysis, multi-step problems, writing that requires planning.
High effort — Complex math, formal proofs, multi-file code generation, problems with many interacting constraints. Use this when accuracy matters more than speed or cost.
When to use o3 vs o4-mini
o4-mini is not just a smaller o3. It's specifically optimized for speed and cost while maintaining strong reasoning on STEM tasks. Here's how to choose:
- o4-mini — Math, science, code generation, structured data extraction, tasks where the answer can be verified. o4-mini is exceptionally cost-effective for these: it often matches o3's accuracy at a fraction of the cost.
- o3 — Open-ended analysis, creative problem-solving, tasks requiring broad world knowledge, situations where you need the highest ceiling on reasoning quality.
o3-specific prompting patterns
Use structured outputs for reliable formatting. o3 supports structured outputs (JSON Schema), which means you can separate the reasoning from the output format entirely. Let o3 reason freely and constrain only the final output shape.
response = client.responses.create(
model="o3",
reasoning={"effort": "high"},
input="Analyze these three investment options and recommend
the best risk-adjusted return for a 5-year horizon.",
text={
"format": {
"type": "json_schema",
"name": "investment_analysis",
"schema": {
"type": "object",
"properties": {
"recommendation": {"type": "string"},
"reasoning_summary": {"type": "string"},
"risk_score": {"type": "number"},
"confidence": {"type": "string"}
}
}
}
}
)
Don't use system prompts for reasoning instructions. o3 and o4-mini support system prompts (unlike o1, which didn't). Use the system prompt for persona and constraints, but keep reasoning guidance in the user message where it's closer to the problem context.
Use developer messages for persistent context. If you're building a multi-turn application with o3, use the developer message role for instructions that should persist across turns. This keeps the user turn clean for the actual problem.
Claude Extended Thinking: Anthropic's Approach
Claude's extended thinking is available on Opus 4.5 and the Sonnet/Opus 4.6 family. When enabled, Claude generates a thinking block (visible to you via the API but separate from the final response) followed by the response itself. Claude 4.6 added adaptive thinking with effort levels.
Thinking budget and effort levels
You control Claude's thinking through a budget_tokens parameter that sets the maximum number of tokens the model can use for internal reasoning. Claude 4.6 also introduced effort levels — low, medium, high (default), and max — which control how aggressively the model uses its thinking budget.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6-20260412",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Review this database schema for a multi-tenant
SaaS application. Identify design flaws that
would cause problems at scale."
}]
)
Setting the thinking budget too low on a hard problem forces the model to truncate its reasoning. Setting it too high on an easy problem wastes money. The adaptive effort levels help — at low effort, Claude might skip thinking entirely on simple tasks, even if the budget allows for more.
XML tag structure still matters
Claude still responds well to XML tags for structuring content within the prompt — wrapping documents, code blocks, and constraints. But avoid wrapping thinking instructions in XML tags. The model's thinking is already managed by the thinking budget; adding <thinking_instructions> tags just wastes context.
<!-- GOOD: XML for content structure -->
<code_to_review>
def process_payment(user_id, amount):
balance = get_balance(user_id)
if balance >= amount:
deduct(user_id, amount)
return True
return False
</code_to_review>
<constraints>
- Must handle concurrent payments
- Must be idempotent
- Must log all transactions
</constraints>
Review this payment processing function for correctness
and production readiness.
When extended thinking helps vs hurts
Extended thinking is not universally better. Here's the breakdown:
Thinking helps:
- Multi-step mathematical proofs
- Code architecture and system design
- Debugging complex interactions across files
- Legal or contract analysis
- Research synthesis from multiple sources
- Strategic decision-making with many variables
Thinking hurts (or doesn't help):
- Simple text generation (emails, summaries)
- Single-turn Q&A with clear answers
- Translation
- Classification and labeling
- Reformatting or restructuring existing text
- Conversational responses
For tasks in the "hurts" category, disable extended thinking and use Claude without it — or use Haiku, which doesn't support thinking at all and is much cheaper.
Interleaved thinking
Claude 4.6 introduced interleaved thinking for agentic workflows. When Claude uses tools (web search, code execution, file operations), it can think between tool calls — not just before the first response. This is important for multi-step tasks where each tool result changes what the model should do next.
You don't need to prompt for this — it happens automatically when extended thinking is enabled and Claude uses tools. But knowing it exists changes how you structure agentic prompts: you can give Claude harder multi-step tasks because it can reason between each step, not just at the beginning.
Gemini 2.5 Pro Deep Think: Google's Approach
Gemini 2.5 Pro and Flash support a "thinking" mode that Google calls Deep Think in its marketing and documentation. Deep Think uses a distinctive approach: instead of a single chain of reasoning, it generates and evaluates multiple hypotheses in parallel before converging on an answer.
Parallel hypothesis generation
This architectural difference matters for prompting. Deep Think performs best when the problem space is genuinely open — when there are multiple plausible approaches and the model benefits from exploring several before committing.
--- PLAYS TO DEEP THINK'S STRENGTH ---
"This application has three performance bottlenecks: database
queries (avg 200ms), API serialization (avg 50ms), and frontend
rendering (avg 800ms). We have budget to fix one. Which one
gives the best ROI considering user experience, engineering
effort, and long-term scalability?"
--- DOESN'T BENEFIT FROM DEEP THINK ---
"Convert this JSON to CSV format."
The first prompt has multiple valid answers that depend on weighing competing factors — exactly the kind of problem where parallel hypothesis evaluation shines. The second is a deterministic transformation that doesn't benefit from deep reasoning.
Multimodal reasoning
Gemini's major differentiator is multimodal reasoning — thinking across text, images, video, and audio. When you combine Deep Think with multimodal input, you get a model that can reason about visual and textual information simultaneously.
Prompt with image attachment:
"This is a photograph of our server room network topology.
Identify single points of failure and suggest redundancy
improvements. Consider both physical layout and logical
network architecture visible in the diagram."
No other reasoning model matches Gemini's ability to reason deeply about visual inputs. If your task involves diagrams, charts, photographs, or video, Gemini Deep Think is often the best choice regardless of what the text-only benchmarks say.
Integration with search grounding
Gemini can combine Deep Think with Google Search grounding — using real-time search results as inputs to its reasoning process. This is valuable for questions that require both current information and deep analysis.
"Research the current state of quantum error correction.
Use search to find the most recent developments from the
past 3 months. Then analyze whether the current progress
trajectory makes a 1000-logical-qubit machine feasible
by 2030."
The model searches for current data, then reasons deeply about what that data means. Prompt for both: tell it to search AND analyze, so it doesn't just summarize search results without applying its reasoning capability.
Tip
Gemini Deep Think prompting pattern: Frame the problem with multiple possible framings or perspectives. Deep Think's parallel hypothesis generation means it handles "analyze from multiple angles" prompts better than single-chain reasoning models.
When NOT to Use Reasoning Models
Not every task benefits from reasoning. Using a reasoning model for simple tasks is like using a scanning electron microscope to check if your plants need watering — technically it works, but you're wasting time and money.
Tasks that don't need reasoning models:
- Quick content generation. Writing a tweet, drafting a short email, generating a product description. Standard models handle these fine and respond faster.
- Simple Q&A. "What's the capital of France?" doesn't need 10,000 thinking tokens.
- Conversational interactions. Chatbots, customer support, casual dialogue. Reasoning models often overthink conversational tasks and sound stilted.
- Formatting and transformation. Converting between data formats, restructuring text, applying consistent formatting rules.
- Classification. Sentiment analysis, topic labeling, spam detection. These are pattern-matching tasks, not reasoning tasks.
The cost calculus:
Reasoning models charge for thinking tokens on top of input and output tokens. On a simple task, thinking tokens are pure waste. On a complex task, they're the difference between a wrong answer and a right one.
Warning
Overthinking is real. Reasoning models on simple tasks can produce worse results than standard models — they second-guess straightforward answers, add unnecessary caveats, and introduce complexity where none is needed. Match the model to the task complexity.
A practical heuristic: If you could solve the task yourself in under 30 seconds with full context, a standard model is probably sufficient. If the task requires you to consider multiple interacting factors, weigh tradeoffs, or follow a chain of logic through several steps, reach for a reasoning model.
Practical Examples: Standard Prompt vs Reasoning Model Prompt
The following five examples demonstrate how to adapt prompts when switching from a standard model to a reasoning model. The standard prompts work well on GPT-4o, Claude Sonnet (without thinking), or Gemini Flash. The reasoning model prompts are optimized for o3, Claude with extended thinking, or Gemini Deep Think.
Example 1: Complex Math/Logic Problem
--- STANDARD MODEL PROMPT ---
"A company has 3 factories. Factory A produces 40% of total
output with a 2% defect rate. Factory B produces 35% with a
3% defect rate. Factory C produces 25% with a 5% defect rate.
A randomly selected product is defective. What's the
probability it came from Factory C?
Think step by step:
1. Calculate P(defect) for each factory
2. Calculate total P(defect)
3. Apply Bayes' theorem
4. Show your work"
--- REASONING MODEL PROMPT ---
"A company has 3 factories. Factory A produces 40% of total
output with a 2% defect rate. Factory B produces 35% with a
3% defect rate. Factory C produces 25% with a 5% defect rate.
A randomly selected product is defective. What's the
probability it came from Factory C?
Verify your answer by confirming the posterior probabilities
for all three factories sum to 1."
The standard model prompt spells out Bayes' theorem step by step — because without that guidance, GPT-4o might take shortcuts. The reasoning model prompt states the problem and adds a verification request. o3 or Claude with thinking will apply Bayes' theorem (or whatever approach it prefers) on its own.
Example 2: Code Architecture Decision
--- STANDARD MODEL PROMPT ---
"We need to add real-time notifications to our app.
Currently we have a REST API with PostgreSQL.
Consider these options:
1. WebSockets with Socket.io
2. Server-Sent Events (SSE)
3. Polling with long-poll fallback
4. A managed service like Pusher or Ably
For each option, analyze:
- Implementation complexity
- Scalability to 100K concurrent users
- Cost implications
- Impact on existing architecture
- Maintenance burden
Then recommend one with justification."
--- REASONING MODEL PROMPT ---
"We need to add real-time notifications to our app.
Currently using a REST API with PostgreSQL, deployed on
Vercel with serverless functions. Team of 3 engineers.
Expected scale: 100K concurrent users within 12 months.
Recommend a real-time notification architecture. Optimize
for time-to-ship and operational simplicity given our
small team. Flag any approach that would require
re-architecting our existing API."
The standard prompt hand-holds through the analysis. The reasoning model prompt provides context that matters (serverless deployment, small team, timeline) and states the optimization criteria. The model will consider options you haven't listed and weight them against your actual constraints.
Example 3: Research Synthesis
--- STANDARD MODEL PROMPT ---
"Summarize the key findings from these three papers on
transformer attention mechanisms. For each paper:
1. State the main hypothesis
2. Describe the methodology
3. List key results
4. Note limitations
Then compare the three papers and identify areas of
agreement and disagreement."
--- REASONING MODEL PROMPT ---
"These three papers all study transformer attention
mechanisms but reach different conclusions about the
role of multi-head attention.
Identify the core disagreement. Determine which paper's
methodology most convincingly supports its claims. If
you had to design a follow-up experiment to resolve
the disagreement, what would it test?"
The standard prompt asks for structured summarization — a task standard models handle well with explicit guidance. The reasoning model prompt asks for judgment: which methodology is strongest, and what experiment would settle the debate. This requires deep reasoning that benefits from extended thinking.
Example 4: Strategic Analysis
--- STANDARD MODEL PROMPT ---
"Our SaaS product has 5,000 users and $50K MRR.
Churn is 8% monthly.
Perform a SWOT analysis:
- List 3 strengths
- List 3 weaknesses
- List 3 opportunities
- List 3 threats
Then suggest 3 strategic priorities for the next quarter."
--- REASONING MODEL PROMPT ---
"Our SaaS product has 5,000 users and $50K MRR. Churn
is 8% monthly. Customer acquisition cost is $200.
Average revenue per user is $10/month. We have 18 months
of runway.
8% monthly churn means we lose half our users every
8 months. At current CAC and ARPU, we can't grow our
way out of this.
What are the highest-leverage moves to fix unit
economics before runway runs out? Be specific about
what to do in the next 30 days vs the next 90 days."
The standard prompt asks for a textbook SWOT exercise. The reasoning model prompt presents a genuine strategic dilemma with real constraints (runway, unit economics) and asks for a time-bound action plan. The model can reason about the interplay between churn, CAC, ARPU, and runway in ways that a paint-by-numbers SWOT can't.
Example 5: Debugging Complex Issues
--- STANDARD MODEL PROMPT ---
"This function throws a NullPointerException intermittently
in production. Here's the stack trace and the relevant code.
Step 1: Identify all places where null could be introduced.
Step 2: Check thread safety of shared state.
Step 3: Review the database query for edge cases.
Step 4: Suggest a fix."
--- REASONING MODEL PROMPT ---
"This function throws a NullPointerException intermittently
in production — roughly 0.1% of requests. It only happens
under load (>500 RPS). The function works correctly in all
unit tests and staging environments.
Here's the stack trace and the relevant code. Find the bug.
[code and stack trace]"
The standard prompt prescribes a debugging procedure. The reasoning model prompt provides the symptoms that actually matter — intermittent, load-dependent, not reproducible in tests — and lets the model reason about what class of bug fits those symptoms. A reasoning model will likely consider concurrency issues, connection pool exhaustion, or caching race conditions without being told to look for them.
Building Prompts for Reasoning Models With SurePrompts
The principles in this guide — goal-oriented framing, constraints over procedures, verification requests, separated output formats — can be applied manually, but it's tedious to restructure every prompt from scratch.
Our prompt generator helps here. It builds structured prompts with role context, clear constraints, and output specifications already separated from the task description. You describe what you need in plain language, and the generator produces a prompt that works well with both standard and reasoning models.
This is especially useful when you're transitioning from standard to reasoning model prompting and need to break the habit of over-specifying steps.
The Future of Reasoning Model Prompting
Reasoning models are still early. The gap between effective and ineffective prompting on these models is larger than it was on standard models — because the models are more capable, there's more performance to unlock (or leave on the table).
A few trends to watch:
Adaptive reasoning will become the default. Claude 4.6's effort levels and o3's reasoning effort parameter are early versions of what will become automatic reasoning allocation. Models will eventually decide how hard to think on their own, without you setting a parameter. But we're not there yet — for now, tuning reasoning effort is a meaningful lever.
Reasoning costs will drop. o4-mini already demonstrates that strong reasoning doesn't require the largest models. Expect reasoning-capable models at every price point within the next year.
Prompting will become more about problem formulation. As models get better at choosing their own reasoning approaches, the skill shifts from "how to make the model think" to "how to frame the problem so the model thinks about the right things." This is closer to how you'd brief a smart colleague than how you'd write code.
The best prompt engineers in 2026 aren't the ones who write the longest, most detailed prompts. They're the ones who state problems clearly, provide the right constraints, and then get out of the model's way.
If you want to practice applying these principles, start with our prompt generator to build a structured prompt, then deliberately strip away any procedural instructions. State the goal, add your constraints, request verification, and let the reasoning model do what it was built to do.
For deeper coverage of specific models, see our guides on advanced prompt engineering techniques for Claude, GPT-5, and Gemini and chain-of-thought prompting fundamentals.