MCP and Tool Use Prompting: How to Write Prompts for AI That Uses Tools (2026)

Q: Does the AI model actually execute tools?

No — the model never executes anything directly. This is the most important thing to understand about tool use. The model outputs a structured request saying "I want to call this tool with these arguments." Your code (or the MCP client) actually runs the function, hits the API, or queries the database, then returns the result to the model as a tool result message. The model reads that output and weaves it into its final response. This separation matters for security: you can validate, rate-limit, and audit every tool call before executing it. The full loop has five steps — define available tools, the model decides to use one, your system executes it, results return to the model, and the model incorporates them — and it can repeat many times within a single conversation turn.

Q: How many tools should I give an AI model at once?

Aim for under 20 tools in a single context, and start with the minimum viable tool set. Giving a model 40+ tools might seem like making it more capable, but in practice it often makes it less reliable: the model spends more tokens reasoning about which tool to use, makes worse selections, and sometimes avoids tools entirely out of uncertainty. There's also a token cost — every tool definition consumes context, and a detailed tool might use 200-500 tokens, so 30 tools can burn 6,000-15,000 tokens before the conversation even starts, reducing room for history and reasoning. Add tools only when you observe the model failing without them, group related operations into fewer tools with an action parameter, remove rarely used tools, and consider dynamic loading so only tools relevant to the current conversation phase are included.

Q: How should I handle tool errors in a system prompt?

Always include error-handling instructions, because if your system prompt says nothing about failures, the model will improvise — sometimes hallucinating data, sometimes retrying indefinitely, sometimes just saying "an error occurred." Specify retry limits ("do NOT retry more than once"), fallback behaviors, and exactly what to tell the user when a tool is unavailable. For example: if a customer lookup returns no results, ask the user to verify their email or ID rather than guessing; if a status tool is down, tell the user you'll create a ticket for follow-up; and never make up data when a tool call fails. It also helps to validate tool inputs in your execution layer and return clear error messages like "Invalid customer_id format. Expected CUS-XXXXX, got '12345'" so the model can self-correct and retry with proper arguments.

Q: Do tool-use prompting principles differ between ChatGPT, Claude, and Gemini?

The core principles — clear descriptions, explicit routing, and error handling — are universal; only the API shape changes. OpenAI calls it "function calling," defines functions with JSON Schema in the tools parameter, supports parallel tool calls, and offers Structured Outputs to force arguments to match your schema plus a tool_choice parameter to force or auto-select calls. Claude defines tools with input_schema, respects negative constraints ("do NOT use this tool for...") more literally than most models, supports sequential multi-tool use, and natively supports MCP in Claude Desktop, Claude Code, and the API. Gemini uses FunctionDeclaration format, offers built-in grounding with Google Search, supports parallel calls, and exposes a function_calling_config with AUTO, ANY, and NONE modes. If you build across models, define tools in a common internal format and convert at the API layer.

Imtiaz Rayhan

The most useful AI isn't the one that knows the most — it's the one that can DO the most. Text generation got AI through the door. Tool use is what makes it stay. When an AI model can check your calendar, query a database, call an API, and execute code — all within a single conversation — it stops being a chatbot and starts being an assistant that actually gets things done.

What Is Tool Use?

Tool use is the ability of an AI model to call external functions, APIs, and services during a conversation. Instead of relying solely on its training data to generate a response, the model can reach out to the real world — run a calculation, look up current data, send a message, write a file, or hit an API endpoint.

The concept has evolved rapidly:

2023: Function calling. OpenAI introduced the idea of telling GPT-4 about available functions and letting the model decide when to call them. The model outputs structured JSON with a function name and arguments. Your code executes the function, returns the result, and the model incorporates it.
2024: Tool use. Anthropic and others expanded the concept beyond "functions" to a broader notion of tools. Claude's tool use API lets you define tools with detailed schemas, and the model intelligently decides when and how to use them.
2025-2026: MCP (Model Context Protocol). An open standard that standardizes how AI models connect to external tools and data sources. Instead of every model having its own tool integration format, MCP provides a universal protocol — build a tool server once, connect it to any MCP-compatible client.

Each step made tool use more powerful and more standardized. But regardless of the implementation — function calling, tool use API, or MCP — the core prompting challenge remains the same: you need to tell the model what tools exist, when to use them, and how to interpret the results.

The Model Context Protocol (MCP)

MCP is an open protocol, originally developed by Anthropic, that standardizes how AI applications connect to external data sources and tools. Think of it as USB-C for AI integrations: a single standard that replaces a tangle of custom connectors.

Why MCP Matters

Before MCP, every tool integration was bespoke. If you wanted Claude to access your company's Jira board, you built a custom integration for Claude. If you also wanted ChatGPT to access it, you built a separate integration. Want to add Gemini? Another integration.

MCP eliminates this duplication. You build one MCP server that exposes your Jira data, and any MCP-compatible client can connect to it — Claude Desktop, Cursor, VS Code with Copilot, or your own custom application.

How MCP Is Structured

MCP uses a client-server architecture with three roles:

MCP Server: Exposes tools, resources (data), and prompts (templates) through a standardized interface. A server might provide database access, file operations, API integrations, or any other capability.
MCP Client: Lives inside the AI application and maintains a connection to one or more MCP servers. The client handles the protocol communication.
MCP Host: The application that the user interacts with — Claude Desktop, an IDE, a custom chatbot. The host manages client connections and presents tools to the model.

For the full protocol and architecture explainer — the transport layer, capability negotiation, resources and prompts beyond tools, and how to stand up a production server — see the complete Model Context Protocol guide. This section covers only what you need for prompting.

Info

The MCP ecosystem is growing fast. As of early 2026, there are MCP servers for GitHub, Slack, Google Drive, PostgreSQL, Stripe, Notion, file systems, web browsers, and dozens more. Most major AI coding tools — Claude Code, Cursor, Windsurf, and Cline — support MCP natively. You can browse community-built servers at the MCP server registry and build your own with the official SDKs for TypeScript, Python, Java, Kotlin, and C#.

What MCP Means for Prompting

From a prompting perspective, MCP changes how tools become available to the model. Instead of defining tools inline in every API call, MCP servers register tools dynamically. The model sees the same tool definitions either way — a name, a description, and a parameter schema — but MCP makes the tooling infrastructure more modular and reusable.

The prompting principles in this guide apply whether you're defining tools via a raw API call, using MCP servers, or working through a tool-enabled interface like Claude Desktop. The model doesn't care how the tool got there. It cares about the tool description, the parameter schema, and your system prompt instructions.

How Tool Use Actually Works

Whether you're using function calling, Claude's tool use API, or MCP, the technical flow follows the same pattern:

Step 1: Define available tools. You provide the model with a list of tools, each with a name, description, and parameter schema.

Step 2: The model decides to use a tool. Based on the user's message and the tool descriptions, the model determines that calling a tool would help. It generates a structured tool call with the appropriate arguments.

Step 3: Your system executes the tool. The tool call is intercepted by your application (or the MCP client), which actually runs the function, hits the API, or queries the database. The model never executes anything directly.

Step 4: Results return to the model. The tool's output is sent back to the model as a tool result message.

Step 5: The model incorporates results. The model reads the tool output and generates its final response, weaving the tool results into a natural answer.

Here's what this looks like in practice with a weather tool:

code

User: "What's the weather in Tokyo right now?"

Model thinks: "I have a get_weather tool. The user wants current weather for Tokyo. I should call it."

Model outputs: tool_call(get_weather, { "city": "Tokyo", "units": "celsius" })

System executes: GET https://api.weather.com/current?city=Tokyo&units=celsius
→ Returns: { "temp": 22, "condition": "Partly cloudy", "humidity": 65 }

Model receives result and responds: "It's currently 22°C in Tokyo with partly cloudy skies and 65% humidity."

Tip

The model does not execute tools. This is the most important thing to understand. The model outputs a structured request saying "I want to call this tool with these arguments." Your code actually runs the tool. This matters for security — you can validate, rate-limit, and audit every tool call before executing it.

This loop can repeat multiple times in a single conversation turn. The model might call a search tool, read the results, then call a calculator tool, read those results, and finally compose its answer. Complex agentic workflows might involve dozens of sequential tool calls.

Writing Tool Descriptions That Work

Tool descriptions are the single most important factor in whether tool use works well or fails. The model uses descriptions to decide which tool to call and when. A bad description leads to the model calling the wrong tool, calling the right tool with wrong arguments, or not calling any tool when it should.

Anatomy of a Good Tool Definition

Every tool definition has three parts: a name, a description, and a parameter schema. Here's what good looks like:

json

{
  "name": "search_knowledge_base",
  "description": "Search the company's internal knowledge base for documentation, policies, and procedures. Use this tool when the user asks about company-specific information that wouldn't be in your training data — HR policies, engineering standards, onboarding procedures, product specs, or internal processes. Do NOT use this for general knowledge questions.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "The search query. Use specific terms rather than full questions. For example, 'parental leave policy' rather than 'what is the parental leave policy at our company?'"
      },
      "category": {
        "type": "string",
        "enum": ["engineering", "hr", "product", "legal", "finance", "general"],
        "description": "The category to search within. Narrow the search to improve relevance. Use 'general' only if the topic doesn't fit other categories."
      },
      "max_results": {
        "type": "integer",
        "description": "Maximum number of results to return. Default is 5. Use 1-3 for specific lookups, 5-10 for broad research.",
        "default": 5
      }
    },
    "required": ["query"]
  }
}

Notice what makes this effective:

The name is specific. search_knowledge_base tells the model exactly what this tool searches. Not search (search what?) or kb_tool (what does it do?).
The description says when to use it AND when not to. "Use this when the user asks about company-specific information" gives positive guidance. "Do NOT use this for general knowledge questions" prevents misuse.
Parameter descriptions include examples and constraints. The query description tells the model how to format queries for best results. The category enum constrains choices. The max_results description suggests appropriate values for different scenarios.

Five More Tool Definition Examples

Here are additional patterns for common tool types:

Database query tool:

json

{
  "name": "query_customers",
  "description": "Query the customer database to look up customer information, order history, and account status. Use this when the user asks about a specific customer, needs to verify account details, or wants order information. Always search by customer ID or email when available rather than name, since names may not be unique.",
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_id": {
        "type": "string",
        "description": "The unique customer ID (format: CUS-XXXXX). Preferred lookup method."
      },
      "email": {
        "type": "string",
        "description": "Customer email address. Use when customer_id is not available."
      },
      "include_orders": {
        "type": "boolean",
        "description": "Whether to include order history in the response. Set to true only when the user specifically asks about orders, as it increases response size.",
        "default": false
      }
    },
    "required": []
  }
}

Code execution tool:

json

{
  "name": "run_python",
  "description": "Execute Python code in a sandboxed environment. Use this for calculations, data transformations, generating visualizations, or testing code snippets. The environment has numpy, pandas, matplotlib, and requests pre-installed. Code execution times out after 30 seconds. Do NOT use this for tasks you can do with simple reasoning — only when actual computation is needed.",
  "input_schema": {
    "type": "object",
    "properties": {
      "code": {
        "type": "string",
        "description": "Python code to execute. Must be valid Python 3.11+ syntax. Use print() to output results you want to see."
      }
    },
    "required": ["code"]
  }
}

File operation tool:

json

{
  "name": "read_file",
  "description": "Read the contents of a file from the project directory. Use this before editing any file to understand its current state. Supports text files only (code, config, markdown, JSON, YAML, CSV). For binary files, use the appropriate specialized tool instead.",
  "input_schema": {
    "type": "object",
    "properties": {
      "path": {
        "type": "string",
        "description": "Relative path from the project root. Example: 'src/components/Header.tsx' or 'config/database.yml'. Do not use absolute paths."
      },
      "line_start": {
        "type": "integer",
        "description": "First line to read (1-indexed). Use with line_end to read a specific section of large files."
      },
      "line_end": {
        "type": "integer",
        "description": "Last line to read (inclusive). If omitted, reads to end of file."
      }
    },
    "required": ["path"]
  }
}

API integration tool:

json

{
  "name": "create_github_issue",
  "description": "Create a new issue in a GitHub repository. Use this when the user explicitly asks to create, file, or open an issue. Do NOT create issues as a side effect of other tasks unless specifically instructed. The authenticated user must have write access to the target repository.",
  "input_schema": {
    "type": "object",
    "properties": {
      "repo": {
        "type": "string",
        "description": "Repository in 'owner/repo' format. Example: 'acme-corp/api-server'."
      },
      "title": {
        "type": "string",
        "description": "Issue title. Keep it concise (under 80 characters) and descriptive."
      },
      "body": {
        "type": "string",
        "description": "Issue body in Markdown format. Include context, steps to reproduce (for bugs), or requirements (for features)."
      },
      "labels": {
        "type": "array",
        "items": { "type": "string" },
        "description": "Labels to apply. Use existing labels only — the API will error on non-existent labels."
      }
    },
    "required": ["repo", "title", "body"]
  }
}

Web search tool:

json

{
  "name": "web_search",
  "description": "Search the web for current information. Use this when the user's question requires up-to-date data that may not be in your training data — recent events, current prices, live documentation, release notes, or real-time data. Do NOT search for information you can answer confidently from training data.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query. Be specific and include relevant qualifiers like year, product name, or version number. Example: 'Next.js 15 app router middleware docs 2026' rather than 'nextjs middleware'."
      },
      "num_results": {
        "type": "integer",
        "description": "Number of results to return. Use 3-5 for focused lookups, 10 for broad research.",
        "default": 5
      }
    },
    "required": ["query"]
  }
}

The Rules for Tool Descriptions

Based on these examples, here are the rules that consistently produce good tool use behavior:

Say when to use it. Positive guidance: "Use this when..."
Say when NOT to use it. Negative guidance prevents the most common misuse cases.
Describe parameter formats with examples. Don't just say "type": "string" — show what a good value looks like.
Use enums to constrain choices. If a parameter only accepts certain values, enumerate them. The model will follow them.
Explain defaults and tradeoffs. "Default is 5. Use 1-3 for specific lookups, 5-10 for broad research" helps the model make better choices.
Mention side effects. If a tool writes data, creates records, sends emails, or costs money, say so in the description. The model should know when a tool call has consequences.

Prompt Patterns for Tool-Using Models

Defining tools well is half the job. The other half is writing system prompts and user instructions that guide how the model uses those tools.

System Prompt Structure for Tool-Enabled Conversations

A system prompt for a tool-using model should cover four things: the model's role, general tool use guidance, tool-specific policies, and output expectations.

code

You are a customer support agent for Acme Corp. You have access to tools
for looking up customer accounts, checking order status, and creating
support tickets.

## Tool Use Guidelines

- Always look up the customer's account before answering account-specific
  questions. Do not guess or rely on information from earlier in the
  conversation if it might be stale.
- When the customer reports a problem, check their order status before
  suggesting solutions. The order status tool returns real-time data.
- Create support tickets only when you cannot resolve the issue directly.
  Always confirm with the customer before creating a ticket.

## Tool-Specific Policies

- query_customers: Always search by email or customer ID. If the customer
  provides their name, ask for their email or customer ID to ensure an
  accurate lookup.
- check_order_status: If an order shows "delayed," check the estimated
  delivery date and proactively share it with the customer.
- create_ticket: Set priority to "high" only for billing issues or orders
  delayed more than 7 days. All other issues are "normal."

## Response Format

- Be concise and direct
- If you used a tool, reference the specific data you found
- Never reveal raw tool outputs to the customer — summarize naturally

Info

Why tool-specific policies in the system prompt matter: Tool descriptions tell the model what a tool does. System prompt policies tell the model how to use it in context. The tool description for create_ticket might say "creates a support ticket." The system prompt adds business logic: "only create tickets when you can't resolve directly, confirm with the customer first, set priority based on these criteria." These two layers work together.

Guiding Tool Selection

When a model has access to multiple tools, it sometimes picks the wrong one. You can guide selection with explicit routing rules in your system prompt:

code

## When to Use Which Tool

- For questions about the customer's account, billing, or subscription
  → use query_customers
- For questions about a specific order's status, shipping, or delivery
  → use check_order_status
- For general product questions (features, pricing, compatibility)
  → answer from your knowledge, do NOT use any tool
- For issues you cannot resolve after looking up the relevant data
  → use create_ticket (confirm with customer first)

This pattern eliminates ambiguity. Without it, the model might search the knowledge base for order status questions or create tickets prematurely.

Multi-Tool Orchestration

Some tasks require calling multiple tools in sequence. The model handles this naturally — it can call one tool, read the result, decide what to do next, and call another tool. But you can make the sequencing more reliable with explicit instructions:

code

When a customer reports a missing order:
1. First, look up their account with query_customers to verify identity
2. Then check the order status with check_order_status
3. If the order shows "delivered" but the customer says they didn't receive
   it, create a ticket with priority "high" and category "missing_delivery"
4. If the order shows "in_transit," share the tracking information and
   estimated delivery date — do not create a ticket yet

This is essentially prompt chaining embedded within a system prompt. You're defining the sequence the model should follow for specific scenarios.

Handling Tool Errors

Tools fail. APIs time out, databases return empty results, services go down. If you don't tell the model what to do when a tool fails, it will either hallucinate an answer, retry indefinitely, or tell the user something unhelpful like "I encountered an error."

code

## Error Handling

- If a tool call returns an error, do NOT retry more than once.
- If the customer lookup returns no results, ask the customer to verify
  their email or customer ID. Do not guess.
- If the order status tool is unavailable, tell the customer: "I'm having
  trouble accessing order information right now. Let me create a ticket
  so our team can follow up within 2 hours."
- Never make up data when a tool call fails. Always be honest about
  what you couldn't retrieve.

Verification After Tool Use

One of the most overlooked patterns is telling the model to verify tool results before presenting them:

code

After retrieving data from any tool:
- Verify the result makes sense given the question. If the customer asked
  about order #12345 and the tool returned data for a different order,
  note the discrepancy and re-query.
- Cross-reference dates and statuses. If the order is marked "delivered"
  but the delivery date is in the future, flag this as a data issue.
- If multiple tool calls return contradictory data, present both pieces
  of information and note the inconsistency rather than picking one.

This pattern is borrowed from agent prompting, where verification loops are standard practice. It works just as well for simpler tool-use scenarios.

Advanced MCP Patterns

Once you move beyond basic tool definitions into full MCP-powered systems, several advanced patterns become important.

Building MCP Servers

An MCP server is a program that exposes tools, resources, and prompt templates through the MCP protocol. You can build servers in TypeScript, Python, or any language with an MCP SDK.

A minimal MCP server in TypeScript looks like this:

typescript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

const server = new McpServer({
  name: "inventory-server",
  version: "1.0.0",
});

server.tool(
  "check_inventory",
  "Check product inventory levels by SKU. Returns current stock count, warehouse location, and reorder status.",
  {
    sku: z.string().describe("Product SKU (format: PRD-XXXXX)"),
    warehouse: z.enum(["east", "west", "central"]).optional()
      .describe("Specific warehouse to check. Omit to check all warehouses."),
  },
  async ({ sku, warehouse }) => {
    const inventory = await db.getInventory(sku, warehouse);
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(inventory, null, 2),
        },
      ],
    };
  }
);

The key insight: the tool description and parameter schemas you define in the MCP server are the same descriptions that reach the model. Everything in the "Writing Tool Descriptions That Work" section applies directly here. The MCP server is just the delivery mechanism.

Composing Multiple MCP Servers

Real-world deployments typically connect multiple MCP servers to a single client. A development environment might connect servers for:

File system access
Git operations
Database queries
Web search
Project management (Jira, Linear)
Communication (Slack, email)

When composing servers, the prompting challenge is managing tool overlap and routing. If both a database server and a knowledge base server can answer a question, which should the model use?

Handle this with explicit system prompt routing:

code

You are connected to multiple tool servers. Use them as follows:

- For structured data queries (customer records, orders, inventory)
  → use the database tools (query_customers, check_order_status, etc.)
- For unstructured knowledge (policies, documentation, how-to guides)
  → use the knowledge base tools (search_knowledge_base, get_document)
- For real-time data (stock prices, weather, news)
  → use the web search tools
- For code operations (reading, writing, running code)
  → use the filesystem and code execution tools

When in doubt, prefer the more specific tool over the general one.

Tool Use in Agentic Workflows

When tool use combines with agentic AI patterns, the model operates in a loop: observe, decide, act, observe results, decide again. This is where prompting gets most complex and most powerful.

For agentic tool use, your system prompt needs additional components:

code

## Planning

Before taking action, outline your plan:
1. What information do I need?
2. Which tools will get me that information?
3. In what order should I use them?
4. What will I do with the results?

## Stopping Criteria

Stop working when:
- The user's request is fully addressed
- You've attempted a tool call 3 times and it keeps failing
- You realize the task requires permissions or access you don't have

Do NOT continue iterating if you're not making progress.

Warning

Tool count matters for performance. Every tool definition consumes tokens in the model's context window. A system with 50 tools might use 5,000-10,000 tokens just for tool definitions, leaving less room for conversation history and reasoning. If you notice degraded performance, audit your tool count. Remove tools the model rarely uses, and consider grouping related operations into fewer, more flexible tools.

Conditional Tool Availability

Not every tool should be available at every point in a conversation. A customer support system might restrict ticket creation until the agent has verified the customer's identity, or limit database writes during a read-only investigation phase.

You can implement this through dynamic system prompts that update as the conversation progresses:

code

## Current Phase: Identity Verification

Available tools in this phase:
- query_customers (to verify the customer's identity)

Tools NOT yet available:
- modify_account (requires verified identity)
- create_ticket (requires verified identity)
- process_refund (requires verified identity + manager approval)

Once the customer is verified, additional tools will be unlocked.

Rate Limiting and Cost Management

Every tool call has a cost — API calls, compute time, database queries, third-party service charges. System prompts should make the model cost-aware:

code

## Cost Awareness

- web_search: Each call costs approximately $0.01. Use judiciously.
- run_python: Each execution uses compute credits. Combine multiple
  calculations into a single code block when possible.
- send_email: Sends a real email to a real person. Double-check the
  recipient and content before calling.

General rule: accomplish the task with the fewest tool calls possible.
Do not call a tool "just to be thorough" if you already have the
information you need.

Model-Specific Tool Use

Each major model family handles tool use slightly differently. Understanding these differences helps you write better prompts for each one.

ChatGPT / GPT-4

OpenAI uses the term "function calling" and supports both sequential and parallel tool use.

Key characteristics:

Functions are defined with JSON Schema in the tools parameter of the API call
GPT-4 supports parallel_tool_calls — it can call multiple functions simultaneously in a single turn when the calls are independent
Structured Outputs mode forces function call arguments to conform strictly to your JSON Schema, eliminating malformed calls
The tool_choice parameter lets you force a specific function call ({"type": "function", "function": {"name": "get_weather"}}) or require any function call ("required") vs. letting the model decide ("auto")

Prompting tips for GPT-4:

Use Structured Outputs for tools where argument format is critical. It guarantees the arguments match your schema.
When you want the model to always use a tool for a certain query type, set tool_choice to force it rather than relying on the system prompt alone.
GPT-4 tends to be verbose after tool calls. If you want concise answers, say so explicitly: "After retrieving data, respond in 1-2 sentences."

Claude

Anthropic's tool use implementation gives Claude strong adherence to tool descriptions and system prompt constraints.

Key characteristics:

Tools are defined in the tools parameter with input_schema for parameters
Claude respects negative constraints ("do NOT use this tool for...") more literally than most models
Supports sequential multi-tool use — Claude will call one tool, read the result, then decide whether to call another
MCP is natively supported in Claude Desktop, Claude Code, and the Claude API
Claude uses XML-style thinking when working with tools in complex scenarios, which can be guided with system prompt structure

Prompting tips for Claude:

Lean into explicit constraints. Claude follows "do NOT use tool X when condition Y" instructions reliably.
When providing multiple tools, include routing guidance in the system prompt. Claude handles "Use tool A for X, tool B for Y" instructions well.
For complex agentic workflows, use structured system prompts with clear section headers (## Planning, ## Constraints, ## Tool Policies). Claude responds well to this document-like formatting.

Gemini

Google's Gemini models support function calling and have unique grounding capabilities.

Key characteristics:

Functions are defined using the FunctionDeclaration format
Gemini supports "grounding with Google Search" — a built-in tool that the model can use to verify or supplement its answers with fresh search results
Parallel function calling is supported for independent tool invocations
The function_calling_config parameter controls whether function calling is AUTO, ANY (force a function call), or NONE

Prompting tips for Gemini:

When using Google Search grounding, you don't need to define a search tool — Gemini handles it natively. Just enable grounding and the model decides when to search.
For custom tools, Gemini benefits from detailed description fields at both the function level and parameter level, similar to the patterns described earlier.
Use ANY mode when you specifically want the model to use one of your tools rather than answer from its training data.

Tip

Cross-model compatibility: If you're building a system that needs to work across multiple models, define your tools using a common internal format and convert to each model's format at the API layer. The prompting principles — clear descriptions, explicit routing, error handling — are universal. Only the API shape changes.

Common Mistakes

These are the six mistakes that cause the most problems in tool-use prompting. Each one is straightforward to fix once you know what to look for.

1. Vague Tool Descriptions

The mistake:

json

{
  "name": "search",
  "description": "Searches for information.",
  "input_schema": {
    "type": "object",
    "properties": {
      "q": { "type": "string" }
    }
  }
}

Why it fails: The model doesn't know what kind of information this searches, when to use it versus answering from training data, or how to format the query parameter for best results. It will either overuse or underuse this tool.

The fix: Specify what it searches, when to use it, when not to use it, and what good input looks like. See the search_knowledge_base example earlier in this guide.

2. Too Many Tools (Decision Paralysis)

Giving a model 40+ tools might seem like making it more capable. In practice, it often makes it less reliable. The model spends more tokens reasoning about which tool to use, makes worse selections, and sometimes avoids tools entirely because it's uncertain.

The fix: Start with the minimum viable tool set. Add tools only when you observe the model failing to accomplish tasks without them. Group related operations into fewer tools with a action parameter instead of having separate tools for each operation. Aim for under 20 tools in a single context.

3. Not Handling Tool Errors in the System Prompt

If your system prompt says nothing about what to do when a tool call fails, the model will improvise. Sometimes it hallucinates data. Sometimes it tells the user "an error occurred" with no next steps. Sometimes it retries indefinitely.

The fix: Always include error handling instructions. Specify retry limits, fallback behaviors, and what to tell the user when tools are unavailable.

4. Not Providing Tool Use Examples in the System Prompt

Tool descriptions tell the model what tools do. Examples show the model when and how to use them. Without examples, the model relies entirely on its training data for tool selection heuristics.

The fix: Include 1-2 examples in your system prompt for the most important tools:

code

## Example Interaction

User: "What's the status of my order #ORD-12345?"

You should:
1. Call check_order_status with order_id "ORD-12345"
2. Read the result
3. Summarize the status, expected delivery, and any issues in plain language

Do NOT just say "let me check" — actually call the tool and report the findings.

5. Ignoring the Token Cost of Tool Definitions

Every tool definition consumes tokens. A detailed tool with a complex schema might use 200-500 tokens. Multiply that by 30 tools and you've used 6,000-15,000 tokens before the conversation even starts. This reduces the effective context window for conversation history and reasoning.

The fix: Audit your tool definitions for verbosity. Remove tools that are rarely used. Consider dynamic tool loading — only include tools relevant to the current conversation phase. Keep descriptions thorough but not redundant.

6. Not Validating Tool Inputs

The model generates tool arguments, but it can make mistakes — wrong data types, missing required fields, values outside expected ranges. If your tool execution layer doesn't validate inputs, you get cryptic errors or silent failures.

The fix: Validate tool arguments in your execution layer before running the tool. Return clear error messages that tell the model what went wrong: "Invalid customer_id format. Expected CUS-XXXXX, got '12345'." The model can then self-correct and retry with proper arguments.

Warning

Security reminder: Never trust tool arguments from the model without validation. Treat them like user input — sanitize, validate, and constrain. This is especially critical for tools that write data, execute code, or access sensitive systems. A model could be manipulated through prompt injection to pass malicious arguments to your tools.

Putting It All Together

Here's a complete example combining tool definitions with a system prompt for a developer productivity assistant. This shows how all the patterns work together:

code

System prompt:

You are a developer assistant with access to tools for code operations,
documentation search, and task management. You help developers write,
debug, and ship code.

## Core Principles

1. Read before you write. Always use read_file before edit_file.
2. Search before you assume. Use search_docs for framework questions
   rather than relying on training data that may be outdated.
3. Verify after you change. After editing code, run the linter or
   tests if available.

## Tool Routing

- Code questions with a specific file → read_file, then answer
- Framework or library questions → search_docs first
- Bug reports → read_file for the relevant code, then analyze
- Feature requests → create_task in the project tracker
- General coding questions → answer from knowledge, no tool needed

## Error Handling

- If read_file returns "file not found," ask the user for the correct
  path. Do not guess.
- If search_docs returns no results, answer from your training data
  but note that you couldn't find official documentation.
- If create_task fails, show the error and suggest the user create
  the task manually.

## Response Style

- Be direct and technical
- Show code in fenced code blocks with language tags
- When explaining tool results, focus on what matters for the
  developer's question — don't dump raw tool output

This system prompt, combined with well-written tool definitions, gives the model everything it needs to use tools effectively. It knows what tools exist (from the definitions), when to use each one (from the routing rules), what to do when things go wrong (from error handling), and how to present results (from response style).

What Comes Next

Tool use transforms AI from a text generator into a capable assistant that interacts with real systems. MCP standardizes these interactions so a tool built once works everywhere. But the intelligence of tool use still depends entirely on prompting — the descriptions you write, the routing rules you define, and the guardrails you set.

The developers and teams who get this right are building AI systems that don't just answer questions but take meaningful actions: querying databases, filing tickets, deploying code, monitoring systems, and orchestrating complex workflows across dozens of services.

If you're building prompts for tool-using AI, start with three things: write tool descriptions that tell the model when to use each tool and when not to, add routing rules in your system prompt for common scenarios, and always include error handling instructions. Those three practices will get you 80% of the way to reliable tool use.

For building structured prompts that guide AI effectively — whether for tool use or any other task — try the AI Prompt Generator. It handles the framework and structure so you can focus on the specifics of your use case.

If you're working with agentic AI systems that combine tool use with autonomous decision-making, or building complex prompt chains that orchestrate multiple AI calls, the same principles apply at every level: clear instructions, explicit constraints, and robust error handling.

The AI models will keep getting better at using tools. The prompting fundamentals won't change. Write clear descriptions. Define boundaries. Handle failures. That's the whole game.

MCP and Tool Use Prompting: How to Write Prompts for AI That Uses Tools (2026)

What Is Tool Use?

The Model Context Protocol (MCP)

Why MCP Matters

How MCP Is Structured

What MCP Means for Prompting

How Tool Use Actually Works

Writing Tool Descriptions That Work

Anatomy of a Good Tool Definition

Five More Tool Definition Examples

The Rules for Tool Descriptions

Prompt Patterns for Tool-Using Models

System Prompt Structure for Tool-Enabled Conversations

Guiding Tool Selection

Multi-Tool Orchestration

Handling Tool Errors

Verification After Tool Use

Advanced MCP Patterns

Building MCP Servers

Composing Multiple MCP Servers

Tool Use in Agentic Workflows

Conditional Tool Availability

Rate Limiting and Cost Management

Model-Specific Tool Use

ChatGPT / GPT-4

Claude

Gemini

Common Mistakes

1. Vague Tool Descriptions

2. Too Many Tools (Decision Paralysis)

3. Not Handling Tool Errors in the System Prompt

4. Not Providing Tool Use Examples in the System Prompt

5. Ignoring the Token Cost of Tool Definitions

6. Not Validating Tool Inputs

Putting It All Together

What Comes Next

AI prompts built for developers

Related Resources

API Tutorial Writer Template

API Endpoint Documentation Template

API Documentation Template

MCP Server Designer Template

Related Articles

AI Agents Prompting Guide: How to Write Instructions That Actually Work (2026)

Prompt Engineering for Developers: The Technical Guide to AI-Assisted Coding (2026)

Prompt Chaining: How to Break Complex Tasks Into Simple Steps (2026 Guide)