Skip to main content
Back to Blog
Featured
computer usebrowser automationClaudeAI agentsdesktop automationadvanced promptingtutorial

Computer Use Prompting: How to Write Instructions for AI That Controls Your Browser and Desktop (2026)

AI can now click, type, and navigate your computer. Learn how to write effective instructions for Claude computer use, browser agents, and desktop automation — with the prompting patterns that prevent costly mistakes.

SurePrompts Team
April 12, 2026
23 min read

You're not chatting with the AI anymore. You're handing it the mouse, the keyboard, and the ability to click real buttons on your real computer. One bad instruction, and it clicks "Delete" instead of "Download." The prompting rules for computer use are completely different from anything you've written before — and the stakes are higher.

What Is Computer Use?

Computer use is a category of AI capability where the model can see your screen, move a cursor, click buttons, type text, scroll through pages, and navigate between applications. Instead of generating text in a chat window, the AI interacts with your computer the same way a human would — through the graphical user interface.

This is fundamentally different from tool use or API calls. A traditional AI agent might call a function like search_web("query") or read_file("path"). A computer-use agent literally looks at a screenshot of your screen, identifies a search bar visually, clicks on it, and types the query character by character.

Info

Computer use refers to AI systems that interact with graphical user interfaces by taking screenshots, interpreting visual elements, and performing mouse/keyboard actions. The AI sees pixels, not code — it navigates your computer the same way a remote human operator would.

The ecosystem is growing quickly. Anthropic's Claude computer use API was the first major release, allowing Claude to take screenshots, move the mouse, click, type, and perform keyboard shortcuts. Open-source browser-use agents followed, enabling AI to navigate websites through browser automation frameworks. Other tools extend this to full desktop environments — opening applications, switching windows, and interacting with native software.

What ties all of these together is a shared challenge: you need to write instructions that guide an AI through a visual, interactive environment where every action has real consequences.

Why Computer Use Prompting Is Different

If you've written prompts for ChatGPT, Claude, or Gemini, you already know the basics of clear instructions. But computer use breaks several assumptions that text prompting relies on.

Actions Are Irreversible

In a normal chat, a bad response costs you nothing. You regenerate, rephrase, or start over. With computer use, the AI is performing real actions. If you tell it to clear a form and it clicks "Delete Account" instead, that action happened. There's no regenerate button for your production database.

Warning

Every action a computer-use agent takes is real. Clicking a delete button actually deletes. Submitting a form actually submits. Sending an email actually sends. Your instructions must be precise enough to prevent the AI from taking destructive actions, because there is no undo for most of them.

The AI Sees Screenshots, Not Structure

When you browse a website, your browser has access to the DOM — the structured tree of HTML elements with IDs, classes, and attributes. A computer-use agent doesn't see any of that. It sees a screenshot. A grid of pixels. It identifies buttons, text fields, and links the same way you would: by looking at them.

This means the AI can be confused by:

  • Buttons that look similar but do different things
  • Text that overlaps or is partially hidden
  • Dynamic elements that move or resize
  • Pop-ups that cover the element it was trying to click
  • Dark mode vs. light mode changing how elements look

Latency Changes Everything

Between the time the AI takes a screenshot, decides what to click, and performs the click, the screen may have changed. A notification might appear. A page might finish loading. A modal might pop up. The AI is always acting on slightly stale information, and your instructions need to account for this.

Error Recovery Is Not Optional

In text prompting, you can handle errors after the fact. In computer use, an unhandled error — a popup the AI doesn't expect, a page that loads differently than usual, a button that moved — can send the entire workflow off the rails. Every instruction set needs to anticipate what can go wrong.

Safety Boundaries Must Be Explicit

A text-based AI can't accidentally wire money from your bank account. A computer-use agent, if given access to your browser with your banking session active, theoretically could. The attack surface is your entire computer. Safety isn't a nice-to-have — it's the first thing you write.

The 8 Principles of Computer Use Prompting

These principles apply regardless of which computer-use platform you're working with. They represent the core patterns that separate reliable computer-use instructions from dangerous ones.

1. Be Explicit About What to Click and Where

Vague references that work in text prompts fail with computer use. "Click the submit button" might work if there's one submit button on screen. If there are two — or if "Submit" and "Cancel" are next to each other — you need more specificity.

Vague (risky):

code
Click the delete button.

Explicit (safer):

code
Look for the red button labeled "Delete Draft" in the bottom-right
corner of the modal dialog. Before clicking it, verify that the
modal title says "Delete Draft" and NOT "Delete Account."

Reference elements by their text label, color, position on screen, and surrounding context. The more visual anchors you provide, the less likely the AI is to click the wrong thing.

2. Define Safety Boundaries Up Front

Before any task instructions, establish what the agent must never do. These constraints should be absolute — not guidelines the agent weighs against efficiency.

code
SAFETY BOUNDARIES — These override all other instructions:
- NEVER click any button containing the word "delete," "remove,"
  or "permanently"
- NEVER enter payment information, credit card numbers, or
  banking credentials
- NEVER close the browser or shut down applications
- NEVER modify system settings or preferences
- If you encounter a login prompt for a service you weren't
  instructed to use, STOP and report it

3. Describe the Expected Screen State at Each Step

The AI needs to know what "correct" looks like at every stage. Without this, it can't distinguish between "the action worked" and "something went wrong but the screen looks similar."

code
Step 1: Navigate to https://example.com/dashboard
  EXPECTED: You should see a page with the heading "Dashboard"
  and a sidebar menu on the left with options including
  "Reports," "Settings," and "Users."

Step 2: Click "Reports" in the sidebar
  EXPECTED: The main content area should change to show a list
  of reports with dates and titles. The sidebar item "Reports"
  should appear highlighted or selected.

4. Include Error Recovery Instructions

Every step should have a contingency. What does the AI do if the expected state doesn't appear?

code
Step 3: Click the "Export CSV" button in the top-right corner
  of the reports table.
  EXPECTED: A file download should begin, and you may see a
  download notification at the bottom of the browser.
  IF NOT FOUND: The button may be hidden behind a "More Actions"
  dropdown menu — click "More Actions" first, then look for
  "Export CSV."
  IF STILL NOT FOUND: Take a screenshot and stop. Do not attempt
  to find alternative export methods.

5. Use Verification Steps

After every significant action, instruct the AI to verify the result before moving on. This catches errors early before they compound.

code
After clicking "Save Changes":
  1. Wait 2 seconds for the page to update
  2. Verify that a success message appears (typically a green
     banner saying "Changes saved" or similar)
  3. Verify that the values you entered are still displayed
     correctly in the form
  4. Only proceed to the next step after both verifications pass

Tip

Build a "verify before proceeding" habit into every computer-use prompt. The cost of pausing to verify is a few seconds of latency. The cost of not verifying is an entire workflow going sideways because step 3 failed silently and the agent kept going through steps 4 through 12.

6. Limit Scope to One Task Per Instruction Set

Computer use prompts should be narrow and focused. "Set up my entire development environment" is too broad. "Install VS Code from the website and verify it opens" is appropriately scoped.

Complex workflows should be broken into sequential instruction sets, each with its own safety boundaries and verification steps. This gives you checkpoints where you can review the agent's progress and catch problems before they propagate.

7. Provide Fallback Behaviors

Not every situation can be anticipated. But you can define what the agent should do when it encounters something unexpected — a default behavior that errs on the side of caution.

code
FALLBACK BEHAVIOR:
- If you encounter any dialog, popup, or prompt you were not
  explicitly briefed on, STOP and take a screenshot. Do not
  dismiss it, do not click any of its buttons, and do not
  try to work around it.
- If a page takes more than 15 seconds to load, STOP and report
  the issue rather than refreshing.
- If you are unsure which of two similar-looking elements to
  interact with, STOP and ask rather than guessing.

8. Set Explicit Stop Conditions

The agent needs to know when it's done — and when it should stop even if it's not done.

code
STOP CONDITIONS:
- Stop after successfully exporting 3 CSV files (the task
  is complete)
- Stop if you encounter any error message you cannot resolve
  in one attempt
- Stop if you have performed more than 30 total click actions
  (something is wrong if the task requires this many)
- Stop if you are prompted for credentials you were not given

Writing Effective Computer Use Instructions

With the principles established, here's how to structure actual computer-use prompts for common scenarios.

Task Decomposition for Browser Workflows

Break complex browser tasks into phases, each with a clear objective and verification checkpoint.

code
TASK: Collect pricing information from three competitor websites.

PHASE 1 — Competitor A (https://competitor-a.com/pricing)
  Objective: Extract plan names, prices, and feature lists.
  Steps:
    1. Navigate to the pricing page URL above.
    2. VERIFY: Page title or heading contains "Pricing."
    3. For each pricing tier visible on the page, record:
       - Plan name
       - Monthly price
       - Top 5 listed features
    4. If pricing requires toggling between "Monthly" and
       "Annual," record both price points.
    5. VERIFY: You have recorded at least 2 pricing tiers.

CHECKPOINT: Present the data collected for Competitor A
before moving to Phase 2.

Step-by-Step vs. Goal-Based Instructions

You have two approaches for structuring computer-use instructions, and each has tradeoffs.

Step-by-step tells the agent exactly what to click, in what order. This is safer for critical tasks but brittle — if the interface changes, the instructions break.

code
STEP-BY-STEP APPROACH:
1. Click the hamburger menu icon (three horizontal lines)
   in the top-left corner
2. Click "Settings" in the dropdown menu
3. Scroll down to the "Notifications" section
4. Uncheck the box labeled "Email notifications"
5. Click the blue "Save" button at the bottom of the page

Goal-based tells the agent what to achieve and lets it figure out the navigation. This is more flexible but riskier — the agent might take unexpected paths.

code
GOAL-BASED APPROACH:
Goal: Disable email notifications in the application settings.
Expected location: Settings > Notifications
Safety: Do not change any other settings. Only modify the
email notification toggle/checkbox.

For most computer-use tasks, a hybrid approach works best: goal-based framing with step-by-step guidance for the critical path, and explicit safety boundaries regardless.

Tip

Use step-by-step instructions for any action that is destructive or irreversible (deleting, submitting, purchasing). Use goal-based instructions for exploratory or read-only tasks (finding information, taking screenshots, navigating to a page).

Screenshot-Aware Prompting

Since the AI works from screenshots, write instructions that reference visual elements the way a human would describe them to someone looking at the same screen.

code
VISUAL REFERENCES:
- The "New Project" button is a large blue button with a
  "+" icon, typically located in the upper-right area of
  the dashboard.
- The project list appears as a series of cards arranged
  in a grid. Each card shows a project name in bold text
  at the top, a description below it, and a date at the
  bottom-right corner.
- The settings gear icon is a small gray cog icon in the
  top-right corner, next to your profile avatar.

Form Filling and Data Entry

Form filling is one of the most common computer-use tasks. The key is specifying both the data and the exact fields, because forms often have fields with similar labels.

code
TASK: Fill out the contact form at https://example.com/contact

DATA TO ENTER:
  - "First Name" field: John
  - "Last Name" field: Smith
  - "Email" field: john.smith@example.com
  - "Subject" dropdown: Select "General Inquiry"
  - "Message" text area: "I would like to request a demo
    of your enterprise plan."

AFTER FILLING:
  1. Review all fields visually to confirm data matches
     what was specified above
  2. VERIFY the email field shows "john.smith@example.com"
     (typos in email fields are common and costly)
  3. Click "Submit" only after verification passes

DO NOT click "Subscribe to newsletter" or any checkbox
that was not listed in the data above.

When a task spans multiple pages, each navigation action is a potential failure point. Account for page load times and URL verification.

code
TASK: Download the most recent invoice from the billing portal.

Step 1: Navigate to https://app.example.com/billing
  VERIFY: URL bar shows the billing page. Page heading
  says "Billing" or "Invoices."
  WAIT: Allow up to 5 seconds for the page to fully load.
  IF REDIRECT: If you are redirected to a login page,
  STOP — do not attempt to log in.

Step 2: Locate the most recent invoice in the list
  The invoices should appear in a table or list format,
  sorted by date with the most recent at the top.
  VERIFY: The top invoice date is within the last 30 days.

Step 3: Click the download icon or "Download PDF" link
  for that invoice
  VERIFY: A file download begins. The downloaded filename
  should contain "invoice" and a date.
  IF POPUP: If a dialog asks for file format, select "PDF."
  IF NO DOWNLOAD: Try right-clicking the invoice row and
  looking for a "Download" option in the context menu.

Step 4: Confirm the download completed
  Check the browser's download bar or notification area.
  VERIFY: File size is greater than 0 bytes.

Multi-Application Workflows

Some tasks require switching between applications. These need extra care because the agent must identify which application is in the foreground and navigate between windows.

code
TASK: Copy a table from a webpage into a spreadsheet.

Phase 1 — Browser:
  1. Navigate to https://example.com/quarterly-report
  2. Locate the table titled "Q1 Revenue by Region"
  3. Select all cells in the table (click the first cell,
     then Shift+click the last cell)
  4. Copy the selection (Ctrl+C / Cmd+C)
  VERIFY: Status bar or tooltip confirms "Copied to clipboard"

Phase 2 — Switch Application:
  5. Open the spreadsheet application (look for it in the
     taskbar or dock)
  IF NOT OPEN: STOP — do not open new applications unless
  the spreadsheet was already running
  6. VERIFY: The spreadsheet application is now in the
     foreground and you can see an open worksheet

Phase 3 — Spreadsheet:
  7. Click cell A1 in the spreadsheet
  8. Paste the data (Ctrl+V / Cmd+V)
  9. VERIFY: The pasted data contains the same number of
     rows and columns as the original table

Safety and Guardrails

Computer use is powerful, and that power demands serious safety measures. This is not theoretical — a poorly configured computer-use agent with access to your primary browser session can interact with every logged-in service: email, banking, cloud infrastructure, social media.

Warning

Never run a computer-use agent on your primary desktop without sandboxing. The agent has access to everything visible on screen, including browser sessions where you're logged into sensitive services. A single misinterpreted instruction could interact with the wrong application entirely.

Sandboxing Computer Use

The safest approach is to run computer-use agents in an isolated environment:

  • Virtual machines: Run the agent inside a VM with a clean browser profile. No saved passwords, no logged-in sessions beyond what the task requires.
  • Containers: Docker containers with virtual display servers (like Xvfb) provide lightweight isolation. The agent can interact with a browser inside the container without access to your host system.
  • Dedicated browser profiles: At minimum, use a separate browser profile with no saved credentials, no extensions, and no autofill data.

Preventing Unintended Actions

Beyond sandboxing, build prevention into your prompts:

  • Allowlists over blocklists: Rather than listing everything the agent shouldn't do, specify exactly which websites and applications it is allowed to interact with. "You may ONLY interact with https://app.example.com. If any other domain appears, STOP."
  • Action budgets: Limit the total number of actions. "Perform no more than 20 click actions total. If the task isn't complete by then, stop and report what remains."
  • Confirmation gates: For any destructive action, require the agent to pause and describe what it's about to do before doing it.

Sensitive Data Handling

Computer-use agents see everything on screen, including sensitive data.

  • Never leave password managers unlocked during agent sessions
  • Close tabs and applications that aren't relevant to the task
  • If the agent needs to enter credentials, pass them through environment variables or secure configuration — not in the prompt text
  • Be aware that screenshots taken during computer use may be logged and stored by the AI provider

Warning

Screenshots are data. Every screenshot the computer-use agent takes may be transmitted to the AI provider's servers for processing. Do not run computer-use agents on screens displaying confidential information, medical records, financial data, or anything you wouldn't want captured in a log.

Rate Limiting Actions

Runaway agents can perform hundreds of actions per minute. Build rate limits into your instructions:

code
RATE LIMITS:
- Wait at least 1 second between click actions
- Wait at least 3 seconds after any page navigation
  before taking the next action
- If you perform more than 5 actions in 10 seconds,
  pause for 5 seconds before continuing

Human-in-the-Loop Checkpoints

For any task with real consequences, require human approval at key decision points:

code
HUMAN APPROVAL REQUIRED BEFORE:
- Submitting any form
- Clicking any button that triggers a purchase or payment
- Sending any email or message
- Deleting or modifying any data
- Navigating away from the specified domain

Present a summary of what you plan to do and wait for
explicit "proceed" instruction before taking the action.

Real-World Use Cases

Computer use is still emerging, but several categories of tasks are already practical and valuable.

Web Scraping and Data Collection

When websites lack APIs or block traditional scraping tools, computer-use agents can navigate them like a human user — scrolling, clicking pagination links, and extracting visible text from screenshots.

code
TASK: Collect product listings from the first 3 pages of
search results on example-marketplace.com.

For each product, record:
- Product name (the bold text on each listing card)
- Price (to the right of the product name)
- Seller name (below the price in smaller gray text)
- Star rating (number of filled stars, out of 5)

Navigation:
- After recording all products on a page, click the
  "Next" button at the bottom to advance
- VERIFY: The page number indicator updates after clicking
- STOP after completing page 3

Output: Present the data as a structured table.

QA and Testing Automation

Computer-use agents can walk through web applications the way a manual QA tester would — filling forms, clicking buttons, and verifying that the right things happen. This is particularly useful for testing workflows that are hard to automate with traditional testing frameworks.

code
TASK: Test the user registration flow.

1. Navigate to https://staging.example.com/register
2. Fill in the registration form with test data:
   - Name: "Test User"
   - Email: "testuser_[timestamp]@example.com"
   - Password: "TestPass123!"
3. Click "Create Account"
4. VERIFY: A success message appears OR you are redirected
   to a welcome page
5. CHECK: Does the welcome page display the name "Test User"?
6. CHECK: Is there a confirmation email prompt?

Report: List each CHECK with PASS/FAIL and a screenshot
of the final state.

Form Filling and Data Migration

Migrating data between systems that don't have APIs often means manually entering records. Computer-use agents can automate this tedious process.

Info

Data migration with computer use works best for small-to-medium datasets (tens to hundreds of records) where building a proper integration would take more time than the migration itself. For large datasets (thousands of records), invest in an API-based solution or database migration instead.

Competitive Research and Monitoring

Regularly checking competitor websites for pricing changes, new features, or updated positioning is a natural fit for computer use. The agent navigates to specific pages, reads the visible content, and reports changes from previous checks.

Administrative Task Automation

Routine admin work — updating spreadsheets, filing reports in internal tools, downloading recurring exports — can often be automated with computer-use agents. These are typically low-risk, high-repetition tasks where the workflow doesn't change often.

Common Mistakes

These five mistakes account for most computer-use failures. Each one seems minor in isolation but can derail an entire workflow.

1. Assuming the AI Understands the Full Page

The AI sees a screenshot — a fixed-size image of whatever is currently visible in the viewport. It does not see content below the fold, inside collapsed menus, or behind modals. If the button you want is below the visible area, you need to explicitly instruct the agent to scroll.

code
BAD: "Click the Submit button."
(What if Submit is below the fold?)

BETTER: "Scroll to the bottom of the form. Look for a
button labeled 'Submit' — it should be below the last
form field. If you don't see it after scrolling, the
page layout may have changed. STOP and take a screenshot."

2. Not Handling Popups, Modals, and Unexpected States

Cookie consent banners. Newsletter signup modals. Chat widgets. Browser notification permission prompts. These appear unpredictably and cover the elements the agent is trying to interact with. Without explicit handling, the agent either clicks the popup's buttons (potentially opting into things you didn't want) or gets stuck trying to click elements hidden behind the overlay.

code
POPUP HANDLING:
- If a cookie consent banner appears, click "Reject All"
  or "Necessary Only." If neither option exists, click
  "Accept" to dismiss it.
- If a newsletter or signup modal appears, look for an
  "X" close button in the top-right corner and click it.
- If a chat widget obscures part of the page, ignore it
  if possible. If it blocks a required element, look for
  a minimize or close button on the widget.
- If a browser notification prompt appears ("This site
  wants to send notifications"), click "Block" or "Deny."

3. Giving Too Many Steps at Once

Long instruction sets create compounding error risk. If step 3 fails subtly and the agent continues through step 20, you've wasted time and potentially caused damage. Break workflows into phases of 3-5 steps each, with verification checkpoints between them.

Tip

The ideal computer-use instruction set is 3-7 steps long. If your task requires more, split it into multiple instruction sets with human review between each phase. This mirrors how you'd supervise a new employee on their first day — you wouldn't give them a 30-step task and walk away.

4. No Safety Boundaries

The most dangerous computer-use prompts are the ones that say what to do but never say what not to do. Without explicit boundaries, the agent applies its best judgment — and its best judgment may include clicking through warning dialogs, dismissing confirmation prompts, or navigating to unexpected pages to "help."

Always define safety boundaries before task instructions. Make them the first thing in your prompt, not an afterthought.

5. Not Verifying Actions Completed

"Click save" is not the same as "click save and verify the data was saved." The click might miss. The page might error. The save might silently fail. Without verification, the agent moves on assuming success, and you don't discover the failure until the end of the workflow — or worse, days later when you notice data is missing.

Every action-verification pair follows this pattern:

code
ACTION: Click "Save Changes"
WAIT: 2-3 seconds for the page to respond
VERIFY: Look for a success indicator:
  - A green "Saved" confirmation message, OR
  - The page refreshing with the updated values, OR
  - A brief toast notification confirming the save
IF NO CONFIRMATION: Try clicking "Save Changes" one more
  time. If there is still no confirmation after 5 seconds,
  STOP and report the issue.

Building Your Computer Use Prompts

Creating effective computer-use prompts follows the same general principle as any advanced prompting: structure, specificity, and safety. If you're building prompts for AI agents or automated workflows, many of the patterns transfer — but computer use demands extra care because the actions are real and visual.

Start with the prompt generator to build your base instructions, then layer on the computer-use-specific elements: safety boundaries, screen state descriptions, verification steps, and fallback behaviors. For a deeper understanding of how agentic AI and tool use work together, read the AI agents prompting guide and the developer-focused prompt engineering guide.

A template for getting started:

code
SAFETY BOUNDARIES:
[What the agent must never do]

ENVIRONMENT:
[What application or browser is open, what URL to start at,
what the agent should see on screen]

TASK:
[Clear objective in one sentence]

STEPS:
[3-7 steps, each with expected screen state and error recovery]

VERIFICATION:
[How to confirm the task completed successfully]

STOP CONDITIONS:
[When to stop, whether successful or not]

FALLBACK:
[What to do when something unexpected happens]

Closing Thoughts

Computer use is the bridge between AI as a thinking tool and AI as a doing tool. For the first time, the same models that can reason about code, analyze data, and write strategy can also interact with the interfaces where work actually happens — the dashboards, the forms, the applications you use every day.

But this bridge carries traffic in both directions. The same capability that lets an AI fill out a form perfectly can let it fill out the wrong form. The same capability that lets it download a report can let it interact with systems it shouldn't touch. The gap between "helpful automation" and "costly mistake" is the quality of your instructions.

The prompting patterns in this guide — explicit actions, safety boundaries, expected states, verification steps, error recovery, scoped tasks, fallback behaviors, and stop conditions — are not optional. They are the minimum safety infrastructure for giving an AI control over your computer.

Start small. Pick a single, low-stakes, repetitive task. Write instructions using the template above. Run it in a sandboxed environment. Verify every step. Then gradually expand from there, building confidence in both the technology and your ability to direct it.

The AI doesn't need to be perfect. Your instructions need to make imperfection safe.

Ready to Level Up Your Prompts?

Stop struggling with AI outputs. Use SurePrompts to create professional, optimized prompts in under 60 seconds.

Try AI Prompt Generator