You're not chatting with the AI anymore. You're handing it the mouse, the keyboard, and the ability to click real buttons on your real computer. One bad instruction, and it clicks "Delete" instead of "Download." The prompting rules for computer use are completely different from anything you've written before — and the stakes are higher.
What Is Computer Use?
Computer use is a category of AI capability where the model can see your screen, move a cursor, click buttons, type text, scroll through pages, and navigate between applications. Instead of generating text in a chat window, the AI interacts with your computer the same way a human would — through the graphical user interface.
This is fundamentally different from tool use or API calls. A traditional AI agent might call a function like search_web("query") or read_file("path"). A computer-use agent literally looks at a screenshot of your screen, identifies a search bar visually, clicks on it, and types the query character by character.
Info
Computer use refers to AI systems that interact with graphical user interfaces by taking screenshots, interpreting visual elements, and performing mouse/keyboard actions. The AI sees pixels, not code — it navigates your computer the same way a remote human operator would.
The ecosystem is growing quickly. Anthropic's Claude computer use API was the first major release, allowing Claude to take screenshots, move the mouse, click, type, and perform keyboard shortcuts. Open-source browser-use agents followed, enabling AI to navigate websites through browser automation frameworks. Other tools extend this to full desktop environments — opening applications, switching windows, and interacting with native software.
What ties all of these together is a shared challenge: you need to write instructions that guide an AI through a visual, interactive environment where every action has real consequences.
Why Computer Use Prompting Is Different
If you've written prompts for ChatGPT, Claude, or Gemini, you already know the basics of clear instructions. But computer use breaks several assumptions that text prompting relies on.
Actions Are Irreversible
In a normal chat, a bad response costs you nothing. You regenerate, rephrase, or start over. With computer use, the AI is performing real actions. If you tell it to clear a form and it clicks "Delete Account" instead, that action happened. There's no regenerate button for your production database.
Warning
Every action a computer-use agent takes is real. Clicking a delete button actually deletes. Submitting a form actually submits. Sending an email actually sends. Your instructions must be precise enough to prevent the AI from taking destructive actions, because there is no undo for most of them.
The AI Sees Screenshots, Not Structure
When you browse a website, your browser has access to the DOM — the structured tree of HTML elements with IDs, classes, and attributes. A computer-use agent doesn't see any of that. It sees a screenshot. A grid of pixels. It identifies buttons, text fields, and links the same way you would: by looking at them.
This means the AI can be confused by:
- Buttons that look similar but do different things
- Text that overlaps or is partially hidden
- Dynamic elements that move or resize
- Pop-ups that cover the element it was trying to click
- Dark mode vs. light mode changing how elements look
Latency Changes Everything
Between the time the AI takes a screenshot, decides what to click, and performs the click, the screen may have changed. A notification might appear. A page might finish loading. A modal might pop up. The AI is always acting on slightly stale information, and your instructions need to account for this.
Error Recovery Is Not Optional
In text prompting, you can handle errors after the fact. In computer use, an unhandled error — a popup the AI doesn't expect, a page that loads differently than usual, a button that moved — can send the entire workflow off the rails. Every instruction set needs to anticipate what can go wrong.
Safety Boundaries Must Be Explicit
A text-based AI can't accidentally wire money from your bank account. A computer-use agent, if given access to your browser with your banking session active, theoretically could. The attack surface is your entire computer. Safety isn't a nice-to-have — it's the first thing you write.
The 8 Principles of Computer Use Prompting
These principles apply regardless of which computer-use platform you're working with. They represent the core patterns that separate reliable computer-use instructions from dangerous ones.
1. Be Explicit About What to Click and Where
Vague references that work in text prompts fail with computer use. "Click the submit button" might work if there's one submit button on screen. If there are two — or if "Submit" and "Cancel" are next to each other — you need more specificity.
Vague (risky):
Click the delete button.
Explicit (safer):
Look for the red button labeled "Delete Draft" in the bottom-right
corner of the modal dialog. Before clicking it, verify that the
modal title says "Delete Draft" and NOT "Delete Account."
Reference elements by their text label, color, position on screen, and surrounding context. The more visual anchors you provide, the less likely the AI is to click the wrong thing.
2. Define Safety Boundaries Up Front
Before any task instructions, establish what the agent must never do. These constraints should be absolute — not guidelines the agent weighs against efficiency.
SAFETY BOUNDARIES — These override all other instructions:
- NEVER click any button containing the word "delete," "remove,"
or "permanently"
- NEVER enter payment information, credit card numbers, or
banking credentials
- NEVER close the browser or shut down applications
- NEVER modify system settings or preferences
- If you encounter a login prompt for a service you weren't
instructed to use, STOP and report it
3. Describe the Expected Screen State at Each Step
The AI needs to know what "correct" looks like at every stage. Without this, it can't distinguish between "the action worked" and "something went wrong but the screen looks similar."
Step 1: Navigate to https://example.com/dashboard
EXPECTED: You should see a page with the heading "Dashboard"
and a sidebar menu on the left with options including
"Reports," "Settings," and "Users."
Step 2: Click "Reports" in the sidebar
EXPECTED: The main content area should change to show a list
of reports with dates and titles. The sidebar item "Reports"
should appear highlighted or selected.
4. Include Error Recovery Instructions
Every step should have a contingency. What does the AI do if the expected state doesn't appear?
Step 3: Click the "Export CSV" button in the top-right corner
of the reports table.
EXPECTED: A file download should begin, and you may see a
download notification at the bottom of the browser.
IF NOT FOUND: The button may be hidden behind a "More Actions"
dropdown menu — click "More Actions" first, then look for
"Export CSV."
IF STILL NOT FOUND: Take a screenshot and stop. Do not attempt
to find alternative export methods.
5. Use Verification Steps
After every significant action, instruct the AI to verify the result before moving on. This catches errors early before they compound.
After clicking "Save Changes":
1. Wait 2 seconds for the page to update
2. Verify that a success message appears (typically a green
banner saying "Changes saved" or similar)
3. Verify that the values you entered are still displayed
correctly in the form
4. Only proceed to the next step after both verifications pass
Tip
Build a "verify before proceeding" habit into every computer-use prompt. The cost of pausing to verify is a few seconds of latency. The cost of not verifying is an entire workflow going sideways because step 3 failed silently and the agent kept going through steps 4 through 12.
6. Limit Scope to One Task Per Instruction Set
Computer use prompts should be narrow and focused. "Set up my entire development environment" is too broad. "Install VS Code from the website and verify it opens" is appropriately scoped.
Complex workflows should be broken into sequential instruction sets, each with its own safety boundaries and verification steps. This gives you checkpoints where you can review the agent's progress and catch problems before they propagate.
7. Provide Fallback Behaviors
Not every situation can be anticipated. But you can define what the agent should do when it encounters something unexpected — a default behavior that errs on the side of caution.
FALLBACK BEHAVIOR:
- If you encounter any dialog, popup, or prompt you were not
explicitly briefed on, STOP and take a screenshot. Do not
dismiss it, do not click any of its buttons, and do not
try to work around it.
- If a page takes more than 15 seconds to load, STOP and report
the issue rather than refreshing.
- If you are unsure which of two similar-looking elements to
interact with, STOP and ask rather than guessing.
8. Set Explicit Stop Conditions
The agent needs to know when it's done — and when it should stop even if it's not done.
STOP CONDITIONS:
- Stop after successfully exporting 3 CSV files (the task
is complete)
- Stop if you encounter any error message you cannot resolve
in one attempt
- Stop if you have performed more than 30 total click actions
(something is wrong if the task requires this many)
- Stop if you are prompted for credentials you were not given
Writing Effective Computer Use Instructions
With the principles established, here's how to structure actual computer-use prompts for common scenarios.
Task Decomposition for Browser Workflows
Break complex browser tasks into phases, each with a clear objective and verification checkpoint.
TASK: Collect pricing information from three competitor websites.
PHASE 1 — Competitor A (https://competitor-a.com/pricing)
Objective: Extract plan names, prices, and feature lists.
Steps:
1. Navigate to the pricing page URL above.
2. VERIFY: Page title or heading contains "Pricing."
3. For each pricing tier visible on the page, record:
- Plan name
- Monthly price
- Top 5 listed features
4. If pricing requires toggling between "Monthly" and
"Annual," record both price points.
5. VERIFY: You have recorded at least 2 pricing tiers.
CHECKPOINT: Present the data collected for Competitor A
before moving to Phase 2.
Step-by-Step vs. Goal-Based Instructions
You have two approaches for structuring computer-use instructions, and each has tradeoffs.
Step-by-step tells the agent exactly what to click, in what order. This is safer for critical tasks but brittle — if the interface changes, the instructions break.
STEP-BY-STEP APPROACH:
1. Click the hamburger menu icon (three horizontal lines)
in the top-left corner
2. Click "Settings" in the dropdown menu
3. Scroll down to the "Notifications" section
4. Uncheck the box labeled "Email notifications"
5. Click the blue "Save" button at the bottom of the page
Goal-based tells the agent what to achieve and lets it figure out the navigation. This is more flexible but riskier — the agent might take unexpected paths.
GOAL-BASED APPROACH:
Goal: Disable email notifications in the application settings.
Expected location: Settings > Notifications
Safety: Do not change any other settings. Only modify the
email notification toggle/checkbox.
For most computer-use tasks, a hybrid approach works best: goal-based framing with step-by-step guidance for the critical path, and explicit safety boundaries regardless.
Tip
Use step-by-step instructions for any action that is destructive or irreversible (deleting, submitting, purchasing). Use goal-based instructions for exploratory or read-only tasks (finding information, taking screenshots, navigating to a page).
Screenshot-Aware Prompting
Since the AI works from screenshots, write instructions that reference visual elements the way a human would describe them to someone looking at the same screen.
VISUAL REFERENCES:
- The "New Project" button is a large blue button with a
"+" icon, typically located in the upper-right area of
the dashboard.
- The project list appears as a series of cards arranged
in a grid. Each card shows a project name in bold text
at the top, a description below it, and a date at the
bottom-right corner.
- The settings gear icon is a small gray cog icon in the
top-right corner, next to your profile avatar.
Form Filling and Data Entry
Form filling is one of the most common computer-use tasks. The key is specifying both the data and the exact fields, because forms often have fields with similar labels.
TASK: Fill out the contact form at https://example.com/contact
DATA TO ENTER:
- "First Name" field: John
- "Last Name" field: Smith
- "Email" field: john.smith@example.com
- "Subject" dropdown: Select "General Inquiry"
- "Message" text area: "I would like to request a demo
of your enterprise plan."
AFTER FILLING:
1. Review all fields visually to confirm data matches
what was specified above
2. VERIFY the email field shows "john.smith@example.com"
(typos in email fields are common and costly)
3. Click "Submit" only after verification passes
DO NOT click "Subscribe to newsletter" or any checkbox
that was not listed in the data above.
Navigation and Multi-Page Workflows
When a task spans multiple pages, each navigation action is a potential failure point. Account for page load times and URL verification.
TASK: Download the most recent invoice from the billing portal.
Step 1: Navigate to https://app.example.com/billing
VERIFY: URL bar shows the billing page. Page heading
says "Billing" or "Invoices."
WAIT: Allow up to 5 seconds for the page to fully load.
IF REDIRECT: If you are redirected to a login page,
STOP — do not attempt to log in.
Step 2: Locate the most recent invoice in the list
The invoices should appear in a table or list format,
sorted by date with the most recent at the top.
VERIFY: The top invoice date is within the last 30 days.
Step 3: Click the download icon or "Download PDF" link
for that invoice
VERIFY: A file download begins. The downloaded filename
should contain "invoice" and a date.
IF POPUP: If a dialog asks for file format, select "PDF."
IF NO DOWNLOAD: Try right-clicking the invoice row and
looking for a "Download" option in the context menu.
Step 4: Confirm the download completed
Check the browser's download bar or notification area.
VERIFY: File size is greater than 0 bytes.
Multi-Application Workflows
Some tasks require switching between applications. These need extra care because the agent must identify which application is in the foreground and navigate between windows.
TASK: Copy a table from a webpage into a spreadsheet.
Phase 1 — Browser:
1. Navigate to https://example.com/quarterly-report
2. Locate the table titled "Q1 Revenue by Region"
3. Select all cells in the table (click the first cell,
then Shift+click the last cell)
4. Copy the selection (Ctrl+C / Cmd+C)
VERIFY: Status bar or tooltip confirms "Copied to clipboard"
Phase 2 — Switch Application:
5. Open the spreadsheet application (look for it in the
taskbar or dock)
IF NOT OPEN: STOP — do not open new applications unless
the spreadsheet was already running
6. VERIFY: The spreadsheet application is now in the
foreground and you can see an open worksheet
Phase 3 — Spreadsheet:
7. Click cell A1 in the spreadsheet
8. Paste the data (Ctrl+V / Cmd+V)
9. VERIFY: The pasted data contains the same number of
rows and columns as the original table
Safety and Guardrails
Computer use is powerful, and that power demands serious safety measures. This is not theoretical — a poorly configured computer-use agent with access to your primary browser session can interact with every logged-in service: email, banking, cloud infrastructure, social media.
Warning
Never run a computer-use agent on your primary desktop without sandboxing. The agent has access to everything visible on screen, including browser sessions where you're logged into sensitive services. A single misinterpreted instruction could interact with the wrong application entirely.
Sandboxing Computer Use
The safest approach is to run computer-use agents in an isolated environment:
- Virtual machines: Run the agent inside a VM with a clean browser profile. No saved passwords, no logged-in sessions beyond what the task requires.
- Containers: Docker containers with virtual display servers (like Xvfb) provide lightweight isolation. The agent can interact with a browser inside the container without access to your host system.
- Dedicated browser profiles: At minimum, use a separate browser profile with no saved credentials, no extensions, and no autofill data.
Preventing Unintended Actions
Beyond sandboxing, build prevention into your prompts:
- Allowlists over blocklists: Rather than listing everything the agent shouldn't do, specify exactly which websites and applications it is allowed to interact with. "You may ONLY interact with https://app.example.com. If any other domain appears, STOP."
- Action budgets: Limit the total number of actions. "Perform no more than 20 click actions total. If the task isn't complete by then, stop and report what remains."
- Confirmation gates: For any destructive action, require the agent to pause and describe what it's about to do before doing it.
Sensitive Data Handling
Computer-use agents see everything on screen, including sensitive data.
- Never leave password managers unlocked during agent sessions
- Close tabs and applications that aren't relevant to the task
- If the agent needs to enter credentials, pass them through environment variables or secure configuration — not in the prompt text
- Be aware that screenshots taken during computer use may be logged and stored by the AI provider
Warning
Screenshots are data. Every screenshot the computer-use agent takes may be transmitted to the AI provider's servers for processing. Do not run computer-use agents on screens displaying confidential information, medical records, financial data, or anything you wouldn't want captured in a log.
Rate Limiting Actions
Runaway agents can perform hundreds of actions per minute. Build rate limits into your instructions:
RATE LIMITS:
- Wait at least 1 second between click actions
- Wait at least 3 seconds after any page navigation
before taking the next action
- If you perform more than 5 actions in 10 seconds,
pause for 5 seconds before continuing
Human-in-the-Loop Checkpoints
For any task with real consequences, require human approval at key decision points:
HUMAN APPROVAL REQUIRED BEFORE:
- Submitting any form
- Clicking any button that triggers a purchase or payment
- Sending any email or message
- Deleting or modifying any data
- Navigating away from the specified domain
Present a summary of what you plan to do and wait for
explicit "proceed" instruction before taking the action.
Real-World Use Cases
Computer use is still emerging, but several categories of tasks are already practical and valuable.
Web Scraping and Data Collection
When websites lack APIs or block traditional scraping tools, computer-use agents can navigate them like a human user — scrolling, clicking pagination links, and extracting visible text from screenshots.
TASK: Collect product listings from the first 3 pages of
search results on example-marketplace.com.
For each product, record:
- Product name (the bold text on each listing card)
- Price (to the right of the product name)
- Seller name (below the price in smaller gray text)
- Star rating (number of filled stars, out of 5)
Navigation:
- After recording all products on a page, click the
"Next" button at the bottom to advance
- VERIFY: The page number indicator updates after clicking
- STOP after completing page 3
Output: Present the data as a structured table.
QA and Testing Automation
Computer-use agents can walk through web applications the way a manual QA tester would — filling forms, clicking buttons, and verifying that the right things happen. This is particularly useful for testing workflows that are hard to automate with traditional testing frameworks.
TASK: Test the user registration flow.
1. Navigate to https://staging.example.com/register
2. Fill in the registration form with test data:
- Name: "Test User"
- Email: "testuser_[timestamp]@example.com"
- Password: "TestPass123!"
3. Click "Create Account"
4. VERIFY: A success message appears OR you are redirected
to a welcome page
5. CHECK: Does the welcome page display the name "Test User"?
6. CHECK: Is there a confirmation email prompt?
Report: List each CHECK with PASS/FAIL and a screenshot
of the final state.
Form Filling and Data Migration
Migrating data between systems that don't have APIs often means manually entering records. Computer-use agents can automate this tedious process.
Info
Data migration with computer use works best for small-to-medium datasets (tens to hundreds of records) where building a proper integration would take more time than the migration itself. For large datasets (thousands of records), invest in an API-based solution or database migration instead.
Competitive Research and Monitoring
Regularly checking competitor websites for pricing changes, new features, or updated positioning is a natural fit for computer use. The agent navigates to specific pages, reads the visible content, and reports changes from previous checks.
Administrative Task Automation
Routine admin work — updating spreadsheets, filing reports in internal tools, downloading recurring exports — can often be automated with computer-use agents. These are typically low-risk, high-repetition tasks where the workflow doesn't change often.
Common Mistakes
These five mistakes account for most computer-use failures. Each one seems minor in isolation but can derail an entire workflow.
1. Assuming the AI Understands the Full Page
The AI sees a screenshot — a fixed-size image of whatever is currently visible in the viewport. It does not see content below the fold, inside collapsed menus, or behind modals. If the button you want is below the visible area, you need to explicitly instruct the agent to scroll.
BAD: "Click the Submit button."
(What if Submit is below the fold?)
BETTER: "Scroll to the bottom of the form. Look for a
button labeled 'Submit' — it should be below the last
form field. If you don't see it after scrolling, the
page layout may have changed. STOP and take a screenshot."
2. Not Handling Popups, Modals, and Unexpected States
Cookie consent banners. Newsletter signup modals. Chat widgets. Browser notification permission prompts. These appear unpredictably and cover the elements the agent is trying to interact with. Without explicit handling, the agent either clicks the popup's buttons (potentially opting into things you didn't want) or gets stuck trying to click elements hidden behind the overlay.
POPUP HANDLING:
- If a cookie consent banner appears, click "Reject All"
or "Necessary Only." If neither option exists, click
"Accept" to dismiss it.
- If a newsletter or signup modal appears, look for an
"X" close button in the top-right corner and click it.
- If a chat widget obscures part of the page, ignore it
if possible. If it blocks a required element, look for
a minimize or close button on the widget.
- If a browser notification prompt appears ("This site
wants to send notifications"), click "Block" or "Deny."
3. Giving Too Many Steps at Once
Long instruction sets create compounding error risk. If step 3 fails subtly and the agent continues through step 20, you've wasted time and potentially caused damage. Break workflows into phases of 3-5 steps each, with verification checkpoints between them.
Tip
The ideal computer-use instruction set is 3-7 steps long. If your task requires more, split it into multiple instruction sets with human review between each phase. This mirrors how you'd supervise a new employee on their first day — you wouldn't give them a 30-step task and walk away.
4. No Safety Boundaries
The most dangerous computer-use prompts are the ones that say what to do but never say what not to do. Without explicit boundaries, the agent applies its best judgment — and its best judgment may include clicking through warning dialogs, dismissing confirmation prompts, or navigating to unexpected pages to "help."
Always define safety boundaries before task instructions. Make them the first thing in your prompt, not an afterthought.
5. Not Verifying Actions Completed
"Click save" is not the same as "click save and verify the data was saved." The click might miss. The page might error. The save might silently fail. Without verification, the agent moves on assuming success, and you don't discover the failure until the end of the workflow — or worse, days later when you notice data is missing.
Every action-verification pair follows this pattern:
ACTION: Click "Save Changes"
WAIT: 2-3 seconds for the page to respond
VERIFY: Look for a success indicator:
- A green "Saved" confirmation message, OR
- The page refreshing with the updated values, OR
- A brief toast notification confirming the save
IF NO CONFIRMATION: Try clicking "Save Changes" one more
time. If there is still no confirmation after 5 seconds,
STOP and report the issue.
Building Your Computer Use Prompts
Creating effective computer-use prompts follows the same general principle as any advanced prompting: structure, specificity, and safety. If you're building prompts for AI agents or automated workflows, many of the patterns transfer — but computer use demands extra care because the actions are real and visual.
Start with the prompt generator to build your base instructions, then layer on the computer-use-specific elements: safety boundaries, screen state descriptions, verification steps, and fallback behaviors. For a deeper understanding of how agentic AI and tool use work together, read the AI agents prompting guide and the developer-focused prompt engineering guide.
A template for getting started:
SAFETY BOUNDARIES:
[What the agent must never do]
ENVIRONMENT:
[What application or browser is open, what URL to start at,
what the agent should see on screen]
TASK:
[Clear objective in one sentence]
STEPS:
[3-7 steps, each with expected screen state and error recovery]
VERIFICATION:
[How to confirm the task completed successfully]
STOP CONDITIONS:
[When to stop, whether successful or not]
FALLBACK:
[What to do when something unexpected happens]
Closing Thoughts
Computer use is the bridge between AI as a thinking tool and AI as a doing tool. For the first time, the same models that can reason about code, analyze data, and write strategy can also interact with the interfaces where work actually happens — the dashboards, the forms, the applications you use every day.
But this bridge carries traffic in both directions. The same capability that lets an AI fill out a form perfectly can let it fill out the wrong form. The same capability that lets it download a report can let it interact with systems it shouldn't touch. The gap between "helpful automation" and "costly mistake" is the quality of your instructions.
The prompting patterns in this guide — explicit actions, safety boundaries, expected states, verification steps, error recovery, scoped tasks, fallback behaviors, and stop conditions — are not optional. They are the minimum safety infrastructure for giving an AI control over your computer.
Start small. Pick a single, low-stakes, repetitive task. Write instructions using the template above. Run it in a sandboxed environment. Verify every step. Then gradually expand from there, building confidence in both the technology and your ability to direct it.
The AI doesn't need to be perfect. Your instructions need to make imperfection safe.