Skip to main content

Computer Use

Computer use is an Anthropic capability in which Claude controls a virtual computer via screenshots and keyboard/mouse actions. The model sees the current screen, plans the next action, executes it (click at coordinates, type text, scroll), observes the resulting screenshot, and continues the loop. This enables agentic tasks like browser automation, form filling, and app navigation without a purpose-built API wrapper for every target app. Each step has meaningful latency and token cost because every observation is an image, so computer use is best reserved for tasks where no structured API exists.

Example

Given the instruction "open the invoicing app, find last month's invoice for Acme, and email it to finance@example.com," a computer-use agent screenshots the desktop, clicks the app icon, screenshots the app window, clicks the invoices tab, types "Acme" into the search field, opens the matching row, clicks Share, types the recipient address, and clicks Send — pausing at each step to screenshot and verify progress before the next action.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts