Computer use tool - Anthropic (original) (raw)

Claude can interact with computer environments through the computer use tool, which provides screenshot capabilities and mouse/keyboard control for autonomous desktop interaction.

Overview

Computer use is a beta feature that enables Claude to interact with desktop environments. This tool provides:

While computer use can be augmented with other tools like bash and text editor for more comprehensive automation workflows, computer use specifically refers to the computer use tool’s capability to see and control desktop environments.

Model compatibility

Computer use is available for the following Claude models:

Model Tool Version Beta Flag
Claude 4 Opus & Sonnet computer_20250124 computer-use-2025-01-24
Claude Sonnet 3.7 computer_20250124 computer-use-2025-01-24
Claude Sonnet 3.5 (new) computer_20241022 computer-use-2024-10-22

Security considerations

Computer use reference implementationGet started quickly with our computer use reference implementation that includes a web interface, Docker container, example tool implementations, and an agent loop.Note: The implementation has been updated to include new tools for both Claude 4 and Claude Sonnet 3.7. Be sure to pull the latest version of the repo to access these new features.

Quick start

Here’s how to get started with computer use:


How computer use works

We refer to the repetition of steps 3 and 4 without user input as the “agent loop” - i.e., Claude responding with a tool use request and your application responding to Claude with the results of evaluating that request.

The computing environment

Computer use requires a sandboxed computing environment where Claude can safely interact with applications and the web. This environment includes:

  1. Virtual display: A virtual X11 display server (using Xvfb) that renders the desktop interface Claude will see through screenshots and control with mouse/keyboard actions.
  2. Desktop environment: A lightweight UI with window manager (Mutter) and panel (Tint2) running on Linux, which provides a consistent graphical interface for Claude to interact with.
  3. Applications: Pre-installed Linux applications like Firefox, LibreOffice, text editors, and file managers that Claude can use to complete tasks.
  4. Tool implementations: Integration code that translates Claude’s abstract tool requests (like “move mouse” or “take screenshot”) into actual operations in the virtual environment.
  5. Agent loop: A program that handles communication between Claude and the environment, sending Claude’s actions to the environment and returning the results (screenshots, command outputs) back to Claude.

When you use computer use, Claude doesn’t directly connect to this environment. Instead, your application:

  1. Receives Claude’s tool use requests
  2. Translates them into actions in your computing environment
  3. Captures the results (screenshots, command outputs, etc.)
  4. Returns these results to Claude

For security and isolation, the reference implementation runs all of this inside a Docker container with appropriate port mappings for viewing and interacting with the environment.


How to implement computer use

Start with our reference implementation

We have built a reference implementation that includes everything you need to get started quickly with computer use:

Understanding the multi-agent loop

The core of computer use is the “agent loop” - a cycle where Claude requests tool actions, your application executes them, and returns results to Claude. Here’s a simplified example:

The loop continues until either Claude responds without requesting any tools (task completion) or the maximum iteration limit is reached. This safeguard prevents potential infinite loops that could result in unexpected API costs.

We recommend trying the reference implementation out before reading the rest of this documentation.

Optimize model performance with prompting

Here are some tips on how to get the best quality outputs:

  1. Specify simple, well-defined tasks and provide explicit instructions for each step.
  2. Claude sometimes assumes outcomes of its actions without explicitly checking their results. To prevent this you can prompt Claude with After each step, take a screenshot and carefully evaluate if you have achieved the right outcome. Explicitly show your thinking: "I have evaluated step X..." If not correct, try again. Only when you confirm a step was executed correctly should you move on to the next one.
  3. Some UI elements (like dropdowns and scrollbars) might be tricky for Claude to manipulate using mouse movements. If you experience this, try prompting the model to use keyboard shortcuts.
  4. For repeatable tasks or UI interactions, include example screenshots and tool calls of successful outcomes in your prompt.
  5. If you need the model to log in, provide it with the username and password in your prompt inside xml tags like <robot_credentials>. Using computer use within applications that require login increases the risk of bad outcomes as a result of prompt injection. Please review our guide on mitigating prompt injections before providing the model with login credentials.

System prompts

When one of the Anthropic-defined tools is requested via the Anthropic API, a computer use-specific system prompt is generated. It’s similar to the tool use system prompt but starts with:

You have access to a set of functions you can use to answer the user’s question. This includes access to a sandboxed computing environment. You do NOT currently have the ability to inspect files or interact with external resources, except by invoking the below functions.

As with regular tool use, the user-provided system_prompt field is still respected and used in the construction of the combined system prompt.

Available actions

The computer use tool supports these actions:

Basic actions (all versions)

**Enhanced actions (computer_20250124)**Available in Claude 4 and Claude Sonnet 3.7:

Tool parameters

Parameter Required Description
type Yes Tool version (computer_20250124 or computer_20241022)
name Yes Must be “computer”
display_width_px Yes Display width in pixels
display_height_px Yes Display height in pixels
display_number No Display number for X11 environments

Enable thinking capability in Claude 4 and Claude Sonnet 3.7

Claude Sonnet 3.7 introduced a new “thinking” capability that allows you to see the model’s reasoning process as it works through complex tasks. This feature helps you understand how Claude is approaching a problem and can be particularly valuable for debugging or educational purposes.

To enable thinking, add a thinking parameter to your API request:

The budget_tokens parameter specifies how many tokens Claude can use for thinking. This is subtracted from your overall max_tokens budget.

When thinking is enabled, Claude will return its reasoning process as part of the response, which can help you:

  1. Understand the model’s decision-making process
  2. Identify potential issues or misconceptions
  3. Learn from Claude’s approach to problem-solving
  4. Get more visibility into complex multi-step operations

Here’s an example of what thinking output might look like:

Augmenting computer use with other tools

The computer use tool can be combined with other tools to create more powerful automation workflows. This is particularly useful when you need to:

Build a custom computer use environment

The reference implementation is meant to help you get started with computer use. It includes all of the components needed have Claude use a computer. However, you can build your own environment for computer use to suit your needs. You’ll need:

Implement the computer use tool

The computer use tool is implemented as a schema-less tool. When using this tool, you don’t need to provide an input schema as with other tools; the schema is built into Claude’s model and can’t be modified.

Handle errors

When implementing the computer use tool, various errors may occur. Here’s how to handle them:

Follow implementation best practices


Understand computer use limitations

The computer use functionality is in beta. While Claude’s capabilities are cutting edge, developers should be aware of its limitations:

  1. Latency: the current computer use latency for human-AI interactions may be too slow compared to regular human-directed computer actions. We recommend focusing on use cases where speed isn’t critical (e.g., background information gathering, automated software testing) in trusted environments.
  2. Computer vision accuracy and reliability: Claude may make mistakes or hallucinate when outputting specific coordinates while generating actions. Claude Sonnet 3.7 introduces the thinking capability that can help you understand the model’s reasoning and identify potential issues.
  3. Tool selection accuracy and reliability: Claude may make mistakes or hallucinate when selecting tools while generating actions or take unexpected actions to solve problems. Additionally, reliability may be lower when interacting with niche applications or multiple applications at once. We recommend that users prompt the model carefully when requesting complex tasks.
  4. Scrolling reliability: While Claude Sonnet 3.5 (new) had limitations with scrolling, Claude Sonnet 3.7 introduces dedicated scroll actions with direction control that improves reliability. The model can now explicitly scroll in any direction (up/down/left/right) by a specified amount.
  5. Spreadsheet interaction: Mouse clicks for spreadsheet interaction have improved in Claude Sonnet 3.7 with the addition of more precise mouse control actions like left_mouse_down, left_mouse_up, and new modifier key support. Cell selection can be more reliable by using these fine-grained controls and combining modifier keys with clicks.
  6. Account creation and content generation on social and communications platforms: While Claude will visit websites, we are limiting its ability to create accounts or generate and share content or otherwise engage in human impersonation across social media websites and platforms. We may update this capability in the future.
  7. Vulnerabilities: Vulnerabilities like jailbreaking or prompt injection may persist across frontier AI systems, including the beta computer use API. In some circumstances, Claude will follow commands found in content, sometimes even in conflict with the user’s instructions. For example, Claude instructions on webpages or contained in images may override instructions or cause Claude to make mistakes. We recommend: a. Limiting computer use to trusted environments such as virtual machines or containers with minimal privileges b. Avoiding giving computer use access to sensitive accounts or data without strict oversight c. Informing end users of relevant risks and obtaining their consent before enabling or requesting permissions necessary for computer use features in your applications
  8. Inappropriate or illegal actions: Per Anthropic’s terms of service, you must not employ computer use to violate any laws or our Acceptable Use Policy.

Always carefully review and verify Claude’s computer use actions and logs. Do not use Claude for tasks requiring perfect precision or sensitive user information without human oversight.


Pricing

Computer use follows the standard tool use pricing. When using the computer use tool:

System prompt overhead: The computer use beta adds 466-499 tokens to the system prompt

Computer use tool token usage:

Model Input tokens per tool definition
Claude 4 / Sonnet 3.7 735 tokens
Claude Sonnet 3.5 683 tokens

Additional token consumption:

Note: If you’re also using bash or text editor tools alongside computer use, those tools have their own token costs as documented in their respective pages.

Next steps