Skip to content

Latest commit

 

History

History
526 lines (407 loc) · 10.1 KB

File metadata and controls

526 lines (407 loc) · 10.1 KB
title Computer Action
openapi POST /computer-use
description Execute computer actions in the virtual desktop environment

Execute actions like mouse movements, clicks, keyboard input, and screenshots in the Bytebot desktop environment.

Request

The type of computer action to perform. Must be one of: `move_mouse`, `trace_mouse`, `click_mouse`, `press_mouse`, `drag_mouse`, `scroll`, `type_keys`, `press_keys`, `type_text`, `wait`, `screenshot`, `cursor_position`.

Mouse Actions

The target coordinates to move to.
<Expandable title="coordinates properties">
  <ParamField body="x" type="number" required>
    X coordinate (horizontal position)
  </ParamField>
  <ParamField body="y" type="number" required>
    Y coordinate (vertical position)
  </ParamField>
</Expandable>

Example Request

{
  "action": "move_mouse",
  "coordinates": {
    "x": 100,
    "y": 200
  }
}
Array of coordinate objects for the mouse path.
<Expandable title="path">
  <ParamField body="0" type="object">
    <Expandable title="properties">
      <ParamField body="x" type="number" required>
        X coordinate
      </ParamField>
      <ParamField body="y" type="number" required>
        Y coordinate
      </ParamField>
    </Expandable>
  </ParamField>
</Expandable>

{" "}

Keys to hold while moving the mouse along the path.

Example Request

{
  "action": "trace_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 150, "y": 150 },
    { "x": 200, "y": 200 }
  ],
  "holdKeys": ["shift"]
}
The coordinates to click (uses current cursor position if omitted).
<Expandable title="coordinates properties">
  <ParamField body="x" type="number" required>
    X coordinate (horizontal position)
  </ParamField>
  <ParamField body="y" type="number" required>
    Y coordinate (vertical position)
  </ParamField>
</Expandable>

{" "}

Mouse button to click. Must be one of: `left`, `right`, `middle`.

{" "}

Number of clicks to perform.

{" "}

Keys to hold while clicking (e.g., ['ctrl', 'shift']) Key name

Example Request

{
  "action": "click_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "clickCount": 2
}
The coordinates to press/release (uses current cursor position if omitted).
<Expandable title="coordinates properties">
  <ParamField body="x" type="number" required>
    X coordinate (horizontal position)
  </ParamField>
  <ParamField body="y" type="number" required>
    Y coordinate (vertical position)
  </ParamField>
</Expandable>

{" "}

Mouse button to press/release. Must be one of: `left`, `right`, `middle`.

{" "}

Whether to press or release the button. Must be one of: `up`, `down`.

Example Request

{
  "action": "press_mouse",
  "coordinates": {
    "x": 150,
    "y": 250
  },
  "button": "left",
  "press": "down"
}
Array of coordinate objects for the drag path.
<Expandable title="path">
  <ParamField body="0" type="object">
    <Expandable title="properties">
      <ParamField body="x" type="number" required>
        X coordinate
      </ParamField>
      <ParamField body="y" type="number" required>
        Y coordinate
      </ParamField>
    </Expandable>
  </ParamField>
</Expandable>

{" "}

Mouse button to use for dragging. Must be one of: `left`, `right`, `middle`.

{" "}

Keys to hold while dragging.

Example Request

{
  "action": "drag_mouse",
  "path": [
    { "x": 100, "y": 100 },
    { "x": 200, "y": 200 }
  ],
  "button": "left"
}
The coordinates to scroll at (uses current cursor position if omitted).
<Expandable title="coordinates properties">
  <ParamField body="x" type="number" required>
    X coordinate
  </ParamField>
  <ParamField body="y" type="number" required>
    Y coordinate
  </ParamField>
</Expandable>

{" "}

Scroll direction. Must be one of: `up`, `down`, `left`, `right`.

{" "}

Number of scroll steps to perform.

{" "}

Keys to hold while scrolling.

Example Request

{
  "action": "scroll",
  "direction": "down",
  "scrollCount": 5
}

Keyboard Actions

Array of keys to type in sequence. Key name

{" "}

Delay between key presses in milliseconds.

Example Request

{
  "action": "type_keys",
  "keys": ["a", "b", "c", "enter"],
  "delay": 50
}
Array of keys to press or release. Key name

{" "}

Whether to press or release the keys. Must be one of: `up`, `down`.

Example Request

{
  "action": "press_keys",
  "keys": ["ctrl", "shift", "esc"],
  "press": "down"
}
The text string to type.

{" "}

Delay between characters in milliseconds.

Example Request

{
  "action": "type_text",
  "text": "Hello, Bytebot!",
  "delay": 50
}
The text to paste. Useful for special characters that aren't on the standard keyboard.

Example Request

{
  "action": "paste_text",
  "text": "Special characters: ©®™€¥£ émojis 🎉"
}

System Actions

Wait duration in milliseconds.

Example Request

{
  "action": "wait",
  "duration": 2000
}
No parameters required.

Example Request

{
  "action": "screenshot"
}
No parameters required.

Example Request

{
  "action": "cursor_position"
}
The application to switch to. Available options: `firefox`, `1password`, `thunderbird`, `vscode`, `terminal`, `desktop`, `directory`.

Example Request

{
  "action": "application",
  "application": "firefox"
}

Available Applications:

  • firefox - Mozilla Firefox web browser
  • 1password - Password manager
  • thunderbird - Email client
  • vscode - Visual Studio Code editor
  • terminal - Terminal/console application
  • desktop - Switch to desktop
  • directory - File manager/directory browser

Response

Responses vary based on the action performed:

Default Response

Most actions return a simple success response:

{
  "success": true
}

Screenshot Response

Returns the screenshot as a base64 encoded string:

{
  "success": true,
  "data": {
    "image": "base64_encoded_image_data"
  }
}

Cursor Position Response

Returns the current cursor position:

{
  "success": true,
  "data": {
    "x": 123,
    "y": 456
  }
}

Error Response

{
  "success": false,
  "error": "Error message"
}

Code Examples

```bash cURL curl -X POST http://localhost:9990/computer-use \ -H "Content-Type: application/json" \ -d '{"action": "move_mouse", "coordinates": {"x": 100, "y": 200}}' ```
import requests

def control_computer(action, **params):
    url = "http://localhost:9990/computer-use"
    data = {"action": action, **params}
    response = requests.post(url, json=data)
    return response.json()

# Move the mouse
result = control_computer("move_mouse", coordinates={"x": 100, "y": 100})
print(result)
const axios = require("axios");

async function controlComputer(action, params = {}) {
  const url = "http://localhost:9990/computer-use";
  const data = { action, ...params };
  const response = await axios.post(url, data);
  return response.data;
}

// Move mouse example
controlComputer("move_mouse", { coordinates: { x: 100, y: 100 } })
  .then((result) => console.log(result))
  .catch((error) => console.error("Error:", error));