Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

opendesk — Python SDK

Give any AI agent eyes and hands on your desktop.

macOS · Linux · Windows

PyPI Python License: MIT


Install

pip install 'opendesk[core,mcp]'
opendesk install

opendesk install registers the MCP server with Claude Code. For Cursor, Windsurf, Continue, or any other MCP client, see the MCP integrations section below.


Quick start

import asyncio
from opendesk import create_registry, allow_all_context

async def main():
    registry = create_registry()
    ctx = allow_all_context()

    # Screenshot with Set-of-Marks
    shot = registry.get("screenshot")
    result = await shot.execute(ctx, shot.Params(marks=True))
    print(result.output)

    # Click a button by name — no coordinates needed
    ui = registry.get("ui")
    await ui.execute(ctx, ui.Params(action="click", app="TextEdit", title="File"))

asyncio.run(main())

Installation options

pip install opendesk                              # core framework only
pip install 'opendesk[core,mcp]'                  # + screen capture + MCP server (recommended)
pip install 'opendesk[core,mcp,learn]'            # + task recording and replay
pip install 'opendesk[core,mcp,learn,schedule]'   # + scheduled tasks
pip install 'opendesk[core,mcp,remote]'           # + remote machine control
pip install 'opendesk[all]'                       # everything

Tools

Tool What it does
screenshot Capture screen with Set-of-Marks on every interactive element
ui Click and type by element name — no coordinates needed
mouse Pixel-level mouse control for anything ui can't reach
keyboard Type text, press keys, send hotkeys
app Open, close, and focus applications
clipboard Read and write the system clipboard
ocr Extract text from any region of the screen
learn Record a workflow once, replay it anytime
schedule Run any task on a timer
audit Show the session audit log in any MCP session

Full reference: docs/tools.md


Remote machine control

opendesk supports controlling remote machines over an encrypted WebSocket connection with mDNS peer discovery.

# On the machine to be controlled:
pip install 'opendesk[core,mcp,remote]'
opendesk pair            # prints a pairing code

# On the controlling machine:
opendesk pair-with <host> <code>
opendesk serve           # start the server

See docs/remote.md and docs/protocol.md for full details.


MCP integrations

Claude Code

opendesk install        # register globally
opendesk uninstall      # remove

Claude Desktop

{
  "mcpServers": {
    "opendesk": { "command": "opendesk-mcp" }
  }
}

Cursor / Continue

{
  "mcpServers": [{ "name": "opendesk", "command": "opendesk-mcp", "transport": "stdio" }]
}

Agent integrations

Anthropic SDK

import anthropic
from opendesk.integrations.claude_code import ClaudeCodeAdapter
from opendesk.registry import create_registry

client = anthropic.Anthropic()
adapter = ClaudeCodeAdapter(create_registry())

result = await adapter.run_loop(
    client=client,
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "Open TextEdit and type Hello."}],
    system="Use the ui tool first. Mouse is a last resort.",
)

OpenAI / on-device models (Ollama, vLLM, llama.cpp)

from openai import OpenAI
from opendesk.integrations.openai_compat import OpenAIAdapter

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
adapter = OpenAIAdapter()
result = await adapter.run_loop(client, model="qwen2.5:72b", messages=messages)

LangChain

from langchain_anthropic import ChatAnthropic
from langgraph.prebuilt import create_react_agent
from opendesk.integrations.langchain_compat import as_langchain_tools
from opendesk.registry import create_registry

tools = as_langchain_tools(create_registry())
agent = create_react_agent(ChatAnthropic(model="claude-opus-4-6"), tools)

Build from source

cd python
pip install -e '.[core,mcp]'

Docs

License

MIT