Codesk Control is a text-first macOS control surface for Codex-style agents. It gives agents a fast, structured way to inspect and operate desktop apps through native UI state, keyboard shortcuts, Accessibility labels, menus, paste/type actions, app activation, URL/file opening, and screenshots only when the interface cannot describe itself.
The project is built around a simple idea:
Use text to know. Use shortcuts to move. Use Accessibility to confirm. Use vision only when the screen refuses to describe itself.
Why this matters: screenshot-first desktop control is general, but it often forces an agent through a slow loop of capture, visual interpretation, coordinate selection, and delayed verification. Codesk Control makes the common path semantic instead. The native MCP server exposes macOS state and actions as typed tools, so an agent can read the front app, focused element, selected text, visible Accessibility text, and window title, then activate apps, send app-aware shortcuts, paste text, press named controls, select menus, wait for UI changes, or capture a screenshot as a last resort.
The executable is codesk. Run it directly as a CLI, or launch the persistent stdio MCP server with codesk mcp.
Watch/download the short product promo: codesk-control-product-promo.mp4.
In a live Codex desktop environment, moving the plugin path from a Node MCP wrapper that spawned the CLI per request to a persistent native Swift MCP server reduced median codesk_mcp_state latency from 34.69 ms to 1.37 ms.
codesk_mcp_state 34.69 ms -> 1.37 ms 25.32x faster
osascript_front_app 124.39 ms -> 124.92 ms about same
screencapture_png 77.05 ms -> 76.11 ms about same
After the native MCP change, repeated codesk_mcp_state calls were roughly:
55.6x faster than screenshot capture
91.2x faster than an AppleScript front-app query
104.9x faster than an AppleScript window-title query
The benchmark measures latency and structured observation coverage. In the saved live run, Codesk CLI/MCP returned the front app, bundle id, process id presence, window-title presence, Accessibility permission state, and visible text count; the screenshot path returned pixels only. See:
codesk mcp is a native Swift stdio MCP server. It keeps the desktop-control process alive and exposes narrow, typed tools instead of asking an agent to run raw shell commands or click coordinates.
| Capability | MCP tools | What agents use it for |
|---|---|---|
| Inspect UI state | codesk_state, codesk_text |
Read front app, bundle id, PID presence, window title, focused element, selected text, permission state, and visible Accessibility text. |
| Move between apps and targets | codesk_app, codesk_open |
Activate or launch apps, open files/folders, and open URLs through Launch Services for native macOS workflows. |
| Drive common shortcuts | codesk_quick, codesk_quick_list, codesk_key, codesk_keys |
Use app-aware aliases such as explicit browser chrome controls, VS Code quick open, Finder Go to Folder, or shortcut chords. |
| Enter text | codesk_paste, codesk_type |
Paste longer content with clipboard restoration, or type key-by-key when paste is rejected. |
| Wait and locate | codesk_wait, codesk_find |
Wait for text/title/app/focus changes and find visible Accessibility elements before acting. |
| Act semantically | codesk_press, codesk_menu |
Press named buttons/controls and choose menu paths such as File > Save. |
| Batch native actions | codesk_batch |
Combine short allowlisted app/open/quick/key/paste/wait/find/press/menu/text/state/sleep workflows in one MCP round trip after the exact native steps are known. |
| Fallback and admin | codesk_screenshot, codesk_permissions, codesk_raw |
Capture screenshots only when text state is insufficient, check/request Accessibility permission, or run a raw codesk command as an escape hatch. |
- Native Swift CLI and native stdio MCP server:
codesk mcp. - Structured UI snapshots: front app, bundle id, process id, window title, focused element, selected text, permission state, and visible Accessibility text.
- App-aware quick shortcuts: explicit browser chrome controls, Finder Go to Folder, VS Code quick open/command palette, terminal actions, and more.
- Batched native workflows: run short explicit app, shortcut, paste, wait, and Accessibility sequences in one persistent MCP call.
- Semantic actions: press labeled controls and select menu paths such as
File > Save. - Fast paste/type helpers with clipboard restoration.
- Screenshot fallback when text and Accessibility state are insufficient.
- Local Codex plugin for explicit native macOS helper actions. It should stay out of the way of Browser, Chrome, Computer Use, Codex Web, and DOM tools for normal browser and visual UI tasks.
- macOS 13 or newer.
- Swift 6.1 or newer.
- Xcode for
swift testwith Swift Testing. - Node.js for the optional benchmark script.
swift build
swift test
swift run codesk selftestRun directly from the package:
swift run codesk helpInstall the debug binary somewhere on your PATH if you want:
cp .build/debug/codesk /usr/local/bin/codeskBuild the optimized binary for local use or packaging:
swift build -c release
cp .build/release/codesk /usr/local/bin/codeskCodesk Control is a narrow native macOS helper, not the default desktop controller. For normal browser, Chrome, visual UI, coordinate, scrolling, dragging, or exploratory workflows, use Browser, Chrome, Codex Web, DOM tools, or Computer Use first.
Use this ladder only when the task is explicitly native macOS app work:
- Use Codesk only for explicit native actions such as app focus, native file/URL open, known shortcuts, focused-field paste/type, app menus, permissions, or OS recovery.
- Use
codesk_batchonly after several exact native steps are already known. - Use
codesk stateorcodesk textonly when native focus or focused-field state is the actual uncertainty. - If the task becomes visual, exploratory, coordinate-based, scroll/drag-heavy, or element-index based, switch to Computer Use.
- For browser/page work, switch to Browser, Chrome, Codex Web, or DOM tools.
For Chrome activation, use the exact app name or bundle id only when native browser chrome focus is explicitly needed: codesk app "Google Chrome" or codesk app com.google.Chrome. The shorter chrome alias resolves to the same bundle.
For Codex Web or browser page work, prefer Codex Web, Browser Use, or DOM web tools when they are available. Codesk can launch or focus a native browser, open an external URL in that browser when explicitly requested, operate browser chrome, choose menus, or recover from OS-level focus problems; it should not be the default path for page DOM inspection, extraction, clicks, form entry, waits, screenshots, or localhost/file:// website testing.
Avoid these common routing traps:
- If the task context says the in-app browser is open, includes a current URL, mentions Codex Web, Browser Use, DOM, localhost, or asks for browser control, keep page work on the browser/DOM surface.
- If
codesk_statereports front appCodexwhile the target is a browser page, treat that as a surface mismatch and use browser tools or explicit OS focus recovery. - If
codesk_findorcodesk_pressmisses once on browser page text, switch to DOM/page tooling instead of trying more label variants. - Use scoped quick aliases such as
chrome.address,safari.address, andvscode.quick_openwhen the front app is uncertain. Bare aliases such asaddressare app-aware conveniences only after the front app is known. - Keep
codesk_waitshort for native app confirmation. Do not wait on exact browser page titles or page text when DOM waits are available. - Keep
codesk_rawfor troubleshooting the CLI itself, not as the fallback path for browser page automation.
codesk state [--json] [--limit n]
codesk text [--limit n]
codesk app <name-or-bundle-id>
codesk open <path-or-url>
codesk key <chord>
codesk keys <chord> [<chord> ...]
codesk q <alias> [<alias> ...]
codesk q list
codesk type <text>
codesk paste [--leave-clipboard] <text>
codesk wait <text|title|app|focus> <value> [--timeout seconds]
codesk find <text>
codesk press <label>
codesk menu "File > Save"
codesk screenshot [path.png]
codesk permissions [--prompt]
codesk mcpOpen a page in Safari when the user explicitly wants native browser chrome control and DOM tooling is unavailable:
codesk app Safari
codesk q safari.address
codesk paste "https://example.com"
codesk key enter
codesk textOpen a file in VS Code:
codesk app "Visual Studio Code"
codesk q quick_open
codesk paste "Sources/CodeskControl/CLI.swift"
codesk key enterSave through the menu:
codesk menu "File > Save"Use cases include browser chrome navigation, editor/IDE control, document save/export flows, desktop state monitoring, and human-auditable automation traces such as pressed AXButton title=Save instead of clicked x=844 y=613.
Keyboard events and Accessibility inspection need macOS privacy permission for the built binary or the host terminal. Start here:
codesk permissions --promptThen enable the relevant binary or terminal in System Settings > Privacy & Security > Accessibility.
Screenshots may also require Screen Recording permission.
This repo includes a local Codex plugin at:
plugins/codesk-controlThe Codex plugin surface is currently quarantined. Its manifest intentionally does not register skills or mcpServers, and the local marketplace installer marks it NOT_AVAILABLE.
Why: current Codex plugin routing can over-select broad desktop-control MCP tools once a plugin is active, even when tool descriptions say to prefer Browser, Chrome, Computer Use, Codex Web, or DOM tools. Codesk remains useful as a standalone CLI and development experiment, but it should not be enabled as a normal Codex plugin until that routing behavior is safe.
The repo still contains the native MCP server implementation for development:
codesk_state
codesk_text
codesk_app
codesk_open
codesk_key
codesk_keys
codesk_quick
codesk_quick_list
codesk_paste
codesk_type
codesk_wait
codesk_find
codesk_press
codesk_menu
codesk_screenshot
codesk_permissions
codesk_batch
codesk_raw
To install/update the local marketplace entry in its quarantined state:
swift build -c release
scripts/install-codesk-plugin.shRestart Codex after changing plugin installation state. Do not reactivate Codesk in normal Codex sessions unless you are deliberately testing the MCP surface.
The plugin launcher looks for the binary in this order:
CODESK_BIN, when set.CODESK_REPO_ROOT/.build/release/codeskorCODESK_REPO_ROOT/.build/debug/codesk, when set.- The repo reached through the local plugin symlink at
~/plugins/codesk-control. .build/release/codeskor.build/debug/codesknext to a source checkout.bin/codeskfrom a release archive.codeskonPATH.
Create a speed and behavior baseline against legacy control paths:
swift build -c release
scripts/benchmark-control.mjsThe benchmark compares Codesk CLI, Codesk MCP, AppleScript/osascript, and screenshot capture, and records a live inventory of running Codex-related systems. See benchmarks/README.md.
Generated benchmark results are ignored by git because they can contain local process and machine details.
The current benchmark measures latency and structured observation coverage, not a full ground-truth task accuracy suite. In the saved live run:
Codesk CLI/MCP: front app, bundle id, PID presence, window-title presence,
Accessibility trust, and visible text count.
osascript: narrow one-field probes.
screenshot: pixels only, no structured text without vision/OCR.
Future work should add labeled end-to-end task suites for true success-rate and semantic accuracy comparisons.
Codesk Control can send keyboard events and invoke Accessibility actions in arbitrary apps once macOS permissions are granted. Use it with the same care as other desktop automation tools:
- Grant Accessibility and Screen Recording permissions deliberately.
- Avoid destructive shortcuts unless the user explicitly requested them.
- Prefer typed MCP tools over raw command escape hatches.
- Keep audit logs semantic where possible: app, window, menu path, button label, and shortcut alias.
Run the local release gate:
scripts/release-check.shThat runs tests, builds the release binary, runs codesk selftest, and writes a versioned archive plus SHA-256 checksum to dist/.
To publish through GitHub Actions:
git tag v0.2.0
git push origin v0.2.0The release workflow packages the archive and creates or updates the GitHub release for the pushed tag.