Skip to content

feat: interactive terminal from iOS over the execd PTY WebSocket — document it + SDK helper #992

@ferponse

Description

@ferponse

Why do you need it?

We're building an iOS app that gives end-users a live, interactive terminal into a sandbox (to drive the Claude Code TUI), orchestrated through the OpenSandbox server on Kubernetes (agent-sandbox provider). "Interactive" means a real PTY: keystrokes, Ctrl-C, arrow-key menus, live resize — not command-and-capture.

After reading the code, the backend already supports this today. This issue documents the working end-to-end flow (so it isn't missed) and requests two small improvements to make it easy to consume from a native client.

How could it be?

What already works (verified in the repo):

  • execd ships a real interactive PTY-over-WebSocket API since v1.0.10 (add websocket PTY support, feat(execd): add WebSocket PTY support #590/chore: add PTY readme and fix unittest #608); source in components/execd/, doc in components/execd/PTY.md:
    • POST /pty {"cwd":...}{ "session_id": ... } (bash starts on the first WebSocket)
    • GET /pty/{session_id}/ws — PTY mode by default (?pty=0 pipe mode, ?since=<offset> replay-on-reconnect)
    • GET /pty/{id} status, DELETE /pty/{id} teardown
    • Wire protocol: server→client binary frames with byte0 = channel (0x01 stdout, 0x02 stderr in pipe mode, 0x03 replay + 8-byte big-endian offset); client→server stdin = binary 0x00 + raw bytes; control as JSON text frames {"type":"resize","cols":120,"rows":40} / {"type":"signal","signal":"SIGINT"} / {"type":"ping"}; JSON connected at start and exit (exit_code) at end; one WebSocket per session (a 2nd gets 409).
  • The server WebSocket proxy (feat(server): WebSocket proxy for bash sessions through opensandbox-server #537, server v0.1.9) forwards to it, and the proxy route is auth-exempt, so a client reaches the terminal end-to-end with no API key on the device:
    POST /sandboxes/{id}/proxy/44772/pty
    WS   /sandboxes/{id}/proxy/44772/pty/{session_id}/ws
    

What we're requesting:

  1. Document the /pty WebSocket API in the OpenAPI spec (specs/execd-api.yaml). It's currently absent there — which is exactly why it's easy to conclude execd has "no PTY" and only SSE command execution. A documented WS surface (or at least a prominent pointer to PTY.md) would prevent that.
  2. An SDK-level terminal/PTY helper (e.g. sandbox.terminal() / bash_session()) so clients don't have to hand-roll the binary framing. The server-proxy PR (feat(server): WebSocket proxy for bash sessions through opensandbox-server #537) explicitly deferred a bash_session() helper to a follow-up — this would be it.
  3. (Nice to have) a native iOS/Swift example or thin SDK wiring the above to a terminal emulator (we use SwiftTerm), as an official reference for running interactive terminals / coding agents from iPhone & iPad.

Other related information

  • The server-proxy WS path is confirmed working. Is the WebSocket upgrade also forwarded through ingress gateway mode with signed routes (get_signed_endpoint, OSEP-0011)?
  • Environment: OpenSandbox server (Kubernetes runtime, agent-sandbox provider), execd v1.0.18, Python SDK 0.1.9.
  • Happy to contribute the spec docs and/or an iOS example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions