From f89e0db8cdc4f58a6e6585cd74ae5060829a33fd Mon Sep 17 00:00:00 2001 From: Brent G Date: Thu, 28 May 2026 22:11:24 +0000 Subject: [PATCH] docs: remote agent control spec (Astation v1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-contained spec for v1 of driving a coding agent (claude/codex) running under atem from Astation macOS, over the existing relay/direct channel. Up-lane only: text / voice / keys β†’ atem injects to the agent PTV stdin; output watched in atem's terminal (no screen mirror). Grounded in Astation's real code (AstationMessage, AstationHubManager sendHandler + routeToFocusedAtem + relay envelope, VoiceCodingManager). Defines the authoritative `agentInput` wire contract (atem_id = relay envelope; payload = agentId + input) and the concrete v1 task list so a fresh agent on the build/test Mac can start without the Atem repo. πŸ€– Built with SMT --- .../2026-05-28-remote-agent-control-design.md | 154 ++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 docs/specs/2026-05-28-remote-agent-control-design.md diff --git a/docs/specs/2026-05-28-remote-agent-control-design.md b/docs/specs/2026-05-28-remote-agent-control-design.md new file mode 100644 index 0000000..f839779 --- /dev/null +++ b/docs/specs/2026-05-28-remote-agent-control-design.md @@ -0,0 +1,154 @@ +# Remote Agent Control β€” Astation side (v1) + +Status: spec β€” 2026-05-28 +Scope: **Astation (macOS) v1 only.** Mobile + screen-mirror are later phases. + +> This spec is self-contained β€” you do not need the Atem repo to start. The +> Atem-side counterpart lives in the `Atem` repo at +> `designs/remote-agent-control.md`; this doc restates everything Astation +> needs and defines the wire contract both sides must match. + +## What we're building (v1) + +A user is running a coding agent (Claude Code / Codex) **under atem** on some +machine. atem owns that agent's terminal (PTY). We want Astation (this macOS +app) to act as a **remote control** that sends the agent input β€” **text**, +**voice**, or **control keys** β€” over the existing Astation↔atem channel +(direct or via the relay). atem injects that input into the agent's stdin. + +**v1 is up-lane only.** Astation sends input; it does **not** mirror the +agent's screen. The user watches the agent's output in the terminal where atem +runs. (Screen mirroring is a later phase, for mobile, where there's no terminal +to look at.) + +So v1 = "type/speak an instruction in Astation β†’ it appears in the claude/codex +session running under atem on the target machine." + +## Why this is small + +The hard parts already exist in Astation: + +- **Transport + targeting + relay envelope** β€” `AstationHubManager.sendHandler?(message, targetId)` is the universal send. For relay clients (`targetId == "relay-"`) it already wraps the message as `{"atem_id": "", "payload": }` and sends it over the identity relay; for direct clients it sends as-is. `routeToFocusedAtem()` picks the target atem (pinned or focused). +- **Voice** β€” `VoiceCodingManager` + `sendVoiceCommand(text:isFinal:)` already do mic β†’ ConvoAI ASR β†’ `voiceCommand` / `voiceRequest` β†’ atem. **Reuse as-is.** No new voice work in v1. +- **Message plumbing** β€” `AstationMessage` (tagged `type`/`data` enum) with manual `Codable`. See CLAUDE.md β†’ "Adding a New Message Type". + +The genuinely new work is one message (`agentInput`) for **text + keys**, a send +method mirroring `sendVoiceCommand`, and a minimal UI to enter it. + +## Wire contract (must match the Atem side exactly) + +**Relay envelope** (already produced by `sendHandler` for relay clients; the relay +routes by `atem_id`): + +```json +{ "atem_id": "", "payload": } +``` + +So **`atem_id` is the envelope's job β€” do NOT put it inside the message payload.** +The `agentInput` payload carries only the agent selector + the input: + +```json +{ + "type": "agentInput", + "data": { + "agentId": "", + "kind": "text" | "key", + "text": "refactor the auth module", // when kind == "text" + "key": "enter|esc|ctrl-c|up|down|y|n" // when kind == "key" + } +} +``` + +- `kind:"text"` β†’ atem writes `text` + `\n` to the agent PTY stdin. +- `kind:"key"` β†’ atem writes the raw byte(s) for that key to the PTY + (`enter`β†’`\r`, `esc`β†’`\x1b`, `ctrl-c`β†’`\x03`, `up`/`down`β†’CSI arrows, `y`/`n`β†’literal). + +Voice does **not** use `agentInput` in v1 β€” it stays on the existing +`voiceCommand`/`voiceRequest` path (ConvoAI transcribes; the result reaches the +agent already). Folding voice into `agentInput` is a later cleanup, not v1. + +## Astation implementation tasks (v1) + +1. **Add the `agentInput` case to `AstationMessage`** + - File: `Sources/Menubar/AstationMessage.swift` + - Add `case agentInput(agentId: String?, kind: String, text: String?, key: String?)` + - Add `agentInput` to the `MessageType` enum (raw value `"agentInput"`). + - Implement `encode`/`decode` for it (nested `data` container with + `agentId`, `kind`, `text`, `key`), following the existing cases. Encode + omits nil fields (`encodeIfPresent`). + - Follow CLAUDE.md β†’ "Adding a New Message Type". + +2. **Add a send method to `AstationHubManager`** (mirror `sendVoiceCommand`) + - File: `Sources/Menubar/AstationHubManager.swift` + - `func sendAgentText(_ text: String, agentId: String? = nil)`: + ```swift + guard let clientId = routeToFocusedAtem() else { Log.info("No Atem β€” dropped"); return } + sendHandler?(.agentInput(agentId: agentId, kind: "text", text: text, key: nil), clientId) + ``` + - `func sendAgentKey(_ key: String, agentId: String? = nil)`: same, `kind:"key", key:key, text:nil`. + - `routeToFocusedAtem()` already returns the right `clientId` (relay or direct); + the envelope wrapping is automatic. + +3. **agent selection (`agentId`)** + - v1 may leave `agentId = nil` (atem targets its focused/only agent). + - If you want multi-agent now: Astation already has `agentListRequest` / + `agentListResponse(agents:)` and `AtemAgentInfo`. Add a picker that sets a + selected `agentId`; pass it into `sendAgentText/Key`. Otherwise defer. + +4. **Minimal control UI** + - A window/panel with: a **text field + Send** (calls `sendAgentText`), and a + small **key bar** (Enter / Esc / Ctrl-C / ↑ / ↓ / y / n β†’ `sendAgentKey`). + - Wire the existing **mic** (PTT Ctrl+V / hands-free) as-is β€” it already sends + voice. Optionally surface the focused-atem name so the user knows the target. + - Keep it simple; this is a remote control, not a terminal. + +5. **(No down-lane.)** Do not build screen mirroring in v1. + +## Atem side (other repo β€” for awareness; do not implement here) + +For the round-trip to work, the Atem side (`Atem` repo) must: +- Add a matching `AstationMessage::AgentInput { agent_id, kind, text, key }` + (`#[serde(rename = "agentInput")]`, `data` fields matching the contract above). +- On receipt, resolve the target agent (focused/only in v1) and **write to its + PTY stdin**: `kind:text` β†’ `text` + `\n`; `kind:key` β†’ the raw byte(s). +- Voice already handled via `pending_voice_request` / `voiceRequest`. + +Reference: `Atem/designs/remote-agent-control.md`. (Note: that doc's `agentInput` +sketch put `atem_id` inside the message; the **authoritative** contract is here β€” +`atem_id` is the relay envelope, the payload carries `agentId` + input. The Atem +doc should be aligned.) + +## Acceptance test + +1. On a test machine, run atem and launch an agent under it (`atem agent launch` + or the PTY claude/codex session atem manages). +2. Pair this Astation to that atem (direct on LAN, or via relay β€” the + `wss://…/ws?role=astation&code=astation-` identity room; atem joins with + the same code). +3. In Astation, type "print working directory" β†’ Send. +4. **Expect**: the text appears at the agent's prompt in atem's terminal and the + agent acts on it. +5. Press **Ctrl-C** in the key bar while the agent is working β†’ it interrupts. +6. Speak via PTT (Ctrl+V) β†’ transcribed text reaches the agent (existing path). + +## Build / test (macOS) + +```bash +# C++ core (first time / after core changes) +cmake -S . -B build -DBUILD_TESTING=ON && cmake --build build -j + +# Swift app +swift build # debug +swift build -c release # release + +# run tests +swift test +``` + +## Out of scope (later phases) + +- Screen/TUI mirror (down lane) β€” mobile only; cell-grid diffs, transport + decision (relay upgrade vs RTC data channel). See the Atem design doc. +- Multiple simultaneous Astations per relay room (relay currently allows one + `astation_tx` per room). +- Folding voice into the unified `agentInput` path.