feat: add `openhome test` — fire trigger + assert on frame stream by realdecimalist · Pull Request #15 · openhome-dev/openhome-cli

realdecimalist · 2026-04-29T00:48:37Z

Hi @Bradymck — per your suggestion in Discord about whether the harness I shared was worth merging into the CLI: here's a port. This turns the standalone voice-test.mjs we've been using on the DevKit into a proper subcommand.

Why

openhome chat and openhome trigger are great for "did anything come back", but iterating on a deployed ability still means reading WebSocket logs by eye and squinting for the right routing event / log line / spoken phrase. With the cloud module cache lag and routing nondeterminism, a tight assertion-based loop is the difference between 30s iterations and 5–15min ones.

Real-world motivation:

We hit exec_local_command-from-skill silent hangs (abilities#260) and session_tasks.create(self.run()) coroutine cancellation (abilities#261) — bugs we couldn't have characterized confidently without watching the frame stream programmatically across many runs
Cloud module caching (abilities#220) means "the new commit landed" doesn't always mean "the new code is running" — assertions catch this immediately

What it does

openhome test "any new tickets" \
  --expect-cap my-skill \
  --expect-log "STEP A0" \
  --expect-log "STEP D probe returned" \
  --expect-speak "Tickets:" \
  --reject-speak "couldn't generate" \
  --timeout 90000 --json

Opens a fresh voice-stream WebSocket via the existing createAgentSocket helper
Waits for the wake greeting (assistant final=true), then injects the trigger
Watches the frame stream for: chat_details:{name:...} (cap routing), editor_logging_handler log lines, and final assistant speech
Exits 0 on success, 1 on missed assertion / timeout, 2 on setup error
--json returns {ok, pass, asserts: [{kind, expression, met}...], elapsed_ms, log_file, agent, trigger}

Implementation

Reuses every WS abstraction the CLI already has — no new WebSocket lifecycle code, just a new consumer of createAgentSocket's onTextMessage / onEvent callbacks
Inherits getApiKey() precedence (env > keychain > config), agent resolution, and interactive-picker patterns from trigger.ts / logs.ts
--json shape matches the rest of the CLI: {ok: false, error: {code, message}} on failure
Pure helpers (src/testing/asserts.ts, src/testing/frame-log.ts) are isolated from I/O so they're trivially unit-testable

Files

File	Purpose
`src/commands/test.ts`	Command driver — auth, flag parsing, WS lifecycle
`src/testing/asserts.ts`	Pure assertion tracker (cap / log / speak / reject)
`src/testing/frame-log.ts`	Frame stream capture for `--log-file` debugging
`src/testing/asserts.test.ts`	8 vitest cases
`src/testing/frame-log.test.ts`	4 vitest cases
`src/cli.ts`	Register `test` subcommand + agent-reference doc
`README.md`	New `openhome test` section + API-status row

Tested

npm test — 12/12 pass (this is the first test file in the repo; vitest config was already in place)
npm run build — clean
npm run lint — no new errors (the pre-existing MockApiClient error is unrelated and on main)
Smoke-tested AUTH_ERROR / BAD_REGEX / MISSING_TRIGGER paths return well-formed JSON + exit 2
Used the standalone .mjs ancestor of this same harness over the past week to ship a real Penny ability against the OpenHome cloud — assertions caught routing regressions that voice-only testing missed

Caveat (documented in the README + JSON output)

openhome test opens a new voice-stream WS to the same agent, so if a hardware client (e.g. the OpenHome DevKit kiosk) is currently connected, the cloud will close that session ("Connection Replaced", code 1000). Iterate with test, then bring the hardware back online for final verification.

Happy to take feedback on the command name (test vs assert vs voice-test), the JSON shape, or anything else. Thanks for the prompt to upstream this!

`openhome chat` and `openhome trigger` are great for "did anything come back", but iterating on a deployed ability still meant reading WebSocket logs by eye and squinting for the right routing event / log line / spoken phrase. This command turns that loop into a fast PASS/FAIL. Usage: openhome test "any new tickets" \ --expect-cap my-skill \ --expect-log "STEP A0" \ --expect-speak "Tickets:" \ --reject-speak "couldn't generate" \ --json What it does: - Opens a fresh voice-stream WS via the existing `createAgentSocket` helper - Waits for the wake greeting (assistant final=true), then injects the trigger - Watches the frame stream for: `chat_details:{name:...}` (cap routing), `editor_logging_handler` log lines, and assistant final speech - Exits 0 on success, 1 on missed assertion / timeout, 2 on setup error Implementation notes: - Reuses every WS abstraction the CLI already has — no new WebSocket lifecycle code, just a new consumer of `createAgentSocket`'s `onTextMessage` / `onEvent` callbacks - Pure assertion tracker (`src/testing/asserts.ts`) and frame log (`src/testing/frame-log.ts`) are unit-tested with vitest (12 tests added) - `--json` shape matches the rest of the CLI (`{ok, error: {code, message}}` on failure; `{ok, pass, asserts, elapsed_ms, log_file, ...}` on completion) - Inherits `getApiKey()` precedence (env > keychain > config), agent resolution, and interactive picker patterns from `trigger.ts` / `logs.ts` Tested: - `npm test` — 12/12 pass (existing repo had no tests; vitest config was already in place, this is the first test file) - `npm run build` — clean - `npm run lint` — no new errors (pre-existing MockApiClient error on main) - Smoke-tested AUTH_ERROR / BAD_REGEX paths return well-formed JSON + exit 2 - Used the standalone .mjs version of this same harness over the past week to ship a real ability against the OpenHome cloud — assertions reliably catch routing/log/speech regressions that voice-only testing misses

Local Abilities (the new `category: local`, announced 2026-05-04) split execution: main.py runs in the cloud sandbox, devkit_functions.py runs on the DevKit. When main.py calls send_devkit_capability_action(), the cloud emits a `devkit-capability` frame and blocks awaiting a `devkit-capability-result` ACK from the device. Plain `openhome test` can't drive this round-trip: opening its own voice-stream WS displaces the kiosk session, so the cloud routes the dispatch back to the test harness rather than the Pi's node-server. The harness records the frame but has no way to invoke devkit_functions.py — main.py's await times out at ~8s with `output: null`. End result: every Local Ability test fails to even exercise the device-side code. Fix: --proxy-pi <ssh-target> makes the test command mirror exactly what the DevKit's node-server does on receipt of the frame (`openhome-node-server/index.js:585+`): sudo python3 <cap_dir>/<capability_name>/devkit_functions.py \ <function_name> <args...> We capture stdout and ACK via `devkit-capability-result` on the same WS. The cloud is none the wiser; main.py's await resolves with the function's stdout, and speak() fires. New flags: --proxy-pi <ssh-target> e.g. openhome@192.168.1.42 --proxy-pi-cap-dir <path> override the local_capabilities path The proxy logic lives in src/testing/devkit-proxy.ts as small pure helpers (shq for shell quoting, buildRemoteCommand, buildResultFrame) plus an injectable `exec` hook so the integration is unit-testable without touching subprocess. 17 new tests cover quoting edge cases, command construction, frame shape, and dispatch semantics. Total suite goes from 12 → 29 tests, all passing. Verified end-to-end against a real Local Ability (Penny's discord-pulse rebuild) on the author's stack: $ "ticket pulse" → 14s PASS, agent speaks the live ticket count $ "team pulse" → 11s PASS, full multi-queue summary $ "approval pulse" → 11s PASS, approvals snapshot Without this flag, every Local Ability deployed via `openhome deploy` would need manual on-device voice testing — minutes per cycle vs. the harness's <30s. With it, the test command is a strict superset of its prior behavior (proxying only fires when the flag is set).

realdecimalist · 2026-05-06T19:56:43Z

Hey @Bradymck — quick update on this PR. With Local Abilities shipping last week, I extended the harness so it can drive end-to-end tests of category: local abilities too. Without it, opening a fresh voice-stream WS displaces the kiosk session and the cloud routes devkit-capability dispatches back to the test client (us) instead of the Pi — so the harness can record the frame but not ACK it, and main.py's await send_devkit_capability_action() always times out.

The new --proxy-pi <ssh-target> flag makes the test command mirror what the DevKit's openhome-node-server/index.js:585+ does on receipt of the frame: SSH-exec the same sudo python3 .../devkit_functions.py <fn> <args> and ACK with devkit-capability-result. The cloud sees a normal device round-trip; main.py's await resolves with the function's stdout. Plain openhome test (no flag) is unchanged.

Verified end-to-end against the discord-pulse rebuild I shipped this morning as a Local Ability:

"ticket pulse" → 14s PASS, agent speaks the live ticket count from the host
"team pulse" → 11s PASS, multi-queue summary
"approval pulse" → 11s PASS, approvals snapshot

Total test count went 12 → 29 (devkit-proxy.test.ts covers shell-quoting edges, command construction, frame shape, dispatch semantics with an injectable exec hook so no subprocess in tests).

Happy to split this into a separate PR after #15 lands if you'd prefer to keep the initial harness review focused — let me know.

Decimalist added 2 commits April 28, 2026 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `openhome test` — fire trigger + assert on frame stream#15

feat: add `openhome test` — fire trigger + assert on frame stream#15
realdecimalist wants to merge 2 commits into
openhome-dev:mainfrom
realdecimalist:feat/test-command

realdecimalist commented Apr 29, 2026

Uh oh!

realdecimalist commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

realdecimalist commented Apr 29, 2026

Why

What it does

Implementation

Files

Tested

Caveat (documented in the README + JSON output)

Uh oh!

realdecimalist commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant