Skip to content

feat(agent): experimental Claude (Interactive) backend (tmux + sidecar host)#855

Open
codefriar wants to merge 95 commits into
utensils:mainfrom
codefriar:codefriar/emerald-cypress
Open

feat(agent): experimental Claude (Interactive) backend (tmux + sidecar host)#855
codefriar wants to merge 95 commits into
utensils:mainfrom
codefriar:codefriar/emerald-cypress

Conversation

@codefriar
Copy link
Copy Markdown
Contributor

@codefriar codefriar commented May 17, 2026

Summary

Adds a new experimental agent backend ClaudeInteractive that runs the interactive claude TUI (no --print) inside a detachable host process that outlives Claudette. Coexists with the existing claude --print flow; selectable per-workspace; gated behind a claudeInteractiveEnabled experimental flag (default off, no impact on existing users).

Closes #838 (proposal issue).

What you get

  • Detachable sessions. Closing Claudette leaves the host (tmux or sidecar) running. Relaunch reattaches; the sidebar repaints from a captured screen snapshot.
  • Full Claude Code TUI parity. Plan mode, slash-command UI, native permission prompts — whatever ships to the interactive client first.
  • Cross-platform. Real tmux ≥ 3.0 on macOS / Linux; a bundled claudette-session-host Rust sidecar on Windows (and opt-in Unix fallback).
  • Power-user tmux attach. Sidebar context menu copies tmux attach-session -t claudette-<sid> so you can drive the same session from any terminal.

What it is NOT

  • Not a replacement for the existing claude --print path. That code is untouched. ClaudeInteractive is added alongside as a new AgentHarnessKind.
  • Not a renderer rewrite. xterm.js still renders; native widget / WebGL paint is a deliberate follow-up.
  • Not on by default. Users must flip claudeInteractiveEnabled in Settings → Experimental.

Architecture

                Claudette (Tauri + React)
                          |
       ChatPanel ──► useInteractiveTurnAssembler ◄── services/interactive.ts
                          |                                |
                          ▼                                |
                InteractiveTurnView                        | Tauri commands
                (per-turn xterm.js)                        |
                                                           ▼
       ┌────────────────────────────────────────────────────────────┐
       │      src/agent/interactive_host  (claudette crate)         │
       │                                                            │
       │   InteractiveHost trait                                    │
       │     ├── TmuxHost     (#[cfg(unix)])                        │
       │     └── SidecarHost  (all platforms)                       │
       │                                                            │
       │   Shared conformance suite — both impls must pass          │
       └────────────────────────────────────────────────────────────┘
              |                                  |
       tmux on Unix                       Named Pipe / UDS
              |                                  |
              ▼                                  ▼
   ┌──────────────────────┐         ┌─────────────────────────────┐
   │  tmux server +       │         │  claudette-session-host     │
   │  claudette-<sid>     │         │  (new Rust bin, externalBin │
   │  sessions            │         │   sidecar). Owns claude     │
   │  pipe-pane → FIFO    │         │   PTYs via portable-pty.    │
   │  capture-pane -pJ -e │         │   Framed JSON wire protocol.│
   └──────────────────────┘         └─────────────────────────────┘
              |                                  |
                          ▼
              claude (interactive, no -p)
              CLAUDE_CONFIG_DIR → transient overlay
              registering Stop / Notification / UserPromptSubmit
              hooks that call back via the CLI.

Hook plumbing. Each session gets a transient CLAUDE_CONFIG_DIR overlay registering three Claude Code hooks (Stop, Notification, UserPromptSubmit) whose commands invoke claudette-cli chat hook --sid <SID> --kind <kind>. The CLI sends the event over the existing local IPC socket, which the GUI ingests via a new chat_hook method that routes to a per-session channel. Hook events drive the awaiting-input badge, turn delimiters, and OS notifications. No new transport.

Key crates and files.

  • New: src-session-host/ — the bundled sidecar binary.
  • New: src/agent/interactive_host/InteractiveHost trait + TmuxHost + SidecarHost + conformance suite + availability check.
  • New: src/agent/interactive_protocol.rs — wire types (length-prefixed JSON envelope frames).
  • New: src/agent/claude_interactive.rsInteractiveSession::start + SettingsOverlay.
  • New: src/interactive.rs — lib-level lifecycle helpers (reattach_pending, stop_sessions_for_workspace, detect_orphans).
  • New: src-tauri/src/commands/interactive.rs — 6 Tauri commands + 2 orphan commands.
  • New: src-tauri/src/interactive_lifecycle.rs — boot reconciler.
  • New: src-cli/src/commands/chat_hook.rsclaudette chat hook subcommand.
  • New (UI): InteractiveTurnView, InteractiveTurns, InteractiveTerminalMode*, useInteractiveTurnAssembler, useInteractiveChatMode, services/interactive.ts, InteractiveBadge.
  • Migration: src/migrations/20260517020158_interactive_sessions.sql.
  • Docs: site/src/content/docs/features/interactive-claude.mdx, settings.mdx row, CLAUDE.md + Copilot mirror.

Complexity Notes

Things reviewers should look at carefully:

  1. Dual-connection SidecarHost (src/agent/interactive_host/sidecar.rs). The plan called for a single multiplexed connection, but the session-host's handle_connection becomes single-purpose after Attach (the dispatch loop exits), so multiplexing on one socket is structurally impossible. The implementation uses one long-lived control connection (request-ID correlated via RequestEnvelope / InboundFrame) and a fresh socket per attach() call. Detach = client closes the connection.
  2. !Send futures around rusqlite::Connection. Several lifecycle helpers (reattach_pending, stop_sessions_for_workspace) hold a &Database across a host.status().await. The Tauri-side glue routes through spawn_blocking + current_thread Tokio runtime so the multi-thread runtime never has to park a !Send future. See src-tauri/src/interactive_lifecycle.rs and the per-workspace teardown in src-tauri/src/commands/workspace.rs.
  3. TmuxHost FIFO fanout (src/agent/interactive_host/tmux.rs). A FIFO has a single in-kernel reader buffer — multiple readers would split bytes, not duplicate. Implementation: one per-session blocking reader spawns at ensure_session, pushing into a tokio::sync::broadcast. attach() calls .subscribe() and bridges to an mpsc ReceiverStream. EOF detection runs a synchronous tmux has-session probe inside the blocking thread to emit AttachEvent::Exit cleanly.
  4. State-string vocabulary. Canonical states are "running" | "detached" | "stopped" | "crashed" | "unknown" (the TypeScript InteractiveSessionState union mirrors them). The DB column is TEXT; an early draft used "exited" which has been globally replaced.
  5. Hook payload divergence. The attach stream emits AttachEvent::Hook(HookFired) (typed, nested) while the CLI-relayed path emits flat {sid, kind, reason} (synthesized by claudette-cli chat hook). Both end up on the same interactive://<sid>/hook Tauri event topic. normalizeHookPayload in services/interactive.ts collapses both shapes into a single HookEvent. Tests cover both directions.
  6. Cross-platform shell quoting. Hook command strings in the settings overlay use POSIX '...' quoting on Unix and "..." doubling on Windows (shell_quote in claude_interactive.rs, #[cfg]-gated). Same overlay format, platform-correct invocation.
  7. SidecarHost has no reconnect path. The cached ConnHandle in OnceCell is never reset if the connection dies (e.g., 600s idle-exit of the sidecar). Subsequent interactive_* commands fail with "conn closed" until Claudette restarts. Documented in CLAUDE.md as a known limitation; tracked as follow-up work.

Test Steps

Automated

# Rust — all four CI-checked crates
cargo fmt --all --check
RUSTFLAGS="-Dwarnings" cargo clippy -p claudette -p claudette-server -p claudette-cli -p claudette-session-host --all-targets --all-features
cargo test -p claudette -p claudette-server -p claudette-cli -p claudette-session-host --all-features

# Frontend
cd src/ui && bun install --frozen-lockfile && bunx tsc -b && bun run lint && bun run lint:css && bun run test

# Docs site
cd site && bun install && bun run build

# Optional Unix-only conformance suites (require tmux ≥ 3.0)
cargo test -p claudette tmux_passes_conformance -- --ignored --nocapture
cargo test -p claudette sidecar_passes_conformance --ignored -- --nocapture

Manual

  1. Enable the flag in Settings → Experimental → Claude (Interactive).
  2. Open a workspace → Settings → Models → Runtime → Claude (Interactive).
  3. Send a chat turn. Confirm:
    • The chat panel renders a per-turn xterm.js view with claude's interactive UI.
    • When claude pauses for input, the sidebar shows an "Awaiting input" badge and an OS notification fires (tray.rs path).
  4. On Unix: right-click the workspace in the sidebar → Copy tmux attach command → paste into any terminal → confirm you see the same session.
  5. Click Open in Terminal in the chat header to swap into full-terminal mode. Verify input + output works. Toggle back.
  6. Detach test: close Claudette while a session is running. Reopen. Confirm the session is listed as "Detached" with the previous screen visible, then re-attaches when you open the workspace.
  7. Stop test: click stop on the session row. Confirm graceful (Ctrl-C → SIGTERM after 5s → SIGKILL) and the row transitions to "Stopped".
  8. Orphan test: start a session, kill -9 Claudette so the cleanup teardown can't run, reopen Claudette. The orphan should be detected on boot, a toast appears, and the session is auto-stopped.
  9. Confirm the existing --print flow is unchanged: pick a workspace with the default Claude runtime, send a turn, verify the old chat UI still renders identically.

Coverage

A per-file coverage gate (scripts/check-coverage-interactive.sh + vitest.config.ts thresholds) now enforces >=85% on the patch surface. Plan + tasks are in superpowers/plans/2026-05-18-interactive-claude-coverage-plan.md.

Final numbers

Rust (cargo llvm-cov, scoped to the patch set):

  • Lines: 86.47% (1553 / 1796 measured) across 8 files
  • Top contributors: interactive_protocol.rs 98.73%, claude_interactive.rs 93.48%, interactive_sessions.rs ~95%, interactive.rs 84.39%
  • Sidecar host fault paths covered via duplex-stream fixtures; tmux + sidecar conformance suites stay #[ignore]d (require tmux >= 3.0 / spawned binary)

Frontend (vitest run --coverage, istanbul, 9-file include):

  • Statements 93.33% / Branches 93.05% / Functions 91.89% / Lines 93.99% — all four 85% thresholds met
  • InteractiveBadge.tsx, interactiveSessionsSlice.ts, InteractiveTerminalModeToggle.tsx, InteractiveTurns.tsx, useInteractiveChatMode.ts all at 100%

Documented exclusions (structurally CI-untestable)

The Rust gate excludes six files whose tests require either tmux >= 3.0 on the runner, a live claude subprocess, or a spawned sidecar binary: interactive_host/{tmux,sidecar,conformance,mod}.rs, src-session-host/src/{main,server}.rs. Rationale lives inline in scripts/check-coverage-interactive.sh. The gated surface is 1796 of 2914 total patch lines (62%); the remainder is exercised by #[ignore]d conformance tests run manually on dev machines.

Defects found while raising coverage

Three small production defects surfaced during the coverage push and were fixed in the same series:

  • InteractiveTerminalMode.tsxsubscribeOutput had no .catch, surfacing as an unhandled rejection on listen() failure (asymmetric with the sibling attach path).
  • InteractiveTerminalMode.tsx — throwing unlistenOutput() skipped term.dispose(), leaking the xterm DOM instance.
  • services/interactive.ts — flat-path unknown hook kinds discarded the original kind label (caught earlier during G3 review).

Tests added during coverage work

~50 new tests across 9 phases (A baseline → B Rust lib → C session-host → D Tauri → E CLI → F frontend → G enforce). Headline coverage deltas:

  • sidecar.rs: 0% → 50%+ lines (ConnHandle fault paths)
  • interactive_protocol.rs: 80% → 98.73% lines (frame edge cases)
  • commands/interactive.rs: 0% → 82.93% lines (8 commands × inner-helper extraction for testability)
  • interactive_lifecycle.rs: 0% → 6 of 7 branches covered (boot reconciler)
  • InteractiveTerminalMode.tsx: 77% → 86% lines (keystrokes + ResizeObserver + rejection paths)
  • useInteractiveTurnAssembler.ts: 87% → 94% branches (race conditions)

Checklist

  • Tests added/updated — new tests across claudette, claudette-cli, claudette-tauri, claudette-session-host, and src/ui (vitest)
  • Documentation updated — site/src/content/docs/features/interactive-claude.mdx + settings.mdx row + CLAUDE.md / .github/copilot-instructions.md synced
  • Migration follows project conventions (20260517020158_interactive_sessions.sql, registered in MIGRATIONS)
  • No regression to existing claude --print path (existing tests untouched, default flag off)
  • Cross-platform (Unix tmux + Windows/Unix sidecar; #[cfg] gating verified)
  • Cargo / clippy / fmt / tsc / lint / lint:css / vitest / docs build — all green on Apple Silicon macOS at HEAD

@codefriar codefriar requested a review from a team as a code owner May 17, 2026 21:22
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 71.43433% with 709 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.30%. Comparing base (63993a0) to head (e8efaa9).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #855      +/-   ##
==========================================
- Coverage   80.73%   80.30%   -0.44%     
==========================================
  Files         123      134      +11     
  Lines       46142    48624    +2482     
==========================================
+ Hits        37255    39046    +1791     
- Misses       8887     9578     +691     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codefriar codefriar force-pushed the codefriar/emerald-cypress branch from 452ef22 to d93d520 Compare May 18, 2026 15:52
@jamesbrink jamesbrink self-assigned this May 19, 2026
@jamesbrink
Copy link
Copy Markdown
Member

Thanks @codefriar ! I will rebase this branch and check it out

codefriar and others added 25 commits May 22, 2026 12:12
Design for a new experimental agent backend that runs interactive
`claude` (no -p) inside a detachable host — tmux on Unix, a custom Rust
`claudette-session-host` sidecar on Windows. Coexists with the existing
print-mode path behind a `claudeInteractiveEnabled` Settings flag.
Renderer perf is explicitly deferred to a follow-up spec.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… plan

Bite-sized TDD plan covering Phases A-I: settings flag + migration,
InteractiveHost trait + conformance suite, claudette-session-host
sidecar crate, TmuxHost impl, hooks + CLI subcommand + IPC ingestion,
backend wiring + Tauri commands, UI surfaces, lifecycle (reattach +
cleanup + orphans), and docs/CLAUDE.md sync.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pin the round-trip contract for the new experimental flag through the
generic `app_settings` table. No schema or IPC changes were required:
`pluginManagementEnabled` (the model this flag follows) is also stored
purely as a key-value string and exposed through the existing generic
`get_app_setting` / `set_app_setting` Tauri commands. There is no typed
Rust `AppSettings` struct, so Steps 4 and 5 of the implementation plan
are intentional no-ops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the post-handshake request loop in `server.rs` and adds a per-session
PTY actor in `session.rs`. The actor owns the `PtyPair`, runs a blocking
reader task that fans output into a 256 KB rolling capture buffer plus a
`broadcast` channel for live attaches (C4), and a waiter task that emits an
`Exit` event when the child terminates.

A new `SessionMap = Arc<Mutex<HashMap<String, Arc<Session>>>>` is shared
across connections so all clients see the same set of live sessions.
`run_at_with` lets an outer harness pass its own map; `run_at` and
`run_for_test` wrap a fresh one. Dispatch handles `EnsureSession`, `Status`,
`SendInput`, and `Stop`; `Resize`, `Detach`, `CaptureScreen`, and `Attach`
remain `not yet implemented` (they land in C4).

Integration test `ensure_session.rs` builds the workspace `stub-tui` fixture
via `cargo build -p stub-tui` and discovers its path through
`cargo metadata`'s `target_directory` (artifact deps still require
`-Z bindeps` on 1.94, so we sidestep them).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tureScreen

Wire up the four request kinds the dispatch loop had stubbed out in C3:

- Attach: special-cased in handle_connection. After writing
  Response::AttachStarted { attach_id }, the connection switches into
  streaming mode and pumps SessionEvent::{Output,Hook,Exit} as wire
  Event frames until the client disconnects or the session exits. The
  per-connection attach_id_counter is a monotonic u64 so future control
  flows can correlate Detach requests if needed.
- Detach: per the plan's simpler v1 model, the canonical way to detach
  is to close the socket — stream_attach exits on the resulting write
  error. The explicit Detach request is accepted for symmetry and
  returns Response::Ok (effectively a no-op).
- Resize: clones the Arc<Session> out from under the map lock, then
  awaits Session::resize.
- CaptureScreen: clones the Arc<Session>, reads the rolling raw-ANSI
  buffer, and returns it base64-encoded alongside current rows/cols.

New integration test tests/attach_stream.rs uses two connections — a
control connection for EnsureSession+SendInput and an attach connection
that asserts both the AttachStarted ack and the streamed READY +
"OUT: hello" echo from stub-tui.
codefriar and others added 19 commits May 22, 2026 12:23
…eachable

The reader's `Ok(0)` arm is collapsed with `Err(_)` because portable-pty's
master reader returns zero only on real slave EOF — unlike TmuxHost's
pipe-pane FIFO, which can transiently close on `respawn-pane`. The
waiter's `Err(_)` arm covers a `Child::wait` failure mode that requires
a lost child handle or external reaper; neither is reproducible from a
unit test.

Adds two `#[ignore]`d documentation tests (`reader_ok_zero_is_not_spurious_on_portable_pty`,
`wait_err_path_is_unreachable`) and expands the inline comments in
`Session::spawn` so grep finds the rationale next to the code. Closes
Task C1 in `superpowers/plans/2026-05-18-interactive-claude-coverage-plan.md`
as documentation-as-coverage; neither branch is reachable from a stable
test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract `reattach_interactive_sessions_inner(&AppState, OrphanEmitter)`
from `reattach_interactive_sessions_on_boot(AppHandle)` so the boot
reconciler can be unit-tested without booting a real Tauri runtime.
The public entry stays as a thin wrapper that resolves `AppState` from
the handle and wires `app.emit("interactive://orphans-detected", ...)`
into the inner reconciler's emit callback.

Add seven unit tests exercising the reconciler's behavior branches:
  1. claudeInteractiveEnabled OFF → early return, no host touched.
  2. Empty DB + empty known sids → fast-path return.
  3. Single workspace, host knows session → row → `detached`.
  4. Single workspace, host missing session → row → `crashed`.
  5. Orphan fallback (no running rows, host has unknown claudette- sid)
     records orphan into `AppState::interactive_orphans` and emits.
  6. Per-workspace host failure isolation: one workspace's `status()`
     error must not prevent the other workspace's rows from being
     reclassified.
  7. Orphan-emit failure is swallowed and orphans are still stashed
     into `AppState::interactive_orphans` before the emit (so a
     subsequent `interactive_cleanup_orphans` can still find them).

The DB-read-failure branch on the running-rows / known-sids query
(plan Step 6) is intentionally not exercised — there is no race-free
way to construct a DB that the flag-check query can read but the
subsequent rows query cannot without adding a fault-injection hook in
`Database::open` that exists only for this test. The branch is shallow
(log + early return) and the underlying failure mode is covered at the
DB layer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract `_inner` helpers from every `interactive_*` Tauri command so
the bodies can be unit-tested with a borrowed `&AppState` instead of
booting a real Tauri runtime. Production entry points stay as thin
wrappers; `interactive_start` additionally takes an injectable
`HookForwardEmitter` callback (production wires it to `AppHandle::emit`,
tests pass a no-op) so the per-session hook-channel forwarder task
can be exercised without an `AppHandle`.

Add 18 new tests across `mod tests` covering every reachable branch of
the eight command handlers:

  - `interactive_start_inner`
      1. Happy path — synthesizes a `claudette-<short>-<rand>` sid,
         calls `host.ensure_session` exactly once with the spec we
         built, persists an `interactive_sessions` row in state
         `"running"` with the right claude_args JSON, and populates
         the sid→workspace_id reverse index.
      2. Flag OFF — short-circuits with the canonical
         "Claude Interactive is disabled" string before touching host
         or DB.
      3. Missing-CLI binary — surfaces "claudette-cli binary not found"
         when neither `CLAUDETTE_CLI` nor a staged sidecar is on disk.
         Pinned with a tolerant match arm because some dev workspaces
         stage the sidecar inline next to the test binary.

  - `interactive_send_input_inner` — happy path forwards `Text` payload
    to the host with the exact bytes; missing-sid surfaces the
    canonical "interactive session not found: <sid>"; flag OFF returns
    the disabled error before touching the host.

  - `interactive_capture_screen_inner` — happy path returns the host's
    `ansi_bytes` base64-encoded AND persists the same raw bytes into
    the DB row's `last_screen_blob`; missing-row tolerance pin (no
    pre-existing row means the UPDATE returns `QueryReturnedNoRows`,
    which the command must swallow so the capture itself still
    succeeds); flag OFF returns the disabled error.

  - `interactive_stop_inner` — graceful (`force=false`) maps to
    `StopMode::Graceful`, DB row transitions to `"stopped"`, sid
    mapping dropped; force (`force=true`) maps to `StopMode::Force`
    with the same DB + sid cleanup; flag OFF + missing-sid surface
    their canonical error strings.

  - `interactive_list_for_workspace_inner` — populated case returns
    only the rows for the queried workspace, newest-first by
    `created_at`, with every field round-tripped through
    `InteractiveSessionListItem`; empty/unknown workspace returns
    `Ok(vec![])`.

  - `interactive_list_orphans_inner` + `interactive_cleanup_orphans_inner`
    — list returns every sid in the orphans map; cleanup drains the
    map, calls `host.stop(sid, Graceful)` once per orphan, and returns
    the stopped sids; cleanup on an empty map returns `Ok(vec![])`.

Two test mocks back the suite:
  - `FakeInteractiveHost` records `ensure_session` / `send_input` /
    `capture_screen` / `stop` calls and echoes a canned screen blob.
  - `StopTrackingHost` is the specialized orphan-cleanup mock — records
    each `stop()` invocation and `unreachable!()`s every other method
    so a regression accidentally reaching for them fails loudly.

Process-global `$CLAUDETTE_HOME` and `$CLAUDETTE_CLI` mutations are
serialized through a `tokio::sync::Mutex` so the env-touching tests
don't race each other or sibling tests in the same binary.

Deferred branches:
  - `interactive_host_for` failure: `select_default_host` is
    infallible in practice (its sidecar branch constructs a
    `SidecarHost` whose `new` cannot fail), and we already test the
    pre-seeded happy path. Adding a fault-injection hook just to
    cover the `map_err` is not worth the production surface change.
  - `interactive_attach`: body is a one-liner over
    `spawn_attach_forwarder`, which consumes a real `AppHandle::emit`
    and requires a full Tauri runtime to exercise meaningfully. Same
    rationale D1 used for the boot-reconciler emit path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…+ delete

Adds Tauri-side coverage for the
`stop_interactive_sessions_for_workspace_teardown` helper that
`delete_workspace` and `archive_workspace_inner` both depend on:

1. The helper resolves the cached `InteractiveHost` and calls
   `stop(sid, Graceful)` for each live `interactive_sessions` row,
   skipping rows from sibling workspaces.
2. The sid -> workspace_id mappings in
   `AppState::interactive_sessions` are dropped after teardown so a
   stale lookup can't route a later command to a now-dead host.
3. `delete_workspace_with_summary` (the SQL that backs the delete
   path) actually cascades through `interactive_sessions` — pins the
   "DB row is gone" half of the delete contract.
4. The archive path leaves DB rows in place (no row delete -> no
   cascade), so the helper is the sole owner of host-side teardown
   there.
5. Empty-rows fast path: the helper skips host resolution entirely
   when no interactive sessions exist for the workspace, avoiding
   needless sidecar spawn / tmux probe on the overwhelmingly common
   delete/archive path.

Coverage for the helper region in workspace.rs: 0/28 -> 19/28
lines hit. The 9 uncovered lines are all defensive error branches
(Database::open, list_interactive_sessions_for_workspace,
interactive_host_for failures) the helper logs-and-swallows, which
mirrors the production "best effort" contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add focused tests for InteractiveTerminalMode that the existing happy-dom
smoke test can't reach without mocking xterm:

- Captures `term.onData` callback and asserts a single keystroke routes
  through the G3 `sendInput` service.
- Stubs ResizeObserver to verify container reshapes re-run `fit.fit()`.
- Pins the disposal order on unmount: ResizeObserver disconnect → onData
  disposable → terminal dispose. The audit calls out data-before-terminal
  as a correctness requirement.

Coverage on InteractiveTerminalMode.tsx rises from 75.67% to 83.78% on
statements and from 44.44% to 66.66% on functions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extracts the orphan-detected listener from App.tsx's god-effect into a
sibling `<OrphanListener />` component so the lifecycle (subscribe →
toast → cleanupOrphans → unlisten) can be tested in isolation without
stubbing App's whole provider graph. Adds App.orphans.test.tsx covering
toast emission, auto-cleanup invocation, unlisten-on-unmount, and the
empty-sids no-op branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add four targeted tests for the failure branches of the full-terminal
interactive view: `attach` rejection on mount, `subscribeOutput`
rejection, an unlisten function that throws on unmount, and the
post-unmount race where `subscribeOutput` resolves after the effect
has been torn down (verifies the resolved unlisten is invoked
immediately so the listener never leaks).

While wiring the tests in, fix two thin gaps in the production effect
that the failure paths exposed:

- `subscribeOutput` now has a `.catch` symmetric with the existing
  `attach` catch — previously a rejected subscribe would surface as
  an unhandled promise rejection instead of a logged warning.
- Cleanup wraps `unlistenOutput()` in try/catch so a throwing
  listener teardown can't skip `term.dispose()` and leak the xterm
  instance + its DOM nodes.

Coverage on `InteractiveTerminalMode.tsx` moves from
83.78/62.5/66.66/85.71 to 92.5/87.5/80/94.87 (stmts/branch/funcs/lines).
Flip both the Rust and the frontend coverage gates from informational
to blocking now that the interactive-claude patch surface measures
above the 85% threshold.

Rust producer now also runs against `claudette-session-host` so its
testable files (idle.rs, session.rs) contribute to the gate. Six files
are explicitly excluded from the gate with inline rationale in
`scripts/check-coverage-interactive.sh`:

  - `interactive_host/{tmux,sidecar,conformance,mod}.rs` — host impls
    and the shared conformance harness require external binaries
    (tmux >= 3.0 or the spawned `claudette-session-host`) that CI
    doesn't have. All real conformance tests are `#[ignore]`.
  - `src-session-host/src/main.rs` — binary entry point covered by
    integration tests.
  - `src-session-host/src/server.rs` — request handlers require a
    live `claude` PTY which CI doesn't have.

To pull `InteractiveSession::start` out of the uncovered set, add a
small RecordingHost mock and three focused tests (happy path, host
error propagation, overlay-materialize failure). claude_interactive.rs
goes from 82.39% -> 93.48%.

Frontend coverage already cleared the global thresholds but two files
were dragging individual numbers down; add focused tests:

  - `InteractiveTerminalModeToggle.tsx` — six tests pinning the
    interactive/no-session/no-workspace gates and the click toggle.
  - `interactiveSessionsSlice.ts` — four tests pinning the setter,
    clear, and reference-stable no-op behaviors.

Final numbers:
  - Rust: 86.47% (1553/1796 lines across 8 included files)
  - Frontend: 93.99% lines / 93.05% branches across 9 files

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…htened test, FIXME on SidecarHost reconnect

- Drop vacuous expect(true).toBe(true) in InteractiveTerminalMode test;
  vitest treats no-throw as pass.
- Tighten interactive_start_returns_error_when_cli_binary_missing by
  pointing CLAUDETTE_CLI at a guaranteed-nonexistent path and asserting
  only the Err branch — exercises the missing-CLI path deterministically
  on every runner (including CI machines with a staged sidecar). Also
  align bundled_cli_binary_path with its documented contract: return
  None when CLAUDETTE_CLI points at a path that does not exist.
- Apply F5's .catch pattern to OrphanListener.tsx so a rejected
  subscribeOrphansDetected promise logs via console.warn instead of
  bubbling an unhandled rejection. Add a regression test that mocks the
  subscribe call to reject and asserts the warn is emitted.
- Document the OnceCell<ConnHandle> reconnect limitation as a FIXME on
  SidecarHost.conn so a future reader hits the contract before debugging
  "conn closed" symptoms.
- Resolve clippy disallowed-methods by routing interactive host
  command spawns through `claudette::process::{command, std_command}`.
- Add missing `ClaudeInteractive` arm to `AgentSession::start_compact`.
- Bump `claudette-session-host` to 0.25.0 to match the workspace.
- Adjust an Experimental settings test that hard-coded a single
  switch — the rebased branch now adds the Claude (Interactive) row.
- Regenerate `Cargo.lock` for the new workspace member.
@jamesbrink jamesbrink force-pushed the codefriar/emerald-cypress branch from 74ae4a5 to 3371659 Compare May 22, 2026 19:34
@jamesbrink jamesbrink force-pushed the codefriar/emerald-cypress branch from db261e0 to 8d9720b Compare May 22, 2026 19:37
…tures

Post-rebase: main added Workspace.input_values + Repository.required_inputs.
Update the interactive-session and workspace test fixtures so the
claudette-tauri test targets compile under CI's --no-run check.
@jamesbrink
Copy link
Copy Markdown
Member

Hey @codefriar — finished the rebase and got most of CI green. Posting a recap so you know what to expect when you pull.

What I did

Rebased onto origin/main (87 PR commits replayed, all your authorship preserved, the prior main-merge commit dropped). Conflicts were almost all churn from work that landed on main while this PR was open:

  • settingsSlice.ts / useAppStore.test.tspluginManagementEnabled / claudeRemoteControlEnabled / communityRegistryEnabled were graduated out of experimental on main (feat(settings): graduate experimental features #877), so the new claudeInteractiveEnabled is now the only flag in that block.
  • ExperimentalSettings.tsx + 5 locale JSONs — same story, only the Claude (Interactive) row + strings survive.
  • ChatPanel.tsx — main split it into ChatPanel + ChatPanelSessionView + lifecycle/store/attachment hooks. The interactive routing (InteractiveTurns / InteractiveTerminalMode swap inside the messages.length branch) now lives in ChatPanelSessionView and interactiveMode is passed through as a prop from ChatPanel. Behavior is identical to your original MessagesWithTurns ternary; just relocated to the file that now owns that render.
  • Sidebar.tsx — clean three-way around WorkspaceScmLinkIcon + InteractiveBadge imports and the new buildTmuxAttachCommand import.
  • migrations/mod.rs — your 20260517020158_interactive_sessions slots between two new main migrations.
  • state.rstry_begin_workspace_create / end_workspace_create from main, kept your register/unregister/dispatch_interactive_hook methods unchanged.
  • CLAUDE.md / .github/copilot-instructions.md — merged the new domain list / coverage script mentions with your interactive-claude docs.
  • Cargo.lock — regenerated.

Then four small fixes on top of the rebase:

# Commit Why
1 fix(rebase): align interactive backend with post-rebase main Added missing ClaudeInteractive arm in AgentSession::start_compact; routed every tokio::process::Command::new / std::process::Command::new in interactive_host/{availability,sidecar,tmux}.rs through claudette::process::{command,std_command} (main tightened clippy.toml's disallowed-methods); bumped claudette-session-host to 0.25.0; updated a stale "expect exactly one switch" assertion in ExperimentalSettings.test.tsx.
2 fix(stub-tui): bump fixture crate to 0.25.0 for workspace version sync New scripts/check-cargo-version-sync.sh job on main rejects the 0.0.0 placeholder. Also regenerated Cargo.lock so the Mobile-(macOS) --locked clippy check passes.
3 test(session-host): widen attach READY timeouts for slow CI/llvm-cov The 2s drain_until_contains("READY", …) budgets in attach_stream.rs failed once under cargo llvm-cov on Linux; raised to 10s for the three sites.
4 fix(tauri): supply missing input_values / required_inputs in test fixtures Repository's required_inputs and Workspace's input_values (added by #807) needed None initializers in the test fixtures under commands/agent_backends/mod.rs, commands/interactive.rs, commands/workspace.rs, and interactive_lifecycle.rs.

Force-pushed with --force-with-lease.

CI

Green after the four follow-ups: Lint, Format, Cargo Version Sync, Migration guard, Commit messages, PR title, Updater Manifest, Mobile (macOS), Desktop Tauri Check, Frontend, Frontend Bundle Smoke, build, codecov/patch + project.

One flakeclaudette-session-host::attach_lagged_subscriber_stream_ends failed once after the 10s timeout bump (10.58s wallclock = full budget). Suspicion is a tokio::sync::broadcast join-race: stub-tui prints READY immediately on startup, and under llvm-cov the fast attach handshake can race past the broadcast send so the subscriber never sees that message. Bumping the timeout further likely won't help. Two paths I'd consider:

  1. Add a small startup gate in stub-tui (sleep_until(env_var("STUB_TUI_DELAY_MS")) before the first println!("READY")) and set it from the lag test only.
  2. Use broadcast::Receiver::subscribe() before EnsureSession returns by reordering the test, or have Session::spawn hold the first frame until the first subscriber attaches.

Not blocking — happy to leave it as-is until you have a moment.

Remaining gaps

The big one: after enabling the experimental flag, there's no UI path to actually select the Claude (Interactive) runtime. I traced it to the TODO(G2 follow-up) comment in ModelSettings.tsx:642:

  • availableHarnessesForKind(AgentBackendKind) in src/agent_backend.rs:102 / src/ui/src/services/tauri/agentBackends.ts:173 never lists ClaudeInteractive for any kind, so the RuntimeSelector dropdown silently no-ops for Anthropic (["claude_code"]) and the others.
  • effective_harness / effectiveHarness will honor runtime_harness = "claude_interactive" when the flag is on, but nothing in the UI persists that override.
  • The Models card with data-testid="runtime-card-claude-interactive" is display-only.

Workarounds to try the feature today are both ugly: set_agent_backend_runtime_harness("anthropic", "claude_interactive") from the dev debug-eval, or hand-edit agent_backends.runtime_harness in ~/.claudette/claudette.db. So functionally the PR ships all the plumbing but no end-user entry point. I started sketching the wire-up (expose ClaudeInteractive in available_harnesses() behind the flag for Claude-flavored kinds, surface it in RuntimeSelector, drop the TODO on the Models card) but stopped at your request so you can land it your way.

Smaller observations from sweeping the diff during conflict resolution:

  • The "claude_interactive falls through to default if flag is off" path in effective_harness_kind is well-tested but only one direction — there's no test that I saw covering "flag on + persisted override + selector renders the option". That'll matter once the runtime selector UI lands.
  • tests/fixtures/stub-tui is reachable from claudette lib tests via find_workspace_binary, but it's also implicitly required by src-session-host/tests/attach_stream.rs. Worth a one-liner in CLAUDE.md's project structure list noting the fixture.
  • The coverage gate (scripts/check-coverage-interactive.sh at >=85%) is in CLAUDE.md's build/test list, which is good — kept that wording in the rebase.
  • src-session-host/Cargo.toml was the only workspace crate missing from your prior version-sync sweeps; release-please will likely keep stepping on this. Worth adding to whatever script bumps versions.

Overall the architecture is clean — InteractiveHost trait with tmux / sidecar implementations plus the conformance harness is exactly the right shape, and the patch-coverage gate is a nice forcing function. Mostly green now and ready for your follow-up. Let me know once the runtime selector lands and I'll smoke-test it end-to-end.

codefriar and others added 4 commits May 22, 2026 14:29
…mental flag

Wire the last unconnected piece of the Claude (Interactive) PR (utensils#855) per
@jamesbrink's review: enabling `claudeInteractiveEnabled` now actually
surfaces a "Claude (Interactive)" option in the per-backend Runtime
picker for the Claude-flavored kinds (Anthropic, Custom Anthropic,
Codex Subscription), and selecting it persists via the existing
`set_agent_backend_runtime_harness` command.

Before this commit, the `effective_harness_kind` resolver already
honored a persisted `runtime_harness = "claude_interactive"` value when
the flag was on, but the Settings UI never offered the option because
`availableHarnessesForKind` / `available_harnesses` never listed
`ClaudeInteractive` — and the persistence validator rejected it.

Approach
- `src/agent_backend.rs`: add a sibling `available_harnesses_with_interactive(claude_interactive_enabled)`
  that wraps the static matrix and appends `ClaudeInteractive` for the
  three Claude-CLI-locked kinds when the flag is on. The static
  `available_harnesses` slice is unchanged so Pi-disabled downgrade
  fallback, gateway-hash key, and `effective_harness()` defense-in-depth
  filtering still see the matrix shape. `ClaudeInteractive` is
  intentionally not in the canonical matrix fixture — the gate is the
  experimental flag, not the per-kind allow-list.
- `set_agent_backend_runtime_harness` reads the flag from `app_settings`
  via `state.claude_interactive_enabled()` and validates against the
  new sibling. Reading the flag server-side (not from the caller)
  prevents a stale frontend from tricking the validator while the gate
  is off.
- TypeScript mirror: `availableHarnessesForKind(kind, options?)` gains
  the optional `claudeInteractiveEnabled` flag. Callers in
  `RuntimeSelector` thread it from the Zustand store; the
  `resolveSessionHarness` Pi-downgrade path keeps the matrix shape.
- Drop the `TODO(G2 follow-up)` comment on the Models card in
  `ModelSettings.tsx` — selection now lives in `RuntimeSelector` and
  the comment was the last open question against this surface.

Tests
- Rust: 4 new unit tests in `agent_backend::tests` — flag-off
  back-compat (every kind), flag-on append for the three Claude
  kinds, flag-on no-op for every other kind, full round-trip through
  `effective_harness_kind` (override + flag on → ClaudeInteractive;
  override + flag off → kind default).
- TypeScript: 4 new tests in `RuntimeSelector.test.tsx` (hide when off
  for Anthropic, render when on, persistence call on selection, never
  exposed for non-Claude kinds) plus 3 in `agentBackends.test.ts`
  pinning `availableHarnessesForKind`'s flag behavior.

Refs: PR utensils#855, jamesbrink review feedback FB-1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…c gate

Clarify in CLAUDE.md's project-structure list that tests/fixtures/stub-tui is
relied on by BOTH the claudette lib's interactive tests AND the
claudette-session-host integration tests (attach_stream / ensure_session /
handshake).

Note on the version-sync gate: scripts/check-cargo-version-sync.sh already
discovers workspace members dynamically via `cargo metadata`, so no change is
needed there — src-session-host is included automatically and the check passes
at 0.25.0 across all 7 packages.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eflake lag test

The `attach_lagged_subscriber_stream_ends` test flaked once on Linux CI
under `cargo llvm-cov`, running its full 10s `drain_until_contains` budget
without ever observing `READY`. Root cause: a tokio broadcast join race.
The stub-tui prints `READY` on startup before any attach client has
subscribed to the per-session `broadcast::Sender<SessionEvent>`. Under
llvm-cov instrumentation the PTY reader can broadcast `READY` to zero
subscribers, the message is dropped, and the slow drainer that subscribes
moments later never sees it.

Fix: add a `STUB_TUI_DELAY_MS` env var to the stub-tui that sleeps before
the initial `println!("READY")`. Only the lag test sets it (to 200 ms),
so `attach_streams_echoed_output` and other tests keep their fast-start
path. The 10s timeout widening from 0bffca1 stays as belt-and-suspenders.

Verified 10/10 sequential runs of the lag test pass; full session-host
suite green; clippy + fmt clean; interactive coverage gate still passes
at 85.75% / 85%.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codefriar
Copy link
Copy Markdown
Contributor Author

Thanks for the rebase + the detailed write-up @jamesbrink — that was a big help. Four follow-up commits on top of your rebase address each item:

The blocker — runtime selection now works

7eace38 + 7203c0d wire the experimental flag through to the dropdown:

  • New Rust sibling AgentBackendKind::available_harnesses_with_interactive(flag) (kept available_harnesses() static so gateway-hash / Pi-downgrade / effective_harness defense-filter callers continue to see the static matrix).
  • When flag = true, appends ClaudeInteractive for Anthropic / CustomAnthropic / CodexSubscription only — Pi / Ollama / LmStudio / OpenAiApi / CodexNative never include it regardless of the flag.
  • TS availableHarnessesForKind gains optional options?: { claudeInteractiveEnabled?: boolean } (backward-compat) and mirrors the same logic.
  • set_agent_backend_runtime_harness reads the flag server-side from app_settings and validates against available_harnesses_with_interactive(flag) — a stale or adversarial frontend can't persist claude_interactive when the flag is off.
  • RuntimeSelector subscribes to claudeInteractiveEnabled and threads it into both availableHarnessesForKind and effectiveHarness (flag is in the useMemo dep array, so toggling Experimental re-renders the option list at runtime).
  • Dropped the TODO(G2 follow-up) comment in ModelSettings.tsx; replacement comment explains the new selection path.
  • 4 new Rust tests (flag on/off matrix per kind + persistence round-trip) and 7 new TS tests (selector hide/render/persist/never-for-non-Claude + helper flag matrix). The effectiveHarness mock in RuntimeSelector.test.tsx was also updated to mirror the production flag-gated fallback so it can't silently mask regressions.

The Models card with data-testid="runtime-card-claude-interactive" stays display-only — selection happens per-backend in RuntimeSelector, matching how the other runtimes work.

Smaller items

  • ecc1baf — added a stub-tui-fixture annotation to CLAUDE.md's project-structure list noting both the claudette lib's interactive tests and claudette-session-host integration tests rely on it. (The version-sync script already uses cargo metadata rather than a hardcoded list, so it picked up src-session-host automatically — your manual 0.0.0 → 0.25.0 bump was the actual fix; nothing else needed.)
  • e8efaa9 — deflaked attach_lagged_subscriber_stream_ends via your option (a): tests/fixtures/stub-tui gains a STUB_TUI_DELAY_MS env hook, the lag test sets it to 200ms so the slow subscriber attaches before the first broadcast. 10/10 stability check passed. Kept your 10s drain-timeout widening as belt-and-suspenders.

Coverage

The patch-coverage gate still passes after all changes:

  • Rust llvm-cov scoped surface: 85.75% lines (1552/1810, ≥85% threshold)
  • Frontend vitest scoped surface: 93.99% lines / 93.05% branches (all four thresholds ≥85%)

Ready for your end-to-end smoke whenever you have a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: experimental ClaudeInteractive agent backend (interactive claude in a detachable host)

2 participants