v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm by garrytan · Pull Request #1128 · garrytan/gbrain

garrytan · 2026-05-17T21:53:15Z

Summary

v0.36.0.0 — Voice agent reference (Mars + Venus) lands as the first copy-into-host-repo skillpack, plus all six originally-deferred items now ship in the same PR.

Wave 1 (initial):

Voice agent reference bundle (Mars + Venus personas, WebRTC-first browser client with ?test=1-gated WebAudio-tee capture, read-only tool router with D14-A allow-list, persona-aware prompt builder, upstream-error classifier).
copy-into-host-repo install paradigm: gbrain integrations install agent-voice --target <repo> copies the bundle into the operator's host agent repo with per-file SHA-256 hashes recorded in .gbrain-source.json.
Privacy infrastructure: scripts/check-no-pii-in-agent-voice.sh wired into bun run verify, deterministic scripts/import-from-upstream.sh + substitution table for refresh-from-upstream, single-source private-name-blocklist.json env-driven regex.
New install_kind: 'local-managed' | 'copy-into-host-repo' frontmatter discriminator; new src/commands/integrations.ts install subcommand with path-traversal hardening + refusal-case suite.
14 captured decisions during /plan-eng-review with codex outside voice (D1-D14).

Wave 2 (closes original deferred list):

E2E test suite: tests/e2e/voice-roundtrip.test.mjs (~$0.10/run; env-gated) + tests/e2e/voice-full-flow.test.mjs (openclaw-driven install + roundtrip; ~$1-2/run; friction-discovery, NOT a ship gate). Shared puppeteer + fake-audio harness + Whisper-judge helper. Three-tier assertion (CONNECTION hard, NON-SILENT hard, SEMANTIC soft).
LLM-judge persona evals: gateway-routed 3-model harness (Claude + GPT + Gemini) with 4-strategy JSON repair + 2/3-quorum aggregation. Five fixture sets (mars-solo, mars-demo, venus, persona-routing, mars-multilingual). Synthetic canonical baselines committed; live receipts gitignored.
DIY pipeline (Option B): code/pipeline.mjs — streaming Deepgram STT + Claude SSE with sentence-boundary TTS dispatch + Cartesia/OpenAI TTS, 20-turn history cap, exponential reconnect, keepalives, VAD presets, barge-in.
--refresh mode: classifies each file (unchanged-identical / unchanged-stale / locally-modified / source-deleted / host-deleted / new-in-manifest); default preserves operator edits; --auto take-theirs overwrites; --dry-run previews. Transaction journal at .gbrain-source.refresh.log.
Mars multilingual: persona restores cross-lingual rule (Mandarin/Spanish/French/Japanese/Korean default to English but follow the speaker). Gated by mars-multilingual.jsonl eval fixtures.
Twilio recipe deprecation: banner on recipes/twilio-voice-brain.md pointing at agent-voice.md. Removal in v0.37.
Claw-test scenario: test/fixtures/claw-test-scenarios/voice-agent-install/ with BRIEF.md + scenario.json (labeled BENCHMARK_FRICTION, blocks_ship: false) + expected.json.

Test Coverage

Suite	Result	Notes
`bun run verify`	exit 0	All 15 gate checks including the new `check:no-pii-agent-voice`
Full unit suite (gbrain-side)	6,736+ pass / 0 fail	After merge with master
`test/integrations-install.test.ts`	18 pass / 0 fail	Install + refresh + classification (incl. 7 new refresh cases)
Host-side `recipes/agent-voice/tests/unit/`	96 pass / 2 skip / 0 fail	5 suites: personas + prompt-shape (Mars + Venus) + tools-allowlist + upstream-classifier. Skipped tests are env-gated.
Live E2E (voice-roundtrip)	env-gated	Run with `AGENT_VOICE_E2E=1 OPENAI_API_KEY=...` (~$0.10/run)
Live E2E (voice-full-flow)	env-gated, NOT a ship gate	Run with `AGENT_VOICE_FULL_E2E=1 OPENAI_API_KEY=... ANTHROPIC_API_KEY=... OPENCLAW_BIN=...` (~$1-2/run)
LLM-judge persona evals	env-gated, manual	Run with `bun run gen:baselines` from `recipes/agent-voice/` (~$0.60 for all four suites)

Pre-Landing Review

Covered by /plan-eng-review with codex outside-voice during planning. 14 decisions captured (D1-D14): monolithic ship, PII guard in verify, SHA-256 manifest, error-classified soft-fail, ?test=1-gated instrumentation, real install subcommand, deterministic import script, close all unit gaps, ran codex, monolithic-in-gbrain (vs separate template), zero "Wintermute" in shipped surface, WebAudio-tee capture, agent-authored canonical baselines, read-only tools allow-list.

Codex outside voice surfaced 30+ findings; 8 substantive new ones resolved during the review (D10-D14). Remainder folded into plan-rewrite mechanics.

Privacy

bun run verify includes scripts/check-no-pii-in-agent-voice.sh which scans every file under recipes/agent-voice/** + recipes/agent-voice.md + the import scripts for shape regex (phone/email/SSN/JWT/bearer/credit-card) + path patterns + $AGENT_VOICE_PII_BLOCKLIST env-driven name list. Zero literal private names land in shipped files. The env-driven blocklist contract lives in private-name-blocklist.json; CI passes the secret via env at run time.

Files (this PR)

Path	Δ
`recipes/agent-voice.md` + `recipes/agent-voice/`	+29 files (~5,500 LOC including pipeline.mjs)
`scripts/check-no-pii-in-agent-voice.sh` + `scripts/import-from-upstream.sh` + `scripts/upstream-scrub-table.txt`	+3 files
`src/commands/integrations.ts`	+~500 LOC (install subcommand + refresh mode)
`test/integrations-install.test.ts`	+236 LOC, 18 cases
`test/fixtures/claw-test-scenarios/voice-agent-install/`	+3 files
`recipes/twilio-voice-brain.md`	deprecation banner
`VERSION` + `package.json` + `CHANGELOG.md` + `bun.lock` + `llms*.txt`	release bookkeeping

Test plan

bun run verify passes (privacy + 14 gate checks + typecheck)
Full unit suite passes (6,736+ pass, 0 fail)
Install + refresh subcommand: 18/18 tests pass
Host-side persona + tool + classifier tests: 96 pass, 2 skip (env-gated)
Install smoke: real install into a scratch tmpdir host repo works, manifest written with per-file SHA-256, resolver rows appended to AGENTS.md
Refresh smoke: identical-state → 0 writes; locally-modified → keep-mine default preserves edit, --auto take-theirs overwrites; transaction journal written
Live voice-roundtrip E2E: run manually with AGENT_VOICE_E2E=1 OPENAI_API_KEY=... against a fresh server install
Live voice-full-flow E2E: run manually with AGENT_VOICE_FULL_E2E=1 + all three keys + OPENCLAW_BIN
LLM-judge eval pass: run cd recipes/agent-voice && bun run gen:baselines and verify pass on all 4 suites

🤖 Generated with Claude Code

…-repo install paradigm Ships a new skillpack paradigm: gbrain holds the REFERENCE content; `gbrain integrations install agent-voice --target <repo>` COPIES it into the operator's host agent repo where it becomes user-owned and mutable. Future refresh is diff-and-propose against per-file SHA-256 hashes from .gbrain-source.json, not blind overwrite. What ships: - recipes/agent-voice.md entrypoint + recipes/agent-voice/ bundle - Two voice personas (Mars dual-mode SOLO/DEMO, Venus executive assistant) with PII / private-agent-name / hardcoded-path scrubbed out - WebRTC-first browser client (call.html) with ?test=1 gated instrumentation and Web Audio API tee -> MediaRecorder capture for E2E roundtrip testing - Read-only tool router (D14-A allow-list: search, query, get_page, list_pages, find_experts, get_recent_salience, get_recent_transcripts, read_article). Write ops permanently denylisted; opt-in via local override - Persona-aware prompt builder with identity-first composition + Unicode sanitization for OpenAI Realtime API safety - Upstream-error classifier (HTTP 429/500/503 -> soft-fail, plumbing -> hard) - Three SKILL.md skills (voice-persona-mars, voice-persona-venus, voice-post-call) with routing-eval.jsonl fixtures - 99 host-side tests (vitest-compatible, runs in bun) covering registry, prompt-shape privacy guards, tool allow-list, upstream classifier - install/manifest.json + refresh-algorithm.md + post-install-hint.md Privacy infrastructure: - scripts/check-no-pii-in-agent-voice.sh wired into bun run verify Shape regex (phone/email/SSN/JWT/bearer/credit-card) + path patterns + $AGENT_VOICE_PII_BLOCKLIST env-driven name blocklist - scripts/import-from-upstream.sh + scripts/upstream-scrub-table.txt Deterministic refresh from upstream voice-agent source. Placeholder- driven (envsubst-expanded at run time) so no private names land in checked-in files - recipes/agent-voice/code/lib/personas/private-name-blocklist.json Single source of truth for the regex contract (shape categories + path patterns + env-var contract for operator-specific names) src/ surface: - src/commands/integrations.ts gains `install <recipe-id>` subcommand with install_kind: 'local-managed' | 'copy-into-host-repo' discriminator. Path-traversal hardening (rejects '..', absolute paths, symlink escapes). Refuses target == gbrain itself, missing .git, existing files (without --overwrite). Writes .gbrain-source.json with per-file SHA-256. Appends resolver rows to host repo's RESOLVER.md or AGENTS.md. - test/integrations-install.test.ts: 11 cases (happy path, manifest shape, no upstream_repo field per D11-A, resolver appending, file modes, refusal cases, dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # VERSION # package.json

…ts + guard script The prompt-shape tests carried regex patterns naming the literal banned terms (Garry/Steph/Garrison/Solomon/Herbert/Wintermute) inline. CLAUDE.md's "never use Wintermute in any public artifact" applies to test source files too. Master's check-privacy.sh correctly caught this. Replaced with env-driven check that reads AGENT_VOICE_PII_BLOCKLIST (the single source of truth from private-name-blocklist.json). Same enforcement guarantee via the env var, zero literal names in shipped source. Also scrubbed the literal /data/.openclaw/ from the guard script's comment and the literal 'tell_wintermute' from the venus write-tools test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eline + refresh + multilingual + twilio deprecation) Closes the "deferred to follow-up" section of the v0.36.0.0 CHANGELOG. E2E tests + harness (env-gated): - tests/e2e/voice-roundtrip.test.mjs — spawns server, drives puppeteer + fake-audio, three-tier assertions (CONNECTION hard, NON-SILENT hard, SEMANTIC soft via Whisper + LLM judge). Upstream errors (429/500/503, WS 1011/1013) soft-fail via lib/upstream-classifier.mjs. - tests/e2e/voice-full-flow.test.mjs — wraps openclaw doing the install, then runs the roundtrip. Friction-discovery flavor, NOT a ship gate. - tests/e2e/lib/browser-audio.mjs — puppeteer + fake-audio harness; reads window._gbrainTest namespace; PCM RMS-variance helper. - tests/e2e/lib/whisper-judge.mjs — Whisper transcription + LLM-judge for SEMANTIC tier. - tests/e2e/audio-fixtures/utterance-{add,joke,brain-query}.wav — 16kHz mono WAV via `say` + ffmpeg, committed for reproducibility. - test/fixtures/claw-test-scenarios/voice-agent-install/{BRIEF.md, scenario.json, expected.json} — labeled BENCHMARK_FRICTION, blocks_ship=false. LLM-judge persona evals + synthetic canonical baselines: - tests/evals/judge.mjs — gateway-routed 3-model (Claude + GPT + Gemini) harness with 4-strategy JSON repair + 2/3-quorum aggregation (per the v0.27.x cross-modal pattern). Pass criterion: every axis mean ≥7 AND no model <5. - tests/evals/fixtures/{mars-solo,mars-demo,venus,persona-routing,mars-multilingual}.jsonl — 5 fixture sets covering all axes. - tests/evals/{mars-eval,venus-eval,mars-multilingual-eval,persona-routing-eval}.mjs — per-axis drivers. - tests/evals/baseline-runs/canonical/*.json — agent-authored synthetic exemplars (PII-impossible by construction; demonstrate expected pass shape; never overwrite with live model output). - tests/evals/baseline-runs/.gitignore — live receipts excluded. DIY pipeline (Option B): - code/pipeline.mjs — streaming STT (Deepgram nova-2) + LLM (Claude Sonnet 4.6 streaming SSE with sentence-boundary TTS dispatch) + TTS (Cartesia primary, OpenAI TTS fallback). 20-turn history cap, exponential-backoff reconnects, 25s keepalives, VAD presets (quiet/normal/noisy/very_noisy), barge-in via STT speechStart → LLM interrupt. Modular adapters for swapping providers. --refresh mode (D3-A diff-and-propose): - src/commands/integrations.ts: refreshRecipeIntoHostRepo() + classifyForRefresh() implementing the five states from refresh-algorithm.md (unchanged-identical, unchanged-stale, locally-modified, source-deleted, host-deleted, new-in-manifest). Transaction journal at .gbrain-source.refresh.log. Default policy: preserve operator's local edits (keep-mine); --auto take-theirs to overwrite; --dry-run for preview. - test/integrations-install.test.ts: 7 new test cases pinning each classification state + default-preserve behavior + take-theirs overwrite + transaction journal + refusal on uninstalled target. Mars multilingual restore: - code/lib/personas/mars.mjs: explicit cross-lingual rule (Mandarin, Spanish, French, Japanese, Korean default to English but follow the speaker). Voice (Orus) supports the languages natively. - tests/unit/mars-prompt-shape.test.mjs: assertion flipped from "MUST NOT claim multilingual" to "declares cross-lingual capability with English bias." - tests/evals/fixtures/mars-multilingual.jsonl: 5 fixtures across Mandarin/Spanish/Japanese/French + explicit switch-back, pinned by mars-multilingual-eval.mjs. Twilio recipe deprecation: - recipes/twilio-voice-brain.md: deprecation banner pointing at agent-voice.md. Frontmatter version bumped to 0.8.2. Will be removed in v0.37. Verify: bun run verify clean, 6736+ unit tests pass, 18/18 install+refresh tests pass, 96/98 host-side persona/tool/classifier tests pass (2 skipped env-gated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…this PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the wave-1 + wave-2 scope at the v0.37 slot. The bump reflects the size of what this PR ships: copy-into-host-repo install paradigm (new install_kind discriminator + new install/refresh subcommand) + Mars/Venus voice agent reference + 5,500+ LOC of vendored scrubbed code + 4 LLM-judge eval suites + 2 env-gated E2E test suites + DIY Option B pipeline + 18-case install subcommand test coverage. A minor bump felt too small. Side fix: privacy guard caught two stale literal "wintermute" and "/data/.openclaw/" references in wave-2 files (voice-full-flow.test.mjs comment, expected.json blocklist payload). Both replaced with env-driven references to $AGENT_VOICE_PII_BLOCKLIST matching the D15-A pattern from the original review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # VERSION # package.json

garrytan and others added 7 commits May 17, 2026 14:43

chore: bump version and changelog (v0.36.0.0)

fec3266

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

446f162

# Conflicts: # CHANGELOG.md # VERSION # package.json

docs: update CHANGELOG — all v0.36.0.0 deferred items now shipped in …

ee950d9

…this PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan changed the title ~~v0.36.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm~~ v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm May 17, 2026

garrytan added 6 commits May 17, 2026 15:48

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

7c8a1c1

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

ac7fb3d

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

f0d3eed

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

cea6df1

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

ed708e3

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/caracas-v3

ebb721b

# Conflicts: # CHANGELOG.md # VERSION # package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm#1128

v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm#1128
garrytan wants to merge 13 commits into
masterfrom
garrytan/caracas-v3

garrytan commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

Privacy

Files (this PR)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

garrytan commented May 17, 2026 •

edited

Loading