v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm#1128
Open
garrytan wants to merge 13 commits into
Open
v0.37.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm#1128garrytan wants to merge 13 commits into
garrytan wants to merge 13 commits into
Conversation
…-repo install paradigm Ships a new skillpack paradigm: gbrain holds the REFERENCE content; `gbrain integrations install agent-voice --target <repo>` COPIES it into the operator's host agent repo where it becomes user-owned and mutable. Future refresh is diff-and-propose against per-file SHA-256 hashes from .gbrain-source.json, not blind overwrite. What ships: - recipes/agent-voice.md entrypoint + recipes/agent-voice/ bundle - Two voice personas (Mars dual-mode SOLO/DEMO, Venus executive assistant) with PII / private-agent-name / hardcoded-path scrubbed out - WebRTC-first browser client (call.html) with ?test=1 gated instrumentation and Web Audio API tee -> MediaRecorder capture for E2E roundtrip testing - Read-only tool router (D14-A allow-list: search, query, get_page, list_pages, find_experts, get_recent_salience, get_recent_transcripts, read_article). Write ops permanently denylisted; opt-in via local override - Persona-aware prompt builder with identity-first composition + Unicode sanitization for OpenAI Realtime API safety - Upstream-error classifier (HTTP 429/500/503 -> soft-fail, plumbing -> hard) - Three SKILL.md skills (voice-persona-mars, voice-persona-venus, voice-post-call) with routing-eval.jsonl fixtures - 99 host-side tests (vitest-compatible, runs in bun) covering registry, prompt-shape privacy guards, tool allow-list, upstream classifier - install/manifest.json + refresh-algorithm.md + post-install-hint.md Privacy infrastructure: - scripts/check-no-pii-in-agent-voice.sh wired into bun run verify Shape regex (phone/email/SSN/JWT/bearer/credit-card) + path patterns + $AGENT_VOICE_PII_BLOCKLIST env-driven name blocklist - scripts/import-from-upstream.sh + scripts/upstream-scrub-table.txt Deterministic refresh from upstream voice-agent source. Placeholder- driven (envsubst-expanded at run time) so no private names land in checked-in files - recipes/agent-voice/code/lib/personas/private-name-blocklist.json Single source of truth for the regex contract (shape categories + path patterns + env-var contract for operator-specific names) src/ surface: - src/commands/integrations.ts gains `install <recipe-id>` subcommand with install_kind: 'local-managed' | 'copy-into-host-repo' discriminator. Path-traversal hardening (rejects '..', absolute paths, symlink escapes). Refuses target == gbrain itself, missing .git, existing files (without --overwrite). Writes .gbrain-source.json with per-file SHA-256. Appends resolver rows to host repo's RESOLVER.md or AGENTS.md. - test/integrations-install.test.ts: 11 cases (happy path, manifest shape, no upstream_repo field per D11-A, resolver appending, file modes, refusal cases, dry-run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # VERSION # package.json
…ts + guard script The prompt-shape tests carried regex patterns naming the literal banned terms (Garry/Steph/Garrison/Solomon/Herbert/Wintermute) inline. CLAUDE.md's "never use Wintermute in any public artifact" applies to test source files too. Master's check-privacy.sh correctly caught this. Replaced with env-driven check that reads AGENT_VOICE_PII_BLOCKLIST (the single source of truth from private-name-blocklist.json). Same enforcement guarantee via the env var, zero literal names in shipped source. Also scrubbed the literal /data/.openclaw/ from the guard script's comment and the literal 'tell_wintermute' from the venus write-tools test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eline + refresh + multilingual + twilio deprecation)
Closes the "deferred to follow-up" section of the v0.36.0.0 CHANGELOG.
E2E tests + harness (env-gated):
- tests/e2e/voice-roundtrip.test.mjs — spawns server, drives puppeteer + fake-audio, three-tier assertions (CONNECTION hard, NON-SILENT hard, SEMANTIC soft via Whisper + LLM judge). Upstream errors (429/500/503, WS 1011/1013) soft-fail via lib/upstream-classifier.mjs.
- tests/e2e/voice-full-flow.test.mjs — wraps openclaw doing the install, then runs the roundtrip. Friction-discovery flavor, NOT a ship gate.
- tests/e2e/lib/browser-audio.mjs — puppeteer + fake-audio harness; reads window._gbrainTest namespace; PCM RMS-variance helper.
- tests/e2e/lib/whisper-judge.mjs — Whisper transcription + LLM-judge for SEMANTIC tier.
- tests/e2e/audio-fixtures/utterance-{add,joke,brain-query}.wav — 16kHz mono WAV via `say` + ffmpeg, committed for reproducibility.
- test/fixtures/claw-test-scenarios/voice-agent-install/{BRIEF.md, scenario.json, expected.json} — labeled BENCHMARK_FRICTION, blocks_ship=false.
LLM-judge persona evals + synthetic canonical baselines:
- tests/evals/judge.mjs — gateway-routed 3-model (Claude + GPT + Gemini) harness with 4-strategy JSON repair + 2/3-quorum aggregation (per the v0.27.x cross-modal pattern). Pass criterion: every axis mean ≥7 AND no model <5.
- tests/evals/fixtures/{mars-solo,mars-demo,venus,persona-routing,mars-multilingual}.jsonl — 5 fixture sets covering all axes.
- tests/evals/{mars-eval,venus-eval,mars-multilingual-eval,persona-routing-eval}.mjs — per-axis drivers.
- tests/evals/baseline-runs/canonical/*.json — agent-authored synthetic exemplars (PII-impossible by construction; demonstrate expected pass shape; never overwrite with live model output).
- tests/evals/baseline-runs/.gitignore — live receipts excluded.
DIY pipeline (Option B):
- code/pipeline.mjs — streaming STT (Deepgram nova-2) + LLM (Claude Sonnet 4.6 streaming SSE with sentence-boundary TTS dispatch) + TTS (Cartesia primary, OpenAI TTS fallback). 20-turn history cap, exponential-backoff reconnects, 25s keepalives, VAD presets (quiet/normal/noisy/very_noisy), barge-in via STT speechStart → LLM interrupt. Modular adapters for swapping providers.
--refresh mode (D3-A diff-and-propose):
- src/commands/integrations.ts: refreshRecipeIntoHostRepo() + classifyForRefresh() implementing the five states from refresh-algorithm.md (unchanged-identical, unchanged-stale, locally-modified, source-deleted, host-deleted, new-in-manifest). Transaction journal at .gbrain-source.refresh.log. Default policy: preserve operator's local edits (keep-mine); --auto take-theirs to overwrite; --dry-run for preview.
- test/integrations-install.test.ts: 7 new test cases pinning each classification state + default-preserve behavior + take-theirs overwrite + transaction journal + refusal on uninstalled target.
Mars multilingual restore:
- code/lib/personas/mars.mjs: explicit cross-lingual rule (Mandarin, Spanish, French, Japanese, Korean default to English but follow the speaker). Voice (Orus) supports the languages natively.
- tests/unit/mars-prompt-shape.test.mjs: assertion flipped from "MUST NOT claim multilingual" to "declares cross-lingual capability with English bias."
- tests/evals/fixtures/mars-multilingual.jsonl: 5 fixtures across Mandarin/Spanish/Japanese/French + explicit switch-back, pinned by mars-multilingual-eval.mjs.
Twilio recipe deprecation:
- recipes/twilio-voice-brain.md: deprecation banner pointing at agent-voice.md. Frontmatter version bumped to 0.8.2. Will be removed in v0.37.
Verify: bun run verify clean, 6736+ unit tests pass, 18/18 install+refresh tests pass, 96/98 host-side persona/tool/classifier tests pass (2 skipped env-gated).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…this PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the wave-1 + wave-2 scope at the v0.37 slot. The bump reflects the size of what this PR ships: copy-into-host-repo install paradigm (new install_kind discriminator + new install/refresh subcommand) + Mars/Venus voice agent reference + 5,500+ LOC of vendored scrubbed code + 4 LLM-judge eval suites + 2 env-gated E2E test suites + DIY Option B pipeline + 18-case install subcommand test coverage. A minor bump felt too small. Side fix: privacy guard caught two stale literal "wintermute" and "/data/.openclaw/" references in wave-2 files (voice-full-flow.test.mjs comment, expected.json blocklist payload). Both replaced with env-driven references to $AGENT_VOICE_PII_BLOCKLIST matching the D15-A pattern from the original review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v0.36.0.0 — Voice agent reference (Mars + Venus) lands as the first
copy-into-host-reposkillpack, plus all six originally-deferred items now ship in the same PR.Wave 1 (initial):
?test=1-gated WebAudio-tee capture, read-only tool router with D14-A allow-list, persona-aware prompt builder, upstream-error classifier).copy-into-host-repoinstall paradigm:gbrain integrations install agent-voice --target <repo>copies the bundle into the operator's host agent repo with per-file SHA-256 hashes recorded in.gbrain-source.json.scripts/check-no-pii-in-agent-voice.shwired intobun run verify, deterministicscripts/import-from-upstream.sh+ substitution table for refresh-from-upstream, single-sourceprivate-name-blocklist.jsonenv-driven regex.install_kind: 'local-managed' | 'copy-into-host-repo'frontmatter discriminator; newsrc/commands/integrations.tsinstallsubcommand with path-traversal hardening + refusal-case suite./plan-eng-reviewwith codex outside voice (D1-D14).Wave 2 (closes original deferred list):
tests/e2e/voice-roundtrip.test.mjs(~$0.10/run; env-gated) +tests/e2e/voice-full-flow.test.mjs(openclaw-driven install + roundtrip; ~$1-2/run; friction-discovery, NOT a ship gate). Shared puppeteer + fake-audio harness + Whisper-judge helper. Three-tier assertion (CONNECTION hard, NON-SILENT hard, SEMANTIC soft).mars-solo,mars-demo,venus,persona-routing,mars-multilingual). Synthetic canonical baselines committed; live receipts gitignored.code/pipeline.mjs— streaming Deepgram STT + Claude SSE with sentence-boundary TTS dispatch + Cartesia/OpenAI TTS, 20-turn history cap, exponential reconnect, keepalives, VAD presets, barge-in.--refreshmode: classifies each file (unchanged-identical / unchanged-stale / locally-modified / source-deleted / host-deleted / new-in-manifest); default preserves operator edits;--auto take-theirsoverwrites;--dry-runpreviews. Transaction journal at.gbrain-source.refresh.log.mars-multilingual.jsonleval fixtures.recipes/twilio-voice-brain.mdpointing atagent-voice.md. Removal in v0.37.test/fixtures/claw-test-scenarios/voice-agent-install/withBRIEF.md+scenario.json(labeled BENCHMARK_FRICTION,blocks_ship: false) +expected.json.Test Coverage
bun run verifycheck:no-pii-agent-voicetest/integrations-install.test.tsrecipes/agent-voice/tests/unit/AGENT_VOICE_E2E=1 OPENAI_API_KEY=...(~$0.10/run)AGENT_VOICE_FULL_E2E=1 OPENAI_API_KEY=... ANTHROPIC_API_KEY=... OPENCLAW_BIN=...(~$1-2/run)bun run gen:baselinesfromrecipes/agent-voice/(~$0.60 for all four suites)Pre-Landing Review
Covered by
/plan-eng-reviewwith codex outside-voice during planning. 14 decisions captured (D1-D14): monolithic ship, PII guard in verify, SHA-256 manifest, error-classified soft-fail,?test=1-gated instrumentation, real install subcommand, deterministic import script, close all unit gaps, ran codex, monolithic-in-gbrain (vs separate template), zero "Wintermute" in shipped surface, WebAudio-tee capture, agent-authored canonical baselines, read-only tools allow-list.Codex outside voice surfaced 30+ findings; 8 substantive new ones resolved during the review (D10-D14). Remainder folded into plan-rewrite mechanics.
Privacy
bun run verifyincludesscripts/check-no-pii-in-agent-voice.shwhich scans every file underrecipes/agent-voice/**+recipes/agent-voice.md+ the import scripts for shape regex (phone/email/SSN/JWT/bearer/credit-card) + path patterns +$AGENT_VOICE_PII_BLOCKLISTenv-driven name list. Zero literal private names land in shipped files. The env-driven blocklist contract lives inprivate-name-blocklist.json; CI passes the secret via env at run time.Files (this PR)
recipes/agent-voice.md+recipes/agent-voice/scripts/check-no-pii-in-agent-voice.sh+scripts/import-from-upstream.sh+scripts/upstream-scrub-table.txtsrc/commands/integrations.tstest/integrations-install.test.tstest/fixtures/claw-test-scenarios/voice-agent-install/recipes/twilio-voice-brain.mdVERSION+package.json+CHANGELOG.md+bun.lock+llms*.txtTest plan
bun run verifypasses (privacy + 14 gate checks + typecheck)keep-minedefault preserves edit,--auto take-theirsoverwrites; transaction journal writtenAGENT_VOICE_E2E=1 OPENAI_API_KEY=...against a fresh server installAGENT_VOICE_FULL_E2E=1+ all three keys +OPENCLAW_BINcd recipes/agent-voice && bun run gen:baselinesand verify pass on all 4 suites🤖 Generated with Claude Code