Skip to content

feat(meet_agent): live note-taking agent for Google Meet (listen + speak)#1355

Merged
senamakel merged 11 commits intotinyhumansai:mainfrom
senamakel:feat/meet-agent-loop
May 8, 2026
Merged

feat(meet_agent): live note-taking agent for Google Meet (listen + speak)#1355
senamakel merged 11 commits intotinyhumansai:mainfrom
senamakel:feat/meet-agent-loop

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 8, 2026

Summary

  • Adds a live agent loop for the embedded Google Meet call window: listens via Meet's built-in captions, fires on the wake word "hey openhuman", captures the dictated note in the session transcript, and speaks a short verbal acknowledgement back through the call.
  • New core domain src/openhuman/meet_agent/ (5 RPC methods, schemas, session registry, wake-word state machine) wired into the controller registry.
  • New shell module app/src-tauri/src/meet_audio/ (CDP-injected Web Audio bridge for the speak path, captions DOM observer for the listen path, auto-CC, lifecycle wired into meet_call_open_window).
  • New audio module in vendored tauri-runtime-cef exposing per-browser CEF audio handlers via a URL-prefix registry (initially used; later superseded by the captions path but kept for a future opt-in).
  • Zero system permission prompts: capture and playback both stay inside the CEF process.

Problem

The existing feat(meet) PR (#1350) lets the agent join a Meet call as an anonymous guest with the mascot as a virtual webcam, but the agent has no way to listen to the call or speak back. The user wants the agent to capture action items / notes during a meeting ("hey openhuman, remember to email Bob about the launch") so they can be surfaced post-meeting, and to acknowledge briefly in-call so the user knows it caught the dictation.

Constraints from the design discussion:

  • macOS first; mascot Y4M webcam must keep working.
  • No system audio permission prompt, no installer, no admin auth, no kext / HAL plugin.
  • Speak path must inject audio into Meet's getUserMedia such that other participants hear it.

Solution

Listen — Meet's built-in captions (after a short detour through CEF audio capture, see below):

  • app/src-tauri/src/meet_audio/captions_bridge.js runs at document-start in the embedded Meet page (installed via CDP Page.addScriptToEvaluateOnNewDocument + Page.reload). It auto-clicks "Turn on captions", attaches a MutationObserver (with a 250 ms safety poll) over the aria-label=\"Captions\" region, and queues new lines.
  • caption_listener.rs polls window.__openhumanDrainCaptions() every 500 ms and forwards lines to openhuman.meet_agent_push_caption.
  • The wake-word state machine in meet_agent::session::note_caption normalizes punctuation ("hey, openhuman" → "hey openhuman"), strips the wake phrase, buffers continuation captions for 1.5 s, then dispatches a brain turn. Includes an 8 s post-turn cooldown so Meet's lingering finalised caption (visible for 5–8 s) doesn't re-fire the wake word on every dedupe-then-grow cycle.

Speak — CDP-injected Web Audio bridge:

  • audio_bridge.js builds a 16 kHz MediaStreamAudioDestinationNode and monkey-patches navigator.mediaDevices.getUserMedia so audio requests get our destination stream (delegating video to the original so Chromium's fake-camera Y4M still renders the mascot). Exposes window.__openhumanFeedPcm(b64) for the shell to push PCM into.
  • speak_pump.rs polls meet_agent_poll_speech every 100 ms and feeds each chunk via Runtime.evaluate on a long-lived CDP session.
  • TTS via voice::reply_speech with output_format=pcm_16000 so ElevenLabs (via the hosted backend) returns bytes the bridge can play with no transcoding.

Brain — canned acks, no LLM in the hot path:

  • The note text is already stored verbatim as a Heard event on the session transcript (post-meeting summarisation can run an LLM offline against it).
  • The verbal ack rotates through a small set ("Got it.", "Noted.", "Adding that.", "On it.", "Captured.") selected by hashing the prompt — short, deterministic, no model latency, no chain-of-thought leakage. (Earlier iterations used agentic-v1/summarization-v1; agentic emitted CoT into TTS, summarization returned empty for short prompts.)

Why captions, not CEF audio?

  • Original design used cef_audio_handler_t for listen + a CDP-injected bridge for speak.
  • CEF queries get_audio_handler lazily (only when audio output starts), so a solo agent in a lobby or pre-admit window never engages the pipeline. Captions handle that case for free — Meet's STT is already running, speaker-attributed, and pre-segmented.
  • The CEF audio handler wiring (tauri-runtime-cef::audio + meet_audio::listen_capture) is kept in the tree as an inactive _legacy_listen field so re-enabling it later is a single wire change.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per docs/TESTING-STRATEGY.md
  • N/A: Coverage gate is a known constraint for this branch — the new shell meet_audio module is mostly CDP plumbing whose meaningful behaviour requires a real browser. Unit tests cover the resampler (irrelevant to the captions path now), the WAV header builder, the session wake-word state machine (punctuation, double-fire, cooldown), and the JSON-RPC E2E for the start/push/poll/stop lifecycle. The page-side JS (audio_bridge.js, captions_bridge.js) is exercised by the smoke runbook (docs/MEET_AGENT_SMOKE.md).
  • N/A: Coverage matrix — feature rows for meet_agent are not yet present in docs/TEST-COVERAGE-MATRIX.md; leaving as a follow-up since the matrix scope predates this PR's domain.
  • N/A: All affected feature IDs from the matrix are listed in the PR description — see above.
  • No new external network dependencies introduced (mock backend used per docs/TESTING-STRATEGY.md)
  • N/A: Manual smoke checklist — this PR ships a dedicated smoke runbook at docs/MEET_AGENT_SMOKE.md; folding it into docs/RELEASE-MANUAL-SMOKE.md is a follow-up once the feature graduates from beta.
  • N/A: Linked issue — no Linear issue for this branch; design was driven from in-session conversation.

Impact

  • Desktop only (macOS validated this session). Touches the meet-call lifecycle in meet_call_open_window; non-meet windows are unaffected because the audio handler registry is keyed by URL prefix and the bridges only install in the meet-call CEF target.
  • JS-injection note (per CLAUDE.md): the project's broader rule is "no new JS in embedded provider webviews" (the acct_* family). The Meet-call window is a distinct top-level surface for a single audio-bridging purpose. The user explicitly authorized this injection for the speak + captions paths; the no-JS rule for acct_* webviews is unchanged.
  • Vendored CEF submodule bumped to tinyhumansai/tauri-cef@feat/openhuman-audio-handler to pick up the new audio module.
  • Pre-push hook bypass note: pre-push ran prettier + cargo fmt and reformatted files; reformatted output is committed (d152cddc) and pushed normally — no --no-verify used.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/meet-agent-loop
  • Commit SHA: d152cdd

Validation Run

  • `pnpm --filter openhuman-app format:check` — ran via pre-push hook
  • `pnpm typecheck` — ran via pre-push hook
  • Focused tests: `cargo test --lib meet_agent` (31 passing) + `cargo test --test json_rpc_e2e json_rpc_meet_agent_session_lifecycle` (passing)
  • Rust fmt/check (if changed): `cargo fmt` + `cargo check` ran via pre-push hook
  • Tauri fmt/check (if changed): `cargo check --manifest-path app/src-tauri/Cargo.toml` ran via pre-push hook

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: agent joins a Meet call → captures dictated notes via the wake word → speaks a short verbal ack → notes are queryable post-meeting from the session transcript.
  • User-visible effect: a new live note-taking agent inside the Meet call window. Mascot webcam unchanged. No system permission prompts.

Parity Contract

  • Legacy behavior preserved: existing meet-join flow (feat(meet): join Google Meet calls with mascot virtual camera #1350) unchanged; meet_call_open_window still navigates the dedicated CEF window to the Meet URL with isolated profile and runs meet_scanner for join automation. The new meet_audio::start/stop calls are additive and best-effort (failures are logged and don't block window lifecycle).
  • Guard/fallback/dispatch parity checks: bridge install failure leaves a no-op pump so MeetAudioSession lifecycle stays uniform; the captions bridge's auto-CC click attempts cap at 30 (~60 s) so a user who deliberately disables CC is respected.

Summary by CodeRabbit

  • New Features

    • Live Meet agent (Beta): listens, transcribes, decides, and speaks back into meetings via a virtual microphone.
    • Automatic caption scraping and wake-word caption interactions.
    • In-meeting audio bridge for streamed synthesized speech and caption-driven prompts.
    • Session lifecycle controls (start/stop, per-call integration) and capability catalog entry.
  • Documentation

    • Added end-to-end smoke test runbook for the Meet agent.
  • Tests

    • Added end-to-end JSON-RPC lifecycle test for Meet agent.

senamakel added 10 commits May 7, 2026 18:14
New `openhuman.meet_agent_*` RPC surface so the Tauri shell can stream
inbound PCM from the open Meet window into core, run VAD-segmented
STT → LLM → TTS, and pull synthesized PCM back out. Sits next to
`meet/` (which only validates URL + mints request_id) — that domain
is single-shot and pure-validation; the live agentic loop needs
buffers, VAD, and a transcript log, which would bloat the validation
surface if jammed in.

This commit is scaffolding only:
- types/ops/session/rpc/schemas wired into the controller registry
- brain.rs ships stub STT (length-proportional placeholder), stub
  LLM ("I'm listening."), and stub TTS (200ms 440Hz tone) so the
  end-to-end audio path is exercisable without an LLM/TTS bill
- 23 unit tests covering VAD hangover, session registry lifecycle,
  RPC round-trip including a turn fired by simulated VAD silence

PR3 swaps the stubs for `voice::cloud_transcribe` (STT, ElevenLabs
under the hood) and `voice::reply_speech` (TTS), and routes LLM
through the existing `agent` runtime as a "meet" channel.

Refs the multi-slice plan for the meet-agent listen+speak loop.
Pulls in the new `crate::audio` module that exposes per-browser CEF
audio-handler callbacks via a URL-prefix registry. Required by the
upcoming `meet_audio` shell module to capture the embedded Meet
window's audio output without OS-level taps.

Submodule branch: feat/openhuman-audio-handler @ 3c321beac
(needs to be pushed to https://github.com/tinyhumansai/tauri-cef
before this PR can merge).
New shell module taps the embedded Meet webview's audio output via the
runtime's `audio::register_audio_handler` URL-prefix registry, runs an
inline float32-planar → 16 kHz mono PCM16LE resampler, and pushes
~100 ms chunks to `openhuman.meet_agent_push_listen_pcm` over JSON-RPC.
Speak path is scaffolded as a poll-and-discard loop today — the real
sink lands with the Chromium fake-audio `pipe://` patch in the next
slice.

Lifecycle is wired to `meet_call_open_window`:
- After window build, `meet_audio::start` opens a core session,
  registers the audio handler keyed by the call's normalised URL, and
  launches the speak pump
- On window destroy, `meet_audio::stop` releases the registration
  (silencing capture immediately), shuts the pump down, and asks core
  for the closeout summary (listened/spoken seconds, turn count)

Resampler is a stateful linear interpolator with phase + last-sample
carry across buffer boundaries (no tick at every CEF buffer flush).
Bounded mpsc channel (32 chunks) backpressures from the CEF audio
thread to the async forwarder — drops the oldest chunk on full rather
than blocking the renderer.

Tests cover passthrough, 48k→16k decimation, stereo-to-mono averaging,
clamping, and zero-rate guard. CEF callback path is exercised
end-to-end manually in the smoke test (slice 7).

No new system permissions: audio is read straight from the renderer
via CEF, never the OS mic / speakers.
Adds:
- `json_rpc_meet_agent_session_lifecycle` E2E test that walks the full
  start_session → push (loud + silent) → poll_speech → stop_session
  flow over a real local JSON-RPC server. Pins behavior the shell
  relies on: VAD doesn't fire while still hearing speech; closes the
  utterance after ~6 silent frames; brain stub enqueues outbound PCM
  fast enough for a 1s polling budget; stop_session returns sane
  listened_seconds + turn_count counters; stopping a non-existent
  session is a JSON-RPC error rather than a silent no-op.

- `meet_agent.live_loop` capability catalog entry covering listen +
  speak with the right privacy facets (Derived, leaves device, Google
  Meet + ElevenLabs destinations).

Stays network-free: STT/TTS are stubs in PR1, so the test exercises
the full RPC plumbing without any backend / model calls.
Original plan was a from-source Chromium patch to make
\`--use-file-for-fake-audio-capture\` accept a \`pipe://\` URL backed
by Rust. Discovered we don't maintain a CEF source build pipeline —
\`cef-dll-sys\` downloads pre-built binaries from a release URL.
Forking chromiumembedded/cef and wiring up build infra is its own
multi-day project, not a slice in this PR. Pivoted to:

- \`audio_bridge.js\`: tiny Web Audio bridge that builds a 16 kHz
  MediaStreamAudioDestinationNode, monkey-patches
  \`navigator.mediaDevices.getUserMedia\` to serve audio requests from
  that destination (delegating video to the original so Chromium's
  fake-camera Y4M still renders the mascot), and exposes
  \`window.__openhumanFeedPcm(b64)\` for the shell to push PCM into.
- \`inject.rs\`: attaches CDP to the Meet target, sends
  \`Page.addScriptToEvaluateOnNewDocument\` + \`Page.reload\` so the
  bridge applies before Meet's first \`getUserMedia\` call. Probes
  \`__openhumanAudioBridgeInfo()\` to confirm liveness before handing
  off to the pump.
- \`speak_pump.rs\`: rewritten to feed each poll_speech chunk into the
  bridge via \`Runtime.evaluate\` on a single long-lived CDP session.
  Bails after 30 consecutive failures (page navigated away).
- \`mod.rs\`: install_audio_bridge runs in start(); on failure, the
  pump is replaced with a no-op so the session still tracks listen
  counters cleanly.

JS-injection note: CLAUDE.md prohibits new JS injection in embedded
provider webviews (the \`acct_*\` family). The Meet call window is a
distinct top-level surface for a single audio-bridging purpose, and
the public CefAudioHandler API only covers listen — speak has no
comparable public hook short of a Chromium rebuild. User explicitly
authorized this injection for the speak path; the no-JS rule for
\`acct_*\` webviews is unchanged.

No system permissions, no admin install, mascot webcam preserved.
Swaps the brain stubs for real adapters:

- STT — wraps the drained PCM16LE buffer in a minimal RIFF/WAVE
  container (new \`wav.rs\` module + 3 tests) and posts via
  \`voice::cloud_transcribe\` (backend Whisper).
- LLM — direct chat-completions call through BackendOAuthClient
  with a "live meeting agent" system prompt and the transcript as
  the user message. \`max_tokens: 120\` keeps replies conversational;
  \`temperature: 0.4\` keeps them on-topic. The system prompt
  authorises the model to return an empty string when the latest
  utterance doesn't need a response.
- TTS — \`voice::reply_speech\` with \`output_format = \"pcm_16000\"\`
  so ElevenLabs (via the hosted backend) returns bytes the shell-side
  audio bridge can play directly with no transcoding.

Each stage falls back to a deterministic stub when the backend
session is missing — keeps existing unit tests + the JSON-RPC E2E
network-free, and means a smoke run without sign-in still produces
audible output instead of silently breaking. Real transport / 5xx
failures are recorded as Note events on the session transcript so
they're visible in the live captions overlay rather than silently
papered over.

Tests: extends meet_agent::brain unit tests + meet_agent::wav.
Two-laptop runbook covering the full live agent loop on a real Meet
call: how to verify the listen path (CEF audio handler → STT
transcripts in logs), the speak path (Web Audio bridge alive,
agent's voice on the host's speaker, mascot webcam preserved), and
the absence of any system permission prompt or driver install.
Includes a small failure-mode table mapping symptoms to fixes for
the most likely first-time issues.
Replaces CEF audio handler / Whisper STT with a DOM observer over
Meet's built-in live captions. CEF's cef_audio_handler_t is queried
lazily (only when audio output starts), so a solo agent in a lobby
or any pre-admit window never engages the pipeline. Captions handle
that case for free — Meet's STT is already running, speaker-attributed,
and pre-segmented.

Page side (captions_bridge.js):
- Auto-clicks "Turn on captions" (up to ~30 attempts over a minute)
- MutationObserver + 250ms safety poll over the captions region
  (selected by aria-label="Captions" — stable across class-name churn)
- Per-speaker dedupe so growing captions don't queue duplicates
- Drain API: window.__openhumanDrainCaptions(), info introspection
  via __openhumanCaptionsBridgeInfo()

Shell side (caption_listener.rs):
- Polls drain on a fresh CDP attach (separate from the speak pump's
  attach so they run concurrently without serialising)
- Forwards each line to openhuman.meet_agent_push_caption RPC
- Exits after 30 consecutive errors (page navigated away)

Core side (meet_agent):
- New types::PushCaptionRequest + RPC schema
- session::note_caption: wake-word state machine (case-insensitive
  match on "hey openhuman" / "hey open human" — tolerates Meet STT
  splitting the brand). Any text after the wake phrase + subsequent
  captions until the brain takes the prompt becomes the LLM input.
- brain::run_caption_turn: short delay (1.5s) so multi-fragment
  utterances assemble, then drain prompt → LLM → TTS → enqueue
  outbound. Skips STT entirely — captions are already text.

Listen path now works pre-admit and without other participants
speaking. Speak path unchanged — same Web Audio bridge.

Old CEF-audio path (listen_capture.rs) kept in tree as the
inactive _legacy_listen field on MeetAudioSession, so re-enabling
it later is a single wire change.
Live-call testing surfaced three regressions in the caption-driven
loop. Each is fixed here:

1. Wake word re-fires while the same utterance is still on screen.
   Meet keeps a finalised caption visible for ~5–8s after speaking
   ends. Our per-text dedupe in captions_bridge.js suppresses
   identical pushes but a single character growth re-queues the
   line — and once the brain drains the prompt and clears
   wake_active, that next push hits the wake-word match again.
   Result: 5–10 cascading turns per single dictation, prompt
   buffers ballooning past 9k chars, runaway TTS rate-limit cascade.

   Fix: 8s cooldown after take_pending_prompt, keyed off the page-side
   ts_ms (same clock as future caption pushes). During cooldown,
   captions still record to the transcript log but skip wake-word
   matching. Lifts wake_active gate AND the cooldown gate before
   the new utterance can fire again.

2. Punctuation breaks the wake match. Meet's STT inserts a comma
   between greeting and brand ("hey, openhuman"), so the literal
   substring search misses. Normalize: lowercase + non-alphanumeric
   to space + collapse whitespace, then substring against
   "hey openhuman" / "hey open human". Also handles "Hey OpenHuman.",
   "hey open-human", multi-space variants, etc.

3. Reasoning model leaks chain-of-thought into TTS. agentic-v1
   emits its internal monologue without <think> delimiters, so
   stripping doesn't help — and the resulting 250+ char replies
   were both unintelligible as a verbal ack and the actual visible
   "thinking" text the user saw narrated through Meet.

   Fix: skip the LLM in the hot path entirely. The note itself is
   already stored verbatim as a Heard event on the session
   transcript (the user's "remember to email Bob" lives there
   for post-meeting actioning). The verbal ack only needs to
   confirm capture, so we hardcode a small rotation
   ["Got it.", "Noted.", "Adding that.", "On it.", "Captured."]
   selected by hashing the prompt bytes — short, deterministic,
   no model latency, no rate-limit pressure, no CoT leak.

Tests:
- note_caption_handles_punctuated_wake — "Hey, OpenHuman ..."
- note_caption_handles_split_brand — "hey open-human ..."
- note_caption_does_not_double_fire_on_growing_caption

Existing meet_agent tests still pass (28→31 total).

Future work: post-meeting summarisation runs an LLM offline against
the full transcript log to surface the captured action-item list.
That path can take its time and use whichever model behaves best
for instruction-following without the latency / CoT constraints
the in-call ack has.
@senamakel senamakel requested a review from a team May 8, 2026 02:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a complete live agent audio loop for Google Meet. The system captures webview audio via CEF, processes it (STT→LLM→TTS), and injects synthesized speech back via a patched Web Audio API, while also listening to captions for wake-word triggers. The implementation spans Tauri shell-side audio pipelines and backend RPC-driven brain logic.

Changes

Meet Agent Live Audio Loop

Layer / File(s) Summary
RPC Type Contracts
src/openhuman/meet_agent/types.rs
Start/stop/push/poll/caption request and response payloads with VAD/utterance/turn counters.
Session State & Registry
src/openhuman/meet_agent/session.rs
Per-session PCM buffers, VAD feeding, caption-wake tracking, event logging, and thread-safe registry keyed by request_id.
Audio Operations & VAD
src/openhuman/meet_agent/ops.rs
Sample-rate validation, request-id sanitization, RMS-based voice activity detection state machine (Idle→Speech→Silence→EndOfUtterance).
WAV Utility
src/openhuman/meet_agent/wav.rs
RIFF/WAVE header packing for PCM16LE mono samples used in cloud STT requests.
Backend Brain Logic
src/openhuman/meet_agent/brain.rs
Two async turns: run_turn (drains PCM, STT→LLM→TTS with fallbacks), run_caption_turn (uses caption prompt, deterministic ack phrases).
RPC Handlers
src/openhuman/meet_agent/rpc.rs
Five handlers: start_session, push_listen_pcm (spawns run_turn on VAD end), push_caption (spawns run_caption_turn on wake), poll_speech, stop_session; includes PCM decoder and tests.
RPC Schemas
src/openhuman/meet_agent/schemas.rs
Define schemas for all five controllers with input/output specs, build registry, and provide function-name lookup with unknown fallback.
JavaScript Bridges (audio)
app/src-tauri/src/meet_audio/audio_bridge.js
Injected audio bridge: Web Audio destination, base64 PCM decode, scheduling, feed API, getUserMedia monkey-patch, and legacy API shim.
JavaScript Bridges (captions)
app/src-tauri/src/meet_audio/captions_bridge.js
Injected captions bridge: DOM scraping, dedupe, MutationObserver + polling, CC auto-enable attempts, and drain/info APIs for shell.
CDP Injection
app/src-tauri/src/meet_audio/inject.rs
Install JS bridges via CDP Page.addScriptToEvaluateOnNewDocument and reload, attach to Meet target, probe readiness, and expose drain_captions/feed_pcm_chunk helpers.
CEF Audio Capture
app/src-tauri/src/meet_audio/listen_capture.rs
Register CEF audio handler, resample float32 to mono 16 kHz, chunk into ~100 ms PCM16LE buffers, forward via async channel to RPC forwarder.
Speak Pump
app/src-tauri/src/meet_audio/speak_pump.rs
Poll synthesized speech every 100 ms, validate/decode base64 PCM, inject into Meet audio bridge via CDP.
Caption Listening
app/src-tauri/src/meet_audio/caption_listener.rs
Poll captions via CDP every 500 ms (post initial tick), forward to backend push_caption RPC, track failures and exit after threshold.
Shell Orchestration
app/src-tauri/src/meet_audio/mod.rs
Start/stop session lifecycle: open core RPC, install bridges, attach listeners, start pumps, manage MeetAudioState; graceful fallbacks on bridge install failure.
Core Integration
app/src-tauri/src/lib.rs, app/src-tauri/src/meet_call/mod.rs, src/core/all.rs, src/openhuman/mod.rs
Register meet_audio state, wire meet_audio start/stop into meet_call window lifecycle, append meet_agent controllers/schemas to core registry, export meet_agent module.
Capability Catalog
src/openhuman/about_app/catalog.rs, catalog_tests.rs
Add meet_agent.live_loop capability (Automation, Beta) with privacy metadata and update catalog tests.
Tests & Docs
tests/json_rpc_e2e.rs, docs/MEET_AGENT_SMOKE.md
E2E JSON-RPC test for meet_agent lifecycle and smoke-test runbook for manual two-laptop validation.
Vendored Submodule
app/src-tauri/vendor/tauri-cef
Update tauri-cef pinned commit to include audio module support.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • tinyhumansai/openhuman#268: Modifies core controller-registration functions; related to adding new RPC namespaces/controllers.
  • tinyhumansai/openhuman#159: Also extends src/core/all.rs to add new OpenHuman namespaces/controllers; touches the same integration points.
  • tinyhumansai/openhuman#618: Unit tests that validate controller registry contents and namespace descriptions overlap core/all.rs changes.

Suggested reviewers

  • graycyrus

Poem

🐰 I hopped inside the webview's hum,

I stitched the streams where captions come,
I schedule PCM and patch the mic,
The meet agent listens, thinks, and speaks quick,
Tiny bytes dance, the audio bridge is spun!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: a live note-taking agent for Google Meet that listens and speaks.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (7)
app/src-tauri/vendor/tauri-cef (1)

1-1: ⚖️ Poor tradeoff

Document the custom fork rationale and upstream plan.

This submodule points to a custom fork (tinyhumansai/tauri-cef) rather than the upstream tauri-cef repository. Using forks of low-level dependencies like CEF (Chromium Embedded Framework) introduces supply chain risk, as the fork may miss upstream security patches or introduce undisclosed changes.

Please ensure:

  1. The rationale for forking (audio handler functionality) is documented in the repository (e.g., README, ADR, or inline comments in relevant integration code)
  2. There's a plan to either upstream these changes or periodically sync security patches from upstream
  3. The specific changes in the fork vs. upstream are tracked and reviewable
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/vendor/tauri-cef` at line 1, Document that the project uses a
custom fork named "tinyhumansai/tauri-cef" (the tauri-cef submodule) because of
added audio handler functionality; add a short rationale and summary of the fork
(what was changed, why) into the repo README or an ADR, add an UPSTREAM_SYNC or
MAINTENANCE plan that states whether changes will be upstreamed or how/when
security patches will be pulled from upstream, and ensure the exact diffs
between upstream tauri-cef and the fork are tracked (e.g., a CHANGELOG or a
git-diff snapshot and a pointer to the submodule commit) so reviewers can
inspect the audio handler changes and follow the syncing/upstreaming plan.
tests/json_rpc_e2e.rs (1)

3931-4094: 🏗️ Heavy lift

Add one E2E for push_caption / wake-word flow as well.

This test only covers the legacy PCM/VAD lifecycle. The shipped Meet listen path in this PR is caption-driven, so a regression in wake-word assembly/cooldown could still pass CI. A focused push_captionpoll_speech JSON-RPC test would lock down the path the shell actually uses.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/json_rpc_e2e.rs` around lines 3931 - 4094, Add a new tokio::test that
mirrors json_rpc_meet_agent_session_lifecycle but exercises the
caption/wake-word flow: call openhuman.meet_agent_start_session (same setup),
then send a wake-word caption via openhuman.meet_agent_push_caption (use the
same request_id pattern), poll openhuman.meet_agent_poll_speech until non-empty
pcm_base64 is returned, then call openhuman.meet_agent_stop_session and assert
listened_seconds > 0 and turn_count == 1; ensure you also test that stopping a
non-existent session errors (reuse the bogus stop check). Locate the new test
near json_rpc_meet_agent_session_lifecycle and reuse helpers post_json_rpc,
assert_no_jsonrpc_error, assert_jsonrpc_error and the same B64/EnvVar setup so
it runs under the same ephemeral RPC server harness.
app/src-tauri/src/meet_audio/speak_pump.rs (1)

57-57: 💤 Low value

Redundant mut rebinding.

cdp is already owned and mutable; the rebinding adds noise.

Suggested fix
-        let mut cdp = cdp;
         let mut feed_errors: u32 = 0;
+        let mut cdp = cdp;

Actually, since cdp is passed by value and used mutably, you can just remove the rebinding entirely and use cdp directly (it's already mutable by ownership):

-        let mut cdp = cdp;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/src/meet_audio/speak_pump.rs` at line 57, Remove the redundant
rebinding "let mut cdp = cdp;" in speak_pump.rs and use the existing owned
mutable variable cdp directly; locate the usage in the function or block where
cdp is passed by value (search for the symbol cdp and the rebinding) and delete
that line so subsequent mutable operations reference the original cdp binding.
app/src-tauri/src/meet_audio/mod.rs (1)

246-249: ⚡ Quick win

Creating a new HTTP client per RPC call is inefficient.

reqwest::Client maintains a connection pool and is designed to be reused. Creating a new client for every rpc_call bypasses connection reuse and adds overhead.

Consider using a OnceCell<reqwest::Client> or passing a shared client.

Suggested fix
+use std::sync::OnceLock;
+
+static RPC_CLIENT: OnceLock<reqwest::Client> = OnceLock::new();
+
+fn get_rpc_client() -> Result<&'static reqwest::Client, String> {
+    Ok(RPC_CLIENT.get_or_init(|| {
+        reqwest::Client::builder()
+            .timeout(std::time::Duration::from_secs(10))
+            .build()
+            .expect("failed to build HTTP client")
+    }))
+}
+
 pub(crate) async fn rpc_call(
     method: &str,
     params: serde_json::Value,
 ) -> Result<serde_json::Value, String> {
     // ...
     let url = crate::core_rpc::core_rpc_url_value();
-    let client = reqwest::Client::builder()
-        .timeout(std::time::Duration::from_secs(10))
-        .build()
-        .map_err(|e| format!("http client: {e}"))?;
+    let client = get_rpc_client()?;
     let req = crate::core_rpc::apply_auth(client.post(&url))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/src/meet_audio/mod.rs` around lines 246 - 249, The code
currently builds a new reqwest::Client inside the rpc call (the block that does
Client::builder().timeout(...).build()), which prevents connection reuse; change
this to use a shared client instead—either introduce a static
OnceCell<reqwest::Client> (e.g. a global CLIENT initialized once and
.get_or_init(...) used where the builder is now) or modify the caller signature
to accept &reqwest::Client and remove the per-call build; update all references
that call the rpc routine to use the shared client and remove the map_err(build)
path so errors only occur on initial client creation.
src/openhuman/meet_agent/rpc.rs (1)

64-70: ⚖️ Poor tradeoff

Spawned brain turn panics are silently swallowed.

If brain::run_turn panics, the tokio::spawn will abort that task but the error won't surface anywhere. Consider using spawn with a catch-unwind wrapper or at minimum logging task completion/failure.

Suggested improvement
         let request_id = req.request_id.clone();
         tokio::spawn(async move {
-            if let Err(err) = brain::run_turn(&request_id).await {
-                log::warn!("{LOG_PREFIX} brain turn failed request_id={request_id} err={err}");
-            }
+            match std::panic::AssertUnwindSafe(brain::run_turn(&request_id))
+                .catch_unwind()
+                .await
+            {
+                Ok(Ok(())) => {}
+                Ok(Err(err)) => {
+                    log::warn!("{LOG_PREFIX} brain turn failed request_id={request_id} err={err}");
+                }
+                Err(_) => {
+                    log::error!("{LOG_PREFIX} brain turn panicked request_id={request_id}");
+                }
+            }
         });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/meet_agent/rpc.rs` around lines 64 - 70, The spawned task using
tokio::spawn for brain::run_turn(&request_id) can panic and the panic will be
silently aborted; wrap the run_turn call in a panic-safe wrapper (use
std::panic::AssertUnwindSafe + tokio::spawn(async move { let result =
tokio::spawn or futures::FutureExt::catch_unwind on the async block }).await or
futures::FutureExt::catch_unwind) and log both panic and Err cases so failures
surface: call brain::run_turn(&request_id).await inside a catch_unwind, then log
panics with the request_id and log the Err variant as you already do, ensuring
the tokio::spawn body handles and records both panic and normal error outcomes.
app/src-tauri/src/meet_audio/captions_bridge.js (2)

150-161: ⚡ Quick win

Redundant duplicate condition.

Line 153 checks the same condition twice: lbl.indexOf("turn on captions") === 0 and /^turn on captions/.test(lbl) are equivalent. The regex test is unnecessary.

Suggested fix
-      if (lbl.indexOf("turn on captions") === 0 || /^turn on captions/.test(lbl)) {
+      if (lbl.indexOf("turn on captions") === 0) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/src/meet_audio/captions_bridge.js` around lines 150 - 161, The
loop in captions_bridge.js has a redundant condition checking the same thing
twice; replace the combined test (lbl.indexOf("turn on captions") === 0 ||
/^turn on captions/.test(lbl)) with a single clear check such as
lbl.startsWith("turn on captions") (or keep lbl.indexOf(...) === 0) to remove
the duplicate regex test, leaving the click/enableAttempts/return behavior
unchanged for the function that iterates over buttons and uses
enableAttempts/ENABLE_ATTEMPT_BUDGET.

40-45: 💤 Low value

Unbounded speaker tracking map may grow over long calls.

lastBySpeaker accumulates entries for every distinct speaker name encountered and is never pruned. In a long meeting with many participants or speaker-name churn, this could grow indefinitely.

Consider periodically pruning entries older than a few seconds, or using a bounded LRU-style map.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src-tauri/src/meet_audio/captions_bridge.js` around lines 40 - 45,
lastBySpeaker currently grows without bounds; update the captions handling to
track a timestamp per speaker in lastBySpeaker and prune stale entries (e.g.,
remove keys older than N seconds) or replace lastBySpeaker with a small bounded
LRU map implementation so the map never exceeds a max size. Specifically, modify
the code paths that update lastBySpeaker (the caption enqueue/emit logic) to
record Date.now() alongside the fingerprint and run a lightweight prune step (or
LRU eviction) before inserting new speakers so old/rare speakers are removed
automatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src-tauri/src/meet_audio/audio_bridge.js`:
- Around line 33-171: Add stable "[openhuman-audio-bridge]" console logs at key
entry/exit and branch points: log on initial install when setting
window.__openhumanAudioBridgeInstalled, inside ensureContext() indicating
creating vs reusing AudioContext and its resulting state, in
navigator.mediaDevices.getUserMedia override to log whether an audio-only
request was intercepted vs audio+video (include constraints), and after splicing
tracks when combining streams; also log in window.__openhumanFeedPcm when a feed
is received (size/duration) and keep the existing catch log but include the same
tag; finally log when the legacy navigator.getUserMedia alias is patched and
when navigator.mediaDevices is absent so installers can see why interception
didn’t occur.
- Around line 137-155: The current code returns the shared dest.stream
(ourStream) which is reused across getUserMedia calls and can be permanently
stopped; instead, create a fresh MediaStream for each request by cloning the
destination tracks via track.clone() and adding those clones to a new
MediaStream; when constraints.video is false return
Promise.resolve(newMediaStreamWithClonedAudio), and when combining with
origGum({ video: ... }) clone the destination audio tracks and add the clones to
the realStream (use realStream.addTrack(clone)) rather than moving/returning the
singleton dest.stream or its original tracks.

In `@app/src-tauri/src/meet_audio/caption_listener.rs`:
- Around line 109-122: The loop in caption_listener.rs currently swallows
failures from super::rpc_call("openhuman.meet_agent_push_caption") by logging
and returning Ok(()), which prevents MAX_CONSECUTIVE_ERRORS from ever
incrementing; instead propagate the failure to the caller: replace the
debug-only handling in the for loop (the block around super::rpc_call / res) so
that a failed rpc_call returns Err(...) from the enclosing function (or use the
? operator) with a clear context message referencing request_id and the rpc
error; ensure the function signature supports returning that error type so the
outer task can back off/terminate.

In `@app/src-tauri/src/meet_audio/listen_capture.rs`:
- Around line 117-129: The current code splits each incoming pcm_bytes but does
not accumulate undersized CEF packets nor evict oldest entries when the channel
is full; change the logic around resampler.lock()/feed_and_drain, FLUSH_SAMPLES
and tx.try_send to use a persistent pending buffer (e.g., a Vec<u8> or VecDeque
stored outside the per-callback scope) that appends successive pcm_bytes until
its length >= FLUSH_SAMPLES * 2, then emits fixed-size chunks; when attempting
to forward a chunk with tx.try_send (the code currently at the try_send/ log
block referencing request_id) and it returns Err (channel full), implement an
overwrite- oldest policy by dropping from the pending buffer (pop_front) or
otherwise evicting the oldest buffered chunk and retrying send so the newest
audio is pushed, and apply the same persistent-buffer+evict-oldest change to the
other similar block around lines 219-266.

In `@docs/MEET_AGENT_SMOKE.md`:
- Around line 30-56: The runbook currently validates the old pre-wake-word/STT
flow and will mislead testers: update the examples and checks to reflect the new
caption-driven, wake-word gated path by (1) changing Step 4 sample prompts to
include the wake phrase "hey openhuman" (or variations) and (2) updating the
"Listen path" checks to avoid expecting the legacy
push_listen_pcm/handle_push_listen_pcm and STT logs and instead mention caption
events and wake-word gating logs (referencing the existing log lines like
`[meet-agent] turn done` and any caption-related log entries); ensure the doc
explicitly states that absence of `cef stream start` or `forward channel push`
may be expected unless wake-word is spoken and that testers should look for
caption/wake-word related log entries instead.

In `@src/openhuman/meet_agent/brain.rs`:
- Around line 43-45: The code currently hard-codes SAMPLE_RATE_HZ = 16_000 (and
uses MIN_TURN_SAMPLES derived from it) but the public API accepts variable
sample_rate_hz; update the module to either enforce 16 kHz at the session
boundary or use the session's actual sample rate throughout: in start_session
(validate sample_rate_hz and return an error if it is not 16_000) OR change all
uses of SAMPLE_RATE_HZ and MIN_TURN_SAMPLES (and any WAV packing, duration
calculation, and turn-floor sizing logic referenced in functions that produce
WAVs and compute timings around lines ~246-260 and ~356-359) to accept and use a
per-session sample_rate_hz from the session struct/params (pass it into helpers
and recompute derived sample counts accordingly). Ensure all places that write
WAV headers, compute durations, or derive sample counts use the session-level
sample_rate_hz instead of the constant.

In `@src/openhuman/meet_agent/schemas.rs`:
- Around line 298-312: Rename the five delegator functions currently named
wrap_start_session, wrap_push_listen_pcm, wrap_push_caption, wrap_poll_speech,
and wrap_stop_session to the repo-standard names handle_start_session,
handle_push_listen_pcm, handle_push_caption, handle_poll_speech, and
handle_stop_session respectively; keep the signature (fn NAME(p: Map<String,
Value>) -> ControllerFuture) and body delegating to super::rpc::handle_* (e.g.,
Box::pin(async move { super::rpc::handle_start_session(p).await })) unchanged,
and update any registry entries (e.g., all_registered_controllers or schemas
lists) that referenced the old wrap_* symbols to the new handle_* symbols so the
domain follows the schema-module contract.

---

Nitpick comments:
In `@app/src-tauri/src/meet_audio/captions_bridge.js`:
- Around line 150-161: The loop in captions_bridge.js has a redundant condition
checking the same thing twice; replace the combined test (lbl.indexOf("turn on
captions") === 0 || /^turn on captions/.test(lbl)) with a single clear check
such as lbl.startsWith("turn on captions") (or keep lbl.indexOf(...) === 0) to
remove the duplicate regex test, leaving the click/enableAttempts/return
behavior unchanged for the function that iterates over buttons and uses
enableAttempts/ENABLE_ATTEMPT_BUDGET.
- Around line 40-45: lastBySpeaker currently grows without bounds; update the
captions handling to track a timestamp per speaker in lastBySpeaker and prune
stale entries (e.g., remove keys older than N seconds) or replace lastBySpeaker
with a small bounded LRU map implementation so the map never exceeds a max size.
Specifically, modify the code paths that update lastBySpeaker (the caption
enqueue/emit logic) to record Date.now() alongside the fingerprint and run a
lightweight prune step (or LRU eviction) before inserting new speakers so
old/rare speakers are removed automatically.

In `@app/src-tauri/src/meet_audio/mod.rs`:
- Around line 246-249: The code currently builds a new reqwest::Client inside
the rpc call (the block that does Client::builder().timeout(...).build()), which
prevents connection reuse; change this to use a shared client instead—either
introduce a static OnceCell<reqwest::Client> (e.g. a global CLIENT initialized
once and .get_or_init(...) used where the builder is now) or modify the caller
signature to accept &reqwest::Client and remove the per-call build; update all
references that call the rpc routine to use the shared client and remove the
map_err(build) path so errors only occur on initial client creation.

In `@app/src-tauri/src/meet_audio/speak_pump.rs`:
- Line 57: Remove the redundant rebinding "let mut cdp = cdp;" in speak_pump.rs
and use the existing owned mutable variable cdp directly; locate the usage in
the function or block where cdp is passed by value (search for the symbol cdp
and the rebinding) and delete that line so subsequent mutable operations
reference the original cdp binding.

In `@app/src-tauri/vendor/tauri-cef`:
- Line 1: Document that the project uses a custom fork named
"tinyhumansai/tauri-cef" (the tauri-cef submodule) because of added audio
handler functionality; add a short rationale and summary of the fork (what was
changed, why) into the repo README or an ADR, add an UPSTREAM_SYNC or
MAINTENANCE plan that states whether changes will be upstreamed or how/when
security patches will be pulled from upstream, and ensure the exact diffs
between upstream tauri-cef and the fork are tracked (e.g., a CHANGELOG or a
git-diff snapshot and a pointer to the submodule commit) so reviewers can
inspect the audio handler changes and follow the syncing/upstreaming plan.

In `@src/openhuman/meet_agent/rpc.rs`:
- Around line 64-70: The spawned task using tokio::spawn for
brain::run_turn(&request_id) can panic and the panic will be silently aborted;
wrap the run_turn call in a panic-safe wrapper (use std::panic::AssertUnwindSafe
+ tokio::spawn(async move { let result = tokio::spawn or
futures::FutureExt::catch_unwind on the async block }).await or
futures::FutureExt::catch_unwind) and log both panic and Err cases so failures
surface: call brain::run_turn(&request_id).await inside a catch_unwind, then log
panics with the request_id and log the Err variant as you already do, ensuring
the tokio::spawn body handles and records both panic and normal error outcomes.

In `@tests/json_rpc_e2e.rs`:
- Around line 3931-4094: Add a new tokio::test that mirrors
json_rpc_meet_agent_session_lifecycle but exercises the caption/wake-word flow:
call openhuman.meet_agent_start_session (same setup), then send a wake-word
caption via openhuman.meet_agent_push_caption (use the same request_id pattern),
poll openhuman.meet_agent_poll_speech until non-empty pcm_base64 is returned,
then call openhuman.meet_agent_stop_session and assert listened_seconds > 0 and
turn_count == 1; ensure you also test that stopping a non-existent session
errors (reuse the bogus stop check). Locate the new test near
json_rpc_meet_agent_session_lifecycle and reuse helpers post_json_rpc,
assert_no_jsonrpc_error, assert_jsonrpc_error and the same B64/EnvVar setup so
it runs under the same ephemeral RPC server harness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9eeb6c72-1f31-431b-885d-fc92db1f6324

📥 Commits

Reviewing files that changed from the base of the PR and between 319afbe and d152cdd.

📒 Files selected for processing (24)
  • app/src-tauri/src/lib.rs
  • app/src-tauri/src/meet_audio/audio_bridge.js
  • app/src-tauri/src/meet_audio/caption_listener.rs
  • app/src-tauri/src/meet_audio/captions_bridge.js
  • app/src-tauri/src/meet_audio/inject.rs
  • app/src-tauri/src/meet_audio/listen_capture.rs
  • app/src-tauri/src/meet_audio/mod.rs
  • app/src-tauri/src/meet_audio/speak_pump.rs
  • app/src-tauri/src/meet_call/mod.rs
  • app/src-tauri/vendor/tauri-cef
  • docs/MEET_AGENT_SMOKE.md
  • src/core/all.rs
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/about_app/catalog_tests.rs
  • src/openhuman/meet_agent/brain.rs
  • src/openhuman/meet_agent/mod.rs
  • src/openhuman/meet_agent/ops.rs
  • src/openhuman/meet_agent/rpc.rs
  • src/openhuman/meet_agent/schemas.rs
  • src/openhuman/meet_agent/session.rs
  • src/openhuman/meet_agent/types.rs
  • src/openhuman/meet_agent/wav.rs
  • src/openhuman/mod.rs
  • tests/json_rpc_e2e.rs

Comment thread app/src-tauri/src/meet_audio/audio_bridge.js
Comment thread app/src-tauri/src/meet_audio/audio_bridge.js Outdated
Comment thread app/src-tauri/src/meet_audio/caption_listener.rs Outdated
Comment thread app/src-tauri/src/meet_audio/listen_capture.rs
Comment thread docs/MEET_AGENT_SMOKE.md Outdated
Comment thread src/openhuman/meet_agent/brain.rs
Comment thread src/openhuman/meet_agent/schemas.rs Outdated
- audio_bridge.js: clone destination tracks per getUserMedia call so
  Meet's track.stop() can't permanently kill the bridge; add stable
  [openhuman-audio-bridge] logs for install / context creation /
  interception branches / sampled feed cadence.
- caption_listener.rs: bubble push_caption RPC failures up so
  MAX_CONSECUTIVE_ERRORS can trip; previously a broken core session
  silently dropped captions forever.
- meet_agent::ops: lock validate_sample_rate to 16 kHz exactly
  (REQUIRED_SAMPLE_RATE) since brain.rs hard-codes the rate
  throughout (WAV header, MIN_TURN_SAMPLES, listened_seconds).
  brain.rs now sources the constant from ops so any future
  loosening of the boundary breaks the math at compile time.
- meet_agent/schemas.rs: rename wrap_* delegators to handle_* per
  the per-domain schemas.rs convention noted in CLAUDE.md.
- docs/MEET_AGENT_SMOKE.md: rewrite Step 4 + Listen path checks
  for the caption-driven flow (wake-word phrases, captions
  drained / wake word fired / caption turn done log lines,
  __openhumanCaptionsBridgeInfo introspection); call out that
  cef stream start / push_listen_pcm logs are NOT expected on
  the active path.

Dismissed (replied in thread): listen_capture.rs chunking /
backpressure suggestion — that module is now the inactive
_legacy_listen field; live listen path is captions-driven. Will
revisit if/when we re-enable CEF audio.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/meet_agent/schemas.rs`:
- Around line 152-153: The schema description for push_caption in
src/openhuman/meet_agent/schemas.rs is stale—update the description string for
the push_caption endpoint/field to remove the claim that the wake-word dispatch
triggers an “LLM/TTS turn” and instead state that the wake-word gate triggers a
deterministic/canned hot-path response (or otherwise reflect current non-LLM
behavior); locate the push_caption description literal and replace the text
accordingly so API docs and operator expectations match runtime behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdaa81ae-767f-4a1d-8f6d-55b1af967699

📥 Commits

Reviewing files that changed from the base of the PR and between d152cdd and e279028.

📒 Files selected for processing (6)
  • app/src-tauri/src/meet_audio/audio_bridge.js
  • app/src-tauri/src/meet_audio/caption_listener.rs
  • docs/MEET_AGENT_SMOKE.md
  • src/openhuman/meet_agent/brain.rs
  • src/openhuman/meet_agent/ops.rs
  • src/openhuman/meet_agent/schemas.rs
✅ Files skipped from review due to trivial changes (3)
  • app/src-tauri/src/meet_audio/caption_listener.rs
  • docs/MEET_AGENT_SMOKE.md
  • app/src-tauri/src/meet_audio/audio_bridge.js

Comment on lines +152 to +153
description: "Push a caption line scraped from Meet's live captions DOM. The wake-word \
gate (\"hey openhuman\") triggers an LLM/TTS turn when fired.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Schema description is stale about LLM usage.

The push_caption description says wake-word dispatch triggers an “LLM/TTS turn”, but this PR’s behavior is deterministic/canned in the hot path. Updating this text will prevent misleading API docs and operator expectations.

✏️ Suggested text update
-        description: "Push a caption line scraped from Meet's live captions DOM. The wake-word \
-                      gate (\"hey openhuman\") triggers an LLM/TTS turn when fired.",
+        description: "Push a caption line scraped from Meet's live captions DOM. The wake-word \
+                      gate (\"hey openhuman\") triggers a reply/TTS turn when fired.",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
description: "Push a caption line scraped from Meet's live captions DOM. The wake-word \
gate (\"hey openhuman\") triggers an LLM/TTS turn when fired.",
description: "Push a caption line scraped from Meet's live captions DOM. The wake-word \
gate (\"hey openhuman\") triggers a reply/TTS turn when fired.",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/meet_agent/schemas.rs` around lines 152 - 153, The schema
description for push_caption in src/openhuman/meet_agent/schemas.rs is
stale—update the description string for the push_caption endpoint/field to
remove the claim that the wake-word dispatch triggers an “LLM/TTS turn” and
instead state that the wake-word gate triggers a deterministic/canned hot-path
response (or otherwise reflect current non-LLM behavior); locate the
push_caption description literal and replace the text accordingly so API docs
and operator expectations match runtime behavior.

@senamakel senamakel merged commit 0636b0c into tinyhumansai:main May 8, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant