Skip to content

feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358

Merged
senamakel merged 1 commit intotinyhumansai:mainfrom
senamakel:feat/meet-resp-improve
May 8, 2026
Merged

feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358
senamakel merged 1 commit intotinyhumansai:mainfrom
senamakel:feat/meet-resp-improve

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented May 8, 2026

Summary

  • Caption-driven turns now call the LLM with rolling meeting history instead of speaking a canned Got it. / Noted.; the model itself decides whether the latest utterance was directed at the agent (returns empty string to stay silent on false-positive wake words / side conversation).
  • New MEETING_SYSTEM_PROMPT replaces the "only acknowledge briefly" prompt with one that asks the model to respond conversationally for questions/tasks, briefly for dictations, and to emit speech-shaped (no markdown) text.
  • max_tokens 20 → 220, temperature 0.2 → 0.5; both run_turn (audio path) and run_caption_turn (caption path) share the same llm_meeting(prompt, history) call.
  • TTS quality: ReplySpeechOptions gains a voice_settings passthrough; the meet agent pins model_id=eleven_turbo_v2_5 with stability 0.4, similarity_boost 0.75, style 0.35, use_speaker_boost true — the previous defaults were the main reason the voice sounded flat/monotone.
  • strip_for_speech scrubs markdown punctuation, fenced code, and bullet markers before TTS.

Problem

The Google Meet agent shipped in #1355 was deliberately scoped down to "note-taker that says Got it." — every caption-driven turn skipped the LLM entirely in favor of a hashed canned ack, the audio path's system prompt explicitly told the model to never expand on anything, and max_tokens=20 capped replies at ~3 words. Combined with default ElevenLabs voice settings, the result felt dumb (could not actually answer questions, no awareness of meeting context, no intent classification on whether the wake-word match was a real address), and the voice sounded monotone.

Solution

Lift the artificial dumbness in the LLM step and tune the TTS step:

  • recent_dialog_history(events, window) pulls the last 12 Heard/Spoke events from MeetAgentSession::events() and shapes them into user/assistant chat-completions messages. Note events (errors, wake-word matches) are filtered out.
  • llm_meeting(prompt, history) posts system + history + user to /openai/v1/chat/completions. Both run_turn and run_caption_turn route through it. The caption path retains the canned-ack fallback only for the LLM error case.
  • The model is instructed to return an empty string when the utterance is not directed at it; session::record_event already records agent declined to respond in that branch and the existing tests around that path continue to pass.
  • strip_for_speech is applied to the LLM output before TTS so any markdown that leaks (asterisks, fenced code, bullets) doesn't get spoken aloud.
  • ReplySpeechOptions::voice_settings: Option<Value> is forwarded verbatim to the backend's /openai/v1/audio/speech endpoint, which proxies it to ElevenLabs. The single existing call site in voice/schemas.rs was updated to default to None.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per docs/TESTING-STRATEGY.md
  • N/A: Diff coverage — changed lines are exercised by the new recent_dialog_history_* and strip_for_speech_* unit tests plus the existing run_turn_falls_back_to_stub_without_backend test which covers the LLM/TTS error fallback paths. Real-network LLM/TTS branches require backend creds and aren't run in CI; the offline fallback path is what's covered.
  • N/A: Coverage matrix updated — behaviour-only change to an existing feature row.
  • N/A: Affected feature IDs — behaviour-only change to existing meet_agent flows.
  • No new external network dependencies introduced (mock backend used per docs/TESTING-STRATEGY.md)
  • N/A: Manual smoke checklist — does not touch release-cut surfaces (no UI / no install path / no startup change).
  • N/A: Linked issue — drive-by improvement on top of feat(meet_agent): live note-taking agent for Google Meet (listen + speak) #1355, no tracking issue.

Impact

  • Runtime: desktop only (meet agent is desktop-scoped). LLM calls now carry up to ~12 prior turns plus a longer system prompt — input tokens per turn rise modestly; output capped at 220 tokens. Latency per caption turn becomes a real chat-completions roundtrip instead of a string lookup.
  • TTS cost: ElevenLabs char count rises proportionally to reply length (was ~10 chars/turn, now up to a couple hundred). Voice model bumped to eleven_turbo_v2_5.
  • Privacy: rolling-window meeting captions are sent to the chat-completions endpoint as conversation context. Same backend / same auth surface as before; nothing new leaves the device.
  • Compatibility: no RPC / schema changes; ReplySpeechOptions::voice_settings is additive (Option<Value>).

Related

  • Closes:
  • Follow-up PR(s)/TODOs: tool / skill registry access for the meet agent so it can actually do things (send a message, set a reminder, look up a doc) — currently the model can only answer in text.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/meet-resp-improve
  • Commit SHA: 54ba0b8

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only change.
  • N/A: pnpm typecheck — Rust-only change.
  • Focused tests: cargo test --manifest-path Cargo.toml --lib meet_agent (35 passing, including 4 new tests for recent_dialog_history and strip_for_speech)
  • Rust fmt/check (if changed): cargo fmt + cargo check --manifest-path Cargo.toml clean
  • N/A: Tauri fmt/check — app/src-tauri not modified.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: live meet agent responds conversationally with meeting context awareness instead of speaking canned acks; voice no longer sounds monotone.
  • User-visible effect: when a user says "hey openhuman, what was the action item from last week?" mid-call, the agent actually answers instead of saying "Got it."; the voice has natural inflection.

Parity Contract

  • Legacy behavior preserved: LLM/TTS error paths still fall back gracefully (LLM error → canned ack; TTS error → stub blip), and the offline-no-backend fallback test still passes.
  • Guard/fallback/dispatch parity checks: existing wake-word state machine, cooldown, and outbound-queue contracts are unchanged.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution: N/A

Summary by CodeRabbit

Release Notes

  • New Features

    • Meeting agent now generates dynamic, context-aware responses using AI and recent conversation history instead of static acknowledgments
    • Automatic response optimization for improved speech synthesis clarity and naturalness
  • Tests

    • Added unit tests to validate conversation history handling and response formatting

The caption-driven path used to skip the LLM entirely and speak a
canned "Got it." / "Noted." ack, and the audio path sent each turn
in isolation with max_tokens=20 and a "only acknowledge briefly"
system prompt. Result: the in-meeting agent felt dumb and monotone.

- Both run_turn and run_caption_turn now call llm_meeting with a
  rolling 12-event Heard/Spoke history pulled from the session log,
  so the model sees the conversation it's joining.
- New MEETING_SYSTEM_PROMPT pushes the model to (a) classify whether
  the latest utterance is actually directed at it (returns empty
  string to stay silent on false-positive wake words / side
  conversation), and (b) respond conversationally for questions and
  tasks, briefly for dictations.
- max_tokens 20 -> 220, temperature 0.2 -> 0.5.
- strip_for_speech scrubs markdown / fenced code / bullet markers
  before TTS.
- ReplySpeechOptions gains voice_settings passthrough; the meet
  agent pins model_id=eleven_turbo_v2_5 with stability 0.4,
  similarity_boost 0.75, style 0.35, speaker_boost on — the previous
  defaults were the main reason the voice sounded flat.
@senamakel senamakel requested a review from a team May 8, 2026 04:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91878ecb-ba8d-420a-b563-852c4ff01cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 0636b0c and 54ba0b8.

📒 Files selected for processing (3)
  • src/openhuman/meet_agent/brain.rs
  • src/openhuman/voice/reply_speech.rs
  • src/openhuman/voice/schemas.rs

📝 Walkthrough

Walkthrough

Meeting agent turns now invoke an LLM with rolling conversation history instead of deterministic acknowledgements. The LLM response is sanitized for TTS safety by removing markdown and code fences. Voice synthesis now supports optional voice settings passed through ReplySpeechOptions.

Changes

Meeting Agent LLM-Driven Caption Turns with Voice Settings

Layer / File(s) Summary
Voice Settings Type
src/openhuman/voice/reply_speech.rs
ReplySpeechOptions gains optional voice_settings: Option<Value> field to support passthrough voice configuration to the speech backend.
LLM and Speech Processing
src/openhuman/meet_agent/brain.rs
New MEETING_SYSTEM_PROMPT constant and llm_meeting(prompt, history) function build chat-completion messages and extract LLM responses. strip_for_speech removes markdown, code fences, and selected punctuation to make LLM output safe for TTS.
Dialog History Extraction
src/openhuman/meet_agent/brain.rs
ConversationTurn type and recent_dialog_history(events, window) function map Heard/Spoke events to user/assistant roles, filter empty text, drop Note events, and return up to window most recent entries in chronological order for LLM context.
Turn Orchestration with LLM
src/openhuman/meet_agent/brain.rs
run_turn drains inbound PCM samples and computes dialog history in session scope. run_caption_turn calls llm_meeting with drained prompt and history context; on LLM failure, records a Note and falls back to deterministic acknowledgement phrase.
TTS Model Selection and Voice Settings Injection
src/openhuman/meet_agent/brain.rs, src/openhuman/voice/reply_speech.rs, src/openhuman/voice/schemas.rs
TTS synthesis selects explicit ElevenLabs model via TTS_MODEL_ID. synthesize_reply conditionally injects voice_settings into request body when present. Handler explicitly passes voice_settings: None in constructed options.
Unit Tests: History and Speech Sanitization
src/openhuman/meet_agent/brain.rs
Tests validate recent_dialog_history role mapping, window capping, and empty-text filtering. Tests validate strip_for_speech removal of markdown code fences and selected punctuation characters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1355: Directly modifies the same meeting agent brain logic to replace canned caption acknowledgements with LLM-driven turns, updating run_caption_turn and run_turn orchestration, prompts, TTS sanitization, and test coverage.

Poem

🐰 A rabbit speaks of wisdom found,
In histories conversed around,
Clean speech stripped of markdown's dance,
The agent learns to chat and prance,
With voices shaped to every tune—
A meeting brightens up so soon! 🎙️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main changes: replacing canned acknowledgements with real LLM-driven conversational turns and tuning TTS settings for live meeting voice interactions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@senamakel senamakel merged commit f120347 into tinyhumansai:main May 8, 2026
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant