feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358
feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358senamakel merged 1 commit intotinyhumansai:mainfrom
Conversation
The caption-driven path used to skip the LLM entirely and speak a canned "Got it." / "Noted." ack, and the audio path sent each turn in isolation with max_tokens=20 and a "only acknowledge briefly" system prompt. Result: the in-meeting agent felt dumb and monotone. - Both run_turn and run_caption_turn now call llm_meeting with a rolling 12-event Heard/Spoke history pulled from the session log, so the model sees the conversation it's joining. - New MEETING_SYSTEM_PROMPT pushes the model to (a) classify whether the latest utterance is actually directed at it (returns empty string to stay silent on false-positive wake words / side conversation), and (b) respond conversationally for questions and tasks, briefly for dictations. - max_tokens 20 -> 220, temperature 0.2 -> 0.5. - strip_for_speech scrubs markdown / fenced code / bullet markers before TTS. - ReplySpeechOptions gains voice_settings passthrough; the meet agent pins model_id=eleven_turbo_v2_5 with stability 0.4, similarity_boost 0.75, style 0.35, speaker_boost on — the previous defaults were the main reason the voice sounded flat.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughMeeting agent turns now invoke an LLM with rolling conversation history instead of deterministic acknowledgements. The LLM response is sanitized for TTS safety by removing markdown and code fences. Voice synthesis now supports optional voice settings passed through ReplySpeechOptions. ChangesMeeting Agent LLM-Driven Caption Turns with Voice Settings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Summary
Got it./Noted.; the model itself decides whether the latest utterance was directed at the agent (returns empty string to stay silent on false-positive wake words / side conversation).MEETING_SYSTEM_PROMPTreplaces the "only acknowledge briefly" prompt with one that asks the model to respond conversationally for questions/tasks, briefly for dictations, and to emit speech-shaped (no markdown) text.max_tokens20 → 220,temperature0.2 → 0.5; bothrun_turn(audio path) andrun_caption_turn(caption path) share the samellm_meeting(prompt, history)call.ReplySpeechOptionsgains avoice_settingspassthrough; the meet agent pinsmodel_id=eleven_turbo_v2_5withstability 0.4,similarity_boost 0.75,style 0.35,use_speaker_boost true— the previous defaults were the main reason the voice sounded flat/monotone.strip_for_speechscrubs markdown punctuation, fenced code, and bullet markers before TTS.Problem
The Google Meet agent shipped in #1355 was deliberately scoped down to "note-taker that says
Got it." — every caption-driven turn skipped the LLM entirely in favor of a hashed canned ack, the audio path's system prompt explicitly told the model to never expand on anything, andmax_tokens=20capped replies at ~3 words. Combined with default ElevenLabs voice settings, the result felt dumb (could not actually answer questions, no awareness of meeting context, no intent classification on whether the wake-word match was a real address), and the voice sounded monotone.Solution
Lift the artificial dumbness in the LLM step and tune the TTS step:
recent_dialog_history(events, window)pulls the last 12Heard/Spokeevents fromMeetAgentSession::events()and shapes them intouser/assistantchat-completions messages.Noteevents (errors, wake-word matches) are filtered out.llm_meeting(prompt, history)postssystem + history + userto/openai/v1/chat/completions. Bothrun_turnandrun_caption_turnroute through it. The caption path retains the canned-ack fallback only for the LLM error case.session::record_eventalready recordsagent declined to respondin that branch and the existing tests around that path continue to pass.strip_for_speechis applied to the LLM output before TTS so any markdown that leaks (asterisks, fenced code, bullets) doesn't get spoken aloud.ReplySpeechOptions::voice_settings: Option<Value>is forwarded verbatim to the backend's/openai/v1/audio/speechendpoint, which proxies it to ElevenLabs. The single existing call site invoice/schemas.rswas updated to default toNone.Submission Checklist
docs/TESTING-STRATEGY.mdrecent_dialog_history_*andstrip_for_speech_*unit tests plus the existingrun_turn_falls_back_to_stub_without_backendtest which covers the LLM/TTS error fallback paths. Real-network LLM/TTS branches require backend creds and aren't run in CI; the offline fallback path is what's covered.docs/TESTING-STRATEGY.md)Impact
eleven_turbo_v2_5.ReplySpeechOptions::voice_settingsis additive (Option<Value>).Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:check— Rust-only change.pnpm typecheck— Rust-only change.cargo test --manifest-path Cargo.toml --lib meet_agent(35 passing, including 4 new tests forrecent_dialog_historyandstrip_for_speech)cargo fmt+cargo check --manifest-path Cargo.tomlcleanapp/src-taurinot modified.Validation Blocked
Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit
Release Notes
New Features
Tests