feat(meet_agent): real LLM turns + tuned TTS for live meet voice by senamakel · Pull Request #1358 · tinyhumansai/openhuman

senamakel · 2026-05-08T04:49:36Z

Summary

Caption-driven turns now call the LLM with rolling meeting history instead of speaking a canned Got it. / Noted.; the model itself decides whether the latest utterance was directed at the agent (returns empty string to stay silent on false-positive wake words / side conversation).
New MEETING_SYSTEM_PROMPT replaces the "only acknowledge briefly" prompt with one that asks the model to respond conversationally for questions/tasks, briefly for dictations, and to emit speech-shaped (no markdown) text.
max_tokens 20 → 220, temperature 0.2 → 0.5; both run_turn (audio path) and run_caption_turn (caption path) share the same llm_meeting(prompt, history) call.
TTS quality: ReplySpeechOptions gains a voice_settings passthrough; the meet agent pins model_id=eleven_turbo_v2_5 with stability 0.4, similarity_boost 0.75, style 0.35, use_speaker_boost true — the previous defaults were the main reason the voice sounded flat/monotone.
strip_for_speech scrubs markdown punctuation, fenced code, and bullet markers before TTS.

Problem

The Google Meet agent shipped in #1355 was deliberately scoped down to "note-taker that says Got it." — every caption-driven turn skipped the LLM entirely in favor of a hashed canned ack, the audio path's system prompt explicitly told the model to never expand on anything, and max_tokens=20 capped replies at ~3 words. Combined with default ElevenLabs voice settings, the result felt dumb (could not actually answer questions, no awareness of meeting context, no intent classification on whether the wake-word match was a real address), and the voice sounded monotone.

Solution

Lift the artificial dumbness in the LLM step and tune the TTS step:

recent_dialog_history(events, window) pulls the last 12 Heard/Spoke events from MeetAgentSession::events() and shapes them into user/assistant chat-completions messages. Note events (errors, wake-word matches) are filtered out.
llm_meeting(prompt, history) posts system + history + user to /openai/v1/chat/completions. Both run_turn and run_caption_turn route through it. The caption path retains the canned-ack fallback only for the LLM error case.
The model is instructed to return an empty string when the utterance is not directed at it; session::record_event already records agent declined to respond in that branch and the existing tests around that path continue to pass.
strip_for_speech is applied to the LLM output before TTS so any markdown that leaks (asterisks, fenced code, bullets) doesn't get spoken aloud.
ReplySpeechOptions::voice_settings: Option<Value> is forwarded verbatim to the backend's /openai/v1/audio/speech endpoint, which proxies it to ElevenLabs. The single existing call site in voice/schemas.rs was updated to default to None.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per docs/TESTING-STRATEGY.md
N/A: Diff coverage — changed lines are exercised by the new recent_dialog_history_* and strip_for_speech_* unit tests plus the existing run_turn_falls_back_to_stub_without_backend test which covers the LLM/TTS error fallback paths. Real-network LLM/TTS branches require backend creds and aren't run in CI; the offline fallback path is what's covered.
N/A: Coverage matrix updated — behaviour-only change to an existing feature row.
N/A: Affected feature IDs — behaviour-only change to existing meet_agent flows.
No new external network dependencies introduced (mock backend used per docs/TESTING-STRATEGY.md)
N/A: Manual smoke checklist — does not touch release-cut surfaces (no UI / no install path / no startup change).
N/A: Linked issue — drive-by improvement on top of feat(meet_agent): live note-taking agent for Google Meet (listen + speak) #1355, no tracking issue.

Impact

Runtime: desktop only (meet agent is desktop-scoped). LLM calls now carry up to ~12 prior turns plus a longer system prompt — input tokens per turn rise modestly; output capped at 220 tokens. Latency per caption turn becomes a real chat-completions roundtrip instead of a string lookup.
TTS cost: ElevenLabs char count rises proportionally to reply length (was ~10 chars/turn, now up to a couple hundred). Voice model bumped to eleven_turbo_v2_5.
Privacy: rolling-window meeting captions are sent to the chat-completions endpoint as conversation context. Same backend / same auth surface as before; nothing new leaves the device.
Compatibility: no RPC / schema changes; ReplySpeechOptions::voice_settings is additive (Option<Value>).

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: feat/meet-resp-improve
Commit SHA: 54ba0b8

Validation Run

N/A: pnpm --filter openhuman-app format:check — Rust-only change.
N/A: pnpm typecheck — Rust-only change.
Focused tests: cargo test --manifest-path Cargo.toml --lib meet_agent (35 passing, including 4 new tests for recent_dialog_history and strip_for_speech)
Rust fmt/check (if changed): cargo fmt + cargo check --manifest-path Cargo.toml clean
N/A: Tauri fmt/check — app/src-tauri not modified.

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: live meet agent responds conversationally with meeting context awareness instead of speaking canned acks; voice no longer sounds monotone.
User-visible effect: when a user says "hey openhuman, what was the action item from last week?" mid-call, the agent actually answers instead of saying "Got it."; the voice has natural inflection.

Parity Contract

Legacy behavior preserved: LLM/TTS error paths still fall back gracefully (LLM error → canned ack; TTS error → stub blip), and the offline-no-backend fallback test still passes.
Guard/fallback/dispatch parity checks: existing wake-word state machine, cooldown, and outbound-queue contracts are unchanged.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: N/A
Resolution: N/A

Summary by CodeRabbit

Release Notes

New Features
- Meeting agent now generates dynamic, context-aware responses using AI and recent conversation history instead of static acknowledgments
- Automatic response optimization for improved speech synthesis clarity and naturalness
Tests
- Added unit tests to validate conversation history handling and response formatting

The caption-driven path used to skip the LLM entirely and speak a canned "Got it." / "Noted." ack, and the audio path sent each turn in isolation with max_tokens=20 and a "only acknowledge briefly" system prompt. Result: the in-meeting agent felt dumb and monotone. - Both run_turn and run_caption_turn now call llm_meeting with a rolling 12-event Heard/Spoke history pulled from the session log, so the model sees the conversation it's joining. - New MEETING_SYSTEM_PROMPT pushes the model to (a) classify whether the latest utterance is actually directed at it (returns empty string to stay silent on false-positive wake words / side conversation), and (b) respond conversationally for questions and tasks, briefly for dictations. - max_tokens 20 -> 220, temperature 0.2 -> 0.5. - strip_for_speech scrubs markdown / fenced code / bullet markers before TTS. - ReplySpeechOptions gains voice_settings passthrough; the meet agent pins model_id=eleven_turbo_v2_5 with stability 0.4, similarity_boost 0.75, style 0.35, speaker_boost on — the previous defaults were the main reason the voice sounded flat.

coderabbitai · 2026-05-08T04:49:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91878ecb-ba8d-420a-b563-852c4ff01cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 0636b0c and 54ba0b8.

📒 Files selected for processing (3)

src/openhuman/meet_agent/brain.rs
src/openhuman/voice/reply_speech.rs
src/openhuman/voice/schemas.rs

📝 Walkthrough

Walkthrough

Meeting agent turns now invoke an LLM with rolling conversation history instead of deterministic acknowledgements. The LLM response is sanitized for TTS safety by removing markdown and code fences. Voice synthesis now supports optional voice settings passed through ReplySpeechOptions.

Changes

Meeting Agent LLM-Driven Caption Turns with Voice Settings

Layer / File(s)	Summary
Voice Settings Type `src/openhuman/voice/reply_speech.rs`	`ReplySpeechOptions` gains optional `voice_settings: Option<Value>` field to support passthrough voice configuration to the speech backend.
LLM and Speech Processing `src/openhuman/meet_agent/brain.rs`	New `MEETING_SYSTEM_PROMPT` constant and `llm_meeting(prompt, history)` function build chat-completion messages and extract LLM responses. `strip_for_speech` removes markdown, code fences, and selected punctuation to make LLM output safe for TTS.
Dialog History Extraction `src/openhuman/meet_agent/brain.rs`	`ConversationTurn` type and `recent_dialog_history(events, window)` function map Heard/Spoke events to user/assistant roles, filter empty text, drop Note events, and return up to `window` most recent entries in chronological order for LLM context.
Turn Orchestration with LLM `src/openhuman/meet_agent/brain.rs`	`run_turn` drains inbound PCM samples and computes dialog history in session scope. `run_caption_turn` calls `llm_meeting` with drained prompt and history context; on LLM failure, records a Note and falls back to deterministic acknowledgement phrase.
TTS Model Selection and Voice Settings Injection `src/openhuman/meet_agent/brain.rs`, `src/openhuman/voice/reply_speech.rs`, `src/openhuman/voice/schemas.rs`	TTS synthesis selects explicit ElevenLabs model via `TTS_MODEL_ID`. `synthesize_reply` conditionally injects `voice_settings` into request body when present. Handler explicitly passes `voice_settings: None` in constructed options.
Unit Tests: History and Speech Sanitization `src/openhuman/meet_agent/brain.rs`	Tests validate `recent_dialog_history` role mapping, window capping, and empty-text filtering. Tests validate `strip_for_speech` removal of markdown code fences and selected punctuation characters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#1355: Directly modifies the same meeting agent brain logic to replace canned caption acknowledgements with LLM-driven turns, updating run_caption_turn and run_turn orchestration, prompts, TTS sanitization, and test coverage.

Poem

🐰 A rabbit speaks of wisdom found,
In histories conversed around,
Clean speech stripped of markdown's dance,
The agent learns to chat and prance,
With voices shaped to every tune—
A meeting brightens up so soon! 🎙️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the main changes: replacing canned acknowledgements with real LLM-driven conversational turns and tuning TTS settings for live meeting voice interactions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

senamakel requested a review from a team May 8, 2026 04:49

coderabbitai Bot approved these changes May 8, 2026

View reviewed changes

senamakel merged commit f120347 into tinyhumansai:main May 8, 2026
19 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358

feat(meet_agent): real LLM turns + tuned TTS for live meet voice#1358
senamakel merged 1 commit intotinyhumansai:mainfrom
senamakel:feat/meet-resp-improve

senamakel commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading