Skip to content

fix: continue recording after barge-in interrupts TTS#281

Open
aydiler wants to merge 1 commit intombailey:feat/VM-606-barge-in-interrupt-tts-playback-when-user-startsfrom
aydiler:fix/barge-in-continue-recording
Open

fix: continue recording after barge-in interrupts TTS#281
aydiler wants to merge 1 commit intombailey:feat/VM-606-barge-in-interrupt-tts-playback-when-user-startsfrom
aydiler:fix/barge-in-continue-recording

Conversation

@aydiler
Copy link
Copy Markdown

@aydiler aydiler commented Feb 20, 2026

Summary

  • Barge-in detection works correctly (TTS stops when user speaks), but the captured audio buffer is only ~0.5s — whatever was recorded during TTS playback before the interrupt fired
  • The current code skips normal recording when barge-in occurs, so everything the user says after TTS stops is lost
  • STT receives a tiny audio snippet and produces nonsensical transcriptions ("are.", "This.", "Bye.")

Fix

After barge-in interrupts TTS, continue recording with silence detection (same as normal flow, but skip the pause and chime since the user is already speaking). Prepend the barge-in buffer to the continuation audio before sending to STT.

Before: ~5-7KB audio, gibberish transcription
After: ~100-130KB audio, correct transcription of full sentence

Test plan

  • Call converse with a long message, interrupt by speaking mid-TTS
  • Verify TTS stops promptly (<50ms)
  • Verify recording continues after TTS stops until silence detected
  • Verify full sentence is transcribed correctly
  • Verify non-barge-in flow (wait for TTS to finish, then speak) still works

The barge-in monitor captures a small audio buffer (~0.5s) during TTS
playback until the interrupt triggers. However, the user is typically
still speaking after TTS stops. The current code skips normal recording
when barge-in occurs, sending only the tiny buffer to STT, which results
in poor or nonsensical transcriptions.

This fix continues recording with silence detection after TTS is
interrupted, then prepends the barge-in buffer to the continuation
audio. This ensures the user's full utterance is captured and
transcribed correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant