Core speech to speech implementation #5654

pranavjoshi001 · 2025-12-12T13:44:12Z

Changelog Entry

TBD

Description

This PR introduces Speech-to-Speech (S2S) functionality in Web Chat, enabling real-time voice conversations with bots. The implementation includes audio recording via AudioWorklet, audio playback with buffer queueing, and speech state management. This foundation supports upcoming MMRT (Multi-Modal Real-Time), ABS (Azure Bot Service), and CCV2 integration changes.

Activity structure - microsoft/Agents#377
Note - Utilizing value instead of payload until we have proposal merged in.

Design

The Speech-to-Speech feature is built on three main components:

Voice Activities Hook (useVoiceActivities.ts) - Filters and provides voice-specific activities from the Redux store's voiceActivities slice
SpeechToSpeech Provider (SpeechToSpeechComposer.tsx) - A React context provider that manages:
- Audio recording via useRecorder.ts hook using Web Audio API with AudioWorklet (CSP compliant)
- Audio playback via useAudioPlayer.ts hook with proper queueing and timing
- Speech state management (idle, listening, user_speaking, processing, bot_speaking)
- Integration with DirectLine for sending audio chunks and handling voice events
useSpeechToSpeech Hook - Provides recording, setRecording, and speechState for consumer UI components

Speech State Flow

idle → listening → user_speaking → processing → bot_speaking → listening

The provider is designed to work with the existing Web Chat architecture, consuming voice activities from the Redux store and posting audio data through the postActivity hook.

Performance Optimization

A new voiceActivities slice has been introduced in the Redux store to optimize voice activity handling:

Separate Storage: Voice activities (non-transcript) are stored in a dedicated slice instead of the main activities array
Fire-and-Forget: Voice events like voice chunk delta, bot states, etc. bypass expensive sorting and grouping operations
Reduced Overhead: Prevents clogging the main activities reducer with high-frequency voice events that don't need rendering, replaying etc..
Selective Processing: Only voice transcript activities (which need to be rendered in chat) go through the standard activity pipeline

Specific Changes

New Files Added:

Core Utilities (packages/core)

isVoiceActivity.ts - Type guard for voice/DTMF activities
isVoiceTranscriptActivity.ts - Type guard for transcript activities (voice.transcript)
getVoiceTranscriptRole.ts - Extract role (user/bot) from voice transcript
getVoiceTranscriptText.ts - Extract transcription text from voice activity

Provider & Hooks (packages/api)

SpeechToSpeechComposer.tsx - Main S2S provider component (integrated into Composer)
useSpeechToSpeech.ts - Public hook to consume S2S context
useVoiceActivities.ts - Hook to select voice activities from store
useAudioPlayer.ts - Audio playback with buffer queueing (Base64 → Int16 → Float32)
useRecorder.ts - AudioWorklet-based recording (Float32 → Int16 → Base64)
SpeechState.ts - Speech state type definition

Redux Store (packages/core)

voiceActivities slice for storing non-transcript voice events

Test Coverage Added:

Unit tests for hooks and utility.
E2E HTML tests covering:
- Full conversation flow
- Multi-turn conversations
- Barge-in/interruption handling
- CSP compliance
- Audio timing and synchronization

Next Steps:

Animation using voice intensity
Documentation/sample once activity and BE is ready (as currently still in testing phase)
Minor UI improvement (timestamps, bot labeling, flair in incoming message)
Improve config logic in composer (keeping now to unblock further work and exploring other approach)

I have added tests and executed them locally
I have updated CHANGELOG.md
I have updated documentation

Review Checklist

This section is for contributors to review your work.

Accessibility reviewed (tab order, content readability, alt text, color contrast)
Browser and platform compatibilities reviewed
CSS styles reviewed (minimal rules, no z-index)
Documents reviewed (docs, samples, live demo)
Internationalization reviewed (strings, unit formatting)
package.json and package-lock.json reviewed
Security reviewed (no data URIs, check for nonce leak)
Tests reviewed (coverage, legitimacy)

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

pranavjoshi001 added 2 commits December 12, 2025 13:26

initial no-op s2s core implementation

d132592

minor

a982457

pranavjoshi001 changed the title ~~Feature/core s2s composer~~ Core speech to speech composer implementation (no-op code) Dec 12, 2025

Merge branch 'main' into feature/core-s2s-composer

0978e7d

pranavjoshi001 marked this pull request as ready for review December 17, 2025 05:44

pranavjoshi001 requested review from a-b-r-o-w-n, beyackle2, compulim, cwhitten, srinaath and tdurnford as code owners December 17, 2025 05:44

pranavjoshi001 and others added 6 commits December 17, 2025 11:14

Merge branch 'main' into feature/core-s2s-composer

08c7a76

Merge branch 'main' into feature/core-s2s-composer

27a1cb4

Merge branch 'feature/core-s2s-composer' of https://github.com/pranav…

6437ee1

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

refactor to align close to activity structure

9ddc63c

refactor composer to not use direct state inside effect

0838e44

Merge branch 'main' into feature/core-s2s-composer

4036a03

OEvgeny reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Show resolved Hide resolved

compulim reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Show resolved Hide resolved

compulim reviewed Jan 13, 2026

View reviewed changes

packages/api/src/providers/SpeechToSpeech/private/useRecorder.ts Show resolved Hide resolved

pranavjoshi001 and others added 2 commits January 14, 2026 11:19

Merge branch 'main' into feature/core-s2s-composer

9be0bcb

more implementation chunk

a3b2c8b

pranavjoshi001 changed the title ~~Core speech to speech composer implementation (no-op code)~~ Core speech to speech implementation Jan 14, 2026

pranavjoshi001 and others added 6 commits January 15, 2026 13:24

minor refactor

e31a8f7

Mic Implementation and animation in fluent theme

cf9d2f5

test case added

af1dd65

Merge branch 'main' into feature/core-s2s-composer

ce9f6c5

screenshot added

8fac1b3

Merge branch 'feature/core-s2s-composer' of https://github.com/pranav…

e01130a

…joshi001/BotFramework-WebChat into feature/core-s2s-composer

pranavjoshi001 requested a review from compulim January 16, 2026 10:54

pranavjoshi001 requested a review from OEvgeny January 16, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Core speech to speech implementation #5654

Core speech to speech implementation #5654

Uh oh!

pranavjoshi001 commented Dec 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Core speech to speech implementation #5654

Are you sure you want to change the base?

Core speech to speech implementation #5654

Uh oh!

Conversation

pranavjoshi001 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog Entry

Description

Design

Speech State Flow

Performance Optimization

Specific Changes

New Files Added:

Core Utilities (packages/core)

Provider & Hooks (packages/api)

Redux Store (packages/core)

Test Coverage Added:

Next Steps:

Review Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pranavjoshi001 commented Dec 12, 2025 •

edited

Loading