🐛 VAD Aggressively Clips Speech - Missing First/Last Words

## Problem

The Voice Activity Detection (VAD) system is clipping speech too aggressively, resulting in significant accuracy degradation during dictation sessions.

### Symptoms

1. **Leading edge clipping**: First words of utterances are frequently cut off
2. **Aggressive end detection**: Segments end very quickly, not capturing complete thoughts
3. **Compounding effect**: In a single conversational turn with 10 segments, if 8 segments have their first word clipped, the transcription is missing 8 words total
4. **Critical impact**: This severely degrades transcription accuracy

### Root Cause

Current VAD tuning parameters are too aggressive for natural speech patterns. The Silero VAD v5 configuration needs adjustment to:
- Better capture the start of speech (leading edge)
- Allow more time before declaring end-of-speech
- Provide more tolerance for brief pauses within continuous speech

### Current VAD Parameters

From [src/vad/VADConfigs.ts](https://github.com/Pedal-Intelligence/saypi-userscript/blob/main/src/vad/VADConfigs.ts):

**High Sensitivity** (most commonly used for dictation):
```typescript
positiveSpeechThreshold: 0.35,
negativeSpeechThreshold: 0.2,
redemptionFrames: 12,
minSpeechFrames: 2,
preSpeechPadFrames: 3,
```

**Balanced** (default):
```typescript
positiveSpeechThreshold: 0.4,
negativeSpeechThreshold: 0.25,
redemptionFrames: 10,
minSpeechFrames: 3,
preSpeechPadFrames: 2,
```

### Recommended Investigation

1. **Increase `preSpeechPadFrames`**: Currently 2-3 frames. Try 5-8 frames to capture more leading speech
2. **Increase `redemptionFrames`**: Currently 10-12 frames. Try 15-20 frames for longer tail capture and tolerance for brief pauses
3. **Adjust `negativeSpeechThreshold`**: Currently 0.2-0.25. Try 0.15-0.2 to avoid premature segment ends
4. **Test with real audio**: Use actual conversation recordings to validate parameter changes

### Testing Approach

1. Enable `keepSegments` config to persist captured audio segments
2. Record test dictation sessions with known content
3. Compare captured WAV files against expected speech
4. Measure:
   - Number of segments per utterance
   - Percentage of segments with leading/trailing clipping
   - Word accuracy before/after parameter tuning

### Related Code

- [src/vad/VADConfigs.ts](https://github.com/Pedal-Intelligence/saypi-userscript/blob/main/src/vad/VADConfigs.ts) - Parameter presets
- [src/offscreen/vad_handler.ts](https://github.com/Pedal-Intelligence/saypi-userscript/blob/main/src/offscreen/vad_handler.ts) - VAD initialization
- [src/state-machines/AudioInputMachine.ts](https://github.com/Pedal-Intelligence/saypi-userscript/blob/main/src/state-machines/AudioInputMachine.ts) - Audio segment capture

### Priority

**Critical** - This issue directly impacts core transcription accuracy and user experience.

---

_Discovered during testing of dual-phase transcription feature (#256)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 VAD Aggressively Clips Speech - Missing First/Last Words #260

Problem

Symptoms

Root Cause

Current VAD Parameters

Recommended Investigation

Testing Approach

Related Code

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 VAD Aggressively Clips Speech - Missing First/Last Words #260

Description

Problem

Symptoms

Root Cause

Current VAD Parameters

Recommended Investigation

Testing Approach

Related Code

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions