Problem
When a WeChat user sends a voice message and voice_item.text is not provided by WeChat, CodePilot downloads the voice file (CDN → AES decrypt → SILK → WAV) but does not transcribe it to text. The WAV file is passed as MediaPath to Claude Code, but Claude Code's terminal cannot process audio files — it only reads the Body (text) field.
Result: voice messages without WeChat's built-in transcription are silently dropped. The user gets no response.
Current behavior
inbound.ts:101-103 — if voice_item.text exists, use it as Body ✅
process-message.ts:122-127 — if !voice_item.text, download & transcode to WAV ✅
- WAV is saved to
MediaPath, but Body remains empty ❌
- Claude Code receives an empty Body with a MediaPath it cannot read ❌
Proposed solution
After SILK → WAV transcoding in media-download.ts, add an ASR step (e.g. OpenAI Whisper API, or local whisper.cpp) to transcribe the WAV to text, then set the transcription as the message Body.
Pseudocode:
// After silkToWav() succeeds in media-download.ts
if (wavBuffer && !voiceItem.text) {
const transcription = await transcribeAudio(wavPath); // Whisper or other ASR
voiceItem.text = transcription;
}
This would make voice messages work seamlessly for all downstream agents (Claude Code, Codex, etc.) without requiring WeChat to provide transcription.
Environment
- CodePilot desktop client
- WeChat bridge (openclaw-weixin)
- macOS
Problem
When a WeChat user sends a voice message and
voice_item.textis not provided by WeChat, CodePilot downloads the voice file (CDN → AES decrypt → SILK → WAV) but does not transcribe it to text. The WAV file is passed asMediaPathto Claude Code, but Claude Code's terminal cannot process audio files — it only reads theBody(text) field.Result: voice messages without WeChat's built-in transcription are silently dropped. The user gets no response.
Current behavior
inbound.ts:101-103— ifvoice_item.textexists, use it as Body ✅process-message.ts:122-127— if!voice_item.text, download & transcode to WAV ✅MediaPath, but Body remains empty ❌Proposed solution
After SILK → WAV transcoding in
media-download.ts, add an ASR step (e.g. OpenAI Whisper API, or localwhisper.cpp) to transcribe the WAV to text, then set the transcription as the messageBody.Pseudocode:
This would make voice messages work seamlessly for all downstream agents (Claude Code, Codex, etc.) without requiring WeChat to provide transcription.
Environment