Skip to content

Feature: Auto-transcribe WeChat voice messages when voice_item.text is missing #467

@Robinshaozhen004

Description

@Robinshaozhen004

Problem

When a WeChat user sends a voice message and voice_item.text is not provided by WeChat, CodePilot downloads the voice file (CDN → AES decrypt → SILK → WAV) but does not transcribe it to text. The WAV file is passed as MediaPath to Claude Code, but Claude Code's terminal cannot process audio files — it only reads the Body (text) field.

Result: voice messages without WeChat's built-in transcription are silently dropped. The user gets no response.

Current behavior

  1. inbound.ts:101-103 — if voice_item.text exists, use it as Body ✅
  2. process-message.ts:122-127 — if !voice_item.text, download & transcode to WAV ✅
  3. WAV is saved to MediaPath, but Body remains empty ❌
  4. Claude Code receives an empty Body with a MediaPath it cannot read ❌

Proposed solution

After SILK → WAV transcoding in media-download.ts, add an ASR step (e.g. OpenAI Whisper API, or local whisper.cpp) to transcribe the WAV to text, then set the transcription as the message Body.

Pseudocode:

// After silkToWav() succeeds in media-download.ts
if (wavBuffer && !voiceItem.text) {
  const transcription = await transcribeAudio(wavPath); // Whisper or other ASR
  voiceItem.text = transcription;
}

This would make voice messages work seamlessly for all downstream agents (Claude Code, Codex, etc.) without requiring WeChat to provide transcription.

Environment

  • CodePilot desktop client
  • WeChat bridge (openclaw-weixin)
  • macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions