Receive Riverside webhook events when a recording completes, download each participant's isolated audio track, transcribe every track with Deepgram pre-recorded STT, and produce a merged speaker-labelled transcript.
A Node.js Express server that listens for Riverside recording.completed webhooks, fetches the per-track audio files (each participant recorded separately at source quality), submits each track to Deepgram STT with diarization disabled (since tracks are already per-speaker), and returns a merged, time-ordered transcript with speaker labels.
- Node.js 18 or later
- Deepgram account — get a free API key
- Riverside account — sign up
Copy .env.example to .env and fill in your keys:
| Variable | Where to find it |
|---|---|
DEEPGRAM_API_KEY |
Deepgram console → API Keys |
RIVERSIDE_API_KEY |
Riverside dashboard → Settings → API |
cp .env.example .env
# Fill in your API keys in .env
npm install
npm startThe server starts on port 3000 (override with PORT env var).
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/webhook/riverside |
Receives Riverside webhook events |
POST |
/transcribe |
Manual transcription — accepts { tracks: [{ participant_name, download_url }] } |
| Parameter | Value | Description |
|---|---|---|
model |
nova-3 |
Deepgram's most accurate speech model |
smart_format |
true |
Adds punctuation, casing, and paragraph formatting |
diarize |
false |
Disabled — Riverside already isolates speakers into separate tracks |
tag |
deepgram-examples |
Tags usage in the Deepgram console for tracking |
- Riverside sends a
recording.completedwebhook with metadata about each participant's audio track - The server downloads each track's audio via the Riverside download URL
- Each track is submitted to Deepgram pre-recorded STT independently (no diarization needed since each track is one speaker)
- Word-level results from all tracks are merged and sorted chronologically
- Consecutive words from the same speaker are grouped into segments
- The final output is a speaker-labelled transcript with timing information
{
"transcript": "[Host] Welcome to the show today.\n[Guest] Thanks for having me!",
"segments": [
{ "speaker": "Host", "start": 0.0, "end": 1.5, "text": "Welcome to the show today." },
{ "speaker": "Guest", "start": 1.6, "end": 2.8, "text": "Thanks for having me!" }
],
"word_count": 11,
"track_count": 2,
"speakers": ["Host", "Guest"]
}