[Suggestion] Haystack audio transcription pipeline (Python)

## Integration: Haystack Audio Transcription Pipeline



### What this should show
A Python example demonstrating how to use Deepgram as a custom Haystack 2.x component for audio transcription in a RAG pipeline. The example should:
- Create a custom Haystack `@component` that accepts audio file paths or URLs
- Transcribe audio via Deepgram Pre-recorded STT (Nova-3)
- Output Haystack `Document` objects with transcript text + metadata (speaker labels, timestamps, confidence)
- Include a retrieval demo using an in-memory document store
- Support both single-file and batch audio transcription

### Credentials likely needed
- `DEEPGRAM_API_KEY`

---
*Original request:*

## What to build

A working example demonstrating Deepgram as a Haystack component for audio transcription in a RAG pipeline — loading audio files, transcribing with Deepgram STT, and feeding transcripts into a Haystack retrieval pipeline.

## Why this matters

Haystack (by deepset) is a leading enterprise NLP/RAG framework used by teams building production search and retrieval systems. Developers building audio-aware RAG pipelines need a reference integration showing how to use Deepgram as an audio ingestion source. There is currently no example of Deepgram + Haystack working together, despite Haystack's growing adoption for enterprise AI.

## Suggested scope

- **Language**: Python
- **Framework**: Haystack 2.x (`haystack-ai` package)
- **Deepgram APIs**: Pre-recorded STT (Nova-3)
- **What it does**: Custom Haystack `@component` that accepts audio file paths or URLs, transcribes via Deepgram, and outputs Haystack `Document` objects with transcript text + metadata (speaker labels, timestamps, confidence)
- **Includes**: Pipeline YAML config, example audio file, retrieval demo with an in-memory document store
- **Complexity**: Medium — single Python file + pipeline config

## Acceptance criteria

- [ ] Runnable with minimal setup (clone, add API key, run)
- [ ] README explains the pattern clearly
- [ ] Uses current SDK version and Haystack 2.x
- [ ] Demonstrates both single-file and batch audio transcription
- [ ] Output documents include Deepgram metadata (speakers, timestamps)

---
*Raised by the DX intelligence system.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Suggestion] Haystack audio transcription pipeline (Python) #185

Integration: Haystack Audio Transcription Pipeline

What this should show

Credentials likely needed

What to build

Why this matters

Suggested scope

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Suggestion] Haystack audio transcription pipeline (Python) #185

Description

Integration: Haystack Audio Transcription Pipeline

What this should show

Credentials likely needed

What to build

Why this matters

Suggested scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions