Skip to content

[Example] 490 — Haystack Audio Transcription Pipeline (Python)#190

Open
github-actions[bot] wants to merge 1 commit intomainfrom
example/490-haystack-deepgram-stt-pipeline-python
Open

[Example] 490 — Haystack Audio Transcription Pipeline (Python)#190
github-actions[bot] wants to merge 1 commit intomainfrom
example/490-haystack-deepgram-stt-pipeline-python

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 5, 2026

New example: Haystack Audio Transcription Pipeline (Python)

Integration: Haystack | Language: Python | Products: STT

What this shows

A custom Haystack 2.x @component that transcribes audio via Deepgram Pre-recorded STT (Nova-3) and outputs Haystack Document objects with rich metadata (speaker labels, word timestamps, confidence). Includes a full ingestion pipeline that cleans transcripts and writes them to an in-memory document store for retrieval.

Required secrets

None — only DEEPGRAM_API_KEY required

Tests

✅ Tests passed

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

Built by Engineer on 2026-04-05

@github-actions github-actions bot added type:example New example language:python Language: Python integration:haystack Integration: Haystack labels Apr 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

============================= test session starts ==============================
platform linux -- Python 3.11.15, pytest-9.0.2, pluggy-1.6.0
collected 3 items

tests/test_example.py::test_transcriber_component PASSED                 [ 33%]
tests/test_example.py::test_batch_transcription PASSED                   [ 66%]
tests/test_example.py::test_ingest_pipeline PASSED                       [100%]

======================== 3 passed, 2 warnings in 6.69s =========================

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

Integration genuineness

Pass. Haystack 2.x SDK is imported and used throughout — Pipeline, @component, Document, DocumentCleaner, DocumentWriter, InMemoryDocumentStore. The DeepgramTranscriber is a proper Haystack @component with run() method and @component.output_types. The pipeline connects components via pipeline.connect() and runs via pipeline.run(). Haystack does not provide a built-in Deepgram component, so wrapping DeepgramClient inside a custom @component is the correct integration pattern. No raw WebSocket or HTTP calls — Deepgram SDK is used (client.listen.v1.media.transcribe_url).

Code quality

  • ✅ Official Deepgram SDK (deepgram-sdk==6.1.1) — matches required version
  • tag="deepgram-examples" present on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling for missing API key
  • ✅ Tests import from src/ and exercise the component and pipeline directly
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), not specific word lists
  • ✅ Credential check runs FIRST (top of test file, before src/ imports)
  • build_ingest_pipeline() tested end-to-end: transcribe → clean → write to document store

Documentation

  • ✅ README covers what you'll build, env vars with console links, install/run instructions, key parameters, and how it works
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions github-actions bot added the status:review-passed Self-review passed label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

tests/test_example.py::test_transcriber_component PASSED
  ✓ DeepgramTranscriber component working
    Transcript length: 334 chars
    Duration: 25.4s
    Words: 62
    Speakers: 1

tests/test_example.py::test_batch_transcription PASSED
  ✓ Batch transcription working (2 documents)

tests/test_example.py::test_ingest_pipeline PASSED
  ✓ Ingest pipeline working (transcribe → clean → write)
    Documents in store: 1
    Transcript length: 334 chars

3 passed in 6.62s

Integration genuineness

Pass — Haystack SDK (haystack-ai) is imported and used throughout. The DeepgramTranscriber is a proper Haystack 2.x @component registered in a real Pipeline with DocumentCleaner and DocumentWriter. Haystack does not provide a native Deepgram wrapper, so wrapping DeepgramClient inside a custom component is the correct idiomatic pattern. No raw WebSocket or HTTP calls. No bypass detected.

Code quality

  • Official Deepgram SDK (deepgram-sdk==6.1.1) — correct version
  • tag="deepgram-examples" present on the API call
  • No hardcoded credentials
  • Error handling for missing DEEPGRAM_API_KEY
  • Tests import from src/ and call the example's actual code (DeepgramTranscriber, build_ingest_pipeline)
  • Transcript assertions use length/duration proportionality (chars_per_sec) — no word lists
  • Credential check runs before SDK imports in tests (exit 2)

Documentation

  • README includes "What you'll build", env vars table with console links, install/run instructions
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

3 passed, 2 warnings in 4.33s

Integration genuineness

✅ Pass — Haystack 2.x @component, Pipeline, DocumentCleaner, InMemoryDocumentStore, and DocumentWriter are all imported and used in a real pipeline (transcribe → clean → write). Haystack has no native Deepgram component, so wrapping DeepgramClient inside a custom @component is the correct integration pattern. No raw WebSocket or HTTP calls. tag="deepgram-examples" present.

Code quality

  • ✅ Official deepgram-sdk==6.1.1 (matches required version)
  • tag="deepgram-examples" on Deepgram API call
  • ✅ No hardcoded credentials
  • ✅ Error handling: RuntimeError if DEEPGRAM_API_KEY not set
  • ✅ Tests import from src/ and call the example's actual DeepgramTranscriber and build_ingest_pipeline
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), no specific word lists
  • ✅ Credential check runs before SDK imports in test file

Documentation

  • ✅ README has "What you'll build", env vars with console links, install and run instructions
  • .env.example present and complete

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 6, 2026

Code Review

Overall: APPROVED

Tests ran ✅

✓ DeepgramTranscriber component working
  Transcript length: 334 chars
  Duration: 25.4s
  Words: 62
  Speakers: 1
✓ Batch transcription working (2 documents)
✓ Ingest pipeline working (transcribe → clean → write)
  Documents in store: 1
  Transcript length: 334 chars

✓ All tests passed

pytest: 3 passed, 2 warnings in 6.06s

Integration genuineness

✅ Pass — Haystack 2.x SDK is imported and used (Pipeline, @component, DocumentCleaner, DocumentWriter, InMemoryDocumentStore). The DeepgramTranscriber is a proper Haystack @component that produces Document objects with rich metadata. Deepgram SDK is used inside the component as expected (Haystack has no built-in Deepgram wrapper). No raw WebSocket/HTTP calls to Deepgram. No bypass.

Code quality

  • ✅ Official Deepgram SDK: deepgram-sdk==6.1.1 (matches required version)
  • tag="deepgram-examples" present on API call
  • ✅ No hardcoded credentials — reads from DEEPGRAM_API_KEY env var
  • ✅ Error handling: raises RuntimeError if API key missing
  • ✅ Tests import from src/ and test actual component + pipeline code
  • ✅ Transcript assertions use length/duration proportionality (chars_per_sec > 2), no word-list checks
  • ✅ Credential check runs first (before SDK imports) with sys.exit(2) on missing creds
  • haystack-ai==2.27.0 pinned

Documentation

  • ✅ README: clear "what you'll build", env vars table with console link, install/run instructions
  • .env.example present and complete
  • ✅ Key parameters documented

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:haystack Integration: Haystack language:python Language: Python status:review-passed Self-review passed type:example New example

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants