Skip to content

sergeen/audio2text

Repository files navigation

Audio-to-Text Pipeline

Modular audio transcription pipeline powered by Whisper.

First of all .venv\Scripts\activate.bat and then you can run the pipeline

Architecture

Component File Description
Configuration config.json Single source of truth for the entire pipeline.
Orchestrator orchestrator.py Reads config, runs steps in order, shows progress
Format Converter convert_format.py Converts audio to mono 16 kHz WAV for Whisper.
DC Offset Corrector dc_offset_correction.py Removes DC offset from WAV files using numpy.

config.json

  • Ordered pipeline steps live under pipeline.
  • Each step has: step, name, script, enabled, params.
  • params is a flat dict passed as --key value CLI args.
  • Logging level and log file are under logging.

Orchestrator (orchestrator.py)

  • Entry point: python orchestrator.py.
  • Reads config.json from its own directory.
  • Sorts steps by step number, skips disabled ones.
  • Launches each script as a subprocess with --key value args.
  • Streams stdout in real-time per step.
  • Shows a per-step progress bar and a pipeline summary table.
  • Aborts the pipeline immediately if any step fails.

Format Converter (convert_format.py)

  • Scans input_folder for supported audio/video files.
  • Detects codec, sample rate, channels, and duration via ffprobe.
  • Converts to target format via ffmpeg (settings from config).
  • Reports real-time progress based on audio seconds processed.
  • Skips files that already match the target specification.
  • Writes results to output_folder.

DC Offset Corrector (dc_offset_correction.py)

  • Scans input_folder for .wav files (output of Step 1).
  • Removes DC offset using one of two methods (set via dc_method param):
    • mean — subtracts the signal mean (fast, good for stationary offset).
    • highpass — applies a first-order IIR high-pass filter at highpass_cutoff Hz (better for slowly drifting offset; default cutoff 10 Hz).
  • Reports real-time progress (file-based counter) via ##PROGRESS## markers.
  • Preserves original WAV subtype (e.g. PCM_16).
  • Writes corrected files to output_folder.

Script Contract

  • Scripts receive ALL params via CLI --key value.
  • Scripts must NOT define default values for params.
  • If a required param is missing, script exits with error.
  • Exit code 0 = success; any other = failure.
  • stdout is streamed in real-time by the orchestrator.
  • stderr is shown in red on failure.
  • Print ##PROGRESS## <current> <total> for progress bars.
  • First call sets the total; subsequent calls update it.
  • Regular print lines are displayed as dimmed log output.

Prerequisites

  • Python 3.10+.
  • ffmpeg and ffprobe installed and available in PATH.
  • numpy and soundfile (installed via requirements.txt).
  • Install deps: pip install -r requirements.txt.

Folder Structure

audioToText/
├── config.json               # Pipeline configuration
├── orchestrator.py            # Main entry point
├── convert_format.py          # Step 1: format conversion
├── dc_offset_correction.py    # Step 2: DC offset correction
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── input/                     # Raw audio files go here
├── output_formatted/          # Step 1 output (created automatically)
└── output_dc_offset/          # Step 2 output (created automatically)

Usage

# 1. Place audio files in the input/ folder
# 2. Run the pipeline
python orchestrator.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages