Modular audio transcription pipeline powered by Whisper.
First of all .venv\Scripts\activate.bat and then you can run the pipeline
| Component | File | Description |
|---|---|---|
| Configuration | config.json |
Single source of truth for the entire pipeline. |
| Orchestrator | orchestrator.py |
Reads config, runs steps in order, shows progress |
| Format Converter | convert_format.py |
Converts audio to mono 16 kHz WAV for Whisper. |
| DC Offset Corrector | dc_offset_correction.py |
Removes DC offset from WAV files using numpy. |
- Ordered pipeline steps live under
pipeline. - Each step has:
step,name,script,enabled,params. paramsis a flat dict passed as--key valueCLI args.- Logging level and log file are under
logging.
- Entry point:
python orchestrator.py. - Reads
config.jsonfrom its own directory. - Sorts steps by
stepnumber, skips disabled ones. - Launches each script as a subprocess with
--key valueargs. - Streams stdout in real-time per step.
- Shows a per-step progress bar and a pipeline summary table.
- Aborts the pipeline immediately if any step fails.
- Scans
input_folderfor supported audio/video files. - Detects codec, sample rate, channels, and duration via
ffprobe. - Converts to target format via
ffmpeg(settings from config). - Reports real-time progress based on audio seconds processed.
- Skips files that already match the target specification.
- Writes results to
output_folder.
- Scans
input_folderfor.wavfiles (output of Step 1). - Removes DC offset using one of two methods (set via
dc_methodparam):- mean — subtracts the signal mean (fast, good for stationary offset).
- highpass — applies a first-order IIR high-pass filter at
highpass_cutoffHz (better for slowly drifting offset; default cutoff 10 Hz).
- Reports real-time progress (file-based counter) via
##PROGRESS##markers. - Preserves original WAV subtype (e.g. PCM_16).
- Writes corrected files to
output_folder.
- Scripts receive ALL params via CLI
--key value. - Scripts must NOT define default values for params.
- If a required param is missing, script exits with error.
- Exit code
0= success; any other = failure. - stdout is streamed in real-time by the orchestrator.
- stderr is shown in red on failure.
- Print
##PROGRESS## <current> <total>for progress bars. - First call sets the total; subsequent calls update it.
- Regular print lines are displayed as dimmed log output.
- Python 3.10+.
ffmpegandffprobeinstalled and available in PATH.numpyandsoundfile(installed via requirements.txt).- Install deps:
pip install -r requirements.txt.
audioToText/
├── config.json # Pipeline configuration
├── orchestrator.py # Main entry point
├── convert_format.py # Step 1: format conversion
├── dc_offset_correction.py # Step 2: DC offset correction
├── requirements.txt # Python dependencies
├── README.md # This file
├── input/ # Raw audio files go here
├── output_formatted/ # Step 1 output (created automatically)
└── output_dc_offset/ # Step 2 output (created automatically)
# 1. Place audio files in the input/ folder
# 2. Run the pipeline
python orchestrator.py