Audio-to-Text Pipeline

Modular audio transcription pipeline powered by Whisper.

First of all .venv\Scripts\activate.bat and then you can run the pipeline

Architecture

Component	File	Description
Configuration	`config.json`	Single source of truth for the entire pipeline.
Orchestrator	`orchestrator.py`	Reads config, runs steps in order, shows progress
Format Converter	`convert_format.py`	Converts audio to mono 16 kHz WAV for Whisper.
DC Offset Corrector	`dc_offset_correction.py`	Removes DC offset from WAV files using numpy.

config.json

Ordered pipeline steps live under pipeline.
Each step has: step, name, script, enabled, params.
params is a flat dict passed as --key value CLI args.
Logging level and log file are under logging.

Orchestrator (`orchestrator.py`)

Entry point: python orchestrator.py.
Reads config.json from its own directory.
Sorts steps by step number, skips disabled ones.
Launches each script as a subprocess with --key value args.
Streams stdout in real-time per step.
Shows a per-step progress bar and a pipeline summary table.
Aborts the pipeline immediately if any step fails.

Format Converter (`convert_format.py`)

Scans input_folder for supported audio/video files.
Detects codec, sample rate, channels, and duration via ffprobe.
Converts to target format via ffmpeg (settings from config).
Reports real-time progress based on audio seconds processed.
Skips files that already match the target specification.
Writes results to output_folder.

DC Offset Corrector (`dc_offset_correction.py`)

Scans input_folder for .wav files (output of Step 1).
Removes DC offset using one of two methods (set via dc_method param):
- mean — subtracts the signal mean (fast, good for stationary offset).
- highpass — applies a first-order IIR high-pass filter at highpass_cutoff Hz (better for slowly drifting offset; default cutoff 10 Hz).
Reports real-time progress (file-based counter) via ##PROGRESS## markers.
Preserves original WAV subtype (e.g. PCM_16).
Writes corrected files to output_folder.

Script Contract

Scripts receive ALL params via CLI --key value.
Scripts must NOT define default values for params.
If a required param is missing, script exits with error.
Exit code 0 = success; any other = failure.
stdout is streamed in real-time by the orchestrator.
stderr is shown in red on failure.
Print ##PROGRESS## <current> <total> for progress bars.
First call sets the total; subsequent calls update it.
Regular print lines are displayed as dimmed log output.

Prerequisites

Python 3.10+.
ffmpeg and ffprobe installed and available in PATH.
numpy and soundfile (installed via requirements.txt).
Install deps: pip install -r requirements.txt.

Folder Structure

audioToText/
├── config.json               # Pipeline configuration
├── orchestrator.py            # Main entry point
├── convert_format.py          # Step 1: format conversion
├── dc_offset_correction.py    # Step 2: DC offset correction
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── input/                     # Raw audio files go here
├── output_formatted/          # Step 1 output (created automatically)
└── output_dc_offset/          # Step 2 output (created automatically)

Usage

# 1. Place audio files in the input/ folder
# 2. Run the pipeline
python orchestrator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-to-Text Pipeline

Architecture

config.json

Orchestrator (`orchestrator.py`)

Format Converter (`convert_format.py`)

DC Offset Corrector (`dc_offset_correction.py`)

Script Contract

Prerequisites

Folder Structure

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
config.json		config.json
convert_format.py		convert_format.py
dc_offset_correction.py		dc_offset_correction.py
orchestrator.py		orchestrator.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Audio-to-Text Pipeline

Architecture

config.json

Orchestrator (orchestrator.py)

Format Converter (convert_format.py)

DC Offset Corrector (dc_offset_correction.py)

Script Contract

Prerequisites

Folder Structure

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Orchestrator (`orchestrator.py`)

Format Converter (`convert_format.py`)

DC Offset Corrector (`dc_offset_correction.py`)

Packages