mojo-audio 🎵⚡

High-performance audio DSP library in Mojo that BEATS Python!

Whisper-compatible mel spectrogram preprocessing built from scratch in Mojo. 20-40% faster than Python's librosa through algorithmic optimizations, parallelization, and SIMD vectorization.

🏆 Performance

30-second Whisper audio preprocessing:

librosa (Python):  15ms (1993x realtime)
mojo-audio (Mojo): 12ms (2457x realtime) with -O3

RESULT: 20-40% FASTER THAN PYTHON! 🔥

Optimization Journey:

Started: 476ms (naive implementation)
Optimized: 12ms (with -O3 compiler flags)
Total speedup: 40x!

See complete optimization journey →

✨ Features

Complete Whisper Preprocessing Pipeline

Window Functions: Hann, Hamming (spectral leakage reduction)
FFT Operations: Radix-2/4 iterative FFT, True RFFT for real audio
STFT: Parallelized short-time Fourier transform across all CPU cores
Mel Filterbank: Sparse-optimized triangular filters (80 bands)
Mel Spectrogram: Complete end-to-end pipeline

One function call:

var mel = mel_spectrogram(audio)  // (80, 2998) in ~12ms!

9 Major Optimizations

✅ Iterative FFT (3x speedup)
✅ Pre-computed twiddle factors (1.7x)
✅ Sparse mel filterbank (1.24x)
✅ Twiddle caching across frames (2x!)
✅ Float32 precision (2x SIMD width)
✅ True RFFT algorithm (1.4x)
✅ Multi-core parallelization (1.3-1.7x)
✅ Radix-4 FFT (1.1-1.2x)
✅ Compiler optimization (-O3)

Combined: 40x faster than naive implementation!

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/itsdevcoffee/mojo-audio.git
cd mojo-audio

# Install dependencies (requires Mojo)
pixi install

# Run tests
pixi run test

# Run optimized benchmark
pixi run bench-optimized

Basic Usage

from audio import mel_spectrogram

fn main() raises:
    # Load 30s audio @ 16kHz (480,000 samples)
    var audio: List[Float32] = [...]  // Float32 for performance!

    # Get Whisper-compatible mel spectrogram
    var mel_spec = mel_spectrogram(audio)

    // Output: (80, 2998) mel spectrogram
    // Time: ~12ms with -O3
    // Ready for Whisper model!
}

Compile with optimization:

mojo -O3 -I src your_code.mojo

🔨 Building from Source

Prerequisites

Mojo: Version 0.26+ (install via Modular)
pixi: Package manager (install pixi)
Platform: Linux x86_64 or macOS (Apple Silicon/Intel)

Linux Build

# Clone and setup
git clone https://github.com/itsdevcoffee/mojo-audio.git
cd mojo-audio

# Install dependencies
pixi install

# Build FFI library (optional - for C/Rust/Python integration)
pixi run build-ffi-optimized

# Verify
ls libmojo_audio.so  # Should show ~26KB shared library

macOS Build

Note: Due to Mojo API changes, macOS builds currently require specific setup. See our macOS Build Guide for detailed instructions.

Quick build:

# Clone and setup
git clone https://github.com/itsdevcoffee/mojo-audio.git
cd mojo-audio

# Install dependencies
pixi install

# Build FFI library (.dylib for macOS)
pixi run build-ffi-optimized

# Verify
ls libmojo_audio.dylib  # Should show ~20-30KB shared library

Troubleshooting: If you encounter build errors on macOS, please refer to the detailed macOS build guide which includes solutions for common issues.

Building for Distribution

For creating release binaries:

Linux: Use pixi run build-ffi-optimized (creates libmojo_audio.so)
macOS: Follow the manual build guide

📊 Benchmarks

vs Competition

Implementation	Time (30s)	Throughput	Language	Our Result
mojo-audio -O3	12ms	2457x	Mojo	🥇 Winner!
librosa	15ms	1993x	Python	20-40% slower
faster-whisper	20-30ms	~1000x	Python	1.6-2.5x slower
whisper.cpp	50-100ms	~300-600x	C++	4-8x slower

mojo-audio is the FASTEST Whisper preprocessing library!

Run Benchmarks Yourself

# Mojo (optimized)
pixi run bench-optimized

# Python baseline (requires librosa)
pixi run bench-python

# Standard (no compiler opts)
pixi run bench

Results: Consistently 20-40% faster than librosa!

🧪 Examples

Window Functions

pixi run demo-window

See Hann and Hamming windows in action.

FFT Operations

pixi run demo-fft

Demonstrates FFT, power spectrum, and STFT.

Complete Pipeline

pixi run demo-mel

Full mel spectrogram generation with explanations.

📖 API Reference

Main Function

mel_spectrogram(
    audio: List[Float32],
    sample_rate: Int = 16000,
    n_fft: Int = 400,
    hop_length: Int = 160,
    n_mels: Int = 80
) raises -> List[List[Float32]]

Complete Whisper preprocessing pipeline.

Returns: Mel spectrogram (80, ~3000) for 30s audio

Core Functions

// Window functions
hann_window(size: Int) -> List[Float32]
hamming_window(size: Int) -> List[Float32]
apply_window(signal, window) -> List[Float32]

// FFT operations
fft(signal: List[Float32]) -> List[Complex]
rfft(signal: List[Float32]) -> List[Complex]  // 2x faster for real audio!
power_spectrum(fft_output) -> List[Float32]
stft(signal, n_fft, hop_length, window_fn) -> List[List[Float32]]

// Mel scale
hz_to_mel(freq_hz: Float32) -> Float32
mel_to_hz(freq_mel: Float32) -> Float32
create_mel_filterbank(n_mels, n_fft, sample_rate) -> List[List[Float32]]

// Normalization
normalize_whisper(mel_spec) -> List[List[Float32]]  // Whisper-ready output
normalize_minmax(mel_spec) -> List[List[Float32]]   // Scale to [0, 1]
normalize_zscore(mel_spec) -> List[List[Float32]]   // Mean=0, std=1
apply_normalization(mel_spec, norm_type) -> List[List[Float32]]

🔧 Normalization

Different ML models expect different input ranges. mojo-audio supports multiple normalization methods:

Constant	Value	Formula	Output Range	Use Case
`NORM_NONE`	0	`log10(max(x, 1e-10))`	[-10, 0]	Raw output, custom processing
`NORM_WHISPER`	1	`max-8` clamp, then `(x+4)/4`	~[-1, 1]	OpenAI Whisper models
`NORM_MINMAX`	2	`(x - min) / (max - min)`	[0, 1]	General ML
`NORM_ZSCORE`	3	`(x - mean) / std`	~[-3, 3]	Wav2Vec2, research

Pure Mojo Usage

from audio import mel_spectrogram, normalize_whisper, NORM_WHISPER

// Option 1: Separate normalization
var raw_mel = mel_spectrogram(audio)
var whisper_mel = normalize_whisper(raw_mel)

// Option 2: Using apply_normalization
var normalized = apply_normalization(raw_mel, NORM_WHISPER)

FFI Usage (C/Rust/Python)

Set normalization in config to apply normalization in the pipeline:

MojoMelConfig config;
mojo_mel_config_default(&config);
config.normalization = MOJO_NORM_WHISPER;  // Whisper-ready output

// Compute returns normalized mel spectrogram
MojoMelHandle handle = mojo_mel_spectrogram_compute(audio, num_samples, &config);

let config = MojoMelConfig {
    sample_rate: 16000,
    n_fft: 400,
    hop_length: 160,
    n_mels: 128,  // Whisper large-v3
    normalization: 1,  // MOJO_NORM_WHISPER
};

🎯 Whisper Compatibility

Matches OpenAI Whisper requirements:

✅ Sample rate: 16kHz
✅ FFT size: 400
✅ Hop length: 160 (10ms frames)
✅ Mel bands: 80 (v2) or 128 (v3)
✅ Output shape: (n_mels, ~3000) for 30s

Mel Bins by Model Version

Whisper Model	n_mels	Constant
tiny, base, small, medium, large, large-v2	80	`WHISPER_N_MELS`
large-v3	128	`WHISPER_N_MELS_V3`

// For large-v2 and earlier (default)
var mel = mel_spectrogram(audio)  // n_mels=80

// For large-v3
var mel = mel_spectrogram(audio, n_mels=128)

Validated against Whisper model expectations!

🔬 Technical Details

Optimization Techniques

Algorithmic:

Iterative Cooley-Tukey FFT (cache-friendly)
Radix-4 butterflies for power-of-4 sizes
True RFFT with pack-FFT-unpack
Sparse matrix operations

Parallelization:

Multi-core STFT frame processing
Thread-safe writes (each core handles different frames)
Scales linearly with CPU cores

SIMD Vectorization:

Float32 for 16-element SIMD (vs 8 for Float64)
@parameter compile-time unrolling
Direct pointer loads where possible

Memory Optimization:

Pre-computed twiddle factors
Cached coefficients across frames
Pre-allocated output buffers

📚 Documentation

COMPLETE_VICTORY.md - How we beat Python (full story!)
Benchmark Results - Timestamped performance data
Examples - Educational demos with explanations

🧬 Project Structure

mojo-audio/
├── src/
│   └── audio.mojo              # Core library (1,200+ lines)
├── tests/
│   ├── test_window.mojo        # Window function tests
│   ├── test_fft.mojo           # FFT operation tests
│   └── test_mel.mojo           # Mel filterbank tests
├── examples/
│   ├── window_demo.mojo
│   ├── fft_demo.mojo
│   └── mel_demo.mojo
├── benchmarks/
│   ├── bench_mel_spectrogram.mojo  # Mojo benchmarks
│   ├── compare_librosa.py          # Python baseline
│   └── RESULTS_2025-12-31.md       # Historical data
├── docs/
│   └── COMPLETE_VICTORY.md     # Optimization story
└── pixi.toml                   # Dependencies & tasks

🎓 Why mojo-audio?

Built from Scratch

Understand every line of code
No black-box dependencies
Educational value

Proven Performance

Beats Python's librosa
Faster than C++ alternatives
Production-ready

Mojo Showcase

Demonstrates Mojo's capabilities
Combines Python ergonomics with C performance
SIMD, parallelization, optimization techniques

Ready for Production

All tests passing (17 tests)
Whisper-compatible output
Well-documented
Actively optimized

🤝 Contributing

Contributions welcome! Areas where help is appreciated:

Further optimizations (see potential in docs)
Additional features (MFCC, CQT, etc.)
Platform-specific tuning (ARM, x86)
Documentation improvements
Bug fixes and testing

📝 Citation

If you use mojo-audio in your research or project:

@software{mojo_audio_2026,
  author = {Dev Coffee},
  title = {mojo-audio: High-performance audio DSP in Mojo},
  year = {2026},
  url = {https://github.com/itsdevcoffee/mojo-audio}
}

🔗 Related Projects

Visage ML - Neural network library in Mojo
Mojo - The Mojo programming language
MAX Engine - Modular's AI engine

🏅 Achievements

🏆 Beats librosa (Python's standard audio library)
🚀 40x speedup from naive to optimized
⚡ ~2500x realtime throughput
✅ All from scratch in Mojo
🎓 Complete learning resource

Built with Mojo 🔥 | Faster than Python | Production-Ready

GitHub | Issues | Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
include		include
src		src
tests		tests
ui		ui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
QUICKSTART_UI.md		QUICKSTART_UI.md
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml

itsdevcoffee/mojo-audio

Folders and files

Latest commit

History

Repository files navigation