wav2vec2.cpp

High-performance C/C++ implementation of Wav2Vec 2.0 for phoneme recognition, using the GGML tensor library.

Wav2Vec 2.0 is a self-supervised speech representation learning framework from Facebook AI Research that achieves state-of-the-art results with minimal labeled data.

Note: This project was vibe coded with an AI assistant and draws heavily from whisper.cpp.

Features

Plain C/C++ implementation without dependencies
Apple Silicon first-class support (via Metal)
Mixed F16/F32 precision
Quantization support (Q4, Q5, Q6, Q8)
Phoneme recognition with timing information
CTC decoding with configurable options

Quick Start

Build

mkdir build && cd build
cmake ..
make -j

# With Metal support (macOS/iOS)
cmake -DGGML_METAL=ON ..
make -j

Convert Model

# Install dependencies
pip install torch transformers

# Convert HuggingFace model to GGML format
python models/convert-wav2vec2-to-ggml.py \
    facebook/wav2vec2-lv-60-espeak-cv-ft \
    models/wav2vec2-phoneme

Run

# Basic phoneme recognition
./bin/wav2vec2-cli -m models/wav2vec2-phoneme/ggml-model-f16.bin -f samples/audio.wav

# With timing information
./bin/wav2vec2-cli -m models/wav2vec2-phoneme/ggml-model-f16.bin -f samples/audio.wav --print-timestamps

Quantize

# Quantize to Q6_K (recommended, ~4x smaller with <5% accuracy loss)
./bin/quantize-wav2vec2 models/wav2vec2-phoneme/ggml-model-f16.bin models/wav2vec2-phoneme/ggml-model-q6_k.bin q6_k

Project Structure

wav2vec2.cpp/
├── src/                    # Core library
│   ├── wav2vec2.cpp       # Main implementation
│   ├── wav2vec2-arch.h    # Architecture definitions
│   └── CMakeLists.txt
├── include/
│   └── wav2vec2.h         # Public C API
├── examples/
│   ├── wav2vec2/          # CLI tools
│   │   ├── wav2vec2-cli.cpp
│   │   └── quantize-wav2vec2.cpp
│   ├── common.cpp/h       # Shared utilities
│   └── common-ggml.cpp/h  # GGML utilities
├── models/
│   └── convert-wav2vec2-to-ggml.py
├── ggml/                   # GGML tensor library
└── cmake/

API Usage

#include "wav2vec2.h"

// Initialize
struct wav2vec2_context_params cparams = wav2vec2_context_default_params();
cparams.use_gpu = true;

struct wav2vec2_context * ctx = wav2vec2_init_from_file("model.bin", cparams);

// Run inference
struct wav2vec2_full_params params = wav2vec2_full_default_params();
wav2vec2_full(ctx, params, samples, n_samples);

// Get results
int n_phonemes = wav2vec2_full_n_phonemes(ctx);
for (int i = 0; i < n_phonemes; i++) {
    const char * phoneme = wav2vec2_full_get_phoneme_text(ctx, i);
    int64_t t0 = wav2vec2_full_get_phoneme_t0(ctx, i);
    int64_t t1 = wav2vec2_full_get_phoneme_t1(ctx, i);
    printf("[%lld - %lld] %s\n", t0, t1, phoneme);
}

// Cleanup
wav2vec2_free(ctx);

Evaluation

Tested on L2-ARCTIC accented English speech samples, comparing C++ output against the HuggingFace Python reference implementation.

Accuracy vs Reference

Model	PER vs Python	Notes
F16	1.0%	Near-exact parity with reference
Q6_K	1.4%	+0.4% degradation, 2.2x smaller
Q4_K	1.7%	+0.7% degradation, 3x smaller

PER = Phoneme Error Rate (edit distance / reference length)

Model Size

Quantization	Size	Compression
F16	~600 MB	1x
Q6_K	~270 MB	2.2x
Q4_K	~200 MB	3x

Q4_K is recommended for mobile deployment - significant size reduction with minimal accuracy loss.

References

@article{baevski2020wav2vec,
  title={wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations},
  author={Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael},
  journal={arXiv preprint arXiv:2006.11477},
  year={2020}
}

Acknowledgments

This project draws heavily from whisper.cpp by Georgi Gerganov and contributors. The architecture, build system, and many implementation patterns are adapted from that excellent project.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wav2vec2.cpp

Features

Quick Start

Build

Convert Model

Run

Quantize

Project Structure

API Usage

Evaluation

Accuracy vs Reference

Model Size

References

Acknowledgments

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

wav2vec2.cpp

Features

Quick Start

Build

Convert Model

Run

Quantize

Project Structure

API Usage

Evaluation

Accuracy vs Reference

Model Size

References

Acknowledgments

License