Audio Features — Tempo, MFCC, Spectral, Key & Beats in DuckDB

A Query.Farm VGI worker for DuckDB.

Audio Features — Tempo, MFCC, Spectral, Key & Beats in DuckDB

vgi-audio · a Query.Farm VGI worker · powered by librosa

A VGI worker that brings audio feature extraction into DuckDB/SQL. It runs librosa over audio you reference by file path or pass as raw bytes (BLOB) — pulling out duration, sample rate, tempo, MFCCs, spectral features, a heuristic musical key, and beat onset times — as plain SQL functions.

INSTALL vgi FROM community; LOAD vgi;
ATTACH 'audio' (TYPE vgi, LOCATION 'uv run audio_worker.py');

SELECT audio.duration('/tmp/tone.wav');            -- 2.0   (seconds)
SELECT audio.sample_rate('/tmp/tone.wav');          -- 22050 (Hz)
SELECT audio.channels('/tmp/tone.wav');             -- 1
SELECT audio.tempo('/tmp/click.wav');               -- ~120.0 (estimated BPM)
SELECT audio.rms_energy('/tmp/tone.wav');           -- mean RMS energy
SELECT audio.zero_crossing_rate('/tmp/tone.wav');   -- mean ZCR
SELECT audio.spectral_centroid('/tmp/tone.wav');    -- Hz (spectrum centre of mass)
SELECT audio.spectral_bandwidth('/tmp/tone.wav');   -- Hz
SELECT audio.mfcc('/tmp/tone.wav');                 -- DOUBLE[13]
SELECT audio.mfcc('/tmp/tone.wav', 20);             -- DOUBLE[20]
SELECT audio.estimated_key('/tmp/tone.wav');        -- 'C major' (heuristic!)

SELECT * FROM audio.beats('/tmp/click.wav');        -- (seq, time) beat onsets
SELECT * FROM audio.audio_info('/tmp/tone.wav');    -- (duration, sample_rate, channels)

Path or bytes — the polymorphic audio argument

Every function accepts the audio two ways:

a VARCHAR path — the worker opens the file from its own filesystem; or
a BLOB — raw audio bytes that travel over Arrow (e.g. from read_blob(...)), decoded in-memory.

-- bytes straight from DuckDB's read_blob:
SELECT file, audio.duration(content)
FROM read_blob('recordings/*.wav');

Under the hood each feature is registered as two same-name overloads — one whose first argument is VARCHAR, one BLOB — and DuckDB resolves which to call by the column type. (mfcc additionally has an optional coefficient-count argument, so it has path/bytes × default/n overloads.)

Scalars (per-row) vs. table functions

Scalars take positional arguments only and resolve overloads by type/arity (DuckDB's name := value syntax is a table-function feature, not a scalar one). Every per-row answer is a scalar, so it works inline in any projection or predicate: duration, sample_rate, channels, tempo, rms_energy, zero_crossing_rate, spectral_centroid, spectral_bandwidth, mfcc, estimated_key.
Table functions return rows: beats(audio) → (seq BIGINT, time DOUBLE) beat onset times; audio_info(audio) → a single (duration DOUBLE, sample_rate INT, channels INT) row.

Robustness — audio bytes are untrusted

A SQL worker is fed whatever is in the column. Malformed, truncated, absurdly large, or maliciously crafted audio must never crash the worker process, so:

every decode/analysis is wrapped per row; failures return NULL (scalars) or no rows (table functions) instead of raising;
NULL input → NULL output / no rows;
decoded audio is bounded — at most 30 minutes (MAX_DURATION_SECONDS) is materialised, so a corrupt header claiming billions of frames can't exhaust memory; longer inputs are analysed on a truncated prefix;
analysis runs at a fixed 22.05 kHz mono internally, bounding per-frame work.

A garbage blob next to a valid one yields NULL for the bad row and the correct value for the good row — the process keeps serving.

Supported formats & native dependencies

WAV / FLAC / OGG decode natively via soundfile (libsndfile, bundled in the wheel) — no external tools required.
Compressed formats (mp3, m4a, ...) require ffmpeg (or audioread) installed on the worker host. If neither is present, those decodes return NULL rather than crashing.

Python dependencies and their licenses:

Library	Purpose	License
`librosa`	feature extraction	ISC
`soundfile`	WAV/FLAC/OGG decode	BSD-3-Clause
`numpy`	array math	BSD-3-Clause

This worker itself is MIT licensed (see LICENSE).

estimated_key is a heuristic chroma-based (Krumhansl-Schmuckler) estimate — useful as a hint, not ground truth.

Layout

audio_worker.py        repo-root stdio entry point; PEP 723 inline deps; main()
vgi_audio/
  features.py          pure feature logic over librosa/soundfile/numpy; no Arrow/VGI; total
  scalars.py           per-row scalars (path/bytes overloads, mfcc n overload)
  tables.py            table functions: beats, audio_info (path/bytes overloads)
  schema_utils.py      pa.Field comment / column-doc helper
tests/                 pytest: test_features (pure), test_tables (in-proc), test_scalars (Client RPC)
test/sql/*.test        haybarn-unittest sqllogictest — authoritative E2E
Makefile               test / test-unit / test-sql / lint

Development

uv sync --extra dev
uv run pytest -q                # unit + integration
make test-sql                   # end-to-end SQL via haybarn-unittest
uv run ruff check . && uv run mypy vgi_audio/

Test audio is generated deterministically in-process (numpy sine tones / click tracks written to in-memory WAV via soundfile); a couple of tiny committed WAV fixtures under test/sql/data/ drive the SQL E2E.

Authorship & License

Written by Query.Farm.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
ci		ci
docs		docs
test/sql		test/sql
tests		tests
vgi_audio		vgi_audio
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
audio_worker.py		audio_worker.py
conftest.py		conftest.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Features — Tempo, MFCC, Spectral, Key & Beats in DuckDB

Path or bytes — the polymorphic audio argument

Scalars (per-row) vs. table functions

Robustness — audio bytes are untrusted

Supported formats & native dependencies

Layout

Development

Authorship & License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Audio Features — Tempo, MFCC, Spectral, Key & Beats in DuckDB

Path or bytes — the polymorphic audio argument

Scalars (per-row) vs. table functions

Robustness — audio bytes are untrusted

Supported formats & native dependencies

Layout

Development

Authorship & License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages