TFLite export (openWakeWord-compatible) + ONNX quantization fix by pham-tuan-binh · Pull Request #82 · livekit/livekit-wakeword

pham-tuan-binh · 2026-06-08T05:14:05Z

Summary

Adds an tflite export format alongside ONNX, producing artifacts that openWakeWord can load directly. Also fixes a pre-existing crash in quantize_onnx discovered while testing.

run_export stays the single entry point and gains a format argument; ONNX is always produced (it is the TFLite conversion source), so eval and the full run pipeline are unaffected.

What's included

Format-aware export

config: new ExportFormat enum + output_format field (defaults to onnx)
cli: --format/-f flag on export (precedence: flag > config)
run_export(config, quantize=False, format=None) dispatches onnx/tflite
rename export_classifier → export_onnx for parity with export_tflite

TFLite via export/tflite.py (onnx2tf: ONNX → TF SavedModel → TFLite), pinned to the openWakeWord runtime contract:

static (1, 16, 96) input (openWakeWord never calls resize_tensor_input; keep_shape_absolutely_input_names stops onnx2tf transposing it to (1, 96, 16))
(1, 1) output
builtin TFLite ops only (LiteRT interpreter has no Flex delegate)

Head support (verified): dnn converts bit-exact vs ONNX/PyTorch. conv_attention and rnn cannot convert to builtin-op TFLite (attention emits an unsupported constant; LSTM needs the Flex delegate) — they now raise NotImplementedError before any work, with an actionable message. The conversion itself is also wrapped to surface a clear RuntimeError instead of an onnx2tf stack trace.

quantize_onnx fix (pre-existing bug, all heads)
The torch dynamo ONNX exporter emits value_info for weight initializers. ORT's dynamic quantizer rewrites Gemm→MatMul by transposing those weights in place but leaves the value_info stale, so its strict shape-inference pass failed with ShapeInferenceError: Inferred shape ... differ in dimension 0: (1536) vs (32). Fix: drop initializer value_info (redundant — shapes infer from the tensors) before quantizing.

Packaging / docs / tests

optional tflite extra (onnx2tf, tensorflow, tf-keras, onnx-graphsurgeon, sng4onnx, psutil, ai-edge-litert — onnx2tf under-declares several of these)
docs/export-and-inference.md + README updated
tests/test_export.py: ONNX IO contract, quantize regression, fast guard for unsupported heads, and a real TFLite parity test (auto-skips without the extra)

Usage

```bash
pip install livekit-wakeword[tflite]
livekit-wakeword export configs/prod.yaml --format tflite
```

Testing

```
69 passed, 1 skipped
```
The real ONNX→TFLite conversion + parity test passes locally with the tflite extra installed; it skips in CI (which runs --extra train --extra export only).

Notes

Only the dnn head currently produces openWakeWord-compatible TFLite. conv_attention (the default/flagship head) does not convert via onnx2tf yet — a follow-up could try onnx2tf param-replacement for the attention op or ai-edge-torch (direct PyTorch→LiteRT).

…zation Add a format-aware export path while keeping run_export as the entry point. - config: new ExportFormat enum + output_format field (defaults to onnx) - cli: --format/-f flag on `export`; precedence is flag > config - run_export(config, quantize, format): dispatches onnx/tflite; ONNX is always produced (it is the TFLite conversion source) so `eval` keeps working - export/tflite.py: ONNX -> TFLite via onnx2tf, pinned to the openWakeWord contract (static (1,16,96) input via keep_shape_absolutely_input_names, (1,1) output, builtin TFLite ops only). dnn verified bit-exact vs ONNX; conv_attention/rnn fail fast with NotImplementedError before any work - rename export_classifier -> export_onnx for parity with export_tflite - fix quantize_onnx: strip stale initializer value_info emitted by the torch dynamo exporter, which broke ORT's Gemm->MatMul shape-inference pass - add optional `tflite` extra, tests (test_export.py), and docs Tests: 69 passed, 1 skipped (real TFLite conversion skips without the extra).

- run: validate tflite + head compatibility before step 1 so an unsupported head (e.g. the default conv_attention) fails fast instead of after training - quantize_onnx: strip stale initializer value_info on a temp copy so the source ONNX deliverable is no longer mutated as a side effect - tests: assert the source ONNX survives quantization, and cover --quantize --format tflite

pham-tuan-binh mentioned this pull request Jun 8, 2026

Use in Home Assistant #75

Open

pham-tuan-binh merged commit 431c7e4 into main Jun 8, 2026
7 checks passed

pham-tuan-binh deleted the feat/tflite-export branch June 8, 2026 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFLite export (openWakeWord-compatible) + ONNX quantization fix#82

TFLite export (openWakeWord-compatible) + ONNX quantization fix#82
pham-tuan-binh merged 2 commits into
mainfrom
feat/tflite-export

pham-tuan-binh commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pham-tuan-binh commented Jun 8, 2026

Summary

What's included

Usage

Testing

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant