TFLite export (openWakeWord-compatible) + ONNX quantization fix#82
Merged
Conversation
…zation Add a format-aware export path while keeping run_export as the entry point. - config: new ExportFormat enum + output_format field (defaults to onnx) - cli: --format/-f flag on `export`; precedence is flag > config - run_export(config, quantize, format): dispatches onnx/tflite; ONNX is always produced (it is the TFLite conversion source) so `eval` keeps working - export/tflite.py: ONNX -> TFLite via onnx2tf, pinned to the openWakeWord contract (static (1,16,96) input via keep_shape_absolutely_input_names, (1,1) output, builtin TFLite ops only). dnn verified bit-exact vs ONNX; conv_attention/rnn fail fast with NotImplementedError before any work - rename export_classifier -> export_onnx for parity with export_tflite - fix quantize_onnx: strip stale initializer value_info emitted by the torch dynamo exporter, which broke ORT's Gemm->MatMul shape-inference pass - add optional `tflite` extra, tests (test_export.py), and docs Tests: 69 passed, 1 skipped (real TFLite conversion skips without the extra).
- run: validate tflite + head compatibility before step 1 so an unsupported head (e.g. the default conv_attention) fails fast instead of after training - quantize_onnx: strip stale initializer value_info on a temp copy so the source ONNX deliverable is no longer mutated as a side effect - tests: assert the source ONNX survives quantization, and cover --quantize --format tflite
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an
tfliteexport format alongside ONNX, producing artifacts that openWakeWord can load directly. Also fixes a pre-existing crash inquantize_onnxdiscovered while testing.run_exportstays the single entry point and gains aformatargument; ONNX is always produced (it is the TFLite conversion source), soevaland the fullrunpipeline are unaffected.What's included
Format-aware export
config: newExportFormatenum +output_formatfield (defaults toonnx)cli:--format/-fflag onexport(precedence: flag > config)run_export(config, quantize=False, format=None)dispatchesonnx/tfliteexport_classifier→export_onnxfor parity withexport_tfliteTFLite via
export/tflite.py(onnx2tf: ONNX → TF SavedModel → TFLite), pinned to the openWakeWord runtime contract:(1, 16, 96)input (openWakeWord never callsresize_tensor_input;keep_shape_absolutely_input_namesstops onnx2tf transposing it to(1, 96, 16))(1, 1)outputHead support (verified):
dnnconverts bit-exact vs ONNX/PyTorch.conv_attentionandrnncannot convert to builtin-op TFLite (attention emits an unsupported constant; LSTM needs the Flex delegate) — they now raiseNotImplementedErrorbefore any work, with an actionable message. The conversion itself is also wrapped to surface a clearRuntimeErrorinstead of an onnx2tf stack trace.quantize_onnxfix (pre-existing bug, all heads)The torch dynamo ONNX exporter emits
value_infofor weight initializers. ORT's dynamic quantizer rewritesGemm→MatMulby transposing those weights in place but leaves thevalue_infostale, so its strict shape-inference pass failed withShapeInferenceError: Inferred shape ... differ in dimension 0: (1536) vs (32). Fix: drop initializervalue_info(redundant — shapes infer from the tensors) before quantizing.Packaging / docs / tests
tfliteextra (onnx2tf,tensorflow,tf-keras,onnx-graphsurgeon,sng4onnx,psutil,ai-edge-litert— onnx2tf under-declares several of these)docs/export-and-inference.md+ README updatedtests/test_export.py: ONNX IO contract, quantize regression, fast guard for unsupported heads, and a real TFLite parity test (auto-skips without the extra)Usage
```bash
pip install livekit-wakeword[tflite]
livekit-wakeword export configs/prod.yaml --format tflite
```
Testing
```
69 passed, 1 skipped
```
The real ONNX→TFLite conversion + parity test passes locally with the
tfliteextra installed; it skips in CI (which runs--extra train --extra exportonly).Notes
dnnhead currently produces openWakeWord-compatible TFLite.conv_attention(the default/flagship head) does not convert via onnx2tf yet — a follow-up could try onnx2tf param-replacement for the attention op orai-edge-torch(direct PyTorch→LiteRT).