fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon by Davidobot · Pull Request #30 · Egoist-Machines/LodeDB

Davidobot · 2026-06-25T03:20:43Z

Summary

Two post-merge fixes found while auditing the #29 ONNX integration on Apple Silicon, where the default device="auto" resolves to mps.

1. Default to the CPU execution provider on Apple Silicon (Core ML is opt-in). The provider mapping requested the ONNX Core ML EP for mps, but that path was never benchmarked (all #29 numbers and the parity test ran on the CPU EP). Measured on an M-series CPU, Core ML is the slowest option for single-query embedding, because the dynamic-shape preset graph fragments into ~50 Core ML/CPU partitions:

provider	single-query latency
ONNX CPU EP	2.9 ms
torch (MPS)	7.9 ms
ONNX Core ML EP	16.0 ms

So the default Mac path was ~2x slower than the torch it replaced, defeating the latency win that motivated the ONNX default. Parity is unaffected (cosine 1.0000 vs both torch and CPU-ONNX). Fix: ONNX defaults to the CPU provider on Apple Silicon; Core ML is opt-in via LODEDB_ONNX_COREML=1, mirroring the off-by-default opt-in MPS vector scan (a slower-than-CPU Apple-GPU path). CUDA hosts still prefer the CUDA provider.

2. doctor reports the embedding runtime as a preference, not a guarantee. It printed runtime (auto): onnx whenever onnxruntime was importable, but auto falls back to torch when the model's ONNX graph can't be materialized (offline/uncached). Now reports the preferred runtime plus an explicit fallback note; JSON field renamed auto_resolves_to to preferred, added note.

Testing

ruff check clean; full pytest 403 passed, 35 skipped.
New tests: provider preference (CPU by default on mps/cpu, CUDA on cuda, Core ML only with the LODEDB_ONNX_COREML opt-in) and the doctor test now asserts both the JSON and rendered-text surfaces.

…antee doctor reported `runtime (auto): onnx` whenever onnxruntime was importable, but `embedding_runtime="auto"` only uses ONNX when the model's ONNX graph can also be materialized (cached, a prebuilt Hub snapshot, or an Optimum export), and otherwise falls back to PyTorch. On an offline, uncached box the report therefore overstated the runtime. Report the preferred runtime plus an explicit fallback note instead, and rename the JSON field auto_resolves_to to preferred. Testing: ruff check; pytest (test_onnx_embedding_runtime, test_local_backends) 23 passed.

…re ML opt-in) The default device "auto" resolves to "mps" on Apple Silicon, and the provider mapping requested the ONNX Core ML EP there. That path was never benchmarked (all #29 numbers and the parity test ran on the CPU EP). On the dynamic-shape preset graphs Core ML fragments into ~50 Core ML/CPU partitions and measured slower than the plain CPU provider for single-query embedding: about 16 ms vs 3 ms on an M-series CPU (torch is 7.9 ms), so the default Mac path was the slowest of the three despite the ONNX-default switch being motivated by lower query latency. Parity is unaffected (cosine 1.0 vs both torch and CPU-ONNX). Default the ONNX runtime to the CPU provider on Apple Silicon and gate Core ML behind LODEDB_ONNX_COREML=1, mirroring the off-by-default opt-in MPS vector scan. CUDA hosts still prefer the CUDA provider. Testing: ruff check; pytest 403 passed, 35 skipped.

Davidobot added 3 commits June 24, 2026 20:20

docs: align doctor wording with the preferred-runtime report

f671ac0

Davidobot changed the title ~~fix(doctor): report the embedding runtime as a preference, not a guarantee~~ fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon Jun 25, 2026

Davidobot merged commit 7775915 into main Jun 25, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon#30

fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon#30
Davidobot merged 3 commits into
mainfrom
claude/onnx-doctor-honesty

Davidobot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Davidobot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Davidobot commented Jun 25, 2026 •

edited

Loading