Skip to content

fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon#30

Merged
Davidobot merged 3 commits into
mainfrom
claude/onnx-doctor-honesty
Jun 25, 2026
Merged

fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon#30
Davidobot merged 3 commits into
mainfrom
claude/onnx-doctor-honesty

Conversation

@Davidobot

@Davidobot Davidobot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Two post-merge fixes found while auditing the #29 ONNX integration on Apple Silicon, where the default device="auto" resolves to mps.

1. Default to the CPU execution provider on Apple Silicon (Core ML is opt-in). The provider mapping requested the ONNX Core ML EP for mps, but that path was never benchmarked (all #29 numbers and the parity test ran on the CPU EP). Measured on an M-series CPU, Core ML is the slowest option for single-query embedding, because the dynamic-shape preset graph fragments into ~50 Core ML/CPU partitions:

provider single-query latency
ONNX CPU EP 2.9 ms
torch (MPS) 7.9 ms
ONNX Core ML EP 16.0 ms

So the default Mac path was ~2x slower than the torch it replaced, defeating the latency win that motivated the ONNX default. Parity is unaffected (cosine 1.0000 vs both torch and CPU-ONNX). Fix: ONNX defaults to the CPU provider on Apple Silicon; Core ML is opt-in via LODEDB_ONNX_COREML=1, mirroring the off-by-default opt-in MPS vector scan (a slower-than-CPU Apple-GPU path). CUDA hosts still prefer the CUDA provider.

2. doctor reports the embedding runtime as a preference, not a guarantee. It printed runtime (auto): onnx whenever onnxruntime was importable, but auto falls back to torch when the model's ONNX graph can't be materialized (offline/uncached). Now reports the preferred runtime plus an explicit fallback note; JSON field renamed auto_resolves_to to preferred, added note.

Testing

  • ruff check clean; full pytest 403 passed, 35 skipped.
  • New tests: provider preference (CPU by default on mps/cpu, CUDA on cuda, Core ML only with the LODEDB_ONNX_COREML opt-in) and the doctor test now asserts both the JSON and rendered-text surfaces.

…antee

doctor reported `runtime (auto): onnx` whenever onnxruntime was importable, but
`embedding_runtime="auto"` only uses ONNX when the model's ONNX graph can also be
materialized (cached, a prebuilt Hub snapshot, or an Optimum export), and otherwise
falls back to PyTorch. On an offline, uncached box the report therefore overstated
the runtime. Report the preferred runtime plus an explicit fallback note instead,
and rename the JSON field auto_resolves_to to preferred.

Testing: ruff check; pytest (test_onnx_embedding_runtime, test_local_backends) 23 passed.
…re ML opt-in)

The default device "auto" resolves to "mps" on Apple Silicon, and the provider mapping
requested the ONNX Core ML EP there. That path was never benchmarked (all #29 numbers and the
parity test ran on the CPU EP). On the dynamic-shape preset graphs Core ML fragments into ~50
Core ML/CPU partitions and measured slower than the plain CPU provider for single-query
embedding: about 16 ms vs 3 ms on an M-series CPU (torch is 7.9 ms), so the default Mac path was
the slowest of the three despite the ONNX-default switch being motivated by lower query latency.
Parity is unaffected (cosine 1.0 vs both torch and CPU-ONNX).

Default the ONNX runtime to the CPU provider on Apple Silicon and gate Core ML behind
LODEDB_ONNX_COREML=1, mirroring the off-by-default opt-in MPS vector scan. CUDA hosts still prefer
the CUDA provider.

Testing: ruff check; pytest 403 passed, 35 skipped.
@Davidobot Davidobot changed the title fix(doctor): report the embedding runtime as a preference, not a guarantee fix(onnx): honest doctor report + default to the CPU provider on Apple Silicon Jun 25, 2026
@Davidobot Davidobot merged commit 7775915 into main Jun 25, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant