test(clam): CHAODA outlier spike — single-method LFD is below the PROBE-CHAODA-1000G bar (AUC 0.62)#219
Conversation
…s below the PROBE-CHAODA-1000G bar Runs the "1-day spike substitute" named in the genetics-probes-v1 spec (AdaWorldAPI/lance-graph): a kernel smoke test for the claim "CHAODA detects novel variants without a trained classifier." Synthesises a 5-lane Gaussian mixture (matching the probe's 5-lane variant feature vector) — three tight "common" clusters plus eight deliberately extreme "novel" outliers — thermometer-encodes each lane into 48 bits so Hamming distance is monotone in per-lane L1 magnitude (the honest bridge from ordinal features to the Hamming-metric CLAM default), builds the shipped ClamTree, and scores via anomaly_scores. MEASURED (deterministic, seed-fixed): mean cluster score = 0.6749, mean outlier score = 0.7500 frac cluster >= 0.5 = 0.733, frac outlier >= 0.5 = 0.750 ROC-AUC (Mann-Whitney U) = 0.6240 FINDING: the shipped single-method leaf-LFD anomaly_scores reaches only AUC ~ 0.62 on the EASIEST possible case (clean synthetic clusters with far outliers) — well below the probe's >= 0.85 bar. The cause is mechanical: leaf LFD = log2(|B(c,r)|/|B(c,r/2)|) measures intra-leaf geometry complexity, not inter-leaf isolation, so an isolated singleton lands in a leaf whose LFD is comparable to a dense cluster's, and global min-max normalisation compresses both into the same band. The CHAODA ensemble of Ishaq et al. 2021 combines several graph-based signals (relative/component cardinality, graph neighbourhood, random-walk stationary distribution, vertex degree); only the LFD signal is shipped here. PROBE-CHAODA-1000G therefore needs the multi-method ensemble or an augmented signal before it can pass — not merely genomic fixtures. The test locks robust, wide-tolerance invariants (valid range, bit-exact determinism, correct polarity, better-than-chance lower bound) plus one tripwire (auc < 0.85) that fails by design if a future multi-method port lifts the signal to the probe bar, forcing a cross-repo FINDING update rather than letting the claim silently rot. This is the evidence-before-build payoff: the gap is caught before any adapter-genetics-experimental (D-GEN-1..4) spend. https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
|
Warning Review limit reached
More reviews will be available in 39 minutes and 33 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5f7e76d53
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex correctly flagged that asserting auc < 0.85 in a library unit test turns a future quality improvement into a failing test: once a multi-method CHAODA ensemble lifts the signal past the 0.85 probe bar, cargo test -p ndarray would fail until an external lance-graph doc is updated. A library test must never fail because the code got better, and ndarray CI should not be coupled to a lance-graph note. Fix: remove the upper-bound assertion. The test now asserts only lower-bound, forward-compatible invariants — valid range, bit-exact determinism, correct polarity (outliers >= cluster mean), and better-than-chance (auc > 0.5). The measured AUC (~ 0.62 today) is surfaced via the existing eprintln diagnostic, not enforced. Refreshing the PROBE-CHAODA-1000G FINDING in lance-graph when the ensemble lands is a documentation step, not a gate enforced from this library's test suite. Doc comment updated to match. Re-run: test green, ROC_AUC=0.6240 still printed. https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
Fixes the format/stable CI check on PR #219. rustfmt reflows the centers array literal and two assert! calls in the spike test; no logic change, test still green (single-LFD AUC 0.6240 unchanged). Changes confined to the added test code. https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
…HAODA-1000G synthetic bar (AUC 0.62 -> 0.99) Increment 1 of D-GEN-CHAODA-ENSEMBLE (lance-graph genetics-probes-v1.md). Adds ClamTree::ensemble_anomaly_scores as a NEW scoring entry point alongside the unchanged single-method anomaly_scores baseline. The spike (#219) measured single-method leaf-LFD at ROC-AUC 0.624 on a synthetic 5-lane Gaussian mixture, below the 0.85 bar. Mechanical cause: leaf LFD measures intra-leaf geometry, not inter-leaf isolation. This ensemble combines isolation-sensitive CHAODA signals: - parent-child path-minority ratio (dominant): walking a leaf to the root, the minimum child/parent cardinality ratio is tiny for a point that split off as a minority (isolated outlier) and moderate for a point that always stayed in the majority (dense-cluster member). Immune to the leaf-fragmentation that defeats raw leaf cardinality. - connected-component cardinality over the leaf-overlap graph (small components are anomalous). Averaged into one score; every point inherits its leaf's score. A first attempt using raw leaf cardinality + vertex degree + component size scored AUC 0.621 (no lift) because the tree fragments dense blobs into many tiny leaves that mimic isolated outliers under those metrics; the path-minority signal is what actually separates. Leaf degree and raw leaf cardinality were dropped as fragmentation noise. The remaining CHAODA methods (random-walk stationary distribution) are deferred. MEASURED (deterministic synthetic mixture, same fixture as #219): single-LFD AUC = 0.6240 ensemble AUC = 0.9906 (lift +0.3667, clears the 0.85 bar) This is the synthetic SMOKE TEST only. It proves the ensemble approach captures isolation where single-LFD does not; it does NOT prove genomic novelty detection. PROBE-CHAODA-1000G on real corpora remains gated on D-GEN-1 + D-GEN-2 (VCF -> feature-vector pipeline). Tests: full hpc::clam suite green (53 incl. the new ensemble test); ensemble is deterministic (bit-exact rebuild) and built purely from shipped tree fields + the public dist(). https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
…t doc + rustfmt Addresses the Codex P2 on PR #220 (quadratic leaf-overlap build) and a doc-comment inconsistency I introduced, and fixes the format/stable CI. (1) Quadratic-build guard (Codex P2). The connected-component term needs an O(L^2 * vec_len) leaf-overlap graph; on production corpora with small min_cluster_size, L approaches the point count and the public API could hang. Split into: - ensemble_anomaly_scores_budgeted(.., graph_budget): computes the linear O(L*depth) parent-child path-minority signal always, and only builds the overlap graph + component term when n_leaves <= graph_budget. - ensemble_anomaly_scores(..): convenience wrapper using the default ENSEMBLE_GRAPH_BUDGET = 4096; above that it degrades to path-minority alone, so the public API never runs the quadratic build at scale. (2) Path-only fallback is validated, not assumed. New measurement on the synthetic fixture (graph_budget = 0 forces the fallback): single-LFD 0.6240 | path-only 0.9938 | full ensemble 0.9906 Path-minority alone clears the 0.85 bar (slightly above the combined — the component term is a marginal refinement), so degrading at scale is safe. The test now asserts path-only AUC >= 0.85 so the guard can never silently degrade large-corpus accuracy. (3) Doc-comment correction. When the scoring pivoted to path-minority + component, the method doc still described the abandoned relative-cardinality / vertex-degree set and listed parent-child ratio as "deferred" when it is in fact the dominant shipped signal. Rewritten to match the implementation. (4) rustfmt: format/stable was red; the new code is now rustfmt-clean (changes confined to the added ensemble method + tests; no pre-existing code touched). clippy --lib clean; full hpc::clam suite green (53 tests). https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
|
Caution Review failedAn error occurred during the review process. Please try again later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
feat(clam): CHAODA multi-method ensemble — clears the synthetic PROBE-CHAODA-1000G bar (AUC 0.62 -> 0.99)
Summary
Runs the "1-day spike substitute" named in
genetics-probes-v1.md(AdaWorldAPI/lance-graph, merged in lance-graph #503): a kernel smoke test for thePROBE-CHAODA-1000Gclaim — "CHAODA detects novel variants without a trained classifier."The spike synthesises a 5-lane Gaussian mixture (matching the probe's 5-lane variant feature vector: AF / DP / FS / entropy / conservation) — three tight "common" clusters + eight deliberately extreme "novel" outliers — thermometer-encodes each lane into 48 bits (so Hamming distance is monotone in per-lane L1 magnitude, the honest bridge from ordinal features to the Hamming-metric CLAM default), builds the shipped
ClamTree, and scores viaanomaly_scores.Measured (deterministic, seed-fixed)
FINDING
The shipped single-method leaf-LFD
anomaly_scoresreaches only AUC ≈ 0.62 on the easiest possible case (clean synthetic clusters with far outliers) — well below the probe's ≥ 0.85 bar.The cause is mechanical: leaf
LFD = log₂(|B(c,r)|/|B(c,r/2)|)measures intra-leaf geometry complexity, not inter-leaf isolation. An isolated singleton lands in a leaf whose LFD is comparable to a dense cluster's, and global min-max normalisation compresses both into the same score band. The CHAODA ensemble of Ishaq et al. 2021 combines several graph-based signals (relative/component cardinality, graph neighbourhood, random-walk stationary distribution, vertex degree); only the LFD signal is shipped here.PROBE-CHAODA-1000Gtherefore needs the multi-method ensemble (or an augmented signal) before it can pass — not merely genomic fixtures. This is the evidence-before-build payoff: the gap is caught before anyadapter-genetics-experimental(D-GEN-1..4) spend.Test design
test_chaoda_flags_novel_outliers_in_genetics_like_mixturelocks robust, wide-tolerance invariants (valid range, bit-exact determinism, correct polaritymean_out ≥ mean_clu, better-than-chanceauc > 0.5) plus one tripwire (auc < 0.85) that fails by design if a future multi-method port lifts the signal to the probe bar — forcing a cross-repo FINDING update in lance-graph rather than letting the claim silently rot. AUC may drift freely in[0.5, 0.85)without breaking the test.Test plan
cargo test --lib hpc::clam::tests— 52 passed (51 pre-existing + this spike).f64::to_bitscompare).Cross-refs
lance-graph/.claude/plans/genetics-probes-v1.md— the probe this spike substitutes for (a companion lance-graph PR records this AUC=0.624 as a CONJECTURE→FINDING update).src/hpc/clam.rs:1493-1567— CHAODA Phase 4anomaly_scoresunder test.https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
Update — 2026-06-16 (post-#220 merge, verified)
The branch absorbed #220 (the CHAODA multi-method ensemble) before merge, so the merged change delivers BOTH the spike FINDING and the ensemble that closes it. Verified on the branch HEAD —
cargo test --lib hpc::clam::tests::test_chaoda -- --nocapture, both green:auc > 0.5) — forward-compatibleauc_path >= 0.85✅auc_ens >= 0.85✅So
PROBE-CHAODA-1000G's ≥0.85 bar is now cleared in-branch by the ensemble, while the single-method spike test stays the documented-FINDING tripwire that can't fail when the code improves (codex P2 on this PR — resolved). The original spike narrative above remains accurate for the single-method commit; this note records the merged final state for the cross-repogenetics-probes-v1FINDING.