Skip to content

test(clam): CHAODA outlier spike — single-method LFD is below the PROBE-CHAODA-1000G bar (AUC 0.62)#219

Merged
AdaWorldAPI merged 6 commits into
masterfrom
claude/chaoda-outlier-spike-v1
Jun 16, 2026
Merged

test(clam): CHAODA outlier spike — single-method LFD is below the PROBE-CHAODA-1000G bar (AUC 0.62)#219
AdaWorldAPI merged 6 commits into
masterfrom
claude/chaoda-outlier-spike-v1

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

Runs the "1-day spike substitute" named in genetics-probes-v1.md (AdaWorldAPI/lance-graph, merged in lance-graph #503): a kernel smoke test for the PROBE-CHAODA-1000G claim — "CHAODA detects novel variants without a trained classifier."

The spike synthesises a 5-lane Gaussian mixture (matching the probe's 5-lane variant feature vector: AF / DP / FS / entropy / conservation) — three tight "common" clusters + eight deliberately extreme "novel" outliers — thermometer-encodes each lane into 48 bits (so Hamming distance is monotone in per-lane L1 magnitude, the honest bridge from ordinal features to the Hamming-metric CLAM default), builds the shipped ClamTree, and scores via anomaly_scores.

Measured (deterministic, seed-fixed)

metric value
mean cluster score 0.6749
mean outlier score 0.7500
frac cluster ≥ 0.5 0.733
frac outlier ≥ 0.5 0.750
ROC-AUC (Mann-Whitney U) 0.6240

FINDING

The shipped single-method leaf-LFD anomaly_scores reaches only AUC ≈ 0.62 on the easiest possible case (clean synthetic clusters with far outliers) — well below the probe's ≥ 0.85 bar.

The cause is mechanical: leaf LFD = log₂(|B(c,r)|/|B(c,r/2)|) measures intra-leaf geometry complexity, not inter-leaf isolation. An isolated singleton lands in a leaf whose LFD is comparable to a dense cluster's, and global min-max normalisation compresses both into the same score band. The CHAODA ensemble of Ishaq et al. 2021 combines several graph-based signals (relative/component cardinality, graph neighbourhood, random-walk stationary distribution, vertex degree); only the LFD signal is shipped here.

PROBE-CHAODA-1000G therefore needs the multi-method ensemble (or an augmented signal) before it can pass — not merely genomic fixtures. This is the evidence-before-build payoff: the gap is caught before any adapter-genetics-experimental (D-GEN-1..4) spend.

Test design

test_chaoda_flags_novel_outliers_in_genetics_like_mixture locks robust, wide-tolerance invariants (valid range, bit-exact determinism, correct polarity mean_out ≥ mean_clu, better-than-chance auc > 0.5) plus one tripwire (auc < 0.85) that fails by design if a future multi-method port lifts the signal to the probe bar — forcing a cross-repo FINDING update in lance-graph rather than letting the claim silently rot. AUC may drift freely in [0.5, 0.85) without breaking the test.

Test plan

  • cargo test --lib hpc::clam::tests — 52 passed (51 pre-existing + this spike).
  • Determinism: rebuild + rescore is bit-exact (f64::to_bits compare).
  • No production code changed — test-module-only addition.

Cross-refs

  • lance-graph/.claude/plans/genetics-probes-v1.md — the probe this spike substitutes for (a companion lance-graph PR records this AUC=0.624 as a CONJECTURE→FINDING update).
  • src/hpc/clam.rs:1493-1567 — CHAODA Phase 4 anomaly_scores under test.
  • Ishaq et al. 2021 — the multi-method CHAODA ensemble the single shipped signal is a subset of.

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v


Update — 2026-06-16 (post-#220 merge, verified)

The branch absorbed #220 (the CHAODA multi-method ensemble) before merge, so the merged change delivers BOTH the spike FINDING and the ensemble that closes it. Verified on the branch HEAD — cargo test --lib hpc::clam::tests::test_chaoda -- --nocapture, both green:

signal ROC-AUC gate
single-method leaf-LFD (spike) 0.6240 lower-bound only (auc > 0.5) — forward-compatible
path-only 0.9938 auc_path >= 0.85
ensemble 0.9906 (+0.367 lift) auc_ens >= 0.85

So PROBE-CHAODA-1000G's ≥0.85 bar is now cleared in-branch by the ensemble, while the single-method spike test stays the documented-FINDING tripwire that can't fail when the code improves (codex P2 on this PR — resolved). The original spike narrative above remains accurate for the single-method commit; this note records the merged final state for the cross-repo genetics-probes-v1 FINDING.

…s below the PROBE-CHAODA-1000G bar

Runs the "1-day spike substitute" named in the genetics-probes-v1 spec
(AdaWorldAPI/lance-graph): a kernel smoke test for the claim "CHAODA
detects novel variants without a trained classifier."

Synthesises a 5-lane Gaussian mixture (matching the probe's 5-lane
variant feature vector) — three tight "common" clusters plus eight
deliberately extreme "novel" outliers — thermometer-encodes each lane
into 48 bits so Hamming distance is monotone in per-lane L1 magnitude
(the honest bridge from ordinal features to the Hamming-metric CLAM
default), builds the shipped ClamTree, and scores via anomaly_scores.

MEASURED (deterministic, seed-fixed):
  mean cluster score = 0.6749, mean outlier score = 0.7500
  frac cluster >= 0.5 = 0.733, frac outlier >= 0.5 = 0.750
  ROC-AUC (Mann-Whitney U) = 0.6240

FINDING: the shipped single-method leaf-LFD anomaly_scores reaches only
AUC ~ 0.62 on the EASIEST possible case (clean synthetic clusters with
far outliers) — well below the probe's >= 0.85 bar. The cause is
mechanical: leaf LFD = log2(|B(c,r)|/|B(c,r/2)|) measures intra-leaf
geometry complexity, not inter-leaf isolation, so an isolated singleton
lands in a leaf whose LFD is comparable to a dense cluster's, and global
min-max normalisation compresses both into the same band. The CHAODA
ensemble of Ishaq et al. 2021 combines several graph-based signals
(relative/component cardinality, graph neighbourhood, random-walk
stationary distribution, vertex degree); only the LFD signal is shipped
here. PROBE-CHAODA-1000G therefore needs the multi-method ensemble or an
augmented signal before it can pass — not merely genomic fixtures.

The test locks robust, wide-tolerance invariants (valid range, bit-exact
determinism, correct polarity, better-than-chance lower bound) plus one
tripwire (auc < 0.85) that fails by design if a future multi-method port
lifts the signal to the probe bar, forcing a cross-repo FINDING update
rather than letting the claim silently rot.

This is the evidence-before-build payoff: the gap is caught before any
adapter-genetics-experimental (D-GEN-1..4) spend.

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 33 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 55cc4c27-866e-4b73-b563-d6f9d7cba42a

📥 Commits

Reviewing files that changed from the base of the PR and between 2d708ef and 2ef18ed.

📒 Files selected for processing (1)
  • src/hpc/clam.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5f7e76d53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hpc/clam.rs Outdated
Codex correctly flagged that asserting auc < 0.85 in a library unit test
turns a future quality improvement into a failing test: once a multi-method
CHAODA ensemble lifts the signal past the 0.85 probe bar, cargo test -p
ndarray would fail until an external lance-graph doc is updated. A library
test must never fail because the code got better, and ndarray CI should not
be coupled to a lance-graph note.

Fix: remove the upper-bound assertion. The test now asserts only lower-bound,
forward-compatible invariants — valid range, bit-exact determinism, correct
polarity (outliers >= cluster mean), and better-than-chance (auc > 0.5). The
measured AUC (~ 0.62 today) is surfaced via the existing eprintln diagnostic,
not enforced. Refreshing the PROBE-CHAODA-1000G FINDING in lance-graph when
the ensemble lands is a documentation step, not a gate enforced from this
library's test suite. Doc comment updated to match.

Re-run: test green, ROC_AUC=0.6240 still printed.

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
claude added 3 commits June 16, 2026 09:24
Fixes the format/stable CI check on PR #219. rustfmt reflows the centers
array literal and two assert! calls in the spike test; no logic change,
test still green (single-LFD AUC 0.6240 unchanged). Changes confined to
the added test code.

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
…HAODA-1000G synthetic bar (AUC 0.62 -> 0.99)

Increment 1 of D-GEN-CHAODA-ENSEMBLE (lance-graph genetics-probes-v1.md).
Adds ClamTree::ensemble_anomaly_scores as a NEW scoring entry point
alongside the unchanged single-method anomaly_scores baseline.

The spike (#219) measured single-method leaf-LFD at ROC-AUC 0.624 on a
synthetic 5-lane Gaussian mixture, below the 0.85 bar. Mechanical cause:
leaf LFD measures intra-leaf geometry, not inter-leaf isolation.

This ensemble combines isolation-sensitive CHAODA signals:
  - parent-child path-minority ratio (dominant): walking a leaf to the
    root, the minimum child/parent cardinality ratio is tiny for a point
    that split off as a minority (isolated outlier) and moderate for a
    point that always stayed in the majority (dense-cluster member).
    Immune to the leaf-fragmentation that defeats raw leaf cardinality.
  - connected-component cardinality over the leaf-overlap graph (small
    components are anomalous).
Averaged into one score; every point inherits its leaf's score.

A first attempt using raw leaf cardinality + vertex degree + component
size scored AUC 0.621 (no lift) because the tree fragments dense blobs
into many tiny leaves that mimic isolated outliers under those metrics;
the path-minority signal is what actually separates. Leaf degree and raw
leaf cardinality were dropped as fragmentation noise. The remaining
CHAODA methods (random-walk stationary distribution) are deferred.

MEASURED (deterministic synthetic mixture, same fixture as #219):
  single-LFD AUC = 0.6240
  ensemble  AUC = 0.9906   (lift +0.3667, clears the 0.85 bar)

This is the synthetic SMOKE TEST only. It proves the ensemble approach
captures isolation where single-LFD does not; it does NOT prove genomic
novelty detection. PROBE-CHAODA-1000G on real corpora remains gated on
D-GEN-1 + D-GEN-2 (VCF -> feature-vector pipeline).

Tests: full hpc::clam suite green (53 incl. the new ensemble test);
ensemble is deterministic (bit-exact rebuild) and built purely from
shipped tree fields + the public dist().

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
…t doc + rustfmt

Addresses the Codex P2 on PR #220 (quadratic leaf-overlap build) and a
doc-comment inconsistency I introduced, and fixes the format/stable CI.

(1) Quadratic-build guard (Codex P2). The connected-component term needs an
O(L^2 * vec_len) leaf-overlap graph; on production corpora with small
min_cluster_size, L approaches the point count and the public API could
hang. Split into:
  - ensemble_anomaly_scores_budgeted(.., graph_budget): computes the linear
    O(L*depth) parent-child path-minority signal always, and only builds the
    overlap graph + component term when n_leaves <= graph_budget.
  - ensemble_anomaly_scores(..): convenience wrapper using the default
    ENSEMBLE_GRAPH_BUDGET = 4096; above that it degrades to path-minority
    alone, so the public API never runs the quadratic build at scale.

(2) Path-only fallback is validated, not assumed. New measurement on the
synthetic fixture (graph_budget = 0 forces the fallback):
    single-LFD 0.6240 | path-only 0.9938 | full ensemble 0.9906
Path-minority alone clears the 0.85 bar (slightly above the combined — the
component term is a marginal refinement), so degrading at scale is safe. The
test now asserts path-only AUC >= 0.85 so the guard can never silently
degrade large-corpus accuracy.

(3) Doc-comment correction. When the scoring pivoted to path-minority +
component, the method doc still described the abandoned relative-cardinality
/ vertex-degree set and listed parent-child ratio as "deferred" when it is in
fact the dominant shipped signal. Rewritten to match the implementation.

(4) rustfmt: format/stable was red; the new code is now rustfmt-clean
(changes confined to the added ensemble method + tests; no pre-existing code
touched). clippy --lib clean; full hpc::clam suite green (53 tests).

https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

An error occurred during the review process. Please try again later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

feat(clam): CHAODA multi-method ensemble — clears the synthetic PROBE-CHAODA-1000G bar (AUC 0.62 -> 0.99)
@AdaWorldAPI AdaWorldAPI merged commit bf606eb into master Jun 16, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants