You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(clam): CHAODA multi-method anomaly ensemble — clears the PROBE-CHAODA-1000G synthetic bar (AUC 0.62 -> 0.99)
Increment 1 of D-GEN-CHAODA-ENSEMBLE (lance-graph genetics-probes-v1.md).
Adds ClamTree::ensemble_anomaly_scores as a NEW scoring entry point
alongside the unchanged single-method anomaly_scores baseline.
The spike (#219) measured single-method leaf-LFD at ROC-AUC 0.624 on a
synthetic 5-lane Gaussian mixture, below the 0.85 bar. Mechanical cause:
leaf LFD measures intra-leaf geometry, not inter-leaf isolation.
This ensemble combines isolation-sensitive CHAODA signals:
- parent-child path-minority ratio (dominant): walking a leaf to the
root, the minimum child/parent cardinality ratio is tiny for a point
that split off as a minority (isolated outlier) and moderate for a
point that always stayed in the majority (dense-cluster member).
Immune to the leaf-fragmentation that defeats raw leaf cardinality.
- connected-component cardinality over the leaf-overlap graph (small
components are anomalous).
Averaged into one score; every point inherits its leaf's score.
A first attempt using raw leaf cardinality + vertex degree + component
size scored AUC 0.621 (no lift) because the tree fragments dense blobs
into many tiny leaves that mimic isolated outliers under those metrics;
the path-minority signal is what actually separates. Leaf degree and raw
leaf cardinality were dropped as fragmentation noise. The remaining
CHAODA methods (random-walk stationary distribution) are deferred.
MEASURED (deterministic synthetic mixture, same fixture as #219):
single-LFD AUC = 0.6240
ensemble AUC = 0.9906 (lift +0.3667, clears the 0.85 bar)
This is the synthetic SMOKE TEST only. It proves the ensemble approach
captures isolation where single-LFD does not; it does NOT prove genomic
novelty detection. PROBE-CHAODA-1000G on real corpora remains gated on
D-GEN-1 + D-GEN-2 (VCF -> feature-vector pipeline).
Tests: full hpc::clam suite green (53 incl. the new ensemble test);
ensemble is deterministic (bit-exact rebuild) and built purely from
shipped tree fields + the public dist().
https://claude.ai/code/session_01VysoWJ6vsyg3wEGc5v7T5v
0 commit comments