Background
SpliceOutcomeSet (#262, PR #292) wraps variants that the existing classifier already flags as splice-adjacent — SpliceDonor, SpliceAcceptor, ExonicSpliceSite, IntronicSpliceSite. That covers the canonical donor/acceptor di-nucleotides, the last 3 bases of each exon, and the first 3–6 intronic bases.
But plenty of splice-altering variants sit outside that window:
- Exonic splicing enhancer (ESE) / silencer (ESS) disruption: a silent or missense variant in the middle of an exon can disrupt an ESE motif and cause the exon to be skipped in the mature transcript. varcode today emits a bare
Substitution/Silent for these; SpliceOutcomeSet cannot wrap them because the classifier never flagged them.
- Deep intronic variants that create or disrupt a cryptic splice site hundreds of bases from the canonical boundary.
- Branch point disruption (~20–50 bp upstream of the acceptor).
These require ML-based prediction (SpliceAI, SpliceTransformer, MMSplice, Pangolin) or RNA evidence to detect. Without them, the possibility-set model silently under-reports splice consequences for exonic/intronic variants outside the canonical window.
Scope
- Integrate a splice-prediction scorer as an optional dependency (SpliceAI is the obvious first target).
- When a variant is outside the canonical window but the scorer flags a high-probability splice change, wrap it in a
SpliceOutcomeSet (or the successor multi-outcome abstraction from the OutcomeSet generalization work).
- Keep the dependency optional; default
splice_outcomes=True uses canonical-window classification only, ML-informed wrapping is a second opt-in (e.g. splice_outcomes="ml" or splice_scorer=...).
Related
Background
SpliceOutcomeSet(#262, PR #292) wraps variants that the existing classifier already flags as splice-adjacent —SpliceDonor,SpliceAcceptor,ExonicSpliceSite,IntronicSpliceSite. That covers the canonical donor/acceptor di-nucleotides, the last 3 bases of each exon, and the first 3–6 intronic bases.But plenty of splice-altering variants sit outside that window:
Substitution/Silentfor these;SpliceOutcomeSetcannot wrap them because the classifier never flagged them.These require ML-based prediction (SpliceAI, SpliceTransformer, MMSplice, Pangolin) or RNA evidence to detect. Without them, the possibility-set model silently under-reports splice consequences for exonic/intronic variants outside the canonical window.
Scope
SpliceOutcomeSet(or the successor multi-outcome abstraction from the OutcomeSet generalization work).splice_outcomes=Trueuses canonical-window classification only, ML-informed wrapping is a second opt-in (e.g.splice_outcomes="ml"orsplice_scorer=...).Related
SpliceOutcomeSetprototype; this issue is a natural follow-up.