Skip to content

Unify splice handling into a single SpliceOutcomeSet (6.0)#392

Merged
iskandr merged 7 commits into
mainfrom
splice-unification-6.0
May 26, 2026
Merged

Unify splice handling into a single SpliceOutcomeSet (6.0)#392
iskandr merged 7 commits into
mainfrom
splice-unification-6.0

Conversation

@iskandr

@iskandr iskandr commented May 22, 2026

Copy link
Copy Markdown
Contributor

Implements #391 — splice unification for varcode 6.0.

Summary

Every splice-disrupting variant now wraps in a SpliceOutcomeSet at the user-facing API. No more splice_outcomes=True opt-in flag. Cost is amortised via lazy candidate computation; legacy access patterns keep working through compat shims.

What landed

  • Always-on SpliceOutcomeSetVariant.effects() / Variant.effect_on_transcript(t) / VariantCollection.effects() always wrap splice-disrupting variants. The splice_outcomes= parameter is removed.
  • Lazy candidate computation — only the cheap NormalSplicing candidate is built eagerly; ExonSkipping / IntronRetention / CrypticDonor / CrypticAcceptor materialise on first .candidates access and are cached. Filter pipelines that drop variants early via modifies_protein_sequence / effect_priority never trigger the expensive math.
  • Three-accessor surface on SpliceOutcomeSet:
    • most_likely_effect — top predicted splicing mechanism
    • effect_if_splicing_unchanged — coding effect if splicing proceeds (or None)
    • candidates — full possibility set (triggers materialization)
  • Silent-splice filter bug fixedSpliceOutcomeSet.modifies_protein_sequence = True (hardcoded class attr). Variants whose NormalSplicing.coding_effect happens to be Silent are no longer silently dropped by drop_silent_and_noncoding(). A splice disruption is always potentially protein-modifying via a non-NormalSplicing candidate.
  • SpliceOutcomeSet is a TranscriptMutationEffect (alongside MultiOutcomeEffect) — carries gene/transcript and matches the standard downstream isinstance(effect, TranscriptMutationEffect) filter.

Compat shims

  • SpliceOutcomeSet.alternate_effect aliases effect_if_splicing_unchanged. Existing code reading effect.alternate_effect (when effect was an ExonicSpliceSite) keeps working when the effect is now a SpliceOutcomeSet wrapping ExonicSpliceSite.
  • select_between_exonic_splice_site_and_alternate_effect now unwraps SpliceOutcomeSet → alternate_effect when (a) the disrupted signal class is ExonicSpliceSite and (b) the alternate has higher effect_priority. Preserves the legacy top_priority_effect behavior: splice variants that also introduce PrematureStop / FrameShift / etc. still surface the coding consequence as the canonical top.
  • FastEffectAnnotator.annotate_on_transcript returns the raw splice class (not the wrapper) so protein_diff's dual-dispatch isinstance(fast_effect, ExonicSpliceSite) keeps working. Wrapping is applied at the collection boundary in predict_variant_effects.
  • enumerate_splice_outcomes(splice_set, genomic_sequence=...) accepts an already-wrapped SpliceOutcomeSet and rebuilds with a new genomic sequence provider. Lets tests / downstream callers inject a sequence source post-classification.

Migration

The one breaking change for callers is the discriminator pattern. Replace:

if isinstance(effect, ExonicSpliceSite):
    coding = effect.alternate_effect

with:

if isinstance(effect, SpliceOutcomeSet) and effect.disrupted_signal_class is ExonicSpliceSite:
    coding = effect.alternate_effect  # compat shim still works

Attribute access patterns (.alternate_effect, .aa_ref, .short_description, etc.) keep working through the wrapper.

Tests + docs

  • 1219 tests pass. Test suite updated throughout to use the new discriminator; obsolete test_exonic_splice_site_multi_outcome.py removed (its assertions are covered in test_splice_outcomes.py).
  • docs/effect_annotation.md: new "When splice disruption is in play" + "Splice and coding effects can co-occur" sections; "Default 2-outcome form" / "Opt-in possibility set" collapsed into a single "The SpliceOutcomeSet shape" section; "Common questions" cheat sheet updated for the single-shape world.
  • README.md: splice-signal classes documented as disrupted_signal_class / splice_signal references rather than top-level effects.
  • CHANGELOG.md: Breaking entry covers the flag removal, isinstance migration, silent-splice filter fix, lazy candidates, and the TranscriptMutationEffect inheritance.

iskandr added 5 commits May 22, 2026 10:51
First step of the splice unification: the canonical "alternative
outcome" accessor. Surfaces the coding consequence that applies if
splicing proceeds normally — the NormalSplicing candidate's
coding_effect — and unlike the old ExonicSpliceSite.alternate_effect
works for intronic disruptions too (returns None when there's no
coding consequence).

Together with the existing most_likely_effect (top predicted
mechanism) and candidates (full possibility set), this gives the
three-accessor surface the unified design calls for. alternate_effect
is kept as a deprecated alias for now; it's removed when the flag and
the lightweight ExonicSpliceSite.alternate_effect form go away.

Additive only — no behavior change yet.
Follow-up to 56cce4a. Two changes on top of the first-pass landing:

- Rename `if_splicing_unchanged` -> `effect_if_splicing_unchanged`
  and delete the deprecated `alternate_effect` alias outright (no
  shim, no warning). The previous commit kept the alias for one
  release; since the PR has not shipped, the alias is dropped in
  the same PR rather than carried as a deprecation tail.
- Read the underlying `NormalSplicing` candidate from the DNA-only
  `_candidates` set, not the public `candidates` (which also picks
  up RNA-attached outcomes). "If splicing proceeds normally" is a
  DNA property.

Docs (docs/effect_annotation.md):

- New "When splice disruption is in play" section enumerates the
  canonical window and the four splice signal classes
  (ExonicSpliceSite / SpliceDonor / SpliceAcceptor /
  IntronicSpliceSite), with a table of which class fires where.
- New "Splice and coding effects can co-occur" section explains
  the ExonicSpliceSite duality (last/first 3 exonic bases hit both
  signal and codon), the intronic-only case (no coding effect),
  and the outside-the-window case (plain coding effect).
- New "Common questions" cheat sheet maps each simple use case
  (is variant splice-disrupting, coding consequence if splicing
  proceeds, most likely mechanism, all candidates, most
  disruptive, candidate proteins, splice signal location) to the
  right accessor on both shapes.

Tests:

- Parametrize the "intronic classes lack alternate_effect" check
  over SpliceDonor / SpliceAcceptor / IntronicSpliceSite so the
  docstring claim is verified across all three.
- Add `test_highest_priority_effect_returns_inner_effect` to
  cover the `_effect` accessor (`_candidate` was already
  covered).
- Add `test_effect_if_splicing_unchanged_returns_coding_effect`
  and `test_effect_if_splicing_unchanged_none_for_intronic` on
  the renamed accessor.
- Drop the two old `alternate_effect` alias tests (the alias is
  gone).
- Shorten the four long test names flagged in review.

1230 tests pass.
Review found that the new splice-window docs over-claimed the
acceptor-side ExonicSpliceSite range. Verified against the
classifier at varcode/effects/effect_helpers.py:119 and
varcode/effects/effect_prediction.py:398-430:

- Donor-side ExonicSpliceSite fires on the last 3 exonic bases
  of an exon (interval [exon_end - 2, exon_end]). ✓ docs match.
- Acceptor-side ExonicSpliceSite fires on only the first
  exonic base (transcript_offset == exon_start_offset). Docs
  said "first 3 exonic bases" — corrected to "first base".
- The asymmetry follows the consensus motifs (MAG|GURAGU on
  donor, YAG|R on acceptor).
- IntronicSpliceSite also fires at +1/+2 or -1/-2 when the
  reference base isn't the canonical GT/AG (non-canonical
  signal downgrade in effect_prediction.py:403-409). Added to
  the table for completeness.

Other review fixes in the same commit:

- Add a comment on the parametrized intronic-classes test
  explaining the synthetic ref/alt at position 117531115 for
  the IntronicSpliceSite case (it exercises the non-canonical
  downgrade path; varcode classifies on supplied ref/alt +
  position, not the actual GRCh38 reference base).
- Refine the test's docstring gloss: "the variant doesn't
  change a coding base" instead of "the variant is intronic"
  (more precise — IntronicSpliceSite is itself intronic, but
  the load-bearing property is the lack of coding consequence).
- TODO marker on test_effect_if_splicing_unchanged_matches_alternate_effect
  so the breaking-6.0 work knows to revisit it when
  ExonicSpliceSite.alternate_effect goes away.
- Drop a duplicate transition sentence ("Two splice-effect
  richness levels are available; the rest of this section
  walks through both.") — the next H3 already signals that.

1230 tests pass.
Review noted two stale paragraphs that survived earlier passes,
plus structural noise from accumulated edits.

Bug fixes:

- "Relationship between the two" claimed `alternate_effect`
  worked on both shapes; `SpliceOutcomeSet.alternate_effect`
  was deleted in this PR. Rewrote to map each shape to its
  actual accessor.
- "Limitations" still summarized the window as "first 3-6
  intronic, donor/acceptor", dropping the donor/acceptor
  asymmetry now documented in "When splice disruption is in
  play". Removed the now-wrong summary; kept the
  "what's not detected" content.

Structural cleanups:

- Moved "Common questions" up — now sits right after
  "Opt-in: full possibility set" rather than after the deep
  Provenance/Picking subsections. Practical readers reach the
  cheat sheet without scrolling past 200 lines of prose.
- Replaced the duplicate `effect_if_splicing_unchanged`
  paragraph at the end of "Opt-in" with a pointer to the
  cheat sheet. The accessor was being documented three times.
- Replaced the inline window description in "Splice and
  coding effects can co-occur" with a back-reference to the
  table above. The window definition now lives in exactly
  one place.

1230 tests pass.
Lands the breaking-6.0 splice unification that the previous commits
prepared the ground for. Every splice-disrupting variant now wraps
in a SpliceOutcomeSet at the user-facing API — no more
splice_outcomes=True opt-in flag. Cost is amortised via lazy
candidate computation; legacy access patterns keep working through
compat shims.

Core changes
------------

* `SpliceOutcomeSet.__init__` gains a lazy construction path:
  - Materialized path (explicit `candidates=...`) still works for
    `with_rna_evidence` rebuilds, deserialization, and tests.
  - Lazy path (used by `enumerate_splice_outcomes`) stores the raw
    `splice_effect` + `mechanism_order` + `genomic_sequence` and
    eagerly builds only the cheap `NormalSplicing` candidate;
    `ExonSkipping` / `IntronRetention` / cryptic candidates
    materialize on first `.candidates` access via a property,
    then cache.
* `effect_if_splicing_unchanged` (and its `alternate_effect`
  back-compat alias) read from the eager `NormalSplicing` slot —
  no full materialization for the common "if splicing proceeds"
  question.
* `SpliceOutcomeSet.modifies_protein_sequence = True` (class attr,
  shadows the inherited property). Fixes the silent-splice filter
  bug: variants whose `NormalSplicing.coding_effect` happens to be
  Silent were previously dropped by `drop_silent_and_noncoding()`.
  A splice disruption is always *potentially* protein-modifying
  via a non-NormalSplicing candidate.
* `SpliceOutcomeSet` is now a `TranscriptMutationEffect` subclass
  (in addition to `MultiOutcomeEffect`), so it carries `gene` and
  matches the standard `isinstance(effect, TranscriptMutationEffect)`
  filter used by downstream consumers (dbnsfp validation, etc.).
* `enumerate_splice_outcomes(splice_set, genomic_sequence=...)`
  accepts an already-wrapped SpliceOutcomeSet and rebuilds with the
  new provider. Lets test fixtures and downstream users inject a
  genomic-sequence source post-classification.

Switch
------

* `splice_outcomes=` parameter removed from `Variant.effects()` /
  `VariantCollection.effects()` / `predict_variant_effects()`. The
  collection-level wrap (`wrap_splice_effects_in_collection`) is
  unconditional.
* `predict_variant_effect_on_transcript()` refactored: the body
  moves to `_predict_variant_effect_on_transcript_raw()` (internal
  helper that returns raw splice classes); the public function
  wraps via `enumerate_splice_outcomes`.
* `FastEffectAnnotator.annotate_on_transcript` now calls the raw
  helper directly, returning the unwrapped splice class. The
  collection-level wrap converts to SpliceOutcomeSet at the
  boundary. This preserves protein_diff's dual-dispatch
  (`isinstance(fast_effect, ExonicSpliceSite)` keeps working
  inside the annotator) while still wrapping at the user-facing
  surface.

Compat shims
------------

* `SpliceOutcomeSet.alternate_effect` aliases
  `effect_if_splicing_unchanged`. Existing callers reading
  `effect.alternate_effect` (when `effect` was an `ExonicSpliceSite`)
  keep working when the effect is now a SpliceOutcomeSet wrapping
  ExonicSpliceSite.
* `select_between_exonic_splice_site_and_alternate_effect` now
  unwraps SpliceOutcomeSet to its `alternate_effect` when (a) the
  disrupted signal class is ExonicSpliceSite and (b) the alternate
  has higher effect_priority than the splice signal. Preserves the
  legacy `top_priority_effect` behavior for variants that are
  splice-disrupting AND introduce a PrematureStop / FrameShift /
  etc. — the more disruptive coding consequence still surfaces as
  the canonical top.

Tests
-----

* Existing tests updated to the new shape — removed
  `splice_outcomes=True` from ~60 callsites; replaced
  `effect.__class__ is SpliceDonor` with
  `effect.disrupted_signal_class is SpliceDonor`; updated COSMIC
  and dbNSFP test helpers to handle SpliceOutcomeSet via the
  `alternate_effect` compat alias.
* Deleted `tests/test_exonic_splice_site_multi_outcome.py` — its
  tests covered the #299 retrofit (making ExonicSpliceSite a
  MultiOutcomeEffect), which is superseded by always-on
  SpliceOutcomeSet. The relevant assertions are covered in
  test_splice_outcomes.py.
* Parametrized `test_intronic_splice_classes_lack_alternate_effect`
  retired in favour of `test_intronic_disruptions_have_no_coding_alternate`
  — same parametrize, new semantics (both
  `effect.alternate_effect is None` and
  `effect.effect_if_splicing_unchanged is None`).

Docs
----

* docs/effect_annotation.md: collapsed "Default lightweight 2-outcome
  form" + "Opt-in full possibility set" + "Relationship between
  the two" into a single "The SpliceOutcomeSet shape" section that
  documents the lazy construction and the `alternate_effect`
  back-compat alias. "Common questions" updated to drop the
  dual-shape `isinstance` guard.
* README.md: splice-site section reworded to say the four
  SpliceSite classes survive as `disrupted_signal_class` /
  `splice_signal` rather than as top-level effects. Position
  ranges in the table refined to match the classifier.
* CHANGELOG.md: Breaking entry documents the removed flag, the
  isinstance discriminator migration, the silent-splice filter fix,
  the lazy candidates, and the new TranscriptMutationEffect
  inheritance.

1219 tests pass.
@iskandr iskandr changed the title [WIP] Unify splice handling into a single SpliceOutcomeSet (6.0) Unify splice handling into a single SpliceOutcomeSet (6.0) May 25, 2026
iskandr added 2 commits May 25, 2026 15:16
Review pointed out the docstring claimed "highest-priority candidate's
inner effect" but the implementation reads ``alternate_effect`` (i.e.
``NormalSplicing.coding_effect``). Implementation is the intended
behaviour (preserves legacy ``top_priority_effect`` semantics, keeps
the SpliceOutcomeSet as the canonical top for mechanism-only outcomes
like ExonSkipping); docstring was misleading.

Rewrote the docstring to be explicit:

- "Alternate" is the if-splicing-proceeds coding effect, not the
  highest-priority across mechanisms.
- Behavior contract: exonic variant that introduces PrematureStop
  in-codon surfaces as PrematureStop; mechanism-only outcomes leave
  the SpliceOutcomeSet as the top so the full possibility set stays
  visible.
Promotes the splice-unification work (#391, #392) to 6.0.0:
SpliceOutcomeSet always-on, splice_outcomes= flag removed,
discriminator migration documented in the Breaking changelog.

CHANGELOG: move the [Unreleased] section to [v6.0.0] with today's
date. All accumulated entries since 5.0.6 — splice unification +
RNAReadPhasingSource + IUPAC ambiguity-code fix + apply_variants_to_transcript
insertion-abutting fix — ship together.
@iskandr iskandr marked this pull request as ready for review May 26, 2026 15:38
@iskandr iskandr merged commit 6c2aeac into main May 26, 2026
8 checks passed
@iskandr iskandr deleted the splice-unification-6.0 branch May 26, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant