Unify splice handling into a single SpliceOutcomeSet (6.0)#392
Merged
Conversation
First step of the splice unification: the canonical "alternative outcome" accessor. Surfaces the coding consequence that applies if splicing proceeds normally — the NormalSplicing candidate's coding_effect — and unlike the old ExonicSpliceSite.alternate_effect works for intronic disruptions too (returns None when there's no coding consequence). Together with the existing most_likely_effect (top predicted mechanism) and candidates (full possibility set), this gives the three-accessor surface the unified design calls for. alternate_effect is kept as a deprecated alias for now; it's removed when the flag and the lightweight ExonicSpliceSite.alternate_effect form go away. Additive only — no behavior change yet.
Follow-up to 56cce4a. Two changes on top of the first-pass landing: - Rename `if_splicing_unchanged` -> `effect_if_splicing_unchanged` and delete the deprecated `alternate_effect` alias outright (no shim, no warning). The previous commit kept the alias for one release; since the PR has not shipped, the alias is dropped in the same PR rather than carried as a deprecation tail. - Read the underlying `NormalSplicing` candidate from the DNA-only `_candidates` set, not the public `candidates` (which also picks up RNA-attached outcomes). "If splicing proceeds normally" is a DNA property. Docs (docs/effect_annotation.md): - New "When splice disruption is in play" section enumerates the canonical window and the four splice signal classes (ExonicSpliceSite / SpliceDonor / SpliceAcceptor / IntronicSpliceSite), with a table of which class fires where. - New "Splice and coding effects can co-occur" section explains the ExonicSpliceSite duality (last/first 3 exonic bases hit both signal and codon), the intronic-only case (no coding effect), and the outside-the-window case (plain coding effect). - New "Common questions" cheat sheet maps each simple use case (is variant splice-disrupting, coding consequence if splicing proceeds, most likely mechanism, all candidates, most disruptive, candidate proteins, splice signal location) to the right accessor on both shapes. Tests: - Parametrize the "intronic classes lack alternate_effect" check over SpliceDonor / SpliceAcceptor / IntronicSpliceSite so the docstring claim is verified across all three. - Add `test_highest_priority_effect_returns_inner_effect` to cover the `_effect` accessor (`_candidate` was already covered). - Add `test_effect_if_splicing_unchanged_returns_coding_effect` and `test_effect_if_splicing_unchanged_none_for_intronic` on the renamed accessor. - Drop the two old `alternate_effect` alias tests (the alias is gone). - Shorten the four long test names flagged in review. 1230 tests pass.
Review found that the new splice-window docs over-claimed the
acceptor-side ExonicSpliceSite range. Verified against the
classifier at varcode/effects/effect_helpers.py:119 and
varcode/effects/effect_prediction.py:398-430:
- Donor-side ExonicSpliceSite fires on the last 3 exonic bases
of an exon (interval [exon_end - 2, exon_end]). ✓ docs match.
- Acceptor-side ExonicSpliceSite fires on only the first
exonic base (transcript_offset == exon_start_offset). Docs
said "first 3 exonic bases" — corrected to "first base".
- The asymmetry follows the consensus motifs (MAG|GURAGU on
donor, YAG|R on acceptor).
- IntronicSpliceSite also fires at +1/+2 or -1/-2 when the
reference base isn't the canonical GT/AG (non-canonical
signal downgrade in effect_prediction.py:403-409). Added to
the table for completeness.
Other review fixes in the same commit:
- Add a comment on the parametrized intronic-classes test
explaining the synthetic ref/alt at position 117531115 for
the IntronicSpliceSite case (it exercises the non-canonical
downgrade path; varcode classifies on supplied ref/alt +
position, not the actual GRCh38 reference base).
- Refine the test's docstring gloss: "the variant doesn't
change a coding base" instead of "the variant is intronic"
(more precise — IntronicSpliceSite is itself intronic, but
the load-bearing property is the lack of coding consequence).
- TODO marker on test_effect_if_splicing_unchanged_matches_alternate_effect
so the breaking-6.0 work knows to revisit it when
ExonicSpliceSite.alternate_effect goes away.
- Drop a duplicate transition sentence ("Two splice-effect
richness levels are available; the rest of this section
walks through both.") — the next H3 already signals that.
1230 tests pass.
Review noted two stale paragraphs that survived earlier passes, plus structural noise from accumulated edits. Bug fixes: - "Relationship between the two" claimed `alternate_effect` worked on both shapes; `SpliceOutcomeSet.alternate_effect` was deleted in this PR. Rewrote to map each shape to its actual accessor. - "Limitations" still summarized the window as "first 3-6 intronic, donor/acceptor", dropping the donor/acceptor asymmetry now documented in "When splice disruption is in play". Removed the now-wrong summary; kept the "what's not detected" content. Structural cleanups: - Moved "Common questions" up — now sits right after "Opt-in: full possibility set" rather than after the deep Provenance/Picking subsections. Practical readers reach the cheat sheet without scrolling past 200 lines of prose. - Replaced the duplicate `effect_if_splicing_unchanged` paragraph at the end of "Opt-in" with a pointer to the cheat sheet. The accessor was being documented three times. - Replaced the inline window description in "Splice and coding effects can co-occur" with a back-reference to the table above. The window definition now lives in exactly one place. 1230 tests pass.
Lands the breaking-6.0 splice unification that the previous commits
prepared the ground for. Every splice-disrupting variant now wraps
in a SpliceOutcomeSet at the user-facing API — no more
splice_outcomes=True opt-in flag. Cost is amortised via lazy
candidate computation; legacy access patterns keep working through
compat shims.
Core changes
------------
* `SpliceOutcomeSet.__init__` gains a lazy construction path:
- Materialized path (explicit `candidates=...`) still works for
`with_rna_evidence` rebuilds, deserialization, and tests.
- Lazy path (used by `enumerate_splice_outcomes`) stores the raw
`splice_effect` + `mechanism_order` + `genomic_sequence` and
eagerly builds only the cheap `NormalSplicing` candidate;
`ExonSkipping` / `IntronRetention` / cryptic candidates
materialize on first `.candidates` access via a property,
then cache.
* `effect_if_splicing_unchanged` (and its `alternate_effect`
back-compat alias) read from the eager `NormalSplicing` slot —
no full materialization for the common "if splicing proceeds"
question.
* `SpliceOutcomeSet.modifies_protein_sequence = True` (class attr,
shadows the inherited property). Fixes the silent-splice filter
bug: variants whose `NormalSplicing.coding_effect` happens to be
Silent were previously dropped by `drop_silent_and_noncoding()`.
A splice disruption is always *potentially* protein-modifying
via a non-NormalSplicing candidate.
* `SpliceOutcomeSet` is now a `TranscriptMutationEffect` subclass
(in addition to `MultiOutcomeEffect`), so it carries `gene` and
matches the standard `isinstance(effect, TranscriptMutationEffect)`
filter used by downstream consumers (dbnsfp validation, etc.).
* `enumerate_splice_outcomes(splice_set, genomic_sequence=...)`
accepts an already-wrapped SpliceOutcomeSet and rebuilds with the
new provider. Lets test fixtures and downstream users inject a
genomic-sequence source post-classification.
Switch
------
* `splice_outcomes=` parameter removed from `Variant.effects()` /
`VariantCollection.effects()` / `predict_variant_effects()`. The
collection-level wrap (`wrap_splice_effects_in_collection`) is
unconditional.
* `predict_variant_effect_on_transcript()` refactored: the body
moves to `_predict_variant_effect_on_transcript_raw()` (internal
helper that returns raw splice classes); the public function
wraps via `enumerate_splice_outcomes`.
* `FastEffectAnnotator.annotate_on_transcript` now calls the raw
helper directly, returning the unwrapped splice class. The
collection-level wrap converts to SpliceOutcomeSet at the
boundary. This preserves protein_diff's dual-dispatch
(`isinstance(fast_effect, ExonicSpliceSite)` keeps working
inside the annotator) while still wrapping at the user-facing
surface.
Compat shims
------------
* `SpliceOutcomeSet.alternate_effect` aliases
`effect_if_splicing_unchanged`. Existing callers reading
`effect.alternate_effect` (when `effect` was an `ExonicSpliceSite`)
keep working when the effect is now a SpliceOutcomeSet wrapping
ExonicSpliceSite.
* `select_between_exonic_splice_site_and_alternate_effect` now
unwraps SpliceOutcomeSet to its `alternate_effect` when (a) the
disrupted signal class is ExonicSpliceSite and (b) the alternate
has higher effect_priority than the splice signal. Preserves the
legacy `top_priority_effect` behavior for variants that are
splice-disrupting AND introduce a PrematureStop / FrameShift /
etc. — the more disruptive coding consequence still surfaces as
the canonical top.
Tests
-----
* Existing tests updated to the new shape — removed
`splice_outcomes=True` from ~60 callsites; replaced
`effect.__class__ is SpliceDonor` with
`effect.disrupted_signal_class is SpliceDonor`; updated COSMIC
and dbNSFP test helpers to handle SpliceOutcomeSet via the
`alternate_effect` compat alias.
* Deleted `tests/test_exonic_splice_site_multi_outcome.py` — its
tests covered the #299 retrofit (making ExonicSpliceSite a
MultiOutcomeEffect), which is superseded by always-on
SpliceOutcomeSet. The relevant assertions are covered in
test_splice_outcomes.py.
* Parametrized `test_intronic_splice_classes_lack_alternate_effect`
retired in favour of `test_intronic_disruptions_have_no_coding_alternate`
— same parametrize, new semantics (both
`effect.alternate_effect is None` and
`effect.effect_if_splicing_unchanged is None`).
Docs
----
* docs/effect_annotation.md: collapsed "Default lightweight 2-outcome
form" + "Opt-in full possibility set" + "Relationship between
the two" into a single "The SpliceOutcomeSet shape" section that
documents the lazy construction and the `alternate_effect`
back-compat alias. "Common questions" updated to drop the
dual-shape `isinstance` guard.
* README.md: splice-site section reworded to say the four
SpliceSite classes survive as `disrupted_signal_class` /
`splice_signal` rather than as top-level effects. Position
ranges in the table refined to match the classifier.
* CHANGELOG.md: Breaking entry documents the removed flag, the
isinstance discriminator migration, the silent-splice filter fix,
the lazy candidates, and the new TranscriptMutationEffect
inheritance.
1219 tests pass.
Review pointed out the docstring claimed "highest-priority candidate's inner effect" but the implementation reads ``alternate_effect`` (i.e. ``NormalSplicing.coding_effect``). Implementation is the intended behaviour (preserves legacy ``top_priority_effect`` semantics, keeps the SpliceOutcomeSet as the canonical top for mechanism-only outcomes like ExonSkipping); docstring was misleading. Rewrote the docstring to be explicit: - "Alternate" is the if-splicing-proceeds coding effect, not the highest-priority across mechanisms. - Behavior contract: exonic variant that introduces PrematureStop in-codon surfaces as PrematureStop; mechanism-only outcomes leave the SpliceOutcomeSet as the top so the full possibility set stays visible.
Promotes the splice-unification work (#391, #392) to 6.0.0: SpliceOutcomeSet always-on, splice_outcomes= flag removed, discriminator migration documented in the Breaking changelog. CHANGELOG: move the [Unreleased] section to [v6.0.0] with today's date. All accumulated entries since 5.0.6 — splice unification + RNAReadPhasingSource + IUPAC ambiguity-code fix + apply_variants_to_transcript insertion-abutting fix — ship together.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements #391 — splice unification for varcode 6.0.
Summary
Every splice-disrupting variant now wraps in a
SpliceOutcomeSetat the user-facing API. No moresplice_outcomes=Trueopt-in flag. Cost is amortised via lazy candidate computation; legacy access patterns keep working through compat shims.What landed
SpliceOutcomeSet—Variant.effects()/Variant.effect_on_transcript(t)/VariantCollection.effects()always wrap splice-disrupting variants. Thesplice_outcomes=parameter is removed.NormalSplicingcandidate is built eagerly;ExonSkipping/IntronRetention/CrypticDonor/CrypticAcceptormaterialise on first.candidatesaccess and are cached. Filter pipelines that drop variants early viamodifies_protein_sequence/effect_prioritynever trigger the expensive math.SpliceOutcomeSet:most_likely_effect— top predicted splicing mechanismeffect_if_splicing_unchanged— coding effect if splicing proceeds (orNone)candidates— full possibility set (triggers materialization)SpliceOutcomeSet.modifies_protein_sequence = True(hardcoded class attr). Variants whoseNormalSplicing.coding_effecthappens to beSilentare no longer silently dropped bydrop_silent_and_noncoding(). A splice disruption is always potentially protein-modifying via a non-NormalSplicingcandidate.SpliceOutcomeSetis aTranscriptMutationEffect(alongsideMultiOutcomeEffect) — carriesgene/transcriptand matches the standard downstreamisinstance(effect, TranscriptMutationEffect)filter.Compat shims
SpliceOutcomeSet.alternate_effectaliaseseffect_if_splicing_unchanged. Existing code readingeffect.alternate_effect(wheneffectwas anExonicSpliceSite) keeps working when the effect is now a SpliceOutcomeSet wrapping ExonicSpliceSite.select_between_exonic_splice_site_and_alternate_effectnow unwraps SpliceOutcomeSet →alternate_effectwhen (a) the disrupted signal class isExonicSpliceSiteand (b) the alternate has highereffect_priority. Preserves the legacytop_priority_effectbehavior: splice variants that also introduce PrematureStop / FrameShift / etc. still surface the coding consequence as the canonical top.FastEffectAnnotator.annotate_on_transcriptreturns the raw splice class (not the wrapper) soprotein_diff's dual-dispatchisinstance(fast_effect, ExonicSpliceSite)keeps working. Wrapping is applied at the collection boundary inpredict_variant_effects.enumerate_splice_outcomes(splice_set, genomic_sequence=...)accepts an already-wrapped SpliceOutcomeSet and rebuilds with a new genomic sequence provider. Lets tests / downstream callers inject a sequence source post-classification.Migration
The one breaking change for callers is the discriminator pattern. Replace:
with:
Attribute access patterns (
.alternate_effect,.aa_ref,.short_description, etc.) keep working through the wrapper.Tests + docs
test_exonic_splice_site_multi_outcome.pyremoved (its assertions are covered intest_splice_outcomes.py).docs/effect_annotation.md: new "When splice disruption is in play" + "Splice and coding effects can co-occur" sections; "Default 2-outcome form" / "Opt-in possibility set" collapsed into a single "TheSpliceOutcomeSetshape" section; "Common questions" cheat sheet updated for the single-shape world.README.md: splice-signal classes documented asdisrupted_signal_class/splice_signalreferences rather than top-level effects.CHANGELOG.md: Breaking entry covers the flag removal, isinstance migration, silent-splice filter fix, lazy candidates, and theTranscriptMutationEffectinheritance.