Follow-up from #364 / #367 (the varcode.transforms module shape). One of four planned transforms enumerated in docs/transforms.md and the module docstring.
Scope
Reduce two or more adjacent SNVs that share a phase set (or are otherwise determined cis) into a single MNV. Removes the need for HaplotypeEffect/PhaseAmbiguousEffect wrapping in the common case where two SNVs sit in the same codon on the same haplotype and can be classified as one combined codon substitution.
Signature:
def combine_cis_snvs(vc, phase_resolver) -> VariantCollection:
"""Combine adjacent in-codon SNVs sharing a phase set into MNVs.
Cardinality: reduces.
"""
Contract (per transforms module conventions)
- Cardinality: reduces.
- Provenance: combined MNV carries
source_variants=(snv1, snv2, ...).
- Metadata behavior: GT must agree across all source SNVs (raises on mismatch — same rule as
pair_breakends); other FORMAT fields taken from the lex-earlier source.
Pairing rule
Two SNVs are combined when all of:
- Both share a phase set (the
phase_resolver answers in_cis(a, b) is True), OR both are homozygous-alt for the same sample at distinct positions within a single codon.
- They sit within a transcript codon window (3 bp at the same codon position).
- They're on the same contig.
Tests
- Two SNVs at codon positions 1+2 of the same codon, in cis -> single MNV; effect prediction emits one
Substitution instead of two adjacent ones.
- Two SNVs in cis but spanning a codon boundary -> not combined (out of codon window).
- Two SNVs in trans -> not combined.
- Phase unknown -> not combined; existing
PhaseAmbiguousEffect path still handles them.
- Three SNVs in cis within one codon -> one MNV with
source_variants=(a, b, c).
Composition
Should compose cleanly after pair_breakends (separate scopes; both reduce; both idempotent).
See also
Follow-up from #364 / #367 (the
varcode.transformsmodule shape). One of four planned transforms enumerated indocs/transforms.mdand the module docstring.Scope
Reduce two or more adjacent SNVs that share a phase set (or are otherwise determined cis) into a single MNV. Removes the need for
HaplotypeEffect/PhaseAmbiguousEffectwrapping in the common case where two SNVs sit in the same codon on the same haplotype and can be classified as one combined codon substitution.Signature:
Contract (per transforms module conventions)
source_variants=(snv1, snv2, ...).pair_breakends); other FORMAT fields taken from the lex-earlier source.Pairing rule
Two SNVs are combined when all of:
phase_resolveranswersin_cis(a, b) is True), OR both are homozygous-alt for the same sample at distinct positions within a single codon.Tests
Substitutioninstead of two adjacent ones.PhaseAmbiguousEffectpath still handles them.source_variants=(a, b, c).Composition
Should compose cleanly after
pair_breakends(separate scopes; both reduce; both idempotent).See also