Skip to content

varcode.transforms.combine_cis_snvs — combine in-cis adjacent SNVs into MNVs #368

@iskandr

Description

@iskandr

Follow-up from #364 / #367 (the varcode.transforms module shape). One of four planned transforms enumerated in docs/transforms.md and the module docstring.

Scope

Reduce two or more adjacent SNVs that share a phase set (or are otherwise determined cis) into a single MNV. Removes the need for HaplotypeEffect/PhaseAmbiguousEffect wrapping in the common case where two SNVs sit in the same codon on the same haplotype and can be classified as one combined codon substitution.

Signature:

def combine_cis_snvs(vc, phase_resolver) -> VariantCollection:
    """Combine adjacent in-codon SNVs sharing a phase set into MNVs.

    Cardinality: reduces.
    """

Contract (per transforms module conventions)

  • Cardinality: reduces.
  • Provenance: combined MNV carries source_variants=(snv1, snv2, ...).
  • Metadata behavior: GT must agree across all source SNVs (raises on mismatch — same rule as pair_breakends); other FORMAT fields taken from the lex-earlier source.

Pairing rule

Two SNVs are combined when all of:

  1. Both share a phase set (the phase_resolver answers in_cis(a, b) is True), OR both are homozygous-alt for the same sample at distinct positions within a single codon.
  2. They sit within a transcript codon window (3 bp at the same codon position).
  3. They're on the same contig.

Tests

  • Two SNVs at codon positions 1+2 of the same codon, in cis -> single MNV; effect prediction emits one Substitution instead of two adjacent ones.
  • Two SNVs in cis but spanning a codon boundary -> not combined (out of codon window).
  • Two SNVs in trans -> not combined.
  • Phase unknown -> not combined; existing PhaseAmbiguousEffect path still handles them.
  • Three SNVs in cis within one codon -> one MNV with source_variants=(a, b, c).

Composition

Should compose cleanly after pair_breakends (separate scopes; both reduce; both idempotent).

See also

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions