You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
load_vcf() now detects these and skips them with a visible warning instead of crashing. This preserves the rest of the VCF but silently drops real variants that a user might care about.
Desired state
Represent these alleles as first-class variant types so downstream code (effect prediction, filtering, annotation export) can reason about them. This ties directly to the in-progress structural variant work:
Spanning deletion *: represent as a reference to the variant that consumed the position, or skip with metadata (it's a placeholder, not an independent variant).
Background
VCF 4.0+ allows alternate alleles that are not literal nucleotide strings:
Symbolic alleles — placeholders inside angle brackets, with detail supplied in INFO:
<DEL>— deletion<DUP>— duplication (tandem, dispersed)<INS>— insertion of unspecified sequence<INV>— inversion<CN0>,<CN1>,<CN2>,<CN3>, ... — copy number states<INS:ME:ALU>,<INS:ME:LINE1>,<INS:ME:SVA>— mobile element insertions<DEL:ME:ALU>— deletion of a mobile elementBreakend (BND) notation — two-ended rearrangements joining distant loci:
G]17:198982]— G joined to position 17:198982, orientations encoded by bracket direction]17:198982]G,[13:123456[T,T[13:123456[— variants for different orientationsSpanning deletion placeholder:
*— indicates an allele deleted by an upstream variantCurrent state (after #88 is fixed in PR #XXX)
load_vcf()now detects these and skips them with a visible warning instead of crashing. This preserves the rest of the VCF but silently drops real variants that a user might care about.Desired state
Represent these alleles as first-class variant types so downstream code (effect prediction, filtering, annotation export) can reason about them. This ties directly to the in-progress structural variant work:
Breakpoint/Translocation/Inversion/Duplicationclasses with two-locus representation — exactly what BND and the large-scale symbolic alleles needImplementation sketch
SVTYPE,END,SVLEN,CIPOS,CIEND,MATEID,CHR2/POS2,INSSEQ, etc.<DEL>/<DEL:ME:*>→ SV Deletion (single-locus, large; fromSTARTtoEND)<INS>/<INS:ME:*>→ SV Insertion (with optional inserted sequence)<INV>→ Inversion (Add structural variant types (translocations, inversions, duplications, breakpoints) #257)<DUP>→ Duplication (Add structural variant types (translocations, inversions, duplications, breakpoints) #257)<CN*>→ CopyNumberVariant (new class, or subclass of Duplication/Deletion based on count)MATEIDpairs and construct a single Translocation/Breakpoint (Add structural variant types (translocations, inversions, duplications, breakpoints) #257) from the pair rather than two half-breakends.*: represent as a reference to the variant that consumed the position, or skip with metadata (it's a placeholder, not an independent variant).Relation to existing issues