Add support for binned quality scores in Illumina sequencers and test phase 1 coverage#133
Merged
Merged
Conversation
Pure additive coverage based on the rneat-test-coverage-plan. No new dev-deps and no new test directories — every change is inside an existing `#[cfg(test)] mod tests` block. Binned quality follow-ups (3 tests, sections 1.1): - QualityScoreModel: JSON-GZ round trip preserves binned_scores and quality_score_options bit-for-bit. - SequencingErrorModel wrapper round-trips a binned inner model and emits only bin values when sampled via the wrapper API. - gen-seq-error-model runner: binned vs unbinned error_rate must differ (locks in that bin-snapped counts feed the rate calculation). VCF parser robustness (6 tests, section 1.2): - Multi-allelic ALT records are skipped while sibling SNPs pass. - `<DEL>`-style SV ALTs parse without panicking (regression bait). - INFO strings with embedded semicolons survive intact (tab is the only field delimiter). - Gzipped VCF input parses identically to plain. - `./.` genotype is treated as homozygous today; lock that in until a deliberate fix. - Mixed phased (`0|1`) and unphased (`0/1`) GTs both parse as Heterozygous. gen-gc-bias-model gzip equivalence (2 tests, section 1.3): - Plain text vs gzipped coverage yields identical depths for bedtools-genomecov-d and bedtools-genomecov-dz. SamtoolsDepth case already existed. gen-mut-model expansion (3 tests, section 1.4): - Indels: a VCF with one insertion and one deletion runs to a written model without erroring. - Reference-mismatch records are logged-and-skipped, not fatal. - BED entries for nonexistent contigs do not panic the runner. filter_reads edge cases (3 tests, section 1.5): - Empty FASTQ in → empty gzipped FASTQ out. - BED with no overlap → every read filtered, no error. - VCF with no passing records keeps header lines but emits no data. gen-frag-length-model edge cases (3 tests, section 1.6): - All-zero-TLEN BAM → EmptyData (codifies the tlen>0 reader filter). - Single-ended BAM → EmptyData (codifies the SEGMENTED flag filter). - Low-MAPQ-only BAM → EmptyData (codifies the MAPQ>10 filter). Adds two new test-only helpers for these flag/MAPQ permutations. gen-seq-error-model negative paths (2 tests, section 1.7): - Truncated FASTQ (3 lines, no qual) → MalformedFastq. - First record with empty qual line → MalformedFastq. cargo test --workspace: 377 passing (up from 355). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Also updated many tests.