Add support for binned quality scores in Illumina sequencers and test phase 1 coverage by joshfactorial · Pull Request #133 · ncsa/rusty-neat

joshfactorial · 2026-05-20T14:05:33Z

Also updated many tests.

Pure additive coverage based on the rneat-test-coverage-plan. No new dev-deps and no new test directories — every change is inside an existing `#[cfg(test)] mod tests` block. Binned quality follow-ups (3 tests, sections 1.1): - QualityScoreModel: JSON-GZ round trip preserves binned_scores and quality_score_options bit-for-bit. - SequencingErrorModel wrapper round-trips a binned inner model and emits only bin values when sampled via the wrapper API. - gen-seq-error-model runner: binned vs unbinned error_rate must differ (locks in that bin-snapped counts feed the rate calculation). VCF parser robustness (6 tests, section 1.2): - Multi-allelic ALT records are skipped while sibling SNPs pass. - `<DEL>`-style SV ALTs parse without panicking (regression bait). - INFO strings with embedded semicolons survive intact (tab is the only field delimiter). - Gzipped VCF input parses identically to plain. - `./.` genotype is treated as homozygous today; lock that in until a deliberate fix. - Mixed phased (`0|1`) and unphased (`0/1`) GTs both parse as Heterozygous. gen-gc-bias-model gzip equivalence (2 tests, section 1.3): - Plain text vs gzipped coverage yields identical depths for bedtools-genomecov-d and bedtools-genomecov-dz. SamtoolsDepth case already existed. gen-mut-model expansion (3 tests, section 1.4): - Indels: a VCF with one insertion and one deletion runs to a written model without erroring. - Reference-mismatch records are logged-and-skipped, not fatal. - BED entries for nonexistent contigs do not panic the runner. filter_reads edge cases (3 tests, section 1.5): - Empty FASTQ in → empty gzipped FASTQ out. - BED with no overlap → every read filtered, no error. - VCF with no passing records keeps header lines but emits no data. gen-frag-length-model edge cases (3 tests, section 1.6): - All-zero-TLEN BAM → EmptyData (codifies the tlen>0 reader filter). - Single-ended BAM → EmptyData (codifies the SEGMENTED flag filter). - Low-MAPQ-only BAM → EmptyData (codifies the MAPQ>10 filter). Adds two new test-only helpers for these flag/MAPQ permutations. gen-seq-error-model negative paths (2 tests, section 1.7): - Truncated FASTQ (3 lines, no qual) → MalformedFastq. - First record with empty qual line → MalformedFastq. cargo test --workspace: 377 passing (up from 355). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joshfactorial changed the base branch from main to develop May 20, 2026 14:05

joshfactorial changed the title ~~Add support for binned quality scores in Illumina sequencers~~ Add support for binned quality scores in Illumina sequencers and test phase 1 coverage May 20, 2026

joshfactorial linked an issue May 20, 2026 that may be closed by this pull request

Fill out testing #104

Closed

joshfactorial merged commit 0e1ecd8 into develop May 20, 2026
1 check passed

joshfactorial deleted the test/phase-1-coverage branch May 20, 2026 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for binned quality scores in Illumina sequencers and test phase 1 coverage#133

Add support for binned quality scores in Illumina sequencers and test phase 1 coverage#133
joshfactorial merged 1 commit into
developfrom
test/phase-1-coverage

joshfactorial commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joshfactorial commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshfactorial commented May 20, 2026 •

edited

Loading