Skip to content

Add support for binned quality scores in Illumina sequencers and test phase 1 coverage#133

Merged
joshfactorial merged 1 commit into
developfrom
test/phase-1-coverage
May 20, 2026
Merged

Add support for binned quality scores in Illumina sequencers and test phase 1 coverage#133
joshfactorial merged 1 commit into
developfrom
test/phase-1-coverage

Conversation

@joshfactorial
Copy link
Copy Markdown
Collaborator

@joshfactorial joshfactorial commented May 20, 2026

Also updated many tests.

Pure additive coverage based on the rneat-test-coverage-plan. No new
dev-deps and no new test directories — every change is inside an
existing `#[cfg(test)] mod tests` block.

Binned quality follow-ups (3 tests, sections 1.1):
- QualityScoreModel: JSON-GZ round trip preserves binned_scores and
  quality_score_options bit-for-bit.
- SequencingErrorModel wrapper round-trips a binned inner model and
  emits only bin values when sampled via the wrapper API.
- gen-seq-error-model runner: binned vs unbinned error_rate must
  differ (locks in that bin-snapped counts feed the rate calculation).

VCF parser robustness (6 tests, section 1.2):
- Multi-allelic ALT records are skipped while sibling SNPs pass.
- `<DEL>`-style SV ALTs parse without panicking (regression bait).
- INFO strings with embedded semicolons survive intact (tab is the
  only field delimiter).
- Gzipped VCF input parses identically to plain.
- `./.` genotype is treated as homozygous today; lock that in until a
  deliberate fix.
- Mixed phased (`0|1`) and unphased (`0/1`) GTs both parse as
  Heterozygous.

gen-gc-bias-model gzip equivalence (2 tests, section 1.3):
- Plain text vs gzipped coverage yields identical depths for
  bedtools-genomecov-d and bedtools-genomecov-dz. SamtoolsDepth case
  already existed.

gen-mut-model expansion (3 tests, section 1.4):
- Indels: a VCF with one insertion and one deletion runs to a written
  model without erroring.
- Reference-mismatch records are logged-and-skipped, not fatal.
- BED entries for nonexistent contigs do not panic the runner.

filter_reads edge cases (3 tests, section 1.5):
- Empty FASTQ in → empty gzipped FASTQ out.
- BED with no overlap → every read filtered, no error.
- VCF with no passing records keeps header lines but emits no data.

gen-frag-length-model edge cases (3 tests, section 1.6):
- All-zero-TLEN BAM → EmptyData (codifies the tlen>0 reader filter).
- Single-ended BAM → EmptyData (codifies the SEGMENTED flag filter).
- Low-MAPQ-only BAM → EmptyData (codifies the MAPQ>10 filter).
  Adds two new test-only helpers for these flag/MAPQ permutations.

gen-seq-error-model negative paths (2 tests, section 1.7):
- Truncated FASTQ (3 lines, no qual) → MalformedFastq.
- First record with empty qual line → MalformedFastq.

cargo test --workspace: 377 passing (up from 355).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joshfactorial joshfactorial changed the base branch from main to develop May 20, 2026 14:05
@joshfactorial joshfactorial changed the title Add support for binned quality scores in Illumina sequencers Add support for binned quality scores in Illumina sequencers and test phase 1 coverage May 20, 2026
@joshfactorial joshfactorial linked an issue May 20, 2026 that may be closed by this pull request
@joshfactorial joshfactorial merged commit 0e1ecd8 into develop May 20, 2026
1 check passed
@joshfactorial joshfactorial deleted the test/phase-1-coverage branch May 20, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fill out testing

1 participant