Skip to content

Test/phase 2 integration#134

Merged
joshfactorial merged 1 commit into
developfrom
test/phase-2-integration
May 20, 2026
Merged

Test/phase 2 integration#134
joshfactorial merged 1 commit into
developfrom
test/phase-2-integration

Conversation

@joshfactorial
Copy link
Copy Markdown
Collaborator

Adding integration tests

Adds a new rneat/tests/ harness that exercises the real binary boundary
(CLI → config → model → output) via assert_cmd. Four integration
suites with 12 tests total, plus one prod fix that the suite caught
on its first run.

New harness:
- tests/common/mod.rs — shared helpers (binary command, fixtures,
  config builders, decompression). Also provides a GenReadsConfig
  builder with paired-ended, model, thread, and seed knobs.
- New dev-deps: assert_cmd, predicates.

cli_smoke.rs (4 tests):
- `rneat --help` lists all 6 subcommands.
- Each subcommand's `--help` exits 0 and mentions
  --configuration-yaml.
- Missing config file → non-zero exit + stderr error message.
- No arguments → non-zero exit + help text on stderr.

pipeline_e2e.rs (2 tests):
- gen-reads with default model produces a structurally well-formed
  FASTQ (multiple of 4 lines, '@'/+' markers, seq.len == qual.len).
- gen-seq-error-model with binned_quality_bins → gen-reads → only
  bin-valued qualities appear in the output FASTQ.

Prod fix caught by the second pipeline test: gen-reads previously
loaded `quality_score_model` independently from `sequence_error_model`,
so the QualityScoreModel embedded in a trained SequencingErrorModel
was silently ignored. When a user set `sequence_error_model:` without
a separate `quality_score_model:`, gen-reads quietly fell back to the
built-in default — meaning binned-quality training had no effect on
output. Fixed in gen_reads/utils/runner.rs by falling through to the
SeqErrorModel's embedded QSM when no explicit override is configured.
This matches the user-facing docstring in gen_reads_template.yml.

determinism.rs (3 tests):
- Same seed, single-threaded → same record multiset.
- Same seed, multi-threaded → same record multiset.
- Different seeds → different output (seed argument is load-bearing).
Comparisons are on decompressed contents (gzip headers carry mtime).
Multiset rather than byte-identical because rneat iterates HashMaps
during contig assembly, so the line order in the output is non-
deterministic even with num_threads=1; the record *set* is stable.

fastq_validation.rs (3 tests):
- Single-ended FASTQ passes strict structural validation
  (ACGTN-only seq, printable-ASCII qual, seq.len == qual.len) and
  every read's length matches the configured read_len.
- Paired-end run produces both _r1 and _r2 with equal record counts;
  R1 names end in /1, R2 names end in /2, and name stems match
  pairwise.
- Every quality byte decodes to a valid Phred+33 score in [0, 93].

cargo test --workspace: 12 new integration tests + 355 existing,
all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joshfactorial joshfactorial changed the base branch from main to develop May 20, 2026 14:06
@joshfactorial joshfactorial linked an issue May 20, 2026 that may be closed by this pull request
@joshfactorial joshfactorial merged commit 6af1373 into develop May 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fill out testing

1 participant