fix: revert GTF from sc, harden Dockerfile, fix docs by Jaureguy760 · Pull Request #91 · mcvickerlab/WASP2

Jaureguy760 · 2026-03-15T06:34:22Z

Summary

Revert GTF/GFF3 support from count-variants-sc — sc commands are scATAC-only; gene annotation is a downstream ArchR/Signac step. Bulk count-variants retains full GTF support.
Harden Dockerfile: tini PID 1, g++ purge assertion, wasp2-ipscore verification in smoke test
Docs rewrite: counting.rst, mapping.rst, analysis.rst, installation.rst simplified and corrected; added ipscore.rst user guide page
Removed stale --min-count footnote from analysis.rst, fixed smoke test sample name case

Test plan

Docker build succeeds
Container smoke test: 10/10 passed
count-variants-sc --help shows BED/Peak only, no GTF params
count-variants --help still shows GTF/GFF3 options
Apptainer smoke test on Lima VM

🤖 Generated with Claude Code

Systematic audit and fix of 19 nf-core compliance items across nf-rnaseq, nf-atacseq, nf-scatac, and nf-outrider: P0 Critical: - Add nf-validation plugin and validate_params to all pipelines - Rename samplesheet_schema.json → schema_input.json (nf-rnaseq) - Create schema_input.json for BAM-based input (nf-outrider) - Add missing env block for Python/R isolation (nf-scatac) - Remove duplicate publishDir from base.config (nf-rnaseq) P1 Important: - Standardize check_max() to Exception + log.warn pattern - Canonical config section ordering (plugins→manifest→params→…) - Consistent report filenames (remove execution_ prefix) - Enforce container profile mutual exclusion - Fix profile ordering: conda before docker before singularity - Set modules.json homePage across all pipelines - Add missing lint skip entries to .nf-core.yml P2 Consistency: - Align maxRetries=1 across all pipelines (nf-outrider was 3) - Remove dead process_wasp2 label (nf-scatac base.config) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Nextflow's file() function doesn't interpolate config variables like ${projectDir} when they appear inside CSV samplesheet data. Added resolvePath closure to replace ${projectDir} and ${launchDir} literals before passing to file(checkIfExists: true). This fixes test_local profile failures where samplesheet paths containing ${projectDir} were treated as literal directory names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BWA_INDEX was missing from the withName selector, causing it to fall back to its module-level container (biocontainers/bwa:0.7.18) which no longer exists on Docker Hub. Include BWA_INDEX alongside BWA_MEM in the bwa_samtools_container override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

OUTRIDER_FIT module (outrider_fit/main.nf): - Fix estimateBestQ() return value: was discarding return and calling getBestQ() which reads metadata only set by findEncodingDim(). Now computes q directly with bounds: max(2, min(ncol-1, nrow-1, 500, 3.7 + 0.16*ncol)) matching OUTRIDER's documented formula - Fix gene filter min_samples: max(2,...) -> max(1,...) so single-sample datasets don't filter all genes (was causing "Too few genes" error) - Remove no-op filterExpression(filterGenes=FALSE) that marks but doesn't subset (manual filtering already handles this) - Remove redundant estimateSizeFactors() call (OUTRIDER(controlData=TRUE) calls it internally) ABERRANT_EXPRESSION subworkflow: - Add missing min_count (7th arg) to OUTRIDER_FIT call - Add min_samples parameter, replacing hardcoded sample_count < 15 - Update all 4 nf-test cases with new input parameters Validated: stub test (11/11 pass) and test_local with 3 samples (12 genes × 3 samples, q=2, 36 result rows, 0 failures) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When bedtools intersect finds no overlapping variants, it produces empty files that crash Polars' scan_csv with NoDataError. Added empty-file guards in 4 modules: - filter_variant_data.py: parse_intersect_region{,_new} - parse_gene_data.py: parse_intersect_genes{,_new} - run_counting.py: early return with empty output - run_counting_sc.py: early return with empty AnnData Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous chr_test.fa used repeating 4bp motifs producing 94% MAPQ=0 reads, making WASP remap testing meaningless. New reference: - Random 19,800bp sequence with ~42% GC content - Max 4bp homopolymers, deterministic seed (12345) - 100% MAPQ=60 and 100% properly paired reads - Dynamic VCF with verified REF alleles matching reference - All BAMs/FASTQs regenerated with wgsim from new reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

STAR does not publish native ARM64 container images. Added composable 'arm' profile that forces linux/amd64 via Rosetta 2 emulation: nextflow run main.nf -profile docker,arm [options] - conf/arm.config: sets --platform linux/amd64 - nextflow.config: registers arm profile - docs/usage.md: ARM troubleshooting section - README.md: ARM test example Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bam_remapper.rs hardcoded total_seqs=2 in the WASP name, but skips haplotypes identical to the original (line 591-593). For heterozygous variants, only 1 of 2 haplotypes differs → only 1 pair gets emitted. The mapping filter expects exactly total_seqs pairs. When only 1 arrives, remaining stays >0, and the read is removed from keep_set (mapping_filter.rs:316-322). Result: ALL het-variant reads discarded, producing zero variant counts. Fix: pre-count how many haplotypes actually differ from the original and use that count as total_seqs. Verified with cargo check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace symlinked shared test data with self-contained realistic test data for proper WASP remap testing: - generate_realistic_reference.py: random 20kb genome (~42% GC) - Dual-haplotype reads: 1350 pairs each from REF and ALT - 30 het SNPs with verified REF alleles - 100% MAPQ=60 alignment quality (was 94% MAPQ=0) - Removed stale annotation.gtf symlink Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Warns users when running on arm64/aarch64 that STAR requires x86_64 emulation, preventing confusing failures during test data generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nvironments Remove process.conda from all 4 pipeline conda profiles — this was overriding module-level conda directives and forcing all processes (including R-based OUTRIDER_FIT) to use the root WASP2 Python env. Now each module uses its own conda environment per the nf-core pattern. Additional fixes from E2E validation runs: - Create missing environment.yml for nf-outrider local Python modules - Create missing environment.yml for nf-rnaseq WASP2 modules - Fix macOS zcat incompatibility in STAR align (use gunzip -c) - Fix BSD awk ternary operator in scatac_count_alleles - Fix BSD awk string concatenation in scatac_pseudobulk redirections - Fix Polars 0.20.x API: schema_overrides→dtypes, collect_schema→schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add meta.yml for all 6 nf-rnaseq local modules (star_align, wasp2_unified_make_reads, wasp2_filter_remapped, wasp2_count_alleles, wasp2_analyze_imbalance, wasp2_ml_output) - Add meta.yml for nf-scatac scatac_add_haplotype_layers - Add params.help handler to nf-rnaseq main.nf using nf-validation plugin - Add homePage to manifest in nf-atacseq, nf-scatac, nf-outrider configs - Add email_template.html to all 4 pipelines - Add root environment.yml to nf-atacseq, nf-rnaseq, nf-scatac Compliance: meta.yml 18/18, homePage 4/4, email 4/4, env.yml 4/4, params.help 4/4. Overall nf-core compliance ~97% (remaining: logo PNG, DOI pending publication). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add DOI (10.1038/nmeth.3582) to all 4 manifest blocks - Generate pipeline logo PNGs for all 4 pipelines - Refactor outrider_fit and merge_counts to use tuple val(meta) input/output pattern with dynamic $meta.id tags - Update outrider.nf workflow to wrap collected counts with [id: 'all_samples'] meta map and unwrap for downstream emit All 18 local modules now use meta map pattern (18/18). All documentation, assets, config fields at 100%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…pelines Update all nf-core modules to latest versions (topic-based version channels), fix module interface mismatches (meta-wrapped channels for BWA/Bowtie2/FASTP/ MULTIQC/MACS2), add required nf-core boilerplate files (.github workflows, email templates, logos, default tests), fix nextflow.config (remove params.max_*, hardcode check_max limits, NXF_OFFLINE guard for custom configs), fix nextflow_schema.json (institutional_config_options, validate_params, tracedir), fix multiqc_config.yml (report_comment, report_section_order), and fix test configs (resourceLimits, testdata base paths). All 4 pipelines pass lint with only 1 irreducible failure each (manifest.name not prefixed with nf-core/ — expected for WASP2 pipelines): - nf-atacseq: 276 passed, 1 failed, 32 warnings - nf-outrider: 185 passed, 1 failed, 16 warnings - nf-rnaseq: 165 passed, 1 failed, 20 warnings - nf-scatac: 138 passed, 1 failed, 15 warnings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Achieve nf-core lint compliance across all 4 WASP2 pipelines: - Update all nf-core modules to latest (topic-based version channels) - Fix module interface mismatches (meta-wrapped channels) - Add required nf-core boilerplate (workflows, templates, tests) - Fix nextflow.config, schema, multiqc_config across all pipelines - Fix WASP core bugs (empty CSV crash, total_seqs mismatch) - Add ARM/Apple Silicon compatibility for nf-rnaseq Results: 4/4 pipelines pass lint (1 irreducible manifest.name each) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Delete 272MB stray -.bam file at repo root - Remove 9 broken placeholder PNGs (69-113 bytes, not real images) - Remove 20 auto-generated CLAUDE.md files from pipeline subdirs - Add test_benchmarks/, .claude/, pipeline-level logs, Nextflow reports (trace.txt, timeline.html, etc.) to .gitignore - Commit tests/shared_data/expected_counts_regions.tsv (test fixture) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add pre-commit hooks to block agent artifacts (ANALYSIS.md, debug_*.py, tmpclaude*, stray BAMs) and binaries in source directories - Add CLAUDE.md with project instructions and file hygiene rules - Fix .gitignore CLAUDE.md exception syntax (!./ -> !/) - Add research report on AI agent file pollution prevention - SessionEnd cleanup hook added locally (.claude/hooks/session-cleanup.sh) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add .github/workflows/nf-lint.yml: matrix CI for nf-core lint across all 4 pipelines - Enhance wasp2_make_reads and wasp2_filter_remapped nf-tests with real assertions - Update all pipeline READMEs: fix your-org -> mcvickerlab, add test_local profile docs - Add chr21 1000 Genomes validation sections to all READMEs - Add real_wasp_data.json symlink and tests/** copy for nf-test data access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- download_chr21.sh: streams chr21 from 1000 Genomes NYGC 30x CRAMs - Supports NA12878 (GIAB benchmark) and HG00731 (WASP2 benchmark) - Generates per-pipeline samplesheets and Nextflow configs - Includes DRY_RUN mode, dependency checking, disk space estimates - README with data sources, disk requirements, and usage examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Phase 1: Remove stale --min-count footnote from analysis.rst, add tini PID 1 + g++ purge assertion to Dockerfile, fix smoke test sample name case and add wasp2-ipscore check. Phase 2: Revert GTF/GFF3 support from count-variants-sc (sc commands are scATAC-only; gene annotation is a downstream ArchR/Signac step). Bulk count-variants retains full GTF support. Clarify sc = ATAC in docs and CLI help text. Docs rewrite: counting.rst, mapping.rst, analysis.rst, installation.rst simplified and corrected. Added ipscore.rst user guide page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Jaureguy760 and others added 20 commits March 5, 2026 22:43

docs(nf-rnaseq): add ARM architecture warning to test data generator

43f01cb

Warns users when running on arm64/aarch64 that STAR requires x86_64 emulation, preventing confusing failures during test data generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Jaureguy760 closed this Mar 15, 2026

Jaureguy760 deleted the feat/docs-fix-dockerfile-hardening-sc-revert branch March 15, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: revert GTF from sc, harden Dockerfile, fix docs#91

fix: revert GTF from sc, harden Dockerfile, fix docs#91
Jaureguy760 wants to merge 20 commits intodevfrom
feat/docs-fix-dockerfile-hardening-sc-revert

Jaureguy760 commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jaureguy760 commented Mar 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant