Skip to content

feat: 100% nf-core compliance across all 4 WASP2 pipelines#90

Closed
Jaureguy760 wants to merge 13 commits intodevfrom
feat/nfcore-compliance-full
Closed

feat: 100% nf-core compliance across all 4 WASP2 pipelines#90
Jaureguy760 wants to merge 13 commits intodevfrom
feat/nfcore-compliance-full

Conversation

@Jaureguy760
Copy link
Collaborator

@Jaureguy760 Jaureguy760 commented Mar 6, 2026

Summary

  • Achieve 100% nf-core compliance across all 4 WASP2 Nextflow pipelines (nf-atacseq, nf-rnaseq, nf-scatac, nf-outrider)
  • Fix critical OUTRIDER API bugs found during real-data validation
  • Fix BWA_INDEX container override and scATAC path resolution
  • 4 commits, 20 files changed

Changes by commit

1. feat: achieve 100% nf-core compliance across all 4 WASP2 pipelines

  • nf-validation plugin, check_max(), container mutual exclusion, HPC profiles

2. fix(nf-scatac): resolve ${projectDir} in samplesheet CSV paths

  • Fix Nextflow variable interpolation in test samplesheet data

3. fix(nf-atacseq): add BWA_INDEX to container override selector

  • BWA_INDEX was falling back to stale Docker Hub image instead of WASP2 container

4. fix(nf-outrider): fix 6 OUTRIDER API bugs in R script and subworkflow

  • CRITICAL: Fix estimateBestQ() return value — was discarding return and calling getBestQ() which only works after findEncodingDim()
  • Fix gene filter min_samples from max(2,...) to max(1,...) for small sample counts
  • Remove no-op filterExpression(filterGenes=FALSE) and redundant estimateSizeFactors()
  • Add missing min_count arg to ABERRANT_EXPRESSION subworkflow
  • Replace hardcoded sample threshold (< 15) with configurable parameter

Validation

Pipeline Stub Test Real Data Test Status
nf-outrider 11/11 ✔ 3 samples, 12 genes, q=2 ✔
nf-atacseq BWA_INDEX container fix validated
nf-scatac Path resolution fix validated
nf-rnaseq STAR ARM limitation documented ⚠️

OUTRIDER API Research Findings

  • No "OUTRIDER2" exists — latest is OUTRIDER v1.28.1 (Bioconductor 3.22)
  • estimateBestQ() returns S4 ODS object, not plain numeric
  • getBestQ() reads metadata key optimalEncDim only set by findEncodingDim()
  • OUTRIDER(controlData=TRUE) calls estimateSizeFactors() internally
  • q must satisfy: 2 ≤ q < min(n_samples, n_genes)

🤖 Generated with Claude Code

Jaureguy760 and others added 13 commits March 5, 2026 22:43
Systematic audit and fix of 19 nf-core compliance items across
nf-rnaseq, nf-atacseq, nf-scatac, and nf-outrider:

P0 Critical:
- Add nf-validation plugin and validate_params to all pipelines
- Rename samplesheet_schema.json → schema_input.json (nf-rnaseq)
- Create schema_input.json for BAM-based input (nf-outrider)
- Add missing env block for Python/R isolation (nf-scatac)
- Remove duplicate publishDir from base.config (nf-rnaseq)

P1 Important:
- Standardize check_max() to Exception + log.warn pattern
- Canonical config section ordering (plugins→manifest→params→…)
- Consistent report filenames (remove execution_ prefix)
- Enforce container profile mutual exclusion
- Fix profile ordering: conda before docker before singularity
- Set modules.json homePage across all pipelines
- Add missing lint skip entries to .nf-core.yml

P2 Consistency:
- Align maxRetries=1 across all pipelines (nf-outrider was 3)
- Remove dead process_wasp2 label (nf-scatac base.config)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nextflow's file() function doesn't interpolate config variables like
${projectDir} when they appear inside CSV samplesheet data. Added
resolvePath closure to replace ${projectDir} and ${launchDir} literals
before passing to file(checkIfExists: true).

This fixes test_local profile failures where samplesheet paths
containing ${projectDir} were treated as literal directory names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BWA_INDEX was missing from the withName selector, causing it to fall
back to its module-level container (biocontainers/bwa:0.7.18) which
no longer exists on Docker Hub. Include BWA_INDEX alongside BWA_MEM
in the bwa_samtools_container override.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OUTRIDER_FIT module (outrider_fit/main.nf):
- Fix estimateBestQ() return value: was discarding return and calling
  getBestQ() which reads metadata only set by findEncodingDim(). Now
  computes q directly with bounds: max(2, min(ncol-1, nrow-1, 500,
  3.7 + 0.16*ncol)) matching OUTRIDER's documented formula
- Fix gene filter min_samples: max(2,...) -> max(1,...) so single-sample
  datasets don't filter all genes (was causing "Too few genes" error)
- Remove no-op filterExpression(filterGenes=FALSE) that marks but
  doesn't subset (manual filtering already handles this)
- Remove redundant estimateSizeFactors() call (OUTRIDER(controlData=TRUE)
  calls it internally)

ABERRANT_EXPRESSION subworkflow:
- Add missing min_count (7th arg) to OUTRIDER_FIT call
- Add min_samples parameter, replacing hardcoded sample_count < 15
- Update all 4 nf-test cases with new input parameters

Validated: stub test (11/11 pass) and test_local with 3 samples
(12 genes × 3 samples, q=2, 36 result rows, 0 failures)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When bedtools intersect finds no overlapping variants, it produces
empty files that crash Polars' scan_csv with NoDataError. Added
empty-file guards in 4 modules:
- filter_variant_data.py: parse_intersect_region{,_new}
- parse_gene_data.py: parse_intersect_genes{,_new}
- run_counting.py: early return with empty output
- run_counting_sc.py: early return with empty AnnData

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous chr_test.fa used repeating 4bp motifs producing 94%
MAPQ=0 reads, making WASP remap testing meaningless. New reference:
- Random 19,800bp sequence with ~42% GC content
- Max 4bp homopolymers, deterministic seed (12345)
- 100% MAPQ=60 and 100% properly paired reads
- Dynamic VCF with verified REF alleles matching reference
- All BAMs/FASTQs regenerated with wgsim from new reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
STAR does not publish native ARM64 container images. Added composable
'arm' profile that forces linux/amd64 via Rosetta 2 emulation:
  nextflow run main.nf -profile docker,arm [options]

- conf/arm.config: sets --platform linux/amd64
- nextflow.config: registers arm profile
- docs/usage.md: ARM troubleshooting section
- README.md: ARM test example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bam_remapper.rs hardcoded total_seqs=2 in the WASP name, but skips
haplotypes identical to the original (line 591-593). For heterozygous
variants, only 1 of 2 haplotypes differs → only 1 pair gets emitted.

The mapping filter expects exactly total_seqs pairs. When only 1
arrives, remaining stays >0, and the read is removed from keep_set
(mapping_filter.rs:316-322). Result: ALL het-variant reads discarded,
producing zero variant counts.

Fix: pre-count how many haplotypes actually differ from the original
and use that count as total_seqs. Verified with cargo check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace symlinked shared test data with self-contained realistic
test data for proper WASP remap testing:
- generate_realistic_reference.py: random 20kb genome (~42% GC)
- Dual-haplotype reads: 1350 pairs each from REF and ALT
- 30 het SNPs with verified REF alleles
- 100% MAPQ=60 alignment quality (was 94% MAPQ=0)
- Removed stale annotation.gtf symlink

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Warns users when running on arm64/aarch64 that STAR requires x86_64
emulation, preventing confusing failures during test data generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nvironments

Remove process.conda from all 4 pipeline conda profiles — this was
overriding module-level conda directives and forcing all processes
(including R-based OUTRIDER_FIT) to use the root WASP2 Python env.
Now each module uses its own conda environment per the nf-core pattern.

Additional fixes from E2E validation runs:
- Create missing environment.yml for nf-outrider local Python modules
- Create missing environment.yml for nf-rnaseq WASP2 modules
- Fix macOS zcat incompatibility in STAR align (use gunzip -c)
- Fix BSD awk ternary operator in scatac_count_alleles
- Fix BSD awk string concatenation in scatac_pseudobulk redirections
- Fix Polars 0.20.x API: schema_overrides→dtypes, collect_schema→schema

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add meta.yml for all 6 nf-rnaseq local modules (star_align,
  wasp2_unified_make_reads, wasp2_filter_remapped, wasp2_count_alleles,
  wasp2_analyze_imbalance, wasp2_ml_output)
- Add meta.yml for nf-scatac scatac_add_haplotype_layers
- Add params.help handler to nf-rnaseq main.nf using nf-validation plugin
- Add homePage to manifest in nf-atacseq, nf-scatac, nf-outrider configs
- Add email_template.html to all 4 pipelines
- Add root environment.yml to nf-atacseq, nf-rnaseq, nf-scatac

Compliance: meta.yml 18/18, homePage 4/4, email 4/4, env.yml 4/4,
params.help 4/4. Overall nf-core compliance ~97% (remaining: logo PNG,
DOI pending publication).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add DOI (10.1038/nmeth.3582) to all 4 manifest blocks
- Generate pipeline logo PNGs for all 4 pipelines
- Refactor outrider_fit and merge_counts to use tuple val(meta)
  input/output pattern with dynamic $meta.id tags
- Update outrider.nf workflow to wrap collected counts with
  [id: 'all_samples'] meta map and unwrap for downstream emit

All 18 local modules now use meta map pattern (18/18).
All documentation, assets, config fields at 100%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Jaureguy760 Jaureguy760 deleted the feat/nfcore-compliance-full branch March 15, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant