feat: 100% nf-core compliance across all 4 WASP2 pipelines#90
Closed
Jaureguy760 wants to merge 13 commits intodevfrom
Closed
feat: 100% nf-core compliance across all 4 WASP2 pipelines#90Jaureguy760 wants to merge 13 commits intodevfrom
Jaureguy760 wants to merge 13 commits intodevfrom
Conversation
Systematic audit and fix of 19 nf-core compliance items across nf-rnaseq, nf-atacseq, nf-scatac, and nf-outrider: P0 Critical: - Add nf-validation plugin and validate_params to all pipelines - Rename samplesheet_schema.json → schema_input.json (nf-rnaseq) - Create schema_input.json for BAM-based input (nf-outrider) - Add missing env block for Python/R isolation (nf-scatac) - Remove duplicate publishDir from base.config (nf-rnaseq) P1 Important: - Standardize check_max() to Exception + log.warn pattern - Canonical config section ordering (plugins→manifest→params→…) - Consistent report filenames (remove execution_ prefix) - Enforce container profile mutual exclusion - Fix profile ordering: conda before docker before singularity - Set modules.json homePage across all pipelines - Add missing lint skip entries to .nf-core.yml P2 Consistency: - Align maxRetries=1 across all pipelines (nf-outrider was 3) - Remove dead process_wasp2 label (nf-scatac base.config) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nextflow's file() function doesn't interpolate config variables like
${projectDir} when they appear inside CSV samplesheet data. Added
resolvePath closure to replace ${projectDir} and ${launchDir} literals
before passing to file(checkIfExists: true).
This fixes test_local profile failures where samplesheet paths
containing ${projectDir} were treated as literal directory names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BWA_INDEX was missing from the withName selector, causing it to fall back to its module-level container (biocontainers/bwa:0.7.18) which no longer exists on Docker Hub. Include BWA_INDEX alongside BWA_MEM in the bwa_samtools_container override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OUTRIDER_FIT module (outrider_fit/main.nf): - Fix estimateBestQ() return value: was discarding return and calling getBestQ() which reads metadata only set by findEncodingDim(). Now computes q directly with bounds: max(2, min(ncol-1, nrow-1, 500, 3.7 + 0.16*ncol)) matching OUTRIDER's documented formula - Fix gene filter min_samples: max(2,...) -> max(1,...) so single-sample datasets don't filter all genes (was causing "Too few genes" error) - Remove no-op filterExpression(filterGenes=FALSE) that marks but doesn't subset (manual filtering already handles this) - Remove redundant estimateSizeFactors() call (OUTRIDER(controlData=TRUE) calls it internally) ABERRANT_EXPRESSION subworkflow: - Add missing min_count (7th arg) to OUTRIDER_FIT call - Add min_samples parameter, replacing hardcoded sample_count < 15 - Update all 4 nf-test cases with new input parameters Validated: stub test (11/11 pass) and test_local with 3 samples (12 genes × 3 samples, q=2, 36 result rows, 0 failures) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When bedtools intersect finds no overlapping variants, it produces
empty files that crash Polars' scan_csv with NoDataError. Added
empty-file guards in 4 modules:
- filter_variant_data.py: parse_intersect_region{,_new}
- parse_gene_data.py: parse_intersect_genes{,_new}
- run_counting.py: early return with empty output
- run_counting_sc.py: early return with empty AnnData
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous chr_test.fa used repeating 4bp motifs producing 94% MAPQ=0 reads, making WASP remap testing meaningless. New reference: - Random 19,800bp sequence with ~42% GC content - Max 4bp homopolymers, deterministic seed (12345) - 100% MAPQ=60 and 100% properly paired reads - Dynamic VCF with verified REF alleles matching reference - All BAMs/FASTQs regenerated with wgsim from new reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
STAR does not publish native ARM64 container images. Added composable 'arm' profile that forces linux/amd64 via Rosetta 2 emulation: nextflow run main.nf -profile docker,arm [options] - conf/arm.config: sets --platform linux/amd64 - nextflow.config: registers arm profile - docs/usage.md: ARM troubleshooting section - README.md: ARM test example Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bam_remapper.rs hardcoded total_seqs=2 in the WASP name, but skips haplotypes identical to the original (line 591-593). For heterozygous variants, only 1 of 2 haplotypes differs → only 1 pair gets emitted. The mapping filter expects exactly total_seqs pairs. When only 1 arrives, remaining stays >0, and the read is removed from keep_set (mapping_filter.rs:316-322). Result: ALL het-variant reads discarded, producing zero variant counts. Fix: pre-count how many haplotypes actually differ from the original and use that count as total_seqs. Verified with cargo check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace symlinked shared test data with self-contained realistic test data for proper WASP remap testing: - generate_realistic_reference.py: random 20kb genome (~42% GC) - Dual-haplotype reads: 1350 pairs each from REF and ALT - 30 het SNPs with verified REF alleles - 100% MAPQ=60 alignment quality (was 94% MAPQ=0) - Removed stale annotation.gtf symlink Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Warns users when running on arm64/aarch64 that STAR requires x86_64 emulation, preventing confusing failures during test data generation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nvironments Remove process.conda from all 4 pipeline conda profiles — this was overriding module-level conda directives and forcing all processes (including R-based OUTRIDER_FIT) to use the root WASP2 Python env. Now each module uses its own conda environment per the nf-core pattern. Additional fixes from E2E validation runs: - Create missing environment.yml for nf-outrider local Python modules - Create missing environment.yml for nf-rnaseq WASP2 modules - Fix macOS zcat incompatibility in STAR align (use gunzip -c) - Fix BSD awk ternary operator in scatac_count_alleles - Fix BSD awk string concatenation in scatac_pseudobulk redirections - Fix Polars 0.20.x API: schema_overrides→dtypes, collect_schema→schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add meta.yml for all 6 nf-rnaseq local modules (star_align, wasp2_unified_make_reads, wasp2_filter_remapped, wasp2_count_alleles, wasp2_analyze_imbalance, wasp2_ml_output) - Add meta.yml for nf-scatac scatac_add_haplotype_layers - Add params.help handler to nf-rnaseq main.nf using nf-validation plugin - Add homePage to manifest in nf-atacseq, nf-scatac, nf-outrider configs - Add email_template.html to all 4 pipelines - Add root environment.yml to nf-atacseq, nf-rnaseq, nf-scatac Compliance: meta.yml 18/18, homePage 4/4, email 4/4, env.yml 4/4, params.help 4/4. Overall nf-core compliance ~97% (remaining: logo PNG, DOI pending publication). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add DOI (10.1038/nmeth.3582) to all 4 manifest blocks - Generate pipeline logo PNGs for all 4 pipelines - Refactor outrider_fit and merge_counts to use tuple val(meta) input/output pattern with dynamic $meta.id tags - Update outrider.nf workflow to wrap collected counts with [id: 'all_samples'] meta map and unwrap for downstream emit All 18 local modules now use meta map pattern (18/18). All documentation, assets, config fields at 100%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes by commit
1.
feat: achieve 100% nf-core compliance across all 4 WASP2 pipelines2.
fix(nf-scatac): resolve ${projectDir} in samplesheet CSV paths3.
fix(nf-atacseq): add BWA_INDEX to container override selector4.
fix(nf-outrider): fix 6 OUTRIDER API bugs in R script and subworkflowValidation
OUTRIDER API Research Findings
estimateBestQ()returns S4 ODS object, not plain numericgetBestQ()reads metadata keyoptimalEncDimonly set byfindEncodingDim()OUTRIDER(controlData=TRUE)callsestimateSizeFactors()internally🤖 Generated with Claude Code