diff --git a/CHANGELOG.md b/CHANGELOG.md index 279c9b75..8de98c4b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -47,6 +47,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#274](https://github.com/nf-core/phaseimpute/pull/274) - Fix issue with compressed reference genome by adding `.gzi` file for `BCFTOOLS_MPILEUP` - [#275](https://github.com/nf-core/phaseimpute/pull/275) - Fix nf-test errors with latest-everything. - [#281](https://github.com/nf-core/phaseimpute/pull/281) - Fix `diffchr()` function. +- [#293](https://github.com/nf-core/phaseimpute/pull/293) - Fix nf-core and nextflow linting. ### `Dependencies` diff --git a/README.md b/README.md index b757e600..5188f244 100644 --- a/README.md +++ b/README.md @@ -106,7 +106,7 @@ We thank the following people for their extensive assistance in the development ## Contributions and Support -If you would like to contribute to this pipeline, please see the [contributing guidelines](docs/CONTRIBUTING.md). Further development tips can be found in the [development documentation](docs/development.md). +If you would like to contribute to this pipeline, please see the [contributing guidelines](docs/CONTRIBUTING.md). For further information or help, don't hesitate to get in touch on the [Slack `#phaseimpute` channel](https://nfcore.slack.com/channels/phaseimpute) (you can join with [this invite](https://nf-co.re/join/slack)). diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml index e1e7b7d7..77c80c39 100644 --- a/assets/methods_description_template.yml +++ b/assets/methods_description_template.yml @@ -3,8 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag section_name: "nf-core/phaseimpute Methods Description" section_href: "https://github.com/nf-core/phaseimpute" plot_type: "html" -## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline -## You inject any metadata in the Nextflow '${workflow}' object data: |

Methods

Data was processed using nf-core/phaseimpute v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (GrĂ¼ning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.

diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index b82e4bb6..6dad681b 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -182,4 +182,33 @@ If you update images or graphics, follow the nf-core [style guidelines](https:// ## Pipeline specific contribution guidelines - +`nf-core/phaseimpute` pipeline aim to strictly follow latest nextflow and nf-core guidelines. +As such, each local modules and subworkflow should be properly written and unittested with nf-test. + +Local components should only be considered when they use are specific for this pipeline. +Otherwise, they should be part of the `nf-core/modules` repository. + +### Channel management and combination + +All channels need to be identified by a meta map. To follow which information is available, the `meta` argument +is suffixed with a combination of the following capital letters: + +- I : individual id +- P : panel id +- R : region used +- M : map used +- T : tool used +- G : reference genome used (is it needed ?) +- S : simulation (depth or genotype array) + +Therefore, the following channel operation example includes a meta map containing the panel id with the region and tool used: + +```nextflow +ch_panel_for_impute.map { + metaPRT, vcf, index -> ... +} +``` + +### Release names + +The names of releases are composed of a color and a dog breed. diff --git a/docs/development.md b/docs/development.md deleted file mode 100644 index 2173cb6f..00000000 --- a/docs/development.md +++ /dev/null @@ -1,26 +0,0 @@ -# Tips for development - -## Channel management and combination - -All channels need to be identified by a meta map. To follow which information is available, the `meta` argument -is suffixed with a combination of the following capital letters: - -- I : individual id -- P : panel id -- R : region used -- M : map used -- T : tool used -- G : reference genome used (is it needed ?) -- S : simulation (depth or genotype array) - -Therefore, the following channel operation example includes a meta map containing the panel id with the region and tool used: - -```nextflow -ch_panel_for_impute.map { - metaPRT, vcf, index -> ... -} -``` - -## Release names - -The names of releases are composed of a color and a dog breed. diff --git a/modules/local/addcolumns/meta.yml b/modules/local/addcolumns/meta.yml new file mode 100644 index 00000000..48ee8f67 --- /dev/null +++ b/modules/local/addcolumns/meta.yml @@ -0,0 +1,61 @@ +name: addcolumns +description: Add metadata information to an existing file as additional columns +keywords: + - columns + - metadata + - awk +tools: + - gawk: + description: "GNU awk" + homepage: "https://www.gnu.org/software/gawk/" + documentation: "https://www.gnu.org/software/gawk/manual/" + tool_dev_url: "https://www.gnu.org/prep/ftp.html" + licence: + - "GPL v3" + identifier: "" +input: + - meta: + type: map + description: | + Groovy Map containing sample information + The following keys are added in additional columns to the input file. + e.g. [ id:'test', depth:1, gparray:'illumina', tools:'glimpse', panel:'1000G' ] + - input: + type: file + description: Textual format file +output: + txt: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - txt: + type: file + description: Resulting file with additional columns containing the metadata information + pattern: "*.txt" + versions_gawk: + - - ${task.process}: + type: string + description: The name of the process + - gawk: + type: string + description: The name of the tool + - awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//': + type: eval + description: The expression to obtain the version of the tool +topics: + versions: + - - ${task.process}: + type: string + description: The name of the process + - gawk: + type: string + description: The name of the tool + - awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//': + type: eval + description: The expression to obtain the version of the tool +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/modules/local/listtofile/meta.yml b/modules/local/listtofile/meta.yml index 97f78209..4756643f 100644 --- a/modules/local/listtofile/meta.yml +++ b/modules/local/listtofile/meta.yml @@ -4,11 +4,14 @@ keywords: - list - gawk tools: - - annotate: - description: | - Extract the file names from a list of path and register them into a file. - The corresponding identifier can be added to the file in a second column - or in a separate file. + - gawk: + description: "GNU awk" + homepage: "https://www.gnu.org/software/gawk/" + documentation: "https://www.gnu.org/software/gawk/manual/" + tool_dev_url: "https://www.gnu.org/prep/ftp.html" + licence: + - "GPL v3" + identifier: "" input: - - meta: type: map diff --git a/modules/local/vcfchrextract/meta.yml b/modules/local/vcfchrextract/meta.yml index 19d523d4..6e69faf3 100644 --- a/modules/local/vcfchrextract/meta.yml +++ b/modules/local/vcfchrextract/meta.yml @@ -6,12 +6,14 @@ keywords: - head - contig tools: - - head: - description: Extract header from variant calling file. + - query: + description: | + Extracts fields from VCF or BCF files and outputs them in user-defined format. homepage: http://samtools.github.io/bcftools/bcftools.html - documentation: https://samtools.github.io/bcftools/bcftools.html#head + documentation: http://www.htslib.org/doc/bcftools.html doi: 10.1093/bioinformatics/btp352 licence: ["MIT"] + identifier: biotools:bcftools input: - meta: type: map diff --git a/ro-crate-metadata.json b/ro-crate-metadata.json index dd5c8ded..d9128fb8 100644 --- a/ro-crate-metadata.json +++ b/ro-crate-metadata.json @@ -23,7 +23,7 @@ "@type": "Dataset", "creativeWorkStatus": "InProgress", "datePublished": "2026-04-30T13:33:11+00:00", - "description": "

\n \n \n \"nf-core/phaseimpute\"\n \n

\n\n[![Open in GitHub Codespaces](https://img.shields.io/badge/Open_In_GitHub_Codespaces-black?labelColor=grey&logo=github)](https://github.com/codespaces/new/nf-core/phaseimpute)\n[![GitHub Actions CI Status](https://github.com/nf-core/phaseimpute/actions/workflows/nf-test.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/phaseimpute/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.14329225-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.14329225)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.10.4-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-4.0.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/4.0.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/phaseimpute)\n\n[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23phaseimpute-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/phaseimpute)[![Follow on Bluesky](https://img.shields.io/badge/bluesky-%40nf__core-1185fe?labelColor=000000&logo=bluesky)](https://bsky.app/profile/nf-co.re)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)\n\n## Introduction\n\n**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data.\n\n\n \n \"metromap\"/\n\n\nThe whole pipeline consists of five main steps, each of which can be run separately and independently. Users are not required to run all steps sequentially and can select specific steps based on their needs:\n\n1. **QC: Chromosome Name Check**: Ensures compatibility by validating that all expected contigs are present in the variant and alignment files.\n\n2. **Simulation (`--simulate`)**: Generates artificial datasets by downsampling high-density data to simulate low-pass genetic information. This enables the comparison of imputation results against a high-quality dataset (truth set). Simulations may include:\n - **Low-pass data generation** by downsampling BAM or CRAM files with [`samtools view -s`](https://www.htslib.org/doc/samtools-view.html) at different depths.\n\n3. **Panel Preparation (`--panelprep`)**: Prepares the reference panel through phasing, quality control, variant filtering, and annotation. Key processes include:\n - **Normalization** of the reference panel to retain essential variants.\n - **Phasing** of haplotypes in the reference panel using [Shapeit5](https://odelaneau.github.io/shapeit5/).\n - **Chunking** of the reference panel into specific regions across chromosomes.\n - **Position Extraction** for targeted imputation sites.\n\n4. **Imputation (`--impute`)**: This is the primary step, where genotypes in the target dataset are imputed using the prepared reference panel. The main steps are:\n - **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt/Quilt2](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).\n - **Ligation** of imputed chunks to produce a final VCF file per sample, with all chromosomes unified.\n\n5. **Validation (`--validate`)**: Assesses imputation accuracy by comparing the imputed dataset to a truth dataset. This step leverages the [Glimpse2](https://odelaneau.github.io/GLIMPSE/) concordance process to summarize differences between two VCF files.\n\nFor more detailed instructions, please refer to the [usage documentation](https://nf-co.re/phaseimpute/usage).\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/get_started/environment_setup/overview) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/get_started/run-your-first-pipeline) with `-profile test` before running the workflow on actual data.\n\nThe primary function of this pipeline is to impute a target dataset based on a phased panel. Begin by preparing a samplesheet with your input data, formatted as follows:\n\n```csv title=\"samplesheet.csv\"\nsample,file,index\nSAMPLE_1X,/path/to/.,/path/to/.\n```\n\nEach row represents either a bam or a cram file along with its corresponding index file. Ensure that all input files have consistent file extensions.\n\nFor certain tools and steps within the pipeline, you will also need to provide a samplesheet for the reference panel. Here's an example of what a final samplesheet for a reference panel might look like, covering three chromosomes:\n\n```csv title=\"panel.csv\"\npanel,chr,vcf,index\nPhase3,1,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\nPhase3,2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\nPhase3,3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\n```\n\n## Running the pipeline\n\nRun one of the steps of the pipeline (imputation with glimpse1) using the following command and test profile:\n\n```bash\nnextflow run nf-core/phaseimpute \\\n -profile test, \\\n --outdir \n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/running/run-pipelines#using-parameter-files).\n\nFor more details and further functionality, please refer to the [usage documentation](https://nf-co.re/phaseimpute/usage) and the [parameter documentation](https://nf-co.re/phaseimpute/parameters).\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/phaseimpute/results) tab on the nf-core website pipeline page.\nFor more details on the output files and reports, please refer to the [output documentation](https://nf-co.re/phaseimpute/output).\n\n## Credits\n\nnf-core/phaseimpute was originally written by Louis Le N\u00e9zet & Anabella Trigila.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Saul Pierotti\n- Eugenia Fontecha\n- Matias Romero Victorica\n- Hemanoel Passarelli\n- Gaspard Ichas\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](docs/CONTRIBUTING.md). Further development tips can be found in the [development documentation](docs/development.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#phaseimpute` channel](https://nfcore.slack.com/channels/phaseimpute) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\nIf you use nf-core/phaseimpute for your analysis, please cite it using the following doi: [10.5281/zenodo.14329225](https://doi.org/10.5281/zenodo.14329225)\n\nAn extensive list of references for the tools used by the pipeline, including QUILT, GLIMPSE, and STITCH, can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", + "description": "

\n \n \n \"nf-core/phaseimpute\"\n \n

\n\n[![Open in GitHub Codespaces](https://img.shields.io/badge/Open_In_GitHub_Codespaces-black?labelColor=grey&logo=github)](https://github.com/codespaces/new/nf-core/phaseimpute)\n[![GitHub Actions CI Status](https://github.com/nf-core/phaseimpute/actions/workflows/nf-test.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/nf-test.yml)\n[![GitHub Actions Linting Status](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/phaseimpute/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.14329225-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.14329225)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.10.4-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)\n[![nf-core template version](https://img.shields.io/badge/nf--core_template-4.0.2-green?style=flat&logo=nfcore&logoColor=white&color=%2324B064&link=https%3A%2F%2Fnf-co.re)](https://github.com/nf-core/tools/releases/tag/4.0.2)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/phaseimpute)\n\n[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23phaseimpute-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/phaseimpute)[![Follow on Bluesky](https://img.shields.io/badge/bluesky-%40nf__core-1185fe?labelColor=000000&logo=bluesky)](https://bsky.app/profile/nf-co.re)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)\n\n## Introduction\n\n**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data.\n\n\n \n \"metromap\"/\n\n\nThe whole pipeline consists of five main steps, each of which can be run separately and independently. Users are not required to run all steps sequentially and can select specific steps based on their needs:\n\n1. **QC: Chromosome Name Check**: Ensures compatibility by validating that all expected contigs are present in the variant and alignment files.\n\n2. **Simulation (`--simulate`)**: Generates artificial datasets by downsampling high-density data to simulate low-pass genetic information. This enables the comparison of imputation results against a high-quality dataset (truth set). Simulations may include:\n - **Low-pass data generation** by downsampling BAM or CRAM files with [`samtools view -s`](https://www.htslib.org/doc/samtools-view.html) at different depths.\n\n3. **Panel Preparation (`--panelprep`)**: Prepares the reference panel through phasing, quality control, variant filtering, and annotation. Key processes include:\n - **Normalization** of the reference panel to retain essential variants.\n - **Phasing** of haplotypes in the reference panel using [Shapeit5](https://odelaneau.github.io/shapeit5/).\n - **Chunking** of the reference panel into specific regions across chromosomes.\n - **Position Extraction** for targeted imputation sites.\n\n4. **Imputation (`--impute`)**: This is the primary step, where genotypes in the target dataset are imputed using the prepared reference panel. The main steps are:\n - **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt/Quilt2](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).\n - **Ligation** of imputed chunks to produce a final VCF file per sample, with all chromosomes unified.\n\n5. **Validation (`--validate`)**: Assesses imputation accuracy by comparing the imputed dataset to a truth dataset. This step leverages the [Glimpse2](https://odelaneau.github.io/GLIMPSE/) concordance process to summarize differences between two VCF files.\n\nFor more detailed instructions, please refer to the [usage documentation](https://nf-co.re/phaseimpute/usage).\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/get_started/environment_setup/overview) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/get_started/run-your-first-pipeline) with `-profile test` before running the workflow on actual data.\n\nThe primary function of this pipeline is to impute a target dataset based on a phased panel. Begin by preparing a samplesheet with your input data, formatted as follows:\n\n```csv title=\"samplesheet.csv\"\nsample,file,index\nSAMPLE_1X,/path/to/.,/path/to/.\n```\n\nEach row represents either a bam or a cram file along with its corresponding index file. Ensure that all input files have consistent file extensions.\n\nFor certain tools and steps within the pipeline, you will also need to provide a samplesheet for the reference panel. Here's an example of what a final samplesheet for a reference panel might look like, covering three chromosomes:\n\n```csv title=\"panel.csv\"\npanel,chr,vcf,index\nPhase3,1,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\nPhase3,2,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\nPhase3,3,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz,ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.csi\n```\n\n## Running the pipeline\n\nRun one of the steps of the pipeline (imputation with glimpse1) using the following command and test profile:\n\n```bash\nnextflow run nf-core/phaseimpute \\\n -profile test, \\\n --outdir \n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/running/run-pipelines#using-parameter-files).\n\nFor more details and further functionality, please refer to the [usage documentation](https://nf-co.re/phaseimpute/usage) and the [parameter documentation](https://nf-co.re/phaseimpute/parameters).\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/phaseimpute/results) tab on the nf-core website pipeline page.\nFor more details on the output files and reports, please refer to the [output documentation](https://nf-co.re/phaseimpute/output).\n\n## Credits\n\nnf-core/phaseimpute was originally written by Louis Le N\u00e9zet & Anabella Trigila.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Saul Pierotti\n- Eugenia Fontecha\n- Matias Romero Victorica\n- Hemanoel Passarelli\n- Gaspard Ichas\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](docs/CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#phaseimpute` channel](https://nfcore.slack.com/channels/phaseimpute) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\nIf you use nf-core/phaseimpute for your analysis, please cite it using the following doi: [10.5281/zenodo.14329225](https://doi.org/10.5281/zenodo.14329225)\n\nAn extensive list of references for the tools used by the pipeline, including QUILT, GLIMPSE, and STITCH, can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", "hasPart": [ { "@id": "main.nf" diff --git a/subworkflows/local/bam_impute_quilt2/main.nf b/subworkflows/local/bam_impute_quilt2/main.nf deleted file mode 100644 index 2fc57c1e..00000000 --- a/subworkflows/local/bam_impute_quilt2/main.nf +++ /dev/null @@ -1,74 +0,0 @@ -include { QUILT_QUILT2 } from '../../../modules/nf-core/quilt/quilt2' -include { GLIMPSE2_LIGATE } from '../../../modules/nf-core/glimpse2/ligate' -include { BCFTOOLS_INDEX } from '../../../modules/nf-core/bcftools/index' - -workflow BAM_IMPUTE_QUILT2 { - take: - ch_input // channel (mandatory): [ [id], [bam], [bai], bampaths, bamnames ] - ch_reference_panel // channel (mandatory): [ [panel, chr], vcf, index ] - ch_chunks // channel (optional) : [ [panel, chr], chr, start, end ] - ch_map // channel (optional) : [ [panel, chr], map ] - ch_fasta // channel (optional) : [ [genome], fa, fai ] - n_gen // integer: Number of generations since founding or mixing - buffer // integer: Buffer of region to perform imputation over - - main: - - ch_parameters = ch_reference_panel - .combine(ch_map, by: 0) - .combine(ch_chunks, by: 0) - - ch_parameters.ifEmpty { - error("ERROR: join operation resulted in an empty channel. Please provide a valid ch_chunks and ch_map channel as input.") - } - - ch_bam_params = ch_input - .combine(ch_parameters) - .map { metaI, bam, bai, bampath, bamname, metaPC, reference_vcf, reference_index, gmap, chr, start, end -> - def regionout = "${chr}" - if (start != [] && end != []) { - regionout = "${chr}:${start}-${end}" - } - [ - metaPC + metaI + ["regionout": regionout], - bam, - bai, - bampath, - bamname, - reference_vcf, - reference_index, - [], - [], - [], - chr, - start, - end, - n_gen, - buffer, - gmap, - ] - } - - QUILT_QUILT2(ch_bam_params, ch_fasta) - - ligate_input = QUILT_QUILT2.out.vcf - .join(QUILT_QUILT2.out.tbi) - .map { meta, vcf, index -> - def keysToKeep = meta.keySet() - ['regionout'] - [meta.subMap(keysToKeep), vcf, index] - } - .groupTuple() - - GLIMPSE2_LIGATE(ligate_input) - - BCFTOOLS_INDEX(GLIMPSE2_LIGATE.out.merged_variants) - - ch_vcf_index = GLIMPSE2_LIGATE.out.merged_variants.join( - BCFTOOLS_INDEX.out.tbi.mix(BCFTOOLS_INDEX.out.csi), - failOnMismatch: true, - failOnDuplicate: true, - ) - - emit: - vcf_index = ch_vcf_index // channel: [ [id, chr], vcf, tbi ] -} diff --git a/subworkflows/local/bam_impute_quilt2/meta.yml b/subworkflows/local/bam_impute_quilt2/meta.yml deleted file mode 100644 index 523d3059..00000000 --- a/subworkflows/local/bam_impute_quilt2/meta.yml +++ /dev/null @@ -1,39 +0,0 @@ -name: "bam_impute_quilt2" -description: Impute low-coverage BAM/CRAM inputs with QUILT2 and ligate chunked - outputs per chromosome. -keywords: - - imputation - - low-coverage - - bam - - cram - - vcf -components: - - quilt/quilt2 - - glimpse2/ligate - - bcftools/index -input: - - ch_input: - type: channel - description: BAM/CRAM input batches with optional rename files. - - ch_reference_panel: - type: channel - description: Reference panel VCF per chromosome. - - ch_chunks: - type: channel - description: Imputation chunks per chromosome. - - ch_map: - type: channel - description: Genetic map per chromosome. - - ch_fasta: - type: channel - description: Reference FASTA, required for CRAM inputs. - - n_gen: - type: integer - description: Number of generations since founding or mixing. - - buffer: - type: integer - description: Buffer of region to perform imputation over. -output: - - vcf_index: - type: channel - description: Imputed and indexed VCF files per chromosome. diff --git a/subworkflows/local/prepare_genome/main.nf b/subworkflows/local/prepare_genome/main.nf index 9e5765e9..ac6cfca7 100644 --- a/subworkflows/local/prepare_genome/main.nf +++ b/subworkflows/local/prepare_genome/main.nf @@ -20,23 +20,23 @@ workflow PREPARE_GENOME { [] ]) + def need_faidx = !fasta_fai_path || (is_compressed && !fasta_gzi_path) + if (need_faidx) { + SAMTOOLS_FAIDX(ch_fasta, false) + } + if (fasta_fai_path) { ch_fai = channel.of(file(fasta_fai_path, checkIfExists:true)) } else { - SAMTOOLS_FAIDX(ch_fasta, false) ch_fai = SAMTOOLS_FAIDX.out.fai.map{ _meta, fasta_fai -> fasta_fai } } - if (is_compressed) { - if (fasta_gzi_path) { - ch_gzi = channel.of(file(fasta_gzi_path, checkIfExists:true)) - } else if (!fasta_fai_path) { - ch_gzi = SAMTOOLS_FAIDX.out.gzi.map{ _meta, gzi -> gzi } - } else { - SAMTOOLS_FAIDX(ch_fasta, false) - ch_gzi = SAMTOOLS_FAIDX.out.gzi.map{ _meta, gzi -> gzi } - } + + if (!is_compressed) { + ch_gzi = channel.of([[]]) + } else if (fasta_gzi_path) { + ch_gzi = channel.of(file(fasta_gzi_path, checkIfExists:true)) } else { - ch_gzi = channel.of([]) + ch_gzi = SAMTOOLS_FAIDX.out.gzi.map{ _meta, gzi -> gzi } } ch_fasta_fai_gzi = ch_fasta @@ -46,5 +46,5 @@ workflow PREPARE_GENOME { .collect() emit: - ch_fasta_fai_gzi + ch_fasta_fai_gzi = ch_fasta_fai_gzi } diff --git a/subworkflows/local/prepare_genome/meta.yml b/subworkflows/local/prepare_genome/meta.yml new file mode 100644 index 00000000..8ceac38c --- /dev/null +++ b/subworkflows/local/prepare_genome/meta.yml @@ -0,0 +1,51 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "PREPARE_GENOME" +description: | + Subworkflow to prepare the reference genome for imputation. + This includes indexing the genome and preparing the necessary files. +keywords: + - genome + - reference + - indexing +components: + - samtools/faidx +input: + - genome: + type: string + description: Reference genome name + - fasta_path: + type: file + description: Reference genome FASTA file + pattern: "*.{fa,fasta,fa.gz,fasta.gz}" + - fasta_index_path: + type: file + description: Reference genome FASTA index file + pattern: "*.{fai,faidx}" + - fasta_gzi_path: + type: file + description: Reference genome FASTA gzi index file (optional) + pattern: "*.gzi" +output: + - ch_fasta_fai_gzi: + type: channel + description: Channel containing the reference genome FASTA file, its index and gzi index if present + structure: + - meta: + type: map + description: Metadata map that will be combined with the input data map + - fasta: + type: file + description: Reference genome FASTA file + pattern: "*.{fa,fasta,fa.gz,fasta.gz}" + - fai: + type: file + description: Reference genome FASTA index file + pattern: "*.{fai,faidx}" + - gzi: + type: file + description: Reference genome FASTA gzi index file (optional) + pattern: "*.gzi" +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/subworkflows/local/prepare_genome/tests/main.nf.test b/subworkflows/local/prepare_genome/tests/main.nf.test new file mode 100644 index 00000000..531845eb --- /dev/null +++ b/subworkflows/local/prepare_genome/tests/main.nf.test @@ -0,0 +1,94 @@ +nextflow_workflow { + + name "Test Subworkflow PREPARE_GENOME" + script "../main.nf" + workflow "PREPARE_GENOME" + + tag "subworkflows" + tag "subworkflows_local" + tag "subworkflows/prepare_genome" + tag "prepare_genome" + + tag "samtools" + tag "samtools/faidx" + + test("Homo sapiens GRCh38 reference genome - no fai, no gzi") { + when { + workflow { + """ + input[0] = "hg38" + input[1] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz", checkIfExist:true) + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(sanitizeOutput(workflow.out)).match() } + ) + } + } + + test("Homo sapiens GRCh38 reference genome - with fai, no gzi") { + when { + workflow { + """ + input[0] = "hg38" + input[1] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz", checkIfExist:true) + input[2] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz.fai", checkIfExist:true) + input[3] = [] + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(sanitizeOutput(workflow.out)).match() } + ) + } + } + + test("Homo sapiens GRCh38 reference genome - with fai, with gzi") { + when { + workflow { + """ + input[0] = "hg38" + input[1] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz", checkIfExist:true) + input[2] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz.fai", checkIfExist:true) + input[3] = file(params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz.gzi", checkIfExist:true) + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(sanitizeOutput(workflow.out)).match() } + ) + } + } + + test("Homo sapiens reference genome not compressed - no fai, no gzi") { + when { + workflow { + """ + input[0] = "hg38" + input[1] = file(params.modules_testdata_base_path + "genomics/homo_sapiens/genome/genome2.fasta", checkIfExist:true) + input[2] = [] + input[3] = [] + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot(sanitizeOutput(workflow.out)).match() } + ) + } + } +} diff --git a/subworkflows/local/prepare_genome/tests/main.nf.test.snap b/subworkflows/local/prepare_genome/tests/main.nf.test.snap new file mode 100644 index 00000000..857f12ea --- /dev/null +++ b/subworkflows/local/prepare_genome/tests/main.nf.test.snap @@ -0,0 +1,88 @@ +{ + "Homo sapiens GRCh38 reference genome - with fai, with gzi": { + "content": [ + { + "ch_fasta_fai_gzi": [ + [ + { + "genome": "hg38" + }, + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz", + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.fai", + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.gzi" + ] + ] + } + ], + "timestamp": "2026-05-21T16:49:07.995543799", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + }, + "Homo sapiens GRCh38 reference genome - no fai, no gzi": { + "content": [ + { + "ch_fasta_fai_gzi": [ + [ + { + "genome": "hg38" + }, + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz", + "GRCh38.s.fa.gz.fai:md5,4f4e0ff133e7a05cb469e345f766ca8c", + "GRCh38.s.fa.gz.gzi:md5,09046d9646db2cc5c425f231ce4595d7" + ] + ] + } + ], + "timestamp": "2026-05-21T16:48:47.950717313", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + }, + "Homo sapiens GRCh38 reference genome - with fai, no gzi": { + "content": [ + { + "ch_fasta_fai_gzi": [ + [ + { + "genome": "hg38" + }, + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz", + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.fai", + "GRCh38.s.fa.gz.gzi:md5,09046d9646db2cc5c425f231ce4595d7" + ] + ] + } + ], + "timestamp": "2026-05-21T16:49:01.266153309", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + }, + "Homo sapiens reference genome not compressed - no fai, no gzi": { + "content": [ + { + "ch_fasta_fai_gzi": [ + [ + { + "genome": "hg38" + }, + "/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome2.fasta", + "genome2.fasta.fai:md5,5b25db0adda159ef5eb5c1afab9b2f2e", + [ + + ] + ] + ] + } + ], + "timestamp": "2026-05-21T16:49:18.876669956", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + } +} \ No newline at end of file diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf index ce8fc58c..aeaeacc0 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/main.nf @@ -15,7 +15,6 @@ include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' include { paramsSummaryMap } from 'plugin/nf-schema' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' -include { SAMTOOLS_FAIDX } from '../../../modules/nf-core/samtools/faidx' /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -192,7 +191,7 @@ workflow PIPELINE_INITIALISATION { // if (sheet_region == null){ // #TODO Add support for string input - ch_regions = getRegionFromFai("all", ch_ref_gen) + ch_regions = getRegionFromFai("all", ch_ref_gen) } else if (sheet_region.endsWith(".csv")) { log.info "Region file provided as input is a samplesheet" ch_regions = channel.from(samplesheetToList( @@ -306,26 +305,32 @@ workflow PIPELINE_INITIALISATION { posfile_panelid = ch_posfile.map{ metaPC, _vcf, _index, _hap, _legend, _posfile -> [metaPC.panel_id]}.unique() // Get all unique panel id except None - panel_id = panel_panelid + panel_id_list = panel_panelid .mix(chunks_panelid, posfile_panelid) .flatten() .filter { it -> it != "None" } .unique() + .toList() // Check uniqueness of panel_id // TODO add support for multiple panel - panel_id - .collect() + panel_id_list .map{ panel_ids -> - if (panel_ids.size() != 1) { + if (panel_ids.size() > 1) { error "Multiple panel IDs detected: ${panel_ids}. Please provide only one across panel, chunks and posfile." } + if (panel_ids.size() == 0) { + log.warn "No panel IDs detected. Channel panel, chunks and posfile will be initialise with null values and `panel_id`: None." + } } + panel_id_list = panel_id_list + .map{ ids -> ids.size() > 0 ? ids[0] : "None" } + // For each channel if not provided change panel_id to available ones if (!sheet_panel) { ch_panel = ch_panel - .combine(panel_id) + .combine(panel_id_list) .map{ metaPC, vcf, index, panel_id_name -> [ metaPC + ['panel_id': panel_id_name], vcf, index ]} @@ -333,7 +338,7 @@ workflow PIPELINE_INITIALISATION { if (!sheet_chunks) { ch_chunks = ch_chunks - .combine(panel_id) + .combine(panel_id_list) .map{ metaPC, chunks, panel_id_name -> [ metaPC + ['panel_id': panel_id_name], chunks ]} @@ -341,7 +346,7 @@ workflow PIPELINE_INITIALISATION { if (!sheet_posfile) { ch_posfile = ch_posfile - .combine(panel_id) + .combine(panel_id_list) .map{ metaPC, vcf, index, hap, legend, posfile, panel_id_name -> [ metaPC + ['panel_id': panel_id_name], vcf, index, hap, legend, posfile ]} @@ -351,7 +356,11 @@ workflow PIPELINE_INITIALISATION { // Check contigs name in different meta map // // Collect all chromosomes names in all different inputs - chr_ref = ch_ref_gen.map { _meta, _fasta, fai_file, _gzi_file -> [fai_file.readLines()*.split('\t').collect{cols -> cols[0]}] } + chr_ref = ch_ref_gen.map { + _meta, _fasta, fai_file, _gzi_file -> [ + fai_file.readLines()*.split('\t').collect{cols -> cols[0]} + ] + } chr_regions = extractChr(ch_regions) // Check that the chromosomes names that will be used are all present in different inputs @@ -419,21 +428,21 @@ workflow PIPELINE_INITIALISATION { // Check that all input files have the correct index checkFileIndex(ch_input_target.mix(ch_input_truth, ch_ref_gen, ch_panel)) - // Make available both index + // Make available both index if present ch_fasta_index = ch_ref_gen.map{ meta, fasta, fai, gzi -> [ - meta, fasta, [fai, gzi] + meta, fasta, gzi ? [fai, gzi] : [fai] ]} emit: - ch_input_target // [ [meta], file, index ] - ch_input_truth // [ [meta], file, index ] - ch_fasta_index // [ [genome], fasta, [fai, gzi] ] - ch_panel // [ [panel_id, chr], vcf, index ] - ch_depth // [ [depth], depth ] - ch_regions // [ [chr, region], region ] - ch_map // [ [chr], map ] - ch_posfile // [ [panel_id, chr], vcf, index, hap, legend, posfile ] - ch_chunks // [ [panel_id, chr], txt ] + ch_input_target = ch_input_target // [ [meta], file, index ] + ch_input_truth = ch_input_truth // [ [meta], file, index ] + ch_fasta_index = ch_fasta_index // [ [genome], fasta, [fai, gzi] ] + ch_panel = ch_panel // [ [panel_id, chr], vcf, index ] + ch_depth = ch_depth // [ [depth], depth ] + ch_regions = ch_regions // [ [chr, region], region ] + ch_map = ch_map // [ [chr], map ] + ch_posfile = ch_posfile // [ [panel_id, chr], vcf, index, hap, legend, posfile ] + ch_chunks = ch_chunks // [ [panel_id, chr], txt ] } /* @@ -852,9 +861,9 @@ def toolCitationText(steps, tools, normalize, remove_samples, compute_freq, phas def text_panelprep = [ "Reference panel preparation followed several steps.", - normalize && remove_samples ? "The reference panel genotypes were normalized and samples" + remove_samples + "were removed" : + normalize && remove_samples ? "The reference panel genotypes were normalized and samples: " + remove_samples.split(",").join(", ") + " were removed" : normalize ? "The reference panel genotypes were normalized" : - remove_samples ? "Samples " + remove_samples.split(",").join(", ") + " were removed from the reference panel genotypes" : + remove_samples ? "Samples: " + remove_samples.split(",").join(", ") + " were removed from the reference panel genotypes" : "No normalization or sample removal were performed on the reference panel genotypes.", normalize || remove_samples ? "followed by site extraction and format conversion using ${tool_citation.BCFTOOLS}.": "Site extraction and format conversion was done using ${tool_citation.BCFTOOLS}.", diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/meta.yml b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/meta.yml index eda69f12..58465d82 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/meta.yml +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/meta.yml @@ -1,5 +1,5 @@ # yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFCORE_PHASEIMPUTE_PIPELINE" +name: "PIPELINE_INITIALISATION" description: | Local utility subworkflow for nf-core/phaseimpute that prepares input channels, validates pipeline inputs, and handles completion reporting hooks. @@ -12,7 +12,8 @@ components: - utils_nfschema_plugin - utils_nfcore_pipeline - utils_nextflow_pipeline - - samtools/faidx + - completionemail + - completionsummary input: - version: type: boolean @@ -47,3 +48,7 @@ output: description: Prepared chunk channel - chunk_model: description: Selected chunking model +authors: + - "@louislenezet" +maintainers: + - "@louislenezet" diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test index 1715f4dc..f13d36e4 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test @@ -280,7 +280,6 @@ nextflow_function { test("Test checkFileIndex fa.gz fai - no gzi") { function "checkFileIndex" tag "checkFileIndex" - tag "test2" when { function { """ @@ -301,7 +300,6 @@ nextflow_function { test("Test checkFileIndex fa.gz fai - gzi wrong ext") { function "checkFileIndex" tag "checkFileIndex" - tag "test2" when { function { """ diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test.snap b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test.snap index ab7eb594..521a20fc 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test.snap +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/function.nf.test.snap @@ -172,7 +172,7 @@ }, "Test toolCitationText - all tools and steps": { "content": [ - "Tools used in the workflow included the following. Low-coverage sequencing data simulation was performed with SAMtools (Danecek et al. 2021) subcommand 'depth' and 'view' for downsampling high-coverage BAM files. Reference panel preparation followed several steps. The reference panel genotypes were normalized and samplessample1,sample2were removed followed by site extraction and format conversion using BCFtools (Danecek et al. 2021). Allele frequencies were then computed with vcflib (Garrison et al. 2022). Genotype phasing was performed with SHAPEIT5 (Hofmeister et al. 2023). Finally, the reference panel was split into per-chromosome chunks using GLIMPSE (Rubinacci et al. 2021) and GLIMPSE2 (Rubinacci et al. 2023). Imputation tools used were: GLIMPSE (Rubinacci et al. 2021) with variants called using BCFtools (Danecek et al. 2021) mpileup followed by indexation with Tabix (Li H et al. 2011) when BAM files were provided, GLIMPSE2 (Rubinacci et al. 2023), QUILT (Davies et al. 2021), QUILT2 (Li et al. 2026), STITCH (Davies et al. 2016), Beagle5 (Browning et al. 2018), Minimac4 (Das et al. 2016). Imputation accuracy was assessed by comparing imputed genotypes to truth data using GLIMPSE2 (Rubinacci et al. 2023). Truth genotypes were obtained either from array genotyping data provided as input or from high-coverage sequencing data from which genotypes were called using BCFtools (Danecek et al. 2021) mpileup followed by indexation with Tabix (Li H et al. 2011). Pipeline results statistics were summarised with MultiQC (Ewels et al. 2016)." + "Tools used in the workflow included the following. Low-coverage sequencing data simulation was performed with SAMtools (Danecek et al. 2021) subcommand 'depth' and 'view' for downsampling high-coverage BAM files. Reference panel preparation followed several steps. The reference panel genotypes were normalized and samples: sample1, sample2 were removed followed by site extraction and format conversion using BCFtools (Danecek et al. 2021). Allele frequencies were then computed with vcflib (Garrison et al. 2022). Genotype phasing was performed with SHAPEIT5 (Hofmeister et al. 2023). Finally, the reference panel was split into per-chromosome chunks using GLIMPSE (Rubinacci et al. 2021) and GLIMPSE2 (Rubinacci et al. 2023). Imputation tools used were: GLIMPSE (Rubinacci et al. 2021) with variants called using BCFtools (Danecek et al. 2021) mpileup followed by indexation with Tabix (Li H et al. 2011) when BAM files were provided, GLIMPSE2 (Rubinacci et al. 2023), QUILT (Davies et al. 2021), QUILT2 (Li et al. 2026), STITCH (Davies et al. 2016), Beagle5 (Browning et al. 2018), Minimac4 (Das et al. 2016). Imputation accuracy was assessed by comparing imputed genotypes to truth data using GLIMPSE2 (Rubinacci et al. 2023). Truth genotypes were obtained either from array genotyping data provided as input or from high-coverage sequencing data from which genotypes were called using BCFtools (Danecek et al. 2021) mpileup followed by indexation with Tabix (Li H et al. 2011). Pipeline results statistics were summarised with MultiQC (Ewels et al. 2016)." ], "timestamp": "2026-04-03T15:44:59.712321732", "meta": { diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test index 6592ce70..d52f8ce8 100644 --- a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test @@ -178,7 +178,8 @@ nextflow_workflow { then { assertAll( { assert workflow.success }, - { assert workflow.stdout.contains("WARN: The following contigs are absent from at least one file : [chr2] and therefore won't be used")} + { assert workflow.stdout.contains("WARN: The following contigs are absent from at least one file : [chr2] and therefore won't be used")}, + { assert snapshot(sanitizeOutput(workflow.out)).match()} ) } } @@ -241,4 +242,64 @@ nextflow_workflow { ) } } + + test("Should give panel_id: None") { + when { + workflow { + """ + input[0] = false // version + input[1] = false // validate_params + input[2] = false // _monochrome_logs + input[3] = [] // nextflow_cli_args + input[4] = "results" // outdir + input[5] = [] // help + input[6] = [] // help_full + input[7] = [] // show_hidden + input[8] = channel.of([ + [genome: "GRCh38"], + file(params.pipelines_testdata_base_path + "/hum_data/reference_genome/GRCh38.s.fa.gz", checkIfExists: true), + file(params.pipelines_testdata_base_path + "/hum_data/reference_genome/GRCh38.s.fa.gz.fai", checkIfExists: true), + file(params.pipelines_testdata_base_path + "/hum_data/reference_genome/GRCh38.s.fa.gz.gzi", checkIfExists: true) + ]) // ch_ref_gen + input[9] = [ + "input_target" : "../../../tests/csv/sample_sim.csv", + "input_truth" : null, + "input_region" : null, + "input_panel" : null, + "input_posfile": null, + "input_chunks" : null, + "input_map" : null, + ] + input[10] = ["simulate"] // steps + input[11] = [] // tools + input[12] = 5 // max_chr_names + input[13] = [ // params_simulate + depth: 1, + genotype: null + ] + input[14] = [ // params_panelprep + normalize : null, + remove_samples: null, + compute_freq : null, + phase : null, + chunk_model : "recursive" + ] + input[15] = [ // params_impute + batch_size: null, + k_val : null, + n_gen : null, + buffer : null, + ] + """ + } + } + + then { + assertAll( + { assert workflow.success }, + { assert workflow.stdout.contains("WARN: No panel IDs detected. Channel panel, chunks and posfile will be initialise with null values and `panel_id`: None.")}, + { assert snapshot(sanitizeOutput(workflow.out)).match()} + ) + } + } } diff --git a/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test.snap b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test.snap new file mode 100644 index 00000000..5922c241 --- /dev/null +++ b/subworkflows/local/utils_nfcore_phaseimpute_pipeline/tests/main.nf.test.snap @@ -0,0 +1,380 @@ +{ + "Should give panel_id: None": { + "content": [ + { + "ch_chunks": [ + [ + { + "panel_id": "None", + "chr": "chr21" + }, + [ + + ] + ], + [ + { + "panel_id": "None", + "chr": "chr22" + }, + [ + + ] + ] + ], + "ch_depth": [ + [ + { + "depth": 1 + }, + 1 + ] + ], + "ch_fasta_index": [ + [ + { + "genome": "GRCh38" + }, + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz", + [ + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.fai", + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.gzi" + ] + ] + ], + "ch_input_target": [ + [ + { + "id": "NA12878", + "tools": [ + + ], + "batch": 0 + }, + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA12878/NA12878.s.bam", + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA12878/NA12878.s.bam.bai" + ], + [ + { + "id": "NA19401", + "tools": [ + + ], + "batch": 0 + }, + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA19401/NA19401.s.bam", + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA19401/NA19401.s.bam.bai" + ], + [ + { + "id": "NA20359", + "tools": [ + + ], + "batch": 0 + }, + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA20359/NA20359.s.bam", + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/individuals/NA20359/NA20359.s.bam.bai" + ] + ], + "ch_input_truth": [ + [ + [ + + ], + [ + + ], + [ + + ] + ] + ], + "ch_map": [ + [ + { + "panel_id": "None", + "chr": "chr21" + }, + [ + + ] + ], + [ + { + "panel_id": "None", + "chr": "chr22" + }, + [ + + ] + ] + ], + "ch_panel": [ + [ + { + "panel_id": "None", + "chr": "chr21" + }, + [ + + ], + [ + + ] + ], + [ + { + "panel_id": "None", + "chr": "chr22" + }, + [ + + ], + [ + + ] + ] + ], + "ch_posfile": [ + [ + { + "panel_id": "None", + "chr": "chr21" + }, + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ], + [ + { + "panel_id": "None", + "chr": "chr22" + }, + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ] + ], + "ch_regions": [ + [ + { + "chr": "chr21", + "region": "chr21:0-46709983" + }, + "chr21:0-46709983" + ], + [ + { + "chr": "chr22", + "region": "chr22:0-50818468" + }, + "chr22:0-50818468" + ] + ] + } + ], + "timestamp": "2026-05-26T15:10:42.05045451", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + }, + "Should give a warning with missing files": { + "content": [ + { + "ch_chunks": [ + [ + { + "panel_id": "1000GP", + "chr": "chr21" + }, + "https://github.com/nf-core/test-datasets/raw/phaseimpute/hum_data/panel/chr21/1000GP.chr21_chunks.txt" + ], + [ + { + "panel_id": "1000GP", + "chr": "chr22" + }, + "https://github.com/nf-core/test-datasets/raw/phaseimpute/hum_data/panel/chr22/1000GP.chr22_chunks.txt" + ] + ], + "ch_depth": [ + [ + [ + + ], + [ + + ] + ] + ], + "ch_fasta_index": [ + [ + { + "genome": "GRCh38" + }, + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz", + [ + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.fai", + "/nf-core/test-datasets/refs/heads/phaseimpute/hum_data/reference_genome/GRCh38.s.fa.gz.gzi" + ] + ] + ], + "ch_input_target": [ + [ + [ + + ], + [ + + ], + [ + + ] + ] + ], + "ch_input_truth": [ + [ + [ + + ], + [ + + ], + [ + + ] + ] + ], + "ch_map": [ + [ + { + "panel_id": "1000GP", + "chr": "chr21" + }, + [ + + ] + ], + [ + { + "panel_id": "1000GP", + "chr": "chr22" + }, + [ + + ] + ] + ], + "ch_panel": [ + [ + { + "panel_id": "1000GP", + "chr": "chr21" + }, + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/panel/chr21/1000GP.chr21.s.norel.vcf.gz", + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/panel/chr21/1000GP.chr21.s.norel.vcf.gz.csi" + ], + [ + { + "panel_id": "1000GP", + "chr": "chr22" + }, + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/panel/chr22/1000GP.chr22.s.norel.vcf.gz", + "https://raw.githubusercontent.com/nf-core/test-datasets/phaseimpute/hum_data/panel/chr22/1000GP.chr22.s.norel.vcf.gz.csi" + ] + ], + "ch_posfile": [ + [ + { + "panel_id": "1000GP", + "chr": "chr21" + }, + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ], + [ + { + "panel_id": "1000GP", + "chr": "chr22" + }, + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ] + ] + ], + "ch_regions": [ + [ + { + "chr": "chr21", + "region": "chr21:16570000-16610000" + }, + "chr21:16570000-16610000" + ], + [ + { + "chr": "chr22", + "region": "chr22:16570000-16610000" + }, + "chr22:16570000-16610000" + ] + ] + } + ], + "timestamp": "2026-05-26T15:00:37.724098357", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + } +} \ No newline at end of file diff --git a/workflows/phaseimpute/main.nf b/workflows/phaseimpute/main.nf index 80338077..b2160d38 100644 --- a/workflows/phaseimpute/main.nf +++ b/workflows/phaseimpute/main.nf @@ -128,7 +128,7 @@ workflow PHASEIMPUTE { .count() ch_map_branched = ch_map - .branch { meta, map_file -> + .branch { _meta, map_file -> non_empty: map_file empty: true } @@ -308,7 +308,7 @@ workflow PHASEIMPUTE { // if (steps.contains("impute") || steps.contains("all")) { - if (params.tools.split(',').any{ it in ["stitch", "quilt"] }) { + if (tools.any{ tool -> tool in ["stitch", "quilt"] }) { // Transform posfile to tabulated format shared by QUILT and STITCH GAWK_POSFILE_IMPUTE( ch_posfile.map{ diff --git a/workflows/phaseimpute/tests/test_sim.nf.test b/workflows/phaseimpute/tests/test_sim.nf.test index f0af0773..f2c8f05f 100644 --- a/workflows/phaseimpute/tests/test_sim.nf.test +++ b/workflows/phaseimpute/tests/test_sim.nf.test @@ -8,7 +8,7 @@ nextflow_pipeline { config "./nextflow.config" - test("Check test_sim") { + test("Check test_sim - with regions") { config "../../../conf/test_sim.config" when { params { @@ -28,4 +28,26 @@ nextflow_pipeline { ) } } + + test("Check test_sim - no regions") { + config "../../../conf/test_sim.config" + when { + params { + publish_dir_mode = "copy" + pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/phaseimpute/' + outdir = "$outputDir" + input_region = null + publish_all = true + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot( + UTILS.getPipelineResults(outputDir, workflow) + ).match()} + ) + } + } } diff --git a/workflows/phaseimpute/tests/test_sim.nf.test.snap b/workflows/phaseimpute/tests/test_sim.nf.test.snap index cc8e0504..f3e9fe66 100644 --- a/workflows/phaseimpute/tests/test_sim.nf.test.snap +++ b/workflows/phaseimpute/tests/test_sim.nf.test.snap @@ -1,8 +1,187 @@ { - "Check test_sim": { + "Check test_sim - no regions": { "content": [ { - "workflow size": 35, + "workflow size": 26, + "versions": { + "BAMCHREXTRACT": { + "samtools": "1.23" + }, + "FILTER_CHR_DWN": { + "gawk": "5.3.1" + }, + "FILTER_CHR_INP": { + "gawk": "5.3.1" + }, + "GAWK": { + "gawk": "5.3.1" + }, + "SAMTOOLS_COVERAGE_DWN": { + "samtools": "1.23.1" + }, + "SAMTOOLS_COVERAGE_INP": { + "samtools": "1.23.1" + }, + "SAMTOOLS_DEPTH": { + "samtools": "1.23.1" + }, + "SAMTOOLS_FAIDX": { + "samtools": "1.23.1" + }, + "SAMTOOLS_VIEW": { + "samtools": "1.23.1" + }, + "Workflow": { + "nf-core/phaseimpute": "v1.2.0dev" + } + }, + "stable name": [ + "initialisation", + "initialisation/prepare_genome", + "initialisation/prepare_genome/GRCh38.s.fa.gz.fai", + "initialisation/prepare_genome/GRCh38.s.fa.gz.gzi", + "multiqc", + "multiqc/multiqc_data", + "multiqc/multiqc_data/llms-full.txt", + "multiqc/multiqc_data/multiqc.log", + "multiqc/multiqc_data/multiqc.parquet", + "multiqc/multiqc_data/multiqc_citations.txt", + "multiqc/multiqc_data/multiqc_data.json", + "multiqc/multiqc_data/multiqc_general_stats.txt", + "multiqc/multiqc_data/multiqc_samtools_coverage.txt", + "multiqc/multiqc_data/multiqc_software_versions.txt", + "multiqc/multiqc_data/multiqc_sources.txt", + "multiqc/multiqc_data/samtools-coverage-table.txt", + "multiqc/multiqc_data/samtools-coverage_BQ.txt", + "multiqc/multiqc_data/samtools-coverage_Bases.txt", + "multiqc/multiqc_data/samtools-coverage_Coverage.txt", + "multiqc/multiqc_data/samtools-coverage_MQ.txt", + "multiqc/multiqc_data/samtools-coverage_Mean_depth.txt", + "multiqc/multiqc_data/samtools-coverage_Reads.txt", + "multiqc/multiqc_plots", + "multiqc/multiqc_plots/pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage-table.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_BQ-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_BQ-log.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Bases-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Bases-log.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Coverage-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Coverage-log.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_MQ-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_MQ-log.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Mean_depth-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Mean_depth-log.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Reads-cnt.pdf", + "multiqc/multiqc_plots/pdf/samtools-coverage_Reads-log.pdf", + "multiqc/multiqc_plots/png", + "multiqc/multiqc_plots/png/samtools-coverage-table.png", + "multiqc/multiqc_plots/png/samtools-coverage_BQ-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_BQ-log.png", + "multiqc/multiqc_plots/png/samtools-coverage_Bases-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_Bases-log.png", + "multiqc/multiqc_plots/png/samtools-coverage_Coverage-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_Coverage-log.png", + "multiqc/multiqc_plots/png/samtools-coverage_MQ-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_MQ-log.png", + "multiqc/multiqc_plots/png/samtools-coverage_Mean_depth-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_Mean_depth-log.png", + "multiqc/multiqc_plots/png/samtools-coverage_Reads-cnt.png", + "multiqc/multiqc_plots/png/samtools-coverage_Reads-log.png", + "multiqc/multiqc_plots/svg", + "multiqc/multiqc_plots/svg/samtools-coverage-table.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_BQ-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_BQ-log.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Bases-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Bases-log.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Coverage-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Coverage-log.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_MQ-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_MQ-log.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Mean_depth-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Mean_depth-log.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Reads-cnt.svg", + "multiqc/multiqc_plots/svg/samtools-coverage_Reads-log.svg", + "multiqc/multiqc_report.html", + "pipeline_info", + "pipeline_info/nf_core_phaseimpute_software_mqc_versions.yml", + "simulation", + "simulation/csv", + "simulation/csv/simulate.csv", + "simulation/downsample", + "simulation/downsample/NA12878_Call.depth.mean.tsv", + "simulation/downsample/NA12878_Call.depth.tsv", + "simulation/downsample/NA19401_Call.depth.mean.tsv", + "simulation/downsample/NA19401_Call.depth.tsv", + "simulation/downsample/NA20359_Call.depth.mean.tsv", + "simulation/downsample/NA20359_Call.depth.tsv", + "simulation/samples", + "simulation/samples/NA12878.depth_1x.bam", + "simulation/samples/NA12878.depth_1x.bam.csi", + "simulation/samples/NA19401.depth_1x.bam", + "simulation/samples/NA19401.depth_1x.bam.csi", + "simulation/samples/NA20359.depth_1x.bam", + "simulation/samples/NA20359.depth_1x.bam.csi" + ], + "stable path": [ + "GRCh38.s.fa.gz.fai:md5,4f4e0ff133e7a05cb469e345f766ca8c", + "GRCh38.s.fa.gz.gzi:md5,09046d9646db2cc5c425f231ce4595d7", + "multiqc_citations.txt:md5,ee1baba4adea57b43a9ae59ce7e1dd7f", + "multiqc_general_stats.txt:md5,892e86e7d6185084a4b1048191aa4edd", + "multiqc_samtools_coverage.txt:md5,6895b2524f6a136b53c9c9e957d262d8", + "samtools-coverage-table.txt:md5,120b0c67b1514ba15cb2c3a52d840497", + "samtools-coverage_BQ.txt:md5,5c29b2eb9b1d828cc92a67a4ff8943a4", + "samtools-coverage_Bases.txt:md5,68c2fe0d9735ddc9dcf853283965a33d", + "samtools-coverage_Coverage.txt:md5,a7c2f78709f541ce39feb465844f5e3f", + "samtools-coverage_MQ.txt:md5,0ad5368c7d8b528dd4e53952fb64ea3d", + "samtools-coverage_Mean_depth.txt:md5,5f1c291701c1523cb36cc8b3ad0494f5", + "samtools-coverage_Reads.txt:md5,51a2dde7594b38da2732ccd86056b820", + "NA12878_Call.depth.mean.tsv:md5,713a1f4811ad68444bffa85d1fcdbe0d", + "NA12878_Call.depth.tsv:md5,ad4748275377d3241aff21cddd97d1e7", + "NA19401_Call.depth.mean.tsv:md5,aab44099c9906b1619cad70830bedf7e", + "NA19401_Call.depth.tsv:md5,367baec0401a4574bd08d3036767e872", + "NA20359_Call.depth.mean.tsv:md5,caec5b47dd4f370753b500eb3cf101c4", + "NA20359_Call.depth.tsv:md5,1cc9ee6a593493a0368976cff8d16d49" + ], + "BAM files": [ + [ + "NA12878.depth_1x.bam", + "dc551537ba58eff2b963531cf79100be" + ], + [ + "NA19401.depth_1x.bam", + "d8efbef9945eb65dc7a5a5d94b928df3" + ], + [ + "NA20359.depth_1x.bam", + "1f0cd7f2be5f0d4572f1757850766ea7" + ] + ], + "VCF files": [ + + ], + "CSV files": [ + { + "fileName": "simulate.csv", + "rows": [ + "sample,file,index", + "NA12878,NA12878.depth_1x.bam,NA12878.depth_1x.bam.csi", + "NA19401,NA19401.depth_1x.bam,NA19401.depth_1x.bam.csi", + "NA20359,NA20359.depth_1x.bam,NA20359.depth_1x.bam.csi" + ] + } + ] + } + ], + "timestamp": "2026-05-26T12:06:11.814316819", + "meta": { + "nf-test": "0.9.5", + "nextflow": "26.04.1" + } + }, + "Check test_sim - with regions": { + "content": [ + { + "workflow size": 38, "versions": { "BAMCHREXTRACT": { "samtools": "1.23" @@ -121,14 +300,20 @@ "simulation/downsample/NA20359_Call.depth.mean.tsv", "simulation/downsample/NA20359_Call.depth.tsv", "simulation/region_extracted", + "simulation/region_extracted/NA12878_Rchr21_16570000-16610000_view.bam", + "simulation/region_extracted/NA12878_Rchr21_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA12878_Rchr22_16570000-16610000_view.bam", "simulation/region_extracted/NA12878_Rchr22_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA12878_region_merged.bam", "simulation/region_extracted/NA12878_region_merged.bam.bai", + "simulation/region_extracted/NA19401_Rchr21_16570000-16610000_view.bam", + "simulation/region_extracted/NA19401_Rchr21_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA19401_Rchr22_16570000-16610000_view.bam", "simulation/region_extracted/NA19401_Rchr22_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA19401_region_merged.bam", "simulation/region_extracted/NA19401_region_merged.bam.bai", + "simulation/region_extracted/NA20359_Rchr21_16570000-16610000_view.bam", + "simulation/region_extracted/NA20359_Rchr21_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA20359_Rchr22_16570000-16610000_view.bam", "simulation/region_extracted/NA20359_Rchr22_16570000-16610000_view.bam.csi", "simulation/region_extracted/NA20359_region_merged.bam", @@ -145,30 +330,38 @@ "GRCh38.s.fa.gz.fai:md5,4f4e0ff133e7a05cb469e345f766ca8c", "GRCh38.s.fa.gz.gzi:md5,09046d9646db2cc5c425f231ce4595d7", "multiqc_citations.txt:md5,ee1baba4adea57b43a9ae59ce7e1dd7f", - "multiqc_general_stats.txt:md5,1e64ff0511d612e5be9c79b1f9118bcf", - "multiqc_samtools_coverage.txt:md5,f9ccbacabb9dd66667a3b53c6da46339", - "samtools-coverage-table.txt:md5,8e75baaecf0bffbf3374030767847daf", - "samtools-coverage_BQ.txt:md5,b4588f44fc39d32a501760469ed8d051", - "samtools-coverage_Bases.txt:md5,16d42af3e25438d3ad2c5bf7580188ff", - "samtools-coverage_Coverage.txt:md5,e2d1202f2b50e37af12a4d0ca3ed6aa0", - "samtools-coverage_MQ.txt:md5,678629db8f597832c92e40e34a030a5b", - "samtools-coverage_Mean_depth.txt:md5,ca2ee0346807443e9008fb84a2d7697e", - "samtools-coverage_Reads.txt:md5,428b91fe12113a3fafa1c7a91bd353e9", - "NA12878_Call.depth.mean.tsv:md5,251df319e51ac6c84f093b459bc35afe", - "NA12878_Call.depth.tsv:md5,df5db4fe094e31bb199b356e331ad796", - "NA19401_Call.depth.mean.tsv:md5,fa817417af538770e493c619fbd4c92d", - "NA19401_Call.depth.tsv:md5,9a0e70c0fe445f6e5739d9ce92da9fca", - "NA20359_Call.depth.mean.tsv:md5,772189a184011391113106e866ebafc7", - "NA20359_Call.depth.tsv:md5,5a5e2f970d47dd05a6f62e470e3cc773" + "multiqc_general_stats.txt:md5,892e86e7d6185084a4b1048191aa4edd", + "multiqc_samtools_coverage.txt:md5,6895b2524f6a136b53c9c9e957d262d8", + "samtools-coverage-table.txt:md5,120b0c67b1514ba15cb2c3a52d840497", + "samtools-coverage_BQ.txt:md5,5c29b2eb9b1d828cc92a67a4ff8943a4", + "samtools-coverage_Bases.txt:md5,68c2fe0d9735ddc9dcf853283965a33d", + "samtools-coverage_Coverage.txt:md5,a7c2f78709f541ce39feb465844f5e3f", + "samtools-coverage_MQ.txt:md5,0ad5368c7d8b528dd4e53952fb64ea3d", + "samtools-coverage_Mean_depth.txt:md5,5f1c291701c1523cb36cc8b3ad0494f5", + "samtools-coverage_Reads.txt:md5,51a2dde7594b38da2732ccd86056b820", + "NA12878_Call.depth.mean.tsv:md5,713a1f4811ad68444bffa85d1fcdbe0d", + "NA12878_Call.depth.tsv:md5,ad4748275377d3241aff21cddd97d1e7", + "NA19401_Call.depth.mean.tsv:md5,aab44099c9906b1619cad70830bedf7e", + "NA19401_Call.depth.tsv:md5,367baec0401a4574bd08d3036767e872", + "NA20359_Call.depth.mean.tsv:md5,caec5b47dd4f370753b500eb3cf101c4", + "NA20359_Call.depth.tsv:md5,1cc9ee6a593493a0368976cff8d16d49" ], "BAM files": [ + [ + "NA12878_Rchr21_16570000-16610000_view.bam", + "d48cdac37c8c3f3d1833662a33adf802" + ], [ "NA12878_Rchr22_16570000-16610000_view.bam", "c404c1616c736a2a6a1cdd5bced715f" ], [ "NA12878_region_merged.bam", - "c404c1616c736a2a6a1cdd5bced715f" + "e2bdad66faf19f8bccf4c4c51dc120b3" + ], + [ + "NA19401_Rchr21_16570000-16610000_view.bam", + "a423563712d655237951854fc9276133" ], [ "NA19401_Rchr22_16570000-16610000_view.bam", @@ -176,7 +369,11 @@ ], [ "NA19401_region_merged.bam", - "6d27d50a7f4f48c2a50090668c4530ce" + "78fd8273f02cede375553530177ac050" + ], + [ + "NA20359_Rchr21_16570000-16610000_view.bam", + "171c07e12f21a9f95d7140106b975677" ], [ "NA20359_Rchr22_16570000-16610000_view.bam", @@ -184,19 +381,19 @@ ], [ "NA20359_region_merged.bam", - "830119db01a05efce0254b40d5bebb33" + "19adee4b55b3744226d38ceaca5ea983" ], [ "NA12878.depth_1x.bam", - "272e610bed32eeaa815658ca49f7ad35" + "dc551537ba58eff2b963531cf79100be" ], [ "NA19401.depth_1x.bam", - "fae9a19d3e52d7e476f7458338e8da62" + "d8efbef9945eb65dc7a5a5d94b928df3" ], [ "NA20359.depth_1x.bam", - "60213ec054efd437f1652031d1c4b139" + "1f0cd7f2be5f0d4572f1757850766ea7" ] ], "VCF files": [ @@ -215,10 +412,10 @@ ] } ], - "timestamp": "2026-04-14T12:32:03.943680779", + "timestamp": "2026-05-26T16:01:16.502949055", "meta": { "nf-test": "0.9.5", - "nextflow": "25.10.4" + "nextflow": "26.04.1" } } } \ No newline at end of file