Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
581fccb
test quilt2 first draft implementation
atrigila Apr 3, 2026
e485e63
Merge branch 'dev' into quilt2
atrigila Apr 3, 2026
7b1ba0b
add tool citation
atrigila Apr 3, 2026
e528a0b
fix failing function tests
atrigila Apr 3, 2026
d243866
update names of params used and snapshot
atrigila Apr 5, 2026
a99218a
fix empty snapshot content
atrigila Apr 5, 2026
9342744
add assert function contains back
atrigila Apr 5, 2026
6fd6c69
remove unnecessary function stdout def
atrigila Apr 5, 2026
bb5fcb1
remove space
atrigila Apr 5, 2026
73b3510
allow mspbwt
atrigila Apr 5, 2026
b00ac83
improve docs
atrigila Apr 5, 2026
68f396f
import quilt2 from nf-core modules
atrigila Apr 12, 2026
9100414
reorder bam impute quilt2
atrigila Apr 12, 2026
7e377da
fix linting
atrigila Apr 12, 2026
ca975bc
add comments and map
atrigila Apr 15, 2026
87b31b3
remove saveas
atrigila Apr 15, 2026
183886e
add full name
atrigila Apr 15, 2026
1883862
update tests and snapshots with map
atrigila Apr 15, 2026
0263ad2
Update README.md
atrigila Apr 15, 2026
f71958e
add bam_impute_quilt2 from nf-core
atrigila May 17, 2026
ada6e8b
update tests quilt2
atrigila May 17, 2026
ac27f5b
update usage.md
atrigila May 17, 2026
4d1584b
update nf-tests
atrigila May 17, 2026
d4c6ebf
Merge branch 'quilt2' of https://github.com/atrigila/phaseimpute into…
atrigila May 17, 2026
fa2153b
update geneticmapconvert module
atrigila May 17, 2026
bab06ce
update snapshots
atrigila May 17, 2026
6cf3ef8
update snapshot
atrigila May 17, 2026
ced88e5
fix docs
atrigila May 19, 2026
f9377db
remove duplicate config
atrigila May 19, 2026
50f062d
fix linting
atrigila May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The whole pipeline consists of five main steps, each of which can be run separat
- **Position Extraction** for targeted imputation sites.

4. **Imputation (`--impute`)**: This is the primary step, where genotypes in the target dataset are imputed using the prepared reference panel. The main steps are:
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).
- **Imputation** of the target dataset using tools like [Glimpse1](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html), [Glimpse2](https://odelaneau.github.io/GLIMPSE/), [Stitch](https://github.com/rwdavies/stitch), [Quilt/Quilt2](https://github.com/rwdavies/QUILT), [Beagle5](https://faculty.washington.edu/browning/beagle/beagle.html) or [Minimac4](https://github.com/statgen/Minimac4).
- **Ligation** of imputed chunks to produce a final VCF file per sample, with all chromosomes unified.

5. **Validation (`--validate`)**: Assesses imputation accuracy by comparing the imputed dataset to a truth dataset. This step leverages the [Glimpse2](https://odelaneau.github.io/GLIMPSE/) concordance process to summarize differences between two VCF files.
Expand Down
1 change: 1 addition & 0 deletions conf/containers_conda_lock_files_amd64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'https://wave.seqera.io/v1alpha1/builds/bd-ee7739d47738383b_1/condalock' } }
1 change: 1 addition & 0 deletions conf/containers_conda_lock_files_arm64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'https://wave.seqera.io/v1alpha1/builds/bd-58d7dee710ab3aa8_1/condalock' } }
1 change: 1 addition & 0 deletions conf/containers_docker_amd64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'community.wave.seqera.io/library/multiqc:1.33--ee7739d47738383b' } }
1 change: 1 addition & 0 deletions conf/containers_docker_arm64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'community.wave.seqera.io/library/multiqc:1.33--58d7dee710ab3aa8' } }
1 change: 1 addition & 0 deletions conf/containers_singularity_https_amd64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/34/34e733a9ae16a27e80fe00f863ea1479c96416017f24a907996126283e7ecd4d/data' } }
1 change: 1 addition & 0 deletions conf/containers_singularity_https_arm64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/78/78b89e91d89e9cc99ad5ade5be311f347838cb2acbfb4f13bc343b170be09ce4/data' } }
1 change: 1 addition & 0 deletions conf/containers_singularity_oras_amd64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'oras://community.wave.seqera.io/library/multiqc:1.33--e3576ddf588fa00d' } }
1 change: 1 addition & 0 deletions conf/containers_singularity_oras_arm64.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
process { withName: 'MULTIQC' { container = 'oras://community.wave.seqera.io/library/multiqc:1.33--2537ca5f8445e3c2' } }
42 changes: 42 additions & 0 deletions conf/steps/imputation_quilt2.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

process {

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:.*' {
publishDir = [
path: { "${params.outdir}/imputation/quilt2/variant_calling/" },
enabled: params.publish_all,
mode: params.publish_dir_mode,
]
tag = {"Batch ${meta.batch} ${meta.regionout ?: meta.chr}"}
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:QUILT_QUILT2' {
ext.args = "--seed=${params.seed}"
ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.regionout ? meta.regionout.replace(':','_') : meta.chr}.quilt2" }
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:GLIMPSE2_LIGATE' {
ext.prefix = { "${meta.id}.batch${meta.batch}.${meta.chr}.quilt2.ligate" }
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:BAM_IMPUTE_QUILT2:BCFTOOLS_INDEX' {
ext.args = '--tbi'
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:.*' {
publishDir = [
path: { "${params.outdir}/imputation/quilt2/concat" },
mode: params.publish_dir_mode
]
}

withName: 'NFCORE_PHASEIMPUTE:PHASEIMPUTE:CONCAT_QUILT2:BCFTOOLS_CONCAT' {
ext.args = "--output-type z --ligate"
ext.prefix = { "${meta.id}.batch${meta.batch}.quilt2" }
}
}
3 changes: 2 additions & 1 deletion conf/steps/initialisation.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ process {
publishDir = [
path: { "${params.outdir}/initialisation/map_convertion" },
mode: params.publish_dir_mode,
enabled: params.publish_all
enabled: params.publish_all,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
Comment thread
atrigila marked this conversation as resolved.
]
}
}
38 changes: 38 additions & 0 deletions conf/test_quilt2.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/phaseimpute -profile test_quilt2,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '4.GB',
time: '1.h'
]
}

params {
config_profile_name = 'Minimal QUILT2 Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function using the tool QUILT2'

// Input data
input = "${projectDir}/tests/csv/sample_bam.csv"
input_region = "${projectDir}/tests/csv/region.csv"
chunks = "${projectDir}/tests/csv/chunks.csv"
panel = "${projectDir}/tests/csv/panel.csv"

Comment thread
atrigila marked this conversation as resolved.
// Genome references
fasta = params.pipelines_testdata_base_path + "hum_data/reference_genome/GRCh38.s.fa.gz"
map = "${projectDir}/tests/csv/map_glimpse.csv"

// Pipeline parameters
steps = "impute"
tools = "quilt2"
}
4 changes: 2 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ The results from `--steps impute` will have the following directory structure:
```tree
├── batch
├── csv
├── <glimpse1|glimpse2|quilt|stitch|beagle5|minimac4>
├── <glimpse1|glimpse2|quilt|quilt2|stitch|beagle5|minimac4>
│ ├── concat/
│ └── samples/
├── stats
Expand All @@ -152,7 +152,7 @@ The results from `--steps impute` will have the following directory structure:
- `imputation/batch/all.batchi.id.txt`: List of samples names processed in the i^th^ batch.
- `imputation/csv/`
- `impute.csv`: A single CSV file containing the path to a VCF file and its index, of each imputed sample with their corresponding tool.
- `imputation/[glimpse1,glimpse2,quilt,stitch]/`
- `imputation/[glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]/`
- `concat/all.batch*.vcf.gz`: The concatenated VCF files of all imputed samples by batches.
- `concat/all.batch*.vcf.gz.tbi`: The index file for the concatenated imputed VCF files of the samples.
- `samples/*.vcf.gz`: A VCF file of each imputed sample.
Expand Down
45 changes: 29 additions & 16 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,7 @@ The different tests profiles are:
- `test`: A profile to evaluate the imputation step with the `glimpse1` tool.
- `test_glimpse2`: A profile to evaluate the imputation step with the `glimpse2` tool.
- `test_quilt`: A profile to evaluate the imputation step with the `quilt` tool.
- `test_quilt2`: A profile to evaluate the imputation step with the `quilt2` tool.
- `test_stitch`: A profile to evaluate the imputation step with the `stitch` tool.
- `test_beagle5`: A profile to evaluate the imputation step with the `beagle5` tool.
- `test_minimac4`: A profile to evaluate the imputation step with the `minimac4` tool.
Expand Down Expand Up @@ -472,9 +473,9 @@ For starting from the imputation steps, the required flags are:
- `--steps impute`
- `--input input.csv`: The samplesheet containing the input sample files in `bam`, `cram` or `vcf`, `bcf` format.
- `--genome` or `--fasta`: The reference genome of the samples.
- `--tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in:
- `--tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]`: A selection of one or more of the available imputation tools. Each imputation tool has their own set of specific flags and input files. These required files are produced by `--steps panelprep` and used as input in:
- `--posfile posfile.csv`: A samplesheet containing all the different files required by the imputation tool. This file can be generated with `--steps panelprep`.
- `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2). These files can be obtained with `--steps panelprep`.
- `--panel panel.csv`: A samplesheet containing the post-processed reference panel VCF (required by GLIMPSE1, GLIMPSE2 and QUILT2). These files can be obtained with `--steps panelprep`.

Optionnaly you can provide the following flags:

Expand All @@ -483,21 +484,23 @@ Optionnaly you can provide the following flags:

#### Summary table of mandatory (m) and optional (o) parameters in `--steps impute`

| | `--steps impute`(m) | `--input`(m) | `--genome` or `--fasta`(m) | `--panel`(m) | `--posfile`(m) | `--map`(o) | `--chunks`(o) |
| ---------- | ------------------- | ------------ | -------------------------- | ------------ | -------------- | ---------- | ------------- |
| `GLIMPSE1` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ³ | ✅ | ✅ |
| `GLIMPSE2` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `QUILT` | ✅ | ✅ ² | ✅ | ❌ | ✅ ⁴ | ✅ | ✅ |
| `STITCH` | ✅ | ✅ ² | ✅ | ❌ | ✅ ³ | ✅ | ✅ |
| `BEAGLE5` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `MINIMAC4` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ⁵ | ✅ | ✅ |
| | `--steps impute`(m) | `--input`(m) | `--genome` or `--fasta`(m) | `--panel`(m) | `--posfile`(m/o) | `--map`(o) | `--chunks`(o) |
| ---------- | ------------------- | ------------ | -------------------------- | ------------ | ---------------- | ---------- | ------------- |
| `GLIMPSE1` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ³ | ✅ | ✅ |
| `GLIMPSE2` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `QUILT` | ✅ | ✅ ² | ✅ | ❌ | ✅ ⁴ | ✅ | ✅ |
| `QUILT2` | ✅ | ✅ ² | ✅ | ✅ ⁵ | ❌ | ✅ | ✅ |
| `STITCH` | ✅ | ✅ ² | ✅ | ❌ | ✅ ³ | ✅ | ✅ |
| `BEAGLE5` | ✅ | ✅ ¹ | ✅ | ✅ | ❌ | ✅ | ✅ |
| `MINIMAC4` | ✅ | ✅ ¹ | ✅ | ✅ | ✅ ⁶ | ✅ | ✅ |

> ¹ Alignment files as well as variant calling format (i.e. BAM, CRAM, VCF or BCF)
> ² Alignment files only (i.e. BAM or CRAM)
> ³ `GLIMPSE1` and `STITCH`: Should be a CSV with columns [panel id, chr, posfile]
> ⁴ `QUILT`: Should be a CSV with columns [panel id, chr, hap, legend, posfile]
> ⁵ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index]
> ⁶ Not yet supported
> ⁵ `QUILT2`: Uses the reference panel VCF directly. The panel CSV should contain [panel id, chr, vcf, index]
> ⁶ `MINIMAC4`: Optionally, a VCF with its index can be provided for more control over the imputed positions. Should be a CSV with columns [panel id, chr, vcf, index]
> ⁷ Not yet supported

Here is a representation on how the input files will be processed depending on the input files type and the selected imputation tool.

Expand All @@ -523,15 +526,25 @@ To summarize:
- GLIMPSE2 should not do target-to-target imputation.
- If you have alignment files (e.g., BAM or CRAM), all tools are available, and processing will occur in `batch_size`:
- GLIMPSE1 and STITCH may induce batch effects, so all samples need to be imputed together.
- GLIMPSE2 and QUILT can process samples in separate batches.
- GLIMPSE2, QUILT and QUILT2 can process samples in separate batches.

## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,stitch,beagle5,minimac4]`
## Imputation tools `--steps impute --tools [glimpse1,glimpse2,quilt,quilt2,stitch,beagle5,minimac4]`

You can choose different software to perform the imputation. In the following sections, the typical commands for running the pipeline with each software are included. Multiple tools can be selected by separating them with a comma (eg. `--tools glimpse1,quilt`).

### QUILT
### QUILT / QUILT2

[QUILT](https://github.com/rwdavies/QUILT) is an R and C++ program for rapid genotype imputation from low-coverage sequence using a large reference panel. The required inputs for this program are bam samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`).
[QUILT](https://github.com/rwdavies/QUILT) is an R and C++ package for read-aware genotype imputation from low-coverage sequencing using a reference panel. This pipeline contains the original QUILT method (`QUILT.R`, referred to here as `quilt`) and the newer QUILT2 method (`QUILT2.R`, exposed in this pipeline as `quilt2`).

In `nf-core/phaseimpute`, both methods use alignment files from `--input`, optionally benefit from `--map`, and can use `--chunks` to split the genome into overlapping imputation regions.

Choose `quilt2` by default for new projects. The official QUILT2 documentation describes it as the recommended modern method for large reference panels and diverse sequencing inputs including short reads, long reads, linked/barcoded reads and ancient DNA. The QUILT2 paper also reports a dedicated cfDNA/NIPT mode upstream; however, the current `nf-core/phaseimpute` integration includes the diploid imputation workflow only.

Choose `quilt` when you specifically want the original QUILT workflow.

#### `quilt` / `quilt2`

The required inputs for `quilt` are BAM/CRAM samples provided in the input samplesheet (`--input`) and a CSV file with the genomic chunks (`--chunks`).

```bash
nextflow run nf-core/phaseimpute \
Expand Down
20 changes: 15 additions & 5 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@
},
"bcftools/index": {
"branch": "master",
"git_sha": "6383d8fe58f9498eecd5aa303e71a4a932d1e9f6",
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
"installed_by": [
"bam_impute_quilt",
"bam_impute_quilt2",
"bam_impute_stitch",
"bam_vcf_impute_glimpse2",
"modules",
"vcf_impute_beagle5",
"vcf_impute_glimpse",
"vcf_impute_minimac4",
Expand Down Expand Up @@ -81,7 +81,7 @@
},
"custom/geneticmapconvert": {
"branch": "master",
"git_sha": "d879be4b3adc1bce389b002bd66f6954630f57d2",
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
"installed_by": ["modules"]
},
"gawk": {
Expand Down Expand Up @@ -116,12 +116,12 @@
},
"glimpse2/ligate": {
"branch": "master",
"git_sha": "236d7f19efcffccdfac5e1850af2aa035e0de79c",
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
"installed_by": [
"bam_impute_quilt",
"bam_impute_quilt2",
"bam_impute_stitch",
"bam_vcf_impute_glimpse2",
"modules",
"vcf_impute_beagle5",
"vcf_impute_minimac4"
]
Expand Down Expand Up @@ -161,6 +161,11 @@
"git_sha": "4e2990cc0df18823d11b192df73039c80fdebc7c",
"installed_by": ["bam_impute_quilt", "modules"]
},
"quilt/quilt2": {
"branch": "master",
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
"installed_by": ["bam_impute_quilt2"]
},
"samtools/coverage": {
"branch": "master",
"git_sha": "4e3e10e502ec6ab6b1c4b4fecd923ff1fa287338",
Expand Down Expand Up @@ -235,6 +240,11 @@
"git_sha": "4e2990cc0df18823d11b192df73039c80fdebc7c",
"installed_by": ["subworkflows"]
},
"bam_impute_quilt2": {
"branch": "master",
"git_sha": "0c06dfb24cd33e404c2811c28d74dd9e4a1df5ce",
"installed_by": ["subworkflows"]
},
"bam_impute_stitch": {
"branch": "master",
"git_sha": "e1cb31f0ced0d0810d1cb099aaa690b05beb1f3a",
Expand Down
4 changes: 2 additions & 2 deletions modules/nf-core/bcftools/index/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/bcftools/index/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading