DiVA.wes

This is a fork of DiVA (DNA Variant Analysis), a Snakemake-based pipeline for Next-Generation Sequencing Exome data analysis, developed at CRS4 Next Generation Sequencing Core Facility. Software dependencies are directly managed by Snakemake using Conda, ensuring the reproducibility of the workflow according to FAIR principles.

In this repo we retained the first part of the analysis, from FASTQ to the recalibrated VCF following GATK Best Practices, and quality control. This pipeline should be executed to generate a master VCf including all the samples, and should re-executed when new samples are available.

Annotation is implemented in DiVA.annotate, which can be used to extract subset of samples from the master VCF for variant annotation and prioritization.

This is an example of folder organization. In parenthesis the name of the pipeline executed in each folder:

   ROOT
    │
    ├── wes_master (diva.wes)
    |
    ├── project_A (diva.annotate)
    |
    ├── project_B (diva.annotate)
    |
    ├── project_N (diva.annotate)

Running DiVA.wes

Clone the repository from git-hub:

git clone https://github.com/igg-bioinfo/diva.wes.git

Rename the folder, from diva.wes to your PROJECT_NAME:

mv diva.wes PROJECT_NAME

cd into the newly created folder:

cd PROJECT_NAME

Edit the configuration files in conf subfolder:
- config.yaml - paths to your reference files: genome, target regions, etc.
- samples.tsv - associate samples to FASTQ files
- samples.ped - pedigree file in ped format
- units.tsv - paths to FASTQ files
Edit the Snakefile and uncomment the output files you need
If conda package manager is not available, install miniconda.
Create a virtual environment containing snakemake, as suggested here. First install mamba as a replacement of the default conda solver:

conda install -c conda-forge mamba

Then, install snakemake:

mamba env create --name snakemake --file environment.yaml

Activate the enviroment:

conda activate snakemake

Run snakemake in dry-run mode to check if everything is fine. YOUR_WORKING_DIR could follow the format: YYYY-MM-DD.

snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --dryrun

For verbose output:

snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --verbose --reason --dryrun

If you are happy with the --dryrun, run snakemake:

snakemake --cores 32 --use-conda --configfile conf/config.yaml --printshellcmds -d YOUR_WORKING_DIR --rerun-incomplete --keep-going --conda-frontend mamba

Tip: For large projects, we suggest to run snakemake in a screen session.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
conf/templates		conf/templates
docs		docs
envs		envs
images		images
resources		resources
rules		rules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile.original		Snakefile.original
Snakefile.template		Snakefile.template
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiVA.wes

Running DiVA.wes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DiVA.wes

Running DiVA.wes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages