This workflow detects, annotates and filters somatic fusions from stranded paired-end RNA-seq from Illumina's instruments.
git clone [--branch ${VESRSION}] --recurse-submodules git@github.com:bialimed/sofur.git
-
conda (>=4.6.8):
# Install conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \ sh Miniconda3-latest-Linux-x86_64.sh # Install mamba conda activate base conda install -c conda-forge mambaMore details on miniconda install here.
-
snakemake (>=5.4.2):
mamba create -c conda-forge -c bioconda -n sofur snakemake==7.32.4 conda activate sofur pip install drmaaMore details on snakemake install here.
-
Install rules dependencies (cutadapt, bwa, ...):
conda activate sofur snakemake \ --use-conda \ --conda-prefix ${application_env_dir} \ --conda-create-envs-only --snakefile ${APP_DIR}/Snakefile \ --configfile workflow_parameters.yml
SoFuR uses genome sequences, genes annotations, known fusions databanks (artifacts and pathogenics) and others standard resources. These files must be provided in your config.yml (see template). An detailed example for the creation process of these resources is available here.
-
From
${APP_DIR}/test/config/wf_config.ymlset variables corresponding to databanks (see## ${BANK}/... ##). -
Launch test wit following command:
conda activate sofur ${APP_DIR}/test/launch_wf.sh \ ${CONDA_ENVS_DIR} \ ${WORK_DIR} \ ${DRMAA_PARAMS}Example with scheduler SGE:
export DRMAA_LIBRARY_PATH=${SGE_ROOT}/lib/linux-rhel7-x64/libdrmaa.so conda activate sofur ~/soft/sofur/test/launch_wf.sh \ /work/$USER/conda_envs/envs \ /work/$USER/test_sofur \ ' -V -q {cluster.queue} -l pri_{cluster.queue}=1 -l mem={cluster.vmem} -l h_vmem={cluster.vmem} -pe smp {cluster.threads}'Example with scheduler slurm:
export DRMAA_LIBRARY_PATH=$SGE_ROOT/lib/linux-rhel7-x64/libdrmaa.so conda activate sofur ~/soft/sofur/test/launch_wf.sh \ /work/$USER/conda_envs/envs \ /work/$USER/test_sofur \ ' --partition={cluster.queue} --mem={cluster.mem} --cpus-per-task={cluster.threads}' -
See results in
${WORK_DIR}/report/run.html.
Copy ${APP_DIR}/config/workflow_parameters.tpl.yml in your current directory
and change values before launching the following command:
conda activate sofur
snakemake \
--use-conda \
--conda-prefix ${application_env_dir} \
--jobs ${nb_jobs} \
--jobname "sofur.{rule}.{jobid}" \
--latency-wait 100 \
--snakefile ${application_dir}/Snakefile \
--cluster-config ${application_dir}/config/cluster.json \
--configfile workflow_parameters.yml \
--directory ${out_dir} \
> ${out_dir}/wf_log.txt \
2> ${out_dir}/wf_stderr.txt
The main elements of the outputs directory are the following:
out_dir/
├── ...
├── report/
| ├── ...
| ├── run.html
| └── sample-A.html
├── stats/
| ├── ...
| └── multiqc/
| ├── ...
| └── multiqc_report.html
└── structural_variants/
├── ...
├── sample-A_unfiltered.vcf
└── sample-A_filtered.vcf
- The reports files containing filtered fusions
list, annotations and viewers are in
out_dir/report/{sample}.html. - The quality reports is in
out_dir/stats/multiqc/multiqc_report.html. It resumes qualities of reads, distribution of alignments (between exon, intron, ...) and strandness analysis. - The annotated fusions in VCF format are kept in
out_dir/structural_variants.*_unfiltered.vcfcontain all the fusions, their annotations and their tags.*_filtered.vcfcontain all the fusions, their annotations and their tags after filtering by rules provided by file declarated in configfile byfilters.rules.
The performances are evaluated on real, synthetic and simulated datasets. The
commands used in evaluation are stored in assessment. The results summarized
in assessment/report.html.
2019 Laboratoire d'Anatomo-Cytopathologie du CHU de Toulouse
