AAV Integration Project

Project Overview

This project focuses on analyzing integration sites (IS) for Adeno-Associated Virus (AAV) using various bioinformatics tools and workflows. The project employs Snakemake workflows to handle different stages of data processing, including mapping, annotation, and coverage calculation.

Directory Structure

**v1.0.0**
├── src
│   ├── Snakefile_annofinal.smk
│   ├── Snakefile_map.smk
│   ├── Snakefile_IS_spikeIn.smk
│   ├── Snakefile_cov.smk
│   ├── Snakefile_anno.smk
│   ├── Snakefile_IS.smk
│   ├── Snakefile_site_cov.smk
│   ├── run.sh
│   ├── bin
│   │   ├── plot_spikeInIS.html
│   │   ├── plotIS_human.rmd
│   │   ├── rm_collisions.py
│   │   ├── filter_and_label_unique.py
│   │   ├── IS_anno_v1.py
│   │   ├── deploy_index.R
│   │   ├── IS_anno_v1_human.py
│   │   ├── plotIS.rmd
│   │   ├── parse_cigar_v0.py
│   │   ├── plot_readstats.html
│   │   ├── circos_plot.rmd
│   │   ├── extract_readid.py
│   │   ├── plot_ISreads.rmd
│   │   ├── mk_md_report.py
│   │   ├── plot_readstats.rmd
│   │   ├── mk_index.py
│   │   ├── collapseIS.py
│   │   ├── plot_cov.R
│   │   ├── count_reads.py
│   │   ├── plot_chemericReads.rmd
│   │   ├── plotIS.html
│   │   ├── plot_spikeInIS.rmd
│   │   ├── extract_ids.py
│   │   ├── deploy_dash_cov_app_ms.py
│   │   ├── format_IS_w_sonication_reads.py
│   │   ├── sonication_sites.py
│   │   ├── parse_cigar_v1.py
│   │   ├── filter_host_proper_paired_reads.py
│   │   ├── filter_unique.py
│   │   ├── plot_trunc.rmd
│   │   ├── sonication_sites_v1.py
│   │   ├── parse_cigar.py
│   │   ├── IS_anno_final.py
│   │   ├── app_template_samples.py
│   │   └── IS_anno_final_human.py
│   ├── Snakefile_cleanIS.smk
│   ├── Snakefile.py
│   ├── Snakefile_plot.smk
│   ├── Snakefile_samples.py
│   ├── Snakefile_count.smk
│   ├── rmd
│   │   └── site
│   ├── Snakefile_ref.smk
│   ├── Snakefile_const.py
│   └── Snakefile_site.smk
├── data
│   ├── refs
│   ├── processed
│   ├── interim
│   │   └── .keep
│   └── raw
│       └── .keep
├── configs
│   ├── Cosmic_CancerGeneCensus_v98_GRCh38.tsv
│   ├── master_mapping.tsv
│   ├── AAV_integration_data_analyses_overview_original.xlsx
│   ├── AAV_integration_data_analyses_overview.xlsx
│   ├── index.Rmd
│   ├── master_mapping_anno.tsv
│   ├── master_mapping_rerun.tsv
│   └── aav_integration.json
├── docs
├── README.md
└── reqs
    ├── dash-cov-requirements.txt
    ├── aav_integration.conda.env.yaml
    ├── rsconnect.conda.env.yaml
    ├── samblaster.conda.env.yaml
    └── cat.conda.env.yaml

Setup Instructions

Prerequisites

Ensure you have the following software installed:

Installation

Clone the repository:

git clone git@github.com:Sparktx-Data-Science/aav_integration.git
cd aav_integration/v1.0.0

Setup Conda Environment:

conda env create -f reqs/aav_integration.yaml
conda activate aav_integration

Setup Configurations: Edit the configuration files aav_integration.json in the configs directory as needed.
Prepare Data: Ensure raw data is located in the appropriate directories as specified in the configuration. Symbol link raw data folder at /data/raw/shortRead/run_id
Compile _site.yml file for deployment:
Store the following text in the file ./src/rmd/site/aav_integration_v1_0_0/_site.yml

name: "AAV integration"
navbar:
  title: "AAV integration analysis report"
  left:
    - text: "About"
      href: about.html
  output:
      html_document:
        theme: darkly

Fill out the experimental design file

./configs/AAV_integration_data_analyses_overview.xlsx

Usage

Running the Workflow

Execute the Snakemake workflow using the following command:

cd aav_integration/v1.0.0/
rm rmd/site/aav_integration_v1_0_0/index.Rmd
rm ../data/interim/flags/index_up.aav_integration_v1_0_0
rm ../data/interim/flags/rsconnect
bash run.sh

Key Snakemake Rules

Mapping: Snakefile_map.smk
- Handles filtering, mapping reads to spike-in, vector, and host references, and removing duplicates.
Annotation: Snakefile_anno.smk
- Annotates integration sites, identifies nearest transcription start sites (TSS), and overlaps with genomic features.
Coverage Calculation: Snakefile_cov.smk
- Calculates coverage across different regions and generates visualizations.
Integration Sites Analysis: Snakefile_IS.smk
- Parses split reads, identifies integration sites, and performs sonication site analysis.
Spike-in Analysis: Snakefile_IS_spikeIn.smk
- Similar to IS analysis but specifically for spike-in controls.

Example Commands

Run a specific Snakemake rule by modifying the run.sh script:

#!/bin/bash
# set -x

# 1. conda activate illumina_shortRead
# 2. prepare master_mapping.tsv in the ../configs folder
# 3. symbol link raw data folder at /data/raw/shortRead/

## ref__target map__target cov__target count__target IS_spikeIn__target IS__target anno__target cleanIS__target annofinal__target plot__target site__target

snakemake -s Snakefile.py \
        --use-conda --conda-frontend mamba --use-singularity \
        --singularity-args "-B /mnt/bfx-ops/:/mnt/bfx-ops/ -B /mnt/data/:/mnt/data/" \
        -j 60 \
        --configfile=../configs/aav_integration.json \
        --rerun-incomplete -p  \
        site__target

Generating Reports

The project includes several R Markdown (.rmd) scripts for generating detailed reports on integration sites, coverage, and other analyses. The reports are hosted on Spark's Rstudio Connect server

Contributing

Contributions are welcome! Please fork the repository and submit pull requests. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or issues, please open an issue on GitHub or contact chao.di@sparktx.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AAV Integration Project

Project Overview

Directory Structure

Setup Instructions

Prerequisites

Installation

Usage

Running the Workflow

Key Snakemake Rules

Example Commands

Generating Reports

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
docs		docs
reqs		reqs
src		src
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AAV Integration Project

Project Overview

Directory Structure

Setup Instructions

Prerequisites

Installation

Usage

Running the Workflow

Key Snakemake Rules

Example Commands

Generating Reports

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages