- Overview
- Installation
- Preparing the Reference Genome for the Pipeline
- External Dependencies
- Contributing
NGS Analyzer is a comprehensive python-based pipeline designed for amplicon-based NGS data analysis. It streamlines processes such as demultiplexing, read alignment, variant calling, and annotation, providing researchers with a powerful tool for genomics research.
To isolate project dependencies, it is recommended to use a virtual environment.
Use the following command to create virtual environment:
python -m venv venv # Linux / Mac
C:\Path\To\Python\python.exe -m venv venv # WindowsActivate the virtual environment with
source venv/bin/activate # Linux / Mac
.\venv\Scripts\activate # WindowsFor simple use of the program:
python -m pip install --upgrade pip
python -m pip install -r requirements/requirements.txtand for Development purposes:
python -m pip install -r requirements/requirements-dev.txtC:\Path\To\Python\python.exe -m pip install -r requirements\requirements.txtFor Developers:
C:\Path\To\Python\python.exe -m pip install -r requirements\requirements-dev.txtOr, if python is in your PATH environment variable,
python.exe -m pip install -r requirements\requirements.txtTo ensure proper functioning of the pipeline, the reference genome needs to be prepared:
- Index the genome using BWA:
/path/to/bwa index /path/to/genome_file.fa- Create a GenomeSize.xml file for Pisces:
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 \
LD_LIBRARY_PATH="/usr/local/lib" \
/path/to/pisces/CreateGenomeSizeFile \
-g path/to/genome_folder/ \
-s "Homo sapiens (<build alias>)"For the build alias, you can use "UCSC hg38" as an example.
The project also relies on several external tools. Please ensure these are installed and their paths in your configuration.
bcl2fastq2 | Converts Illumina BCL files to FASTQ files.
bwa-mem2 | Fast aligner for sequencing reads.
bwa | Burrow-Wheeler Aligner for short-read alignment (for systems with less than 64 Gb RAM).
Trimmomatic | Read trimming tool.
pTrimmer | Adapter trimming and quality control.
Picard | Toolkit for manipulating high-throughput sequencing data.
Genome Analysis Toolkit | Genome Analysis Toolkit for variant discovery.
samtools | Utilities for manipulating alignments and variant calling.
bedtools | Genomic interval manipulation.
SnpEff | Variant annotation and effect prediction.
Pisces | Variant caller optimized for amplicon sequencing.
Annovar | Variant annotation tool.
Please follow the official installation instructions for each tool required for your needs to ensure proper setup.
Contributions are welcome:
- Fork repo
- Create feature branch
- Open PR to
dev - Issues welcome!
Currently in development. See the project documentation at readthedocs.