Skip to content

archsaurus/ngs-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NGS Analyzer

Read the Docs License Codecov Python Version pip Version Java Version


Navigation


Overview

NGS Analyzer is a comprehensive python-based pipeline designed for amplicon-based NGS data analysis. It streamlines processes such as demultiplexing, read alignment, variant calling, and annotation, providing researchers with a powerful tool for genomics research.

Installation

Prerequisites

Python Requirements

Environment Setup and Virtual Environment Activation

To isolate project dependencies, it is recommended to use a virtual environment.


Use the following command to create virtual environment:

python -m venv venv # Linux / Mac

C:\Path\To\Python\python.exe -m venv venv # Windows

Activate the virtual environment with

source venv/bin/activate # Linux / Mac

.\venv\Scripts\activate # Windows

Fetching requirements

On Linux / Mac

For simple use of the program:

python -m pip install --upgrade pip
python -m pip install -r requirements/requirements.txt

and for Development purposes:

python -m pip install -r requirements/requirements-dev.txt
On Windows:
C:\Path\To\Python\python.exe -m pip install -r requirements\requirements.txt

For Developers:

C:\Path\To\Python\python.exe -m pip install -r requirements\requirements-dev.txt

Or, if python is in your PATH environment variable,

python.exe -m pip install -r requirements\requirements.txt

Preparing the Reference Genome for the Pipeline

To ensure proper functioning of the pipeline, the reference genome needs to be prepared:

  1. Index the genome using BWA:
/path/to/bwa index /path/to/genome_file.fa
  1. Create a GenomeSize.xml file for Pisces:
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 \
LD_LIBRARY_PATH="/usr/local/lib" \
/path/to/pisces/CreateGenomeSizeFile \
    -g path/to/genome_folder/ \
    -s "Homo sapiens (<build alias>)"

For the build alias, you can use "UCSC hg38" as an example.

External Dependencies

The project also relies on several external tools. Please ensure these are installed and their paths in your configuration.

bcl2fastq2 | Converts Illumina BCL files to FASTQ files.

bwa-mem2 | Fast aligner for sequencing reads.

bwa | Burrow-Wheeler Aligner for short-read alignment (for systems with less than 64 Gb RAM).

Trimmomatic | Read trimming tool.

pTrimmer | Adapter trimming and quality control.

Picard | Toolkit for manipulating high-throughput sequencing data.

Genome Analysis Toolkit | Genome Analysis Toolkit for variant discovery.

samtools | Utilities for manipulating alignments and variant calling.

bedtools | Genomic interval manipulation.

SnpEff | Variant annotation and effect prediction.

Pisces | Variant caller optimized for amplicon sequencing.

Annovar | Variant annotation tool.

Please follow the official installation instructions for each tool required for your needs to ensure proper setup.


Contributing

Contributions are welcome:

  1. Fork repo
  2. Create feature branch
  3. Open PR to dev
  4. Issues welcome!

Currently in development. See the project documentation at readthedocs.

Releases

No releases published

Packages

 
 
 

Contributors

Languages