GWASPipe

GWASPipe is a Python-based computational pipeline that streamlines the management and processing of genome-wide association study (GWAS) summary statistics. It automates complex workflows for quality control, standardization, and visualization, making multi-study harmonization more reproducible, efficient, and less error-prone.

Key Features

Modular Architecture: Organized into reusable components (order_alleles, utils)
Automated Workflows: Handles normalization, allele harmonization, filtering, and QC metrics
Flexible Configuration: Uses YAML configuration files for customizable processing
Comprehensive Reporting: Generates QC reports and visualizations
High Performance: Leverages parallel processing and optimized algorithms

Requirements

Python: 3.11 or higher
Dependencies: See pyproject.toml for complete list
Key Packages:
- gwaslab - Core GWAS processing library
- pandas, numpy - Data manipulation
- click, cloup - Command-line interface
- loguru - Advanced logging
- ruamel.yaml - YAML configuration

Installation

Via Conda/Mamba (Recommended)

# Clone the repository
git clone https://github.com/ht-diva/gwaspipe.git
cd gwaspipe

# Create and activate environment
conda env create -f environment_docker.yml
conda activate gwaspipe

# Install package
make install

Via Docker

# Pull the Docker image
docker pull ghcr.io/ht-diva/gwaspipe:latest

# Run container
docker run -v $(pwd):/data ghcr.io/ht-diva/gwaspipe gwaspipe --help

Quick Start

Process GWAS summary statistics with a single command:

gwaspipe \
  -c examples/config_sumstats_harmonization.yml \
  -i examples/input_data.tsv.gz \
  -f regenie \
  -o results/

Module Documentation

Order Alleles Module

The gwaspipe.order_alleles module provides comprehensive allele ordering functionality:

from gwaspipe.order_alleles import order_alleles
import pandas as pd
from gwaslab.info.g_Log import Log

# Basic usage
df = pd.DataFrame({
    'CHR': [1, 2, 3],
    'POS': [1000, 2000, 3000],
    'EA': ['A', 'T', 'C'],
    'NEA': ['T', 'A', 'G'],
    'STATUS': [9999999, 9999999, 9999999]
})

log = Log()
result = order_alleles(df, log=log)

Configuration

GWASPipe uses YAML configuration files to define processing pipelines. See Getting Started Guide for detailed configuration examples.

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Make changes and add tests
Commit changes: git commit -m "Add feature description"
Push branch: git push origin feature/your-feature
Open a Pull Request

License

GWASPipe is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
bump-version.py		bump-version.py
environment.yml		environment.yml
environment_docker.yml		environment_docker.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GWASPipe

Key Features

Table of Contents

Requirements

Installation

Via Conda/Mamba (Recommended)

Via Docker

Quick Start

Module Documentation

Order Alleles Module

Configuration

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GWASPipe

Key Features

Table of Contents

Requirements

Installation

Via Conda/Mamba (Recommended)

Via Docker

Quick Start

Module Documentation

Order Alleles Module

Configuration

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages