From 253eda4abf4548921b37171c0bdfff631f253a41 Mon Sep 17 00:00:00 2001 From: sanjaysgk <44039457+sanjaysgk@users.noreply.github.com> Date: Tue, 26 May 2026 11:04:00 +1000 Subject: [PATCH] docs: refresh README with badges, live demo/docs links, and GFM callouts --- README.md | 136 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 106 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index ced044d..723ad65 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,53 @@ # MHC-TP -Cluster immunopeptidomics peptides by their HLA/MHC binding motif and get a -ranked table plus a standalone interactive HTML report. +[![PyPI](https://img.shields.io/pypi/v/mhc-tp.svg)](https://pypi.org/project/mhc-tp/) +[![Python](https://img.shields.io/pypi/pyversions/mhc-tp.svg)](https://pypi.org/project/mhc-tp/) +[![Docs](https://img.shields.io/badge/docs-mkdocs--material-526CFE.svg)](https://purcelllab.github.io/MHC-TP/) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) + +Cluster immunopeptidomics peptides by their **HLA/MHC binding motif** and get a +ranked table plus a standalone, interactive HTML report. `mhc-tp` takes a **GibbsCluster** output folder, correlates each cluster's position-specific scoring matrix against a reference of HLA/MHC **class I + II** -binding motifs (human & mouse), and writes the best allele match per cluster. +binding motifs (human & mouse), and reports the best allele match per cluster. + +> [!TIP] +> 📖 **[Documentation](https://purcelllab.github.io/MHC-TP/)**  Â·  +> 🔬 **[Live example report](https://purcelllab.github.io/MHC-TP/example-report.html)**  Â·  +> 📦 **[PyPI](https://pypi.org/project/mhc-tp/)** --- -## For users +## Quick start -**Requirements:** Python 3.9–3.11. +```bash +pip install mhc-tp +mhc-tp fetch -s human # download reference motifs (once) +mhc-tp search -s human -o results/ +``` + +Open `results/clust_result/mhc-tp-result.html` in any browser — see what it looks like in the +**[live example report](https://purcelllab.github.io/MHC-TP/example-report.html)**. + +> [!NOTE] +> **Requirements:** Python 3.9–3.11. A virtual environment is recommended +> (`python -m venv .venv && source .venv/bin/activate`). + +--- -### 1. Install +## Install -Clone the repo and install it (editable, so `git pull` updates the tool): +From PyPI (recommended): + +```bash +pip install mhc-tp +``` + +
+Or install editable from source + +So that `git pull` updates the tool: ```bash git clone https://github.com/PurcellLab/MHC-TP.git @@ -23,10 +55,11 @@ cd MHC-TP pip install -e . ``` -> Prefer a one-liner without cloning? `pip install git+https://github.com/PurcellLab/MHC-TP.git` -> A virtual environment (`python -m venv .venv && source .venv/bin/activate`) is recommended. +One-liner without cloning: `pip install "git+https://github.com/PurcellLab/MHC-TP.git"` + +
-### 2. Download the reference data (once) +## Download the reference data (once) The reference motifs are fetched from the GitHub release, not bundled: @@ -34,29 +67,31 @@ The reference motifs are fetched from the GitHub release, not bundled: mhc-tp fetch -s human # or: mouse | all ``` -### 3. Run a search +## Run a search ```bash mhc-tp search -s human -o results/ ``` -`` is a GibbsCluster run folder (it must contain a -`matrices/` subdirectory). +`` is a GibbsCluster run folder (it must contain a `matrices/` subdirectory). **Outputs** land in `results/clust_result/`: | file | what it is | |------|------------| -| `correlations.csv` | every cluster→allele match above the threshold (`hla` = display name, `formatted` = raw key, `correlation` = PCC) | -| `mhc-tp-result.html` | standalone report — open it in any browser | +| `correlations.csv` | every cluster→allele match (`hla` = display name, `formatted` = raw key, `correlation` = PCC) | +| `mhc-tp-result.html` | standalone interactive report — open it in any browser | -### Common options +### Options | flag | meaning | default | |------|---------|---------| | `-s, --species` | `human` or `mouse` | `human` | +| `-c, --class` | restrict the reference to MHC class `I`, `II`, or `all` | `all` | | `-r, --reference` | path to a `.parquet` (otherwise the fetched one is used) | auto | -| `-t, --threshold` | minimum Pearson correlation to report | `0.70` | +| `-t, --threshold` | minimum Pearson correlation (PCC) to report | `0.70` | +| `--topNHits` | allele matches to keep per cluster | `3` | +| `--always-top-n` | keep each cluster's top-N even below threshold (flagged in the report) | off | | `-o, --output` | output directory | `output` | | `--threads` | max CPU threads (also `$MHC_TP_THREADS`) | `4` | | `--no-html` | write only the CSV | off | @@ -64,43 +99,67 @@ mhc-tp search -s human -o results/ Run `mhc-tp search --help` for the full list. +### Examples + +```bash +# Class I only, keep the top 5 matches per cluster +mhc-tp search runs/sampleA -s human -c I --topNHits 5 -o results/ + +# Guarantee a top-3 for every cluster (weak matches tagged "below cutoff") +mhc-tp search runs/sampleA -s human --always-top-n -o results/ +``` + +> [!IMPORTANT] +> By default a match must score `≥ --threshold`, so a cluster can return fewer than +> `--topNHits` rows (or none). `--always-top-n` returns the best N regardless — the +> threshold then only **annotates** confidence and nothing is dropped. + --- ## For contributors / developers -The project uses [pixi](https://pixi.sh) for a reproducible dev environment -(Python 3.11) and a `src/` layout packaged with hatchling. +
+Dev environment, tests, and docs (click to expand) + +The project uses [pixi](https://pixi.sh) for a reproducible dev environment (Python 3.11) +and a `src/` layout packaged with hatchling. ```bash git clone https://github.com/PurcellLab/MHC-TP.git cd MHC-TP pixi install # create the dev env from pixi.lock -pixi run dev-install # editable-install the package into the env (run once) +pixi run dev-install # editable-install the package (run once) pixi run test # pytest pixi run lint # ruff pixi run fmt # black ``` -Always run via `pixi run …` — a bare `python` may pick up a different -interpreter without the pinned dependencies. +> [!WARNING] +> Always run via `pixi run …` — a bare `python` may pick up a different interpreter +> without the pinned dependencies. CI enforces `black --check`, so run `pixi run fmt` before pushing. + +### Preview the docs site + +```bash +pip install -e ".[docs]" +mkdocs serve # live preview at http://127.0.0.1:8000 +mkdocs build # static site in ./site +``` ### Rebuilding the reference data (dev only) End users never do this. The per-species parquets are built once from the NetMHCpan / NetMHCIIpan packs and uploaded to the release. Embedding the -Seq2Logo reference logos (`--with-logos`) needs a separate Python 2.7 env and -is slow — run it on a cluster: +Seq2Logo reference logos (`--with-logos`) needs a separate Python 2.7 env and is slow: ```bash -mhc-tp build-ref \ - --with-logos --workers 16 -# Seq2Logo itself runs in its own env: pixi run -e seq2logo ... +mhc-tp build-ref --with-logos --workers 16 ``` ### Layout -``` +```text src/mhc_tp/ cli.py entry point (mhc-tp) engine/ numba correlation search @@ -109,8 +168,21 @@ src/mhc_tp/ db/ DEV-ONLY reference-pack ingestion tui/ Rich console banner, logging, results table tests/ pytest suite +docs/ MkDocs site ``` +
+ +--- + +## How it works + +For each GibbsCluster motif, every reference allotype motif is scored by the **Pearson +correlation** of their flattened position-weight matrices, computed only over the +informative cells of the cluster motif. Per cluster the allotypes are ranked by PCC +(`1.0` = identical motif shape). Full method and formula: +the **[API reference](https://purcelllab.github.io/MHC-TP/api/)**. + --- ## Citation @@ -118,11 +190,14 @@ tests/ pytest suite If you use MHC-TP in your work, please cite: > Munday PR, Krishna SSG, Fehring J, Croft NP, Purcell AW, Li C, Braun A. -> Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis -> of immunopeptidomic data. *Comput Struct Biotechnol J.* 2025;29:296–304. +> *Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis of +> immunopeptidomic data.* Comput Struct Biotechnol J. 2025;29:296–304. > doi:[10.1016/j.csbj.2025.10.007](https://doi.org/10.1016/j.csbj.2025.10.007). > PMID: [41209766](https://pubmed.ncbi.nlm.nih.gov/41209766/); PMCID: PMC12590289. +
+BibTeX + ```bibtex @article{Munday2025Immunolyser2, title = {Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis of immunopeptidomic data}, @@ -137,3 +212,4 @@ If you use MHC-TP in your work, please cite: } ``` +