From 253eda4abf4548921b37171c0bdfff631f253a41 Mon Sep 17 00:00:00 2001
From: sanjaysgk <44039457+sanjaysgk@users.noreply.github.com>
Date: Tue, 26 May 2026 11:04:00 +1000
Subject: [PATCH] docs: refresh README with badges, live demo/docs links, and
 GFM callouts

---
 README.md | 136 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 106 insertions(+), 30 deletions(-)
diff --git a/README.md b/README.md
index ced044d..723ad65 100644
--- a/README.md
+++ b/README.md
@@ -1,21 +1,53 @@
 # MHC-TP
 
-Cluster immunopeptidomics peptides by their HLA/MHC binding motif and get a
-ranked table plus a standalone interactive HTML report.
+[![PyPI](https://img.shields.io/pypi/v/mhc-tp.svg)](https://pypi.org/project/mhc-tp/)
+[![Python](https://img.shields.io/pypi/pyversions/mhc-tp.svg)](https://pypi.org/project/mhc-tp/)
+[![Docs](https://img.shields.io/badge/docs-mkdocs--material-526CFE.svg)](https://purcelllab.github.io/MHC-TP/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+
+Cluster immunopeptidomics peptides by their **HLA/MHC binding motif** and get a
+ranked table plus a standalone, interactive HTML report.
 
 `mhc-tp` takes a **GibbsCluster** output folder, correlates each cluster's
 position-specific scoring matrix against a reference of HLA/MHC **class I + II**
-binding motifs (human & mouse), and writes the best allele match per cluster.
+binding motifs (human & mouse), and reports the best allele match per cluster.
+
+> [!TIP]
+> 📖 **[Documentation](https://purcelllab.github.io/MHC-TP/)** &nbsp;·&nbsp;
+> 🔬 **[Live example report](https://purcelllab.github.io/MHC-TP/example-report.html)** &nbsp;·&nbsp;
+> 📦 **[PyPI](https://pypi.org/project/mhc-tp/)**
 
 ---
 
-## For users
+## Quick start
 
-**Requirements:** Python 3.9–3.11.
+```bash
+pip install mhc-tp
+mhc-tp fetch -s human                                  # download reference motifs (once)
+mhc-tp search <gibbscluster_output_dir> -s human -o results/
+```
+
+Open `results/clust_result/mhc-tp-result.html` in any browser — see what it looks like in the
+**[live example report](https://purcelllab.github.io/MHC-TP/example-report.html)**.
+
+> [!NOTE]
+> **Requirements:** Python 3.9–3.11. A virtual environment is recommended
+> (`python -m venv .venv && source .venv/bin/activate`).
+
+---
 
-### 1. Install
+## Install
 
-Clone the repo and install it (editable, so `git pull` updates the tool):
+From PyPI (recommended):
+
+```bash
+pip install mhc-tp
+```
+
+<details>
+<summary>Or install editable from source</summary>
+
+So that `git pull` updates the tool:
 
 ```bash
 git clone https://github.com/PurcellLab/MHC-TP.git
@@ -23,10 +55,11 @@ cd MHC-TP
 pip install -e .
 ```
 
-> Prefer a one-liner without cloning? `pip install git+https://github.com/PurcellLab/MHC-TP.git`
-> A virtual environment (`python -m venv .venv && source .venv/bin/activate`) is recommended.
+One-liner without cloning: `pip install "git+https://github.com/PurcellLab/MHC-TP.git"`
+
+</details>
 
-### 2. Download the reference data (once)
+## Download the reference data (once)
 
 The reference motifs are fetched from the GitHub release, not bundled:
 
@@ -34,29 +67,31 @@ The reference motifs are fetched from the GitHub release, not bundled:
 mhc-tp fetch -s human     # or:  mouse  |  all
 ```
 
-### 3. Run a search
+## Run a search
 
 ```bash
 mhc-tp search <gibbscluster_output_dir> -s human -o results/
 ```
 
-`<gibbscluster_output_dir>` is a GibbsCluster run folder (it must contain a
-`matrices/` subdirectory).
+`<gibbscluster_output_dir>` is a GibbsCluster run folder (it must contain a `matrices/` subdirectory).
 
 **Outputs** land in `results/clust_result/`:
 
 | file | what it is |
 |------|------------|
-| `correlations.csv` | every cluster→allele match above the threshold (`hla` = display name, `formatted` = raw key, `correlation` = PCC) |
-| `mhc-tp-result.html` | standalone report — open it in any browser |
+| `correlations.csv` | every cluster→allele match (`hla` = display name, `formatted` = raw key, `correlation` = PCC) |
+| `mhc-tp-result.html` | standalone interactive report — open it in any browser |
 
-### Common options
+### Options
 
 | flag | meaning | default |
 |------|---------|---------|
 | `-s, --species` | `human` or `mouse` | `human` |
+| `-c, --class` | restrict the reference to MHC class `I`, `II`, or `all` | `all` |
 | `-r, --reference` | path to a `<species>.parquet` (otherwise the fetched one is used) | auto |
-| `-t, --threshold` | minimum Pearson correlation to report | `0.70` |
+| `-t, --threshold` | minimum Pearson correlation (PCC) to report | `0.70` |
+| `--topNHits` | allele matches to keep per cluster | `3` |
+| `--always-top-n` | keep each cluster's top-N even below threshold (flagged in the report) | off |
 | `-o, --output` | output directory | `output` |
 | `--threads` | max CPU threads (also `$MHC_TP_THREADS`) | `4` |
 | `--no-html` | write only the CSV | off |
@@ -64,43 +99,67 @@ mhc-tp search <gibbscluster_output_dir> -s human -o results/
 
 Run `mhc-tp search --help` for the full list.
 
+### Examples
+
+```bash
+# Class I only, keep the top 5 matches per cluster
+mhc-tp search runs/sampleA -s human -c I --topNHits 5 -o results/
+
+# Guarantee a top-3 for every cluster (weak matches tagged "below cutoff")
+mhc-tp search runs/sampleA -s human --always-top-n -o results/
+```
+
+> [!IMPORTANT]
+> By default a match must score `≥ --threshold`, so a cluster can return fewer than
+> `--topNHits` rows (or none). `--always-top-n` returns the best N regardless — the
+> threshold then only **annotates** confidence and nothing is dropped.
+
 ---
 
 ## For contributors / developers
 
-The project uses [pixi](https://pixi.sh) for a reproducible dev environment
-(Python 3.11) and a `src/` layout packaged with hatchling.
+<details>
+<summary>Dev environment, tests, and docs (click to expand)</summary>
+
+The project uses [pixi](https://pixi.sh) for a reproducible dev environment (Python 3.11)
+and a `src/` layout packaged with hatchling.
 
 ```bash
 git clone https://github.com/PurcellLab/MHC-TP.git
 cd MHC-TP
 pixi install            # create the dev env from pixi.lock
-pixi run dev-install    # editable-install the package into the env (run once)
+pixi run dev-install    # editable-install the package (run once)
 
 pixi run test           # pytest
 pixi run lint           # ruff
 pixi run fmt            # black
 ```
 
-Always run via `pixi run …` — a bare `python` may pick up a different
-interpreter without the pinned dependencies.
+> [!WARNING]
+> Always run via `pixi run …` — a bare `python` may pick up a different interpreter
+> without the pinned dependencies. CI enforces `black --check`, so run `pixi run fmt` before pushing.
+
+### Preview the docs site
+
+```bash
+pip install -e ".[docs]"
+mkdocs serve            # live preview at http://127.0.0.1:8000
+mkdocs build            # static site in ./site
+```
 
 ### Rebuilding the reference data (dev only)
 
 End users never do this. The per-species parquets are built once from the
 NetMHCpan / NetMHCIIpan packs and uploaded to the release. Embedding the
-Seq2Logo reference logos (`--with-logos`) needs a separate Python 2.7 env and
-is slow — run it on a cluster:
+Seq2Logo reference logos (`--with-logos`) needs a separate Python 2.7 env and is slow:
 
 ```bash
-mhc-tp build-ref <species> <classI_pack> <classII_pack> <out.parquet> \
-    --with-logos --workers 16
-# Seq2Logo itself runs in its own env:  pixi run -e seq2logo ...
+mhc-tp build-ref <species> <classI_pack> <classII_pack> <out.parquet> --with-logos --workers 16
 ```
 
 ### Layout
 
-```
+```text
 src/mhc_tp/
   cli.py            entry point (mhc-tp)
   engine/           numba correlation search
@@ -109,8 +168,21 @@ src/mhc_tp/
   db/               DEV-ONLY reference-pack ingestion
   tui/              Rich console banner, logging, results table
 tests/              pytest suite
+docs/               MkDocs site
 ```
 
+</details>
+
+---
+
+## How it works
+
+For each GibbsCluster motif, every reference allotype motif is scored by the **Pearson
+correlation** of their flattened position-weight matrices, computed only over the
+informative cells of the cluster motif. Per cluster the allotypes are ranked by PCC
+(`1.0` = identical motif shape). Full method and formula:
+the **[API reference](https://purcelllab.github.io/MHC-TP/api/)**.
+
 ---
 
 ## Citation
@@ -118,11 +190,14 @@ tests/              pytest suite
 If you use MHC-TP in your work, please cite:
 
 > Munday PR, Krishna SSG, Fehring J, Croft NP, Purcell AW, Li C, Braun A.
-> Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis
-> of immunopeptidomic data. *Comput Struct Biotechnol J.* 2025;29:296–304.
+> *Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis of
+> immunopeptidomic data.* Comput Struct Biotechnol J. 2025;29:296–304.
 > doi:[10.1016/j.csbj.2025.10.007](https://doi.org/10.1016/j.csbj.2025.10.007).
 > PMID: [41209766](https://pubmed.ncbi.nlm.nih.gov/41209766/); PMCID: PMC12590289.
 
+<details>
+<summary>BibTeX</summary>
+
 ```bibtex
 @article{Munday2025Immunolyser2,
   title   = {Immunolyser 2.0: An advanced computational pipeline for comprehensive analysis of immunopeptidomic data},
@@ -137,3 +212,4 @@ If you use MHC-TP in your work, please cite:
 }
 ```
 
+</details>