PepHLA_Interaction

This project predicts peptide-protein（HLA-A） interactions.
The repository already includes pretrained checkpoints and precomputed features, so you can run inference directly.

You can:

run demo.py for a single-sample demonstration, or
run evaluate/test.py for evaluation on the provided dataset.

1. Project Structure

PepHLA_Interaction/
├── demo.py                         # Single-sample inference demo
├── README.md
├── requirements.txt
│
├── ckpts/                          # Pretrained model checkpoints
│   ├── HQNN_AUC_0.8530_AUPR_0.6382.pt
│   ├── CNN_AUC_0.8260_AUPR_0.5608.pt
│   └── CAMP_AUC_0.8653_AUPR_0.6429.pt
│
├── dataset/
│   └── HLA.tsv                     # Main input table for inference/evaluation
│
├── evaluate/
│   └── test.py                     # Batch evaluation script (ACC/AUC/AUPR/confusion matrix)
│
├── models/
│   ├── hqnn.py                     # Default model used by demo/test
│   ├── cnn.py
│   └── camp.py
│
└── data_prepare/
    ├── features_HLA/               # Precomputed feature dictionaries (required for inference)
    │   ├── peptide_seq_feature_dict
    │   ├── peptide_ss_feature_dict
    │   ├── peptide_dis_feature_dict
    │   ├── peptide_phys_feature_dict
    │   ├── protein_seq_feature_dict
    │   ├── protein_ss_feature_dict
    │   ├── protein_dis_pssm_feature_dict
    │   ├── protein_phys_feature_dict
    │   ├── pad_pep_len.npy
    │   └── pad_prot_len.npy
    ├── step1_IUPred2A/             # Disorder prediction intermediate outputs
    ├── step2_PSSM/                 # PSSM intermediate outputs
    └── step3_SSPro/                # Secondary-structure intermediate outputs

Data Mapping Overview

Each row in dataset/HLA.tsv represents one protein + peptide pair.
Column order:
- prot_seq
- pep_seq
- label
- pep_concat_seq
- prot_concat_seq
demo.py and evaluate/test.py use these fields as keys to retrieve entries from data_prepare/features_HLA.
As long as your sample strings exactly match dictionary keys, inference works directly.

2. Environment Setup

Python 3.10+ is recommended.

2.1 Create a Conda Environment (Recommended)

conda create -n pepHLA python=3.10 -y
conda activate pepHLA

2.2 Install Dependencies

pip install -r requirements.txt

Current dependencies:

torch==2.8.0
pennylane==0.42.3
pandas==2.3.3
numpy==1.26.4
scikit-learn==1.7.2
tqdm
matplotlib

3. Quick Start: `demo.py`

demo.py is the reader-friendly entry point:

loads one sample from HLA.tsv,
fetches all 8 required feature groups from features_HLA,
loads a checkpoint,
prints predicted probability and binary class.

3.1 Run with Defaults

python demo.py

Default values:

checkpoint: ./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.pt
feature directory: ./data_prepare/features_HLA
dataset file: ./dataset/HLA.tsv
sample index: 0

3.2 Select a Different Sample

python demo.py --sample-index 10

3.3 Override Paths

python demo.py \
  --dataset ./dataset/HLA.tsv \
  --feature-dir ./data_prepare/features_HLA \
  --checkpoint ./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.pt

4. Validation Script: `evaluate/test.py`

evaluate/test.py performs batch evaluation and reports:

Accuracy / Precision / Recall
AUC / AUPR
Optional confusion matrix (printed; optional image output)

Run from project root:

python evaluate/test.py --split test

Options:

--split {train,test,all} (default: test)
--cm to print confusion matrix and try saving a figure
--cm-out to set confusion matrix image path

Example:

python evaluate/test.py --split test --cm --cm-out ./evaluate/confusion_matrix.png

Note: test.py first tries to read dataset/HLA_split.pkl.

5. New Data Processing Reference

This repository already provides ready-to-use HLA features in features_HLA.
For new data, follow the same processing logic and key format.

5.1 Recommended Workflow

Prepare a TSV table with the same schema as dataset/HLA.tsv.
Generate peptide/protein intermediate features.
Convert intermediate outputs into dictionaries with keys matching your TSV fields.
Run demo.py first for a single-sample sanity check, then run batch evaluation.

5.2 External Toolchain

IUPred2A (disorder): https://iupred2a.elte.hu/
BLAST+ (PSSM): https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
SSPro (secondary structure): https://download.igb.uci.edu/#sspro

5.3 Intermediate File Format Examples

data_prepare/step1_IUPred2A/*.txt: residue-level disorder scores
data_prepare/step2_PSSM/prot1.pssm: PSSM matrix example
data_prepare/step3_SSPro/pep_prot.fasta + pep_prot.ss: sequence/secondary-structure paired examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PepHLA_Interaction

1. Project Structure

Data Mapping Overview

2. Environment Setup

2.1 Create a Conda Environment (Recommended)

2.2 Install Dependencies

3. Quick Start: `demo.py`

3.1 Run with Defaults

3.2 Select a Different Sample

3.3 Override Paths

4. Validation Script: `evaluate/test.py`

5. New Data Processing Reference

5.1 Recommended Workflow

5.2 External Toolchain

5.3 Intermediate File Format Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
ckpts		ckpts
data_prepare		data_prepare
dataset		dataset
evaluate		evaluate
models		models
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PepHLA_Interaction

1. Project Structure

Data Mapping Overview

2. Environment Setup

2.1 Create a Conda Environment (Recommended)

2.2 Install Dependencies

3. Quick Start: demo.py

3.1 Run with Defaults

3.2 Select a Different Sample

3.3 Override Paths

4. Validation Script: evaluate/test.py

5. New Data Processing Reference

5.1 Recommended Workflow

5.2 External Toolchain

5.3 Intermediate File Format Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Quick Start: `demo.py`

4. Validation Script: `evaluate/test.py`

Packages