This project predicts peptide-protein(HLA-A) interactions.
The repository already includes pretrained checkpoints and precomputed features, so you can run inference directly.
You can:
- run
demo.pyfor a single-sample demonstration, or - run
evaluate/test.pyfor evaluation on the provided dataset.
PepHLA_Interaction/
├── demo.py # Single-sample inference demo
├── README.md
├── requirements.txt
│
├── ckpts/ # Pretrained model checkpoints
│ ├── HQNN_AUC_0.8530_AUPR_0.6382.pt
│ ├── CNN_AUC_0.8260_AUPR_0.5608.pt
│ └── CAMP_AUC_0.8653_AUPR_0.6429.pt
│
├── dataset/
│ └── HLA.tsv # Main input table for inference/evaluation
│
├── evaluate/
│ └── test.py # Batch evaluation script (ACC/AUC/AUPR/confusion matrix)
│
├── models/
│ ├── hqnn.py # Default model used by demo/test
│ ├── cnn.py
│ └── camp.py
│
└── data_prepare/
├── features_HLA/ # Precomputed feature dictionaries (required for inference)
│ ├── peptide_seq_feature_dict
│ ├── peptide_ss_feature_dict
│ ├── peptide_dis_feature_dict
│ ├── peptide_phys_feature_dict
│ ├── protein_seq_feature_dict
│ ├── protein_ss_feature_dict
│ ├── protein_dis_pssm_feature_dict
│ ├── protein_phys_feature_dict
│ ├── pad_pep_len.npy
│ └── pad_prot_len.npy
├── step1_IUPred2A/ # Disorder prediction intermediate outputs
├── step2_PSSM/ # PSSM intermediate outputs
└── step3_SSPro/ # Secondary-structure intermediate outputs
- Each row in
dataset/HLA.tsvrepresents oneprotein + peptidepair. - Column order:
prot_seqpep_seqlabelpep_concat_seqprot_concat_seq
demo.pyandevaluate/test.pyuse these fields as keys to retrieve entries fromdata_prepare/features_HLA.- As long as your sample strings exactly match dictionary keys, inference works directly.
Python 3.10+ is recommended.
conda create -n pepHLA python=3.10 -y
conda activate pepHLApip install -r requirements.txtCurrent dependencies:
torch==2.8.0pennylane==0.42.3pandas==2.3.3numpy==1.26.4scikit-learn==1.7.2tqdmmatplotlib
demo.py is the reader-friendly entry point:
- loads one sample from
HLA.tsv, - fetches all 8 required feature groups from
features_HLA, - loads a checkpoint,
- prints predicted probability and binary class.
python demo.pyDefault values:
- checkpoint:
./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.pt - feature directory:
./data_prepare/features_HLA - dataset file:
./dataset/HLA.tsv - sample index:
0
python demo.py --sample-index 10python demo.py \
--dataset ./dataset/HLA.tsv \
--feature-dir ./data_prepare/features_HLA \
--checkpoint ./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.ptevaluate/test.py performs batch evaluation and reports:
- Accuracy / Precision / Recall
- AUC / AUPR
- Optional confusion matrix (printed; optional image output)
Run from project root:
python evaluate/test.py --split testOptions:
--split {train,test,all}(default:test)--cmto print confusion matrix and try saving a figure--cm-outto set confusion matrix image path
Example:
python evaluate/test.py --split test --cm --cm-out ./evaluate/confusion_matrix.pngNote: test.py first tries to read dataset/HLA_split.pkl.
This repository already provides ready-to-use HLA features in features_HLA.
For new data, follow the same processing logic and key format.
- Prepare a TSV table with the same schema as
dataset/HLA.tsv. - Generate peptide/protein intermediate features.
- Convert intermediate outputs into dictionaries with keys matching your TSV fields.
- Run
demo.pyfirst for a single-sample sanity check, then run batch evaluation.
- IUPred2A (disorder): https://iupred2a.elte.hu/
- BLAST+ (PSSM): https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- SSPro (secondary structure): https://download.igb.uci.edu/#sspro
data_prepare/step1_IUPred2A/*.txt: residue-level disorder scoresdata_prepare/step2_PSSM/prot1.pssm: PSSM matrix exampledata_prepare/step3_SSPro/pep_prot.fasta+pep_prot.ss: sequence/secondary-structure paired examples