Skip to content

SpinQTech/PepHLA_Interaction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PepHLA_Interaction

This project predicts peptide-protein(HLA-A) interactions.
The repository already includes pretrained checkpoints and precomputed features, so you can run inference directly.

You can:

  • run demo.py for a single-sample demonstration, or
  • run evaluate/test.py for evaluation on the provided dataset.

1. Project Structure

PepHLA_Interaction/
├── demo.py                         # Single-sample inference demo
├── README.md
├── requirements.txt
│
├── ckpts/                          # Pretrained model checkpoints
│   ├── HQNN_AUC_0.8530_AUPR_0.6382.pt
│   ├── CNN_AUC_0.8260_AUPR_0.5608.pt
│   └── CAMP_AUC_0.8653_AUPR_0.6429.pt
│
├── dataset/
│   └── HLA.tsv                     # Main input table for inference/evaluation
│
├── evaluate/
│   └── test.py                     # Batch evaluation script (ACC/AUC/AUPR/confusion matrix)
│
├── models/
│   ├── hqnn.py                     # Default model used by demo/test
│   ├── cnn.py
│   └── camp.py
│
└── data_prepare/
    ├── features_HLA/               # Precomputed feature dictionaries (required for inference)
    │   ├── peptide_seq_feature_dict
    │   ├── peptide_ss_feature_dict
    │   ├── peptide_dis_feature_dict
    │   ├── peptide_phys_feature_dict
    │   ├── protein_seq_feature_dict
    │   ├── protein_ss_feature_dict
    │   ├── protein_dis_pssm_feature_dict
    │   ├── protein_phys_feature_dict
    │   ├── pad_pep_len.npy
    │   └── pad_prot_len.npy
    ├── step1_IUPred2A/             # Disorder prediction intermediate outputs
    ├── step2_PSSM/                 # PSSM intermediate outputs
    └── step3_SSPro/                # Secondary-structure intermediate outputs

Data Mapping Overview

  • Each row in dataset/HLA.tsv represents one protein + peptide pair.
  • Column order:
    • prot_seq
    • pep_seq
    • label
    • pep_concat_seq
    • prot_concat_seq
  • demo.py and evaluate/test.py use these fields as keys to retrieve entries from data_prepare/features_HLA.
  • As long as your sample strings exactly match dictionary keys, inference works directly.

2. Environment Setup

Python 3.10+ is recommended.

2.1 Create a Conda Environment (Recommended)

conda create -n pepHLA python=3.10 -y
conda activate pepHLA

2.2 Install Dependencies

pip install -r requirements.txt

Current dependencies:

  • torch==2.8.0
  • pennylane==0.42.3
  • pandas==2.3.3
  • numpy==1.26.4
  • scikit-learn==1.7.2
  • tqdm
  • matplotlib

3. Quick Start: demo.py

demo.py is the reader-friendly entry point:

  • loads one sample from HLA.tsv,
  • fetches all 8 required feature groups from features_HLA,
  • loads a checkpoint,
  • prints predicted probability and binary class.

3.1 Run with Defaults

python demo.py

Default values:

  • checkpoint: ./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.pt
  • feature directory: ./data_prepare/features_HLA
  • dataset file: ./dataset/HLA.tsv
  • sample index: 0

3.2 Select a Different Sample

python demo.py --sample-index 10

3.3 Override Paths

python demo.py \
  --dataset ./dataset/HLA.tsv \
  --feature-dir ./data_prepare/features_HLA \
  --checkpoint ./ckpts/HQNN_AUC_0.8530_AUPR_0.6382.pt

4. Validation Script: evaluate/test.py

evaluate/test.py performs batch evaluation and reports:

  • Accuracy / Precision / Recall
  • AUC / AUPR
  • Optional confusion matrix (printed; optional image output)

Run from project root:

python evaluate/test.py --split test

Options:

  • --split {train,test,all} (default: test)
  • --cm to print confusion matrix and try saving a figure
  • --cm-out to set confusion matrix image path

Example:

python evaluate/test.py --split test --cm --cm-out ./evaluate/confusion_matrix.png

Note: test.py first tries to read dataset/HLA_split.pkl.

5. New Data Processing Reference

This repository already provides ready-to-use HLA features in features_HLA.
For new data, follow the same processing logic and key format.

5.1 Recommended Workflow

  1. Prepare a TSV table with the same schema as dataset/HLA.tsv.
  2. Generate peptide/protein intermediate features.
  3. Convert intermediate outputs into dictionaries with keys matching your TSV fields.
  4. Run demo.py first for a single-sample sanity check, then run batch evaluation.

5.2 External Toolchain

5.3 Intermediate File Format Examples

  • data_prepare/step1_IUPred2A/*.txt: residue-level disorder scores
  • data_prepare/step2_PSSM/prot1.pssm: PSSM matrix example
  • data_prepare/step3_SSPro/pep_prot.fasta + pep_prot.ss: sequence/secondary-structure paired examples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors