Goal: Creation of an objective function for the RNA folding problem
This project develops a scoring function to evaluate predicted RNA tertiary structures based on interatomic distance distributions.
Supervised by: Professor Guillaume Postic
Team: Joelle Assy, Yazid Hoblos, Denys Buryi, Raul Duran De Alba, Rayane Adam
rna-score provides tools to download, process, and score RNA 3D structures using statistical models of interatomic distances. The tool is available as a Python CLI tool AND a library package AND is also accessible via a web interface:
π Try it online: https://rna-score.onrender.com/
Install via pip:
# -- under development
pip install git+https://github.com/raysas/structural-RNA-project.gita more stable release will be available soon, better to install in editable mode for now:
git clone https://github.com/raysas/structural-RNA-project.git
cd structural-RNA-project
pip install -r requirements.txt
pip install -e .Documentation: structural-rna-project.readthedocs.io
Tip
Issues & feedback:
Please report bugs or suggestions via the
GitHub Issues page.
| Component | Description | CLI Option | Details |
|---|---|---|---|
| Input Source | Select the structure(s) to process, either remote or local. | --pdb, --list, --folder |
Choose one: a PDB ID, a local file, a list file (<ID> [CHAIN ...]), or a directory of structures. |
| Input Format | Specify the format used for parsing the structure. | --format {pdb, mmcif} |
Default: pdb. Automatically detected for local files. |
| Atomic Selection | Choose how the structure is represented for distance calculations. | --atom-mode |
Options: "C3'" (default), centroid, all, or multiple atom names (e.g., "P" "C4"). |
| Interaction Mode | Determine whether distances are measured within or between chains. | --dist-mode {intra, inter} |
intra (default): within one chain. inter: between distinct chains. |
| Sequence Separation | Minimum offset for intra-chain contacts. | --seq-sep SEQ_SEP |
Default: 4 residues. Ignored in inter mode. Distances considered from i to i+4. |
| Distance Cutoff | Maximum atomβatom distance (Γ ) counted as a contact. | --cutoff CUTOFF |
Default: 20.0 Γ . |
| Output Type | Determines the type of distance distribution produced. | --method {histogram, kde} |
histogram (default): binned counts. kde: raw distances for kernel density estimation. |
| Parallelization | Control how many CPU cores to use. | --cores CORES |
Default: all available cores. |
| NMR Models | Whether to process all models in NMR structures. | --all-models |
Default: only the first model is used. |
| Detailed Log | Save a CSV file with full information for every measured distance. | --save-detailed |
Saves: PDB, Model, Chain IDs, Residue IDs, Atom Names, B-factors, AltLocs, Distance, and Pair Type. |
| Output Directory | Location where results will be written. | --out-dir OUT_DIR |
Default: dist_data/. |
All constants are exposed as CLI arguments and API parameters:
- Distance cutoff, sequence separation, bin width, max score, pseudocount
Beyond C3β²: all atoms, centroid, or custom atom selections (e.g., P, C4β², O3β²)
Kernel Density Estimation using R's density() function via rpy2, with SciPy fallback
Choose between:
- log (default):
-log(f_obs / f_ref)β Sippl's statistical potential - inverse:
f_ref / f_obsβ inverse frequency ratio - info-gain:
-(f_obs - f_ref) / f_refβ information gain (Postic et al., 2020) - ratio:
f_obs / f_refβ direct frequency ratio
Example:
# Use information gain for model quality assessment
rna-score train --input-dir distances --scoring-formula info-gain --output-dir tables_infogainπ See documentation for detailed usage and API reference.
-
src/
Main Python package containing:rna_score/cli.pyβ Command-line interface entry pointaccess_rna_structures.pyβ Downloading RNA structuresextract_distances.pyβ Distance extraction from structureskde_training.py,train.pyβ Training scoring tables (histogram/KDE)score_structures.pyβ Scoring new structuresplot_distributions.py,plot_scores.pyβ Visualization utilitiesutils/β Helper functions (e.g., structure I/O, validation)
-
tests/
Unit and integration tests -
requirements.txt
Python dependencies -
setup.py
Installation script
Install (editable mode for development):
pip install -r requirements.txt
pip install -e .rna-score access -n 50 --rna-only -f cif -o data/rna_structures --workers 4Add --validate to filter out invalid downloaded files.
rna-score extract --folder rna_structures/mmcif --format mmcif --out-dir dist_datarna-score train --input-dir dist_data --output-dir training_output --method histogramrna-score score --folder rna_structures/mmcif --tables training_output --format mmcif --output scores.csvrna-score plot --input-dir training_output --output-dir plots --combined# add pdb ids and chains for scoring
cat <<EOF > tests/scoring_list.txt
1EHZ A
1Y26 B C
EOF
rna-score workflow --train-folder data/rna_structures/mmcif --score-list tests/scoring_list.txt --output-dir tests/workflow_output --format mmcif --method histogramThis runs extraction, training, scoring, and plotting in a single step. See rna-score workflow --help for all options.
Each subcommand supports --help / -h for details.
You can also use rna-score directly in your browser:
π https://rna-score.onrender.com/