R package for the Analysis of Ribosomal RNA Fragments generated by Ribosome Profiling (Ribo-Seq) experiments.
ARF links position-specific rRNA fragment abundances to the 3D structure of the ribosome to identify biological drivers of ribosomal heterogeneity and collision dynamics across conditions. It currently includes two methods:
- dripARF — Differential Ribosomal protein Incorporation Prediction: identifies which ribosomal proteins (RPs) show altered incorporation between conditions, nominating candidates that contribute to inter-sample ribosomal heterogeneity.
- dricARF — Differential RIbosome Collision prediction: extends dripARF by additionally integrating cryo-EM collision structural data (SAS, Col.Int., Rib.Col. sets) to simultaneously predict changes in ribosome collision abundance alongside RP heterogeneity.
Both methods use DESeq2 for normalisation, GSEA-based enrichment testing (RPSEA), and overrepresentation analysis (ORA), with enrichment scores z-scored against 99 circularly-shifted random controls to remove positional rRNA fragmentation bias. An additional ssGSEA-based layer (ssRPSEA) weights these scores per-sample using limma, producing weighted.RPSEA.NES_randZ in the output.
For detailed worked examples including non-model organisms, see:
docs/dricARF_use_cases_documentation.md
In an R (>= 4.0) environment:
install.packages("devtools")
devtools::install_github("fallerlab/ARF@main")Please make sure the following packages are installed:
- DESeq2 (>= 1.30.1)
- SummarizedExperiment
- matrixStats
- clusterProfiler
- fgsea
- ComplexHeatmap
- grid
- ggplot2
- ggrepel
- scales
- reshape2
- bio3d
- Biostrings
- msa
- cowplot
- dplyr
- magrittr
- readr
- RColorBrewer
- circlize
- wesanderson
- GSVA
- limma
- tidyr
install.packages('renv', dependencies = TRUE)
## Initiate renv to manage the R environment
renv::init()
## Install CRAN packages
install.packages(
c('remotes', 'curl', 'ggrepel', 'reshape2', 'scales',
'dplyr', 'magrittr', 'readr', 'RColorBrewer',
'circlize', 'wesanderson', 'bio3d', 'cowplot', 'tidyr'),
dependencies = TRUE)
## Install Bioconductor packages
remotes::install_bioc(
c('DESeq2', 'SummarizedExperiment', 'matrixStats',
'clusterProfiler', 'fgsea', 'ComplexHeatmap',
'msa', 'Biostrings', 'GSVA', 'limma'),
dependencies = TRUE)
renv::install("fallerlab/ARF@main")Trim adapters from your Ribo-seq reads, then align them to the rRNA FASTA files provided in the rRNAs/ folder:
| Organism | rRNA FASTA file |
|---|---|
| Human (H. sapiens) | rRNAs/4V6X_human_rRNAs.fa |
| Mouse (M. musculus) | rRNAs/mouse_rRNAs.fa |
| Yeast (S. cerevisiae) | rRNAs/6T7I_yeast_rRNAs.fa |
Example alignment with bowtie2 (any short-read aligner works):
bowtie2 -x rRNAs/mouse_rRNAs -U trimmed_reads.fastq -S output.samNote: You must align to the rRNA FASTA files provided here — the rRNA position coordinates used in ARF are tied to these specific sequences.
Pre-processing alignments to bedGraph format is optional but significantly speeds up ARF:
bedtools genomecov -bg -ibam output.bam > output.bedGraphARF accepts both .bam and .bedGraph files as input.
Create a tab-separated TSV file describing your samples. The minimum required columns are:
| sampleName | bedGraphFile | group |
|---|---|---|
| ctrl_rep1 | path/to/ctrl_rep1.bedGraph | CTRL |
| ctrl_rep2 | path/to/ctrl_rep2.bedGraph | CTRL |
| treat_rep1 | path/to/treat_rep1.bedGraph | TREAT |
| treat_rep2 | path/to/treat_rep2.bedGraph | TREAT |
See test_data/ for example samples files.
# Ribosome heterogeneity predictions (dripARF)
dripARF_results <- ARF::dripARF(
samplesFile = "samples.tsv",
rRNAs_fasta = "rRNAs/mouse_rRNAs.fa",
organism = "mm",
QCplot = TRUE,
targetDir = "dripARF_results/"
)
# Ribosome collision + heterogeneity predictions (dricARF)
dricARF_results <- ARF::dricARF(
samplesFile = "samples.tsv",
rRNAs_fasta = "rRNAs/6T7I_yeast_rRNAs.fa",
organism = "sc",
QCplot = TRUE,
targetDir = "dricARF_results/"
)Both functions return a data frame of predictions for all pairwise comparisons between groups, and write comparison-specific CSV files plus scatterplot and heatmap PDFs to targetDir. Set ssRPSEAplots = TRUE to additionally save per-comparison scatter plots of weighted.RPSEA.NES_randZ vs RPSEA.NES_randZ.
The results data frame contains one row per (RP, comparison). Key columns:
| Column | Description |
|---|---|
comp |
Comparison label, e.g. "CTRL_vs_TREAT" |
Description |
Ribosomal protein name (e.g. "eL1", "uS3") or collision set name ("SAS", "Col.Int.", "Rib.Col.") |
RPSEA.NES |
Normalised Enrichment Score from GSEA over RP-rRNA proximity sets. Magnitude reflects enrichment strength; sign reflects direction of change |
RPSEA.NES_randZ |
Z-score of NES relative to 99 circularly-shifted random control sets. Corrects for positional rRNA fragmentation bias. Values > 1–2 indicate a signal unlikely to arise from positional artefacts |
ssRPSEA.weight |
Per-RP, per-comparison weight derived from ssGSEA sample-level activity scores modelled with limma: (1 − adj.P.Val) / (1 + |logFC|) |
weighted.RPSEA.NES_randZ |
RPSEA.NES_randZ scaled by ssRPSEA.weight — combines positional enrichment with sample-level RP activity |
RPSEA.padj |
Benjamini-Hochberg adjusted p-value from RPSEA |
ORA.overlap |
Number of significantly differential rRNA positions ( |
ORA.setSize |
Total number of rRNA positions in the RP proximity set |
ORA.padj |
Adjusted p-value from overrepresentation analysis (ORA) |
C1.avg.read.c |
Average DESeq2-normalised rRNA fragment count in condition 1 across RP contact positions |
C2.avg.read.c |
Average DESeq2-normalised rRNA fragment count in condition 2 across RP contact positions |
A strong hit typically has RPSEA.padj < 0.05, ORA.padj < 0.05, |RPSEA.NES| > 1, and RPSEA.NES_randZ > 1. Use dripARF_simplify_results() to filter by these thresholds.
ARF has preset RP-rRNA proximity data for three organisms:
| Code | Species | PDB structure | Collision sets |
|---|---|---|---|
"hs" |
Homo sapiens (human) | 4V6X | Yes (7QVP-derived) |
"mm" |
Mus musculus (mouse) | 4V6X liftover | Yes |
"sc" |
Saccharomyces cerevisiae (yeast) | 6T7I | Yes (6I7O, 6T83, 6SV4) |
For organisms other than human, mouse, and yeast, structural data must be prepared first using the helper functions below, then passed to dripARF() or dricARF() with organism = NULL.
ARF includes structural data for a wide range of species (bacteria, plants, other eukaryotes). Run ARF_check_organism("any") to print the full list of available PDB IDs grouped by species.
Step 1 — Compute RP–rRNA distances from a PDB structure
RP_proximity_df <- ARF::ARF_parse_PDB_ribosome(
species = "ec",
PDBid = "6XZA",
download_directory = "./my_results/"
)Note: For PDB structures not in the ARF database, chain names must be manually mapped to standard RP nomenclature via the
PDB_chains_2_RP_nomenclatureargument. See docs/dricARF_use_cases_documentation.md for a helper function that automates this.
Step 2 — Liftover coordinates to your organism's rRNA (if needed)
If the PDB structure is from a different species than your study organism, align and transfer coordinates:
LO.RP_proximity_df <- ARF::ARF_convert_Ribo3D_pos(
source_distance_file = "./Ribosome.3D.6XZA.ARF.minimum_distances.csv",
source_rRNAs_fasta = "./my_results/6XZA.rRNAs.fasta",
target_species = "ec",
target_rRNAs_fasta = "./my_organism/rRNAs.fa",
rRNA_pairs = list(c("rRNA_16S", "ec_rRNA_16S"),
c("rRNA_23S", "ec_rRNA_23S"),
c("rRNA_5S", "ec_rRNA_5S")),
type = "distances"
)Step 3 — Generate RP proximity gene sets and collision sets
gsea_sets_RP <- ARF::dripARF_get_RP_proximity_sets(
RP_proximity_df = LO.RP_proximity_df,
rRNAs_fasta = "./my_organism/rRNAs.fa"
)
# For dricARF: liftover built-in collision structure sets
gsea_sets_Collision <- ARF::dricARF_liftover_collision_sets(
target_species = "ec",
target_rRNAs_fasta = "./my_organism/rRNAs.fa",
rRNA_pairs = list(c("18S", "ec_rRNA_16S"),
c("28S", "ec_rRNA_23S"),
c("5S", "ec_rRNA_5S"))
)Note: The rRNA FASTA headers must match the position IDs in
gsea_sets_RP$gene(e.g.>ec_rRNA_23S). Check this carefully before running.
Step 4 — Run dripARF or dricARF with custom data
dricARF_results <- ARF::dricARF(
samplesFile = "samples.tsv",
rRNAs_fasta = "./my_organism/rRNAs.fa",
organism = NULL,
gsea_sets_RP = gsea_sets_RP,
RP_proximity_df = LO.RP_proximity_df,
gsea_sets_Collision = gsea_sets_Collision,
targetDir = "./my_results/"
)For complete worked examples with Escherichia coli and Pseudomonas savastanoi, including custom PDB nomenclature table generation, see:
docs/dricARF_use_cases_documentation.md
| Function | Purpose |
|---|---|
read_ARF_samples_file() |
Read the tab-separated samples TSV into a data frame |
dripARF_read_rRNA_fragments() |
Read bedGraph/BAM files and build the rRNA position × sample count matrix |
dripARF_get_DESEQ_dds() |
Normalise rRNA counts with DESeq2 |
dripARF_report_RPset_group_counts() |
Compute group-wise average normalised counts per RP proximity set |
dripARF_predict_heterogenity() |
Core enrichment analysis: RPSEA + ORA per RP per comparison |
dripARF_simplify_results() |
Filter results by RPSEA/ORA significance thresholds |
dripARF_result_heatmap() |
Heatmap of NES and NES_randZ across RPs and comparisons |
dripARF_result_scatterplot() |
Scatterplot of RPSEA NES vs NES_randZ per comparison |
dripARF_report_RPspec_pos_results() |
Extract per-position DESeq2 results for RP proximity sets |
dripARF_rRNApos_heatmaps() |
Heatmap of position-specific differential rRNA fragment abundances |
dripARF_add_replicates() |
Create pseudo-replicates for single-sample groups (use with caution) |
dripARF_threshold_test() |
Run dripARF across multiple proximity distance thresholds (experimental) |
dricARF uses the full dripARF stack internally and adds:
| Function | Purpose |
|---|---|
dricARF_liftover_collision_sets() |
Liftover built-in cryo-EM collision structure sets to a target organism |
dricARF_result_scatterplot() |
Two-panel scatterplot highlighting collision set predictions |
| Function | Purpose |
|---|---|
ARF_check_organism() |
Validate organism code; print available PDB structures if invalid |
ARF_parse_PDB_ribosome() |
Compute RP–rRNA minimum Euclidean distances from a PDB/CIF structure |
ARF_convert_Ribo3D_pos() |
Liftover 3D distance coordinates to a target organism via rRNA alignment |
dripARF_get_RP_proximity_sets() |
Convert proximity matrix to GSEA gene sets with random controls |
If ARF installed successfully, test it with the datasets provided in test_data/:
- Mouse RP-tagging dataset (
Rpl10aRps25Rpl22_samples_mouse.tsv): 12 samples across three FLAG-tagged RPs (uL1/Rpl10a, eL22/Rpl22, eS25/Rps25) and Total ribosome controls. Use withorganism = "mm"anddripARF(). - Arabidopsis 3-AT stress dataset (
SRP043036_Lareau_Brown_2014.tsv): 7 samples comparing CTRL vs 3-aminotriazole stress conditions. Use with the plant rRNA FASTA (4V7R) anddricARF().
If you use ARF in your research, please cite the relevant publication(s) below:
Ferhat Alkan, Oscar Wilkins, Santiago Hernandez-Perez, Sofia Ramalho, Joana Silva, Jernej Ule, William J Faller; Identifying ribosome heterogeneity using ribosome profiling data, Nucleic Acids Research, 2022; gkac484, https://doi.org/10.1093/nar/gkac484
Edwin Sakyi Kyei-Baffour, Jitske Bak, Joana Silva, William J Faller, Ferhat Alkan; Detecting ribosome collisions with differential rRNA fragment analysis in ribosome profiling data, NAR Genomics and Bioinformatics, Volume 7, Issue 2, June 2025; lqaf045, https://doi.org/10.1093/nargab/lqaf045
Edwin Sakyi Kyei-Baffour, William J Faller, Ferhat Alkan; Multi-species support for ribosome collision and heterogeneity predictions with ARF, 2026, (Under Review).
Edwin Sakyi Kyei-Baffour, William J Faller, Ferhat Alkan; Ribo-Seq rRNA reads as a base for predicting ribosomal protein and collisions changes, 2026, (Under Review).
Copyright 2026 Ferhat Alkan, Edwin Sakyi Kyei-Baffour, William J Faller.
Released under the GPL-3 License.
For questions, bug reports, and feature requests, please use the GitHub Issues tab or contact: fallerlab@gmail.com and/or feralkan@gmail.com.