Skip to content

fallerlab/ARF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ARF (v2.2)

R package for the Analysis of Ribosomal RNA Fragments generated by Ribosome Profiling (Ribo-Seq) experiments.

ARF links position-specific rRNA fragment abundances to the 3D structure of the ribosome to identify biological drivers of ribosomal heterogeneity and collision dynamics across conditions. It currently includes two methods:

  • dripARFDifferential Ribosomal protein Incorporation Prediction: identifies which ribosomal proteins (RPs) show altered incorporation between conditions, nominating candidates that contribute to inter-sample ribosomal heterogeneity.
  • dricARFDifferential RIbosome Collision prediction: extends dripARF by additionally integrating cryo-EM collision structural data (SAS, Col.Int., Rib.Col. sets) to simultaneously predict changes in ribosome collision abundance alongside RP heterogeneity.

Both methods use DESeq2 for normalisation, GSEA-based enrichment testing (RPSEA), and overrepresentation analysis (ORA), with enrichment scores z-scored against 99 circularly-shifted random controls to remove positional rRNA fragmentation bias. An additional ssGSEA-based layer (ssRPSEA) weights these scores per-sample using limma, producing weighted.RPSEA.NES_randZ in the output.

Further reading

For detailed worked examples including non-model organisms, see:

docs/dricARF_use_cases_documentation.md


Installation

In an R (>= 4.0) environment:

install.packages("devtools")
devtools::install_github("fallerlab/ARF@main")

Required R libraries

Please make sure the following packages are installed:

  • DESeq2 (>= 1.30.1)
  • SummarizedExperiment
  • matrixStats
  • clusterProfiler
  • fgsea
  • ComplexHeatmap
  • grid
  • ggplot2
  • ggrepel
  • scales
  • reshape2
  • bio3d
  • Biostrings
  • msa
  • cowplot
  • dplyr
  • magrittr
  • readr
  • RColorBrewer
  • circlize
  • wesanderson
  • GSVA
  • limma
  • tidyr

Installing all dependencies

install.packages('renv', dependencies = TRUE)

## Initiate renv to manage the R environment
renv::init()

## Install CRAN packages
install.packages(
  c('remotes', 'curl', 'ggrepel', 'reshape2', 'scales',
    'dplyr', 'magrittr', 'readr', 'RColorBrewer',
    'circlize', 'wesanderson', 'bio3d', 'cowplot', 'tidyr'),
  dependencies = TRUE)

## Install Bioconductor packages
remotes::install_bioc(
  c('DESeq2', 'SummarizedExperiment', 'matrixStats',
    'clusterProfiler', 'fgsea', 'ComplexHeatmap',
    'msa', 'Biostrings', 'GSVA', 'limma'),
  dependencies = TRUE)

renv::install("fallerlab/ARF@main")

Quick start

Step 1 — Align Ribo-seq reads to rRNA sequences

Trim adapters from your Ribo-seq reads, then align them to the rRNA FASTA files provided in the rRNAs/ folder:

Organism rRNA FASTA file
Human (H. sapiens) rRNAs/4V6X_human_rRNAs.fa
Mouse (M. musculus) rRNAs/mouse_rRNAs.fa
Yeast (S. cerevisiae) rRNAs/6T7I_yeast_rRNAs.fa

Example alignment with bowtie2 (any short-read aligner works):

bowtie2 -x rRNAs/mouse_rRNAs -U trimmed_reads.fastq -S output.sam

Note: You must align to the rRNA FASTA files provided here — the rRNA position coordinates used in ARF are tied to these specific sequences.

Step 2 — Convert BAM to bedGraph (recommended)

Pre-processing alignments to bedGraph format is optional but significantly speeds up ARF:

bedtools genomecov -bg -ibam output.bam > output.bedGraph

ARF accepts both .bam and .bedGraph files as input.

Step 3 — Prepare a samples file

Create a tab-separated TSV file describing your samples. The minimum required columns are:

sampleName bedGraphFile group
ctrl_rep1 path/to/ctrl_rep1.bedGraph CTRL
ctrl_rep2 path/to/ctrl_rep2.bedGraph CTRL
treat_rep1 path/to/treat_rep1.bedGraph TREAT
treat_rep2 path/to/treat_rep2.bedGraph TREAT

See test_data/ for example samples files.

Step 4 — Run dripARF or dricARF

# Ribosome heterogeneity predictions (dripARF)
dripARF_results <- ARF::dripARF(
  samplesFile = "samples.tsv",
  rRNAs_fasta = "rRNAs/mouse_rRNAs.fa",
  organism    = "mm",
  QCplot      = TRUE,
  targetDir   = "dripARF_results/"
)

# Ribosome collision + heterogeneity predictions (dricARF)
dricARF_results <- ARF::dricARF(
  samplesFile = "samples.tsv",
  rRNAs_fasta = "rRNAs/6T7I_yeast_rRNAs.fa",
  organism    = "sc",
  QCplot      = TRUE,
  targetDir   = "dricARF_results/"
)

Both functions return a data frame of predictions for all pairwise comparisons between groups, and write comparison-specific CSV files plus scatterplot and heatmap PDFs to targetDir. Set ssRPSEAplots = TRUE to additionally save per-comparison scatter plots of weighted.RPSEA.NES_randZ vs RPSEA.NES_randZ.


Understanding the output

The results data frame contains one row per (RP, comparison). Key columns:

Column Description
comp Comparison label, e.g. "CTRL_vs_TREAT"
Description Ribosomal protein name (e.g. "eL1", "uS3") or collision set name ("SAS", "Col.Int.", "Rib.Col.")
RPSEA.NES Normalised Enrichment Score from GSEA over RP-rRNA proximity sets. Magnitude reflects enrichment strength; sign reflects direction of change
RPSEA.NES_randZ Z-score of NES relative to 99 circularly-shifted random control sets. Corrects for positional rRNA fragmentation bias. Values > 1–2 indicate a signal unlikely to arise from positional artefacts
ssRPSEA.weight Per-RP, per-comparison weight derived from ssGSEA sample-level activity scores modelled with limma: (1 − adj.P.Val) / (1 + |logFC|)
weighted.RPSEA.NES_randZ RPSEA.NES_randZ scaled by ssRPSEA.weight — combines positional enrichment with sample-level RP activity
RPSEA.padj Benjamini-Hochberg adjusted p-value from RPSEA
ORA.overlap Number of significantly differential rRNA positions (
ORA.setSize Total number of rRNA positions in the RP proximity set
ORA.padj Adjusted p-value from overrepresentation analysis (ORA)
C1.avg.read.c Average DESeq2-normalised rRNA fragment count in condition 1 across RP contact positions
C2.avg.read.c Average DESeq2-normalised rRNA fragment count in condition 2 across RP contact positions

A strong hit typically has RPSEA.padj < 0.05, ORA.padj < 0.05, |RPSEA.NES| > 1, and RPSEA.NES_randZ > 1. Use dripARF_simplify_results() to filter by these thresholds.


Built-in organisms and supported structures

ARF has preset RP-rRNA proximity data for three organisms:

Code Species PDB structure Collision sets
"hs" Homo sapiens (human) 4V6X Yes (7QVP-derived)
"mm" Mus musculus (mouse) 4V6X liftover Yes
"sc" Saccharomyces cerevisiae (yeast) 6T7I Yes (6I7O, 6T83, 6SV4)

Using ARF with other organisms

For organisms other than human, mouse, and yeast, structural data must be prepared first using the helper functions below, then passed to dripARF() or dricARF() with organism = NULL.

ARF includes structural data for a wide range of species (bacteria, plants, other eukaryotes). Run ARF_check_organism("any") to print the full list of available PDB IDs grouped by species.

Step 1 — Compute RP–rRNA distances from a PDB structure

RP_proximity_df <- ARF::ARF_parse_PDB_ribosome(
  species            = "ec",
  PDBid              = "6XZA",
  download_directory = "./my_results/"
)

Note: For PDB structures not in the ARF database, chain names must be manually mapped to standard RP nomenclature via the PDB_chains_2_RP_nomenclature argument. See docs/dricARF_use_cases_documentation.md for a helper function that automates this.

Step 2 — Liftover coordinates to your organism's rRNA (if needed)

If the PDB structure is from a different species than your study organism, align and transfer coordinates:

LO.RP_proximity_df <- ARF::ARF_convert_Ribo3D_pos(
  source_distance_file = "./Ribosome.3D.6XZA.ARF.minimum_distances.csv",
  source_rRNAs_fasta   = "./my_results/6XZA.rRNAs.fasta",
  target_species       = "ec",
  target_rRNAs_fasta   = "./my_organism/rRNAs.fa",
  rRNA_pairs           = list(c("rRNA_16S", "ec_rRNA_16S"),
                              c("rRNA_23S", "ec_rRNA_23S"),
                              c("rRNA_5S",  "ec_rRNA_5S")),
  type = "distances"
)

Step 3 — Generate RP proximity gene sets and collision sets

gsea_sets_RP <- ARF::dripARF_get_RP_proximity_sets(
  RP_proximity_df = LO.RP_proximity_df,
  rRNAs_fasta     = "./my_organism/rRNAs.fa"
)

# For dricARF: liftover built-in collision structure sets
gsea_sets_Collision <- ARF::dricARF_liftover_collision_sets(
  target_species     = "ec",
  target_rRNAs_fasta = "./my_organism/rRNAs.fa",
  rRNA_pairs         = list(c("18S", "ec_rRNA_16S"),
                            c("28S", "ec_rRNA_23S"),
                            c("5S",  "ec_rRNA_5S"))
)

Note: The rRNA FASTA headers must match the position IDs in gsea_sets_RP$gene (e.g. >ec_rRNA_23S). Check this carefully before running.

Step 4 — Run dripARF or dricARF with custom data

dricARF_results <- ARF::dricARF(
  samplesFile         = "samples.tsv",
  rRNAs_fasta         = "./my_organism/rRNAs.fa",
  organism            = NULL,
  gsea_sets_RP        = gsea_sets_RP,
  RP_proximity_df     = LO.RP_proximity_df,
  gsea_sets_Collision = gsea_sets_Collision,
  targetDir           = "./my_results/"
)

For complete worked examples with Escherichia coli and Pseudomonas savastanoi, including custom PDB nomenclature table generation, see:

docs/dricARF_use_cases_documentation.md


Function reference

dripARF pipeline

Function Purpose
read_ARF_samples_file() Read the tab-separated samples TSV into a data frame
dripARF_read_rRNA_fragments() Read bedGraph/BAM files and build the rRNA position × sample count matrix
dripARF_get_DESEQ_dds() Normalise rRNA counts with DESeq2
dripARF_report_RPset_group_counts() Compute group-wise average normalised counts per RP proximity set
dripARF_predict_heterogenity() Core enrichment analysis: RPSEA + ORA per RP per comparison
dripARF_simplify_results() Filter results by RPSEA/ORA significance thresholds
dripARF_result_heatmap() Heatmap of NES and NES_randZ across RPs and comparisons
dripARF_result_scatterplot() Scatterplot of RPSEA NES vs NES_randZ per comparison
dripARF_report_RPspec_pos_results() Extract per-position DESeq2 results for RP proximity sets
dripARF_rRNApos_heatmaps() Heatmap of position-specific differential rRNA fragment abundances
dripARF_add_replicates() Create pseudo-replicates for single-sample groups (use with caution)
dripARF_threshold_test() Run dripARF across multiple proximity distance thresholds (experimental)

dricARF pipeline

dricARF uses the full dripARF stack internally and adds:

Function Purpose
dricARF_liftover_collision_sets() Liftover built-in cryo-EM collision structure sets to a target organism
dricARF_result_scatterplot() Two-panel scatterplot highlighting collision set predictions

Structural / utility functions

Function Purpose
ARF_check_organism() Validate organism code; print available PDB structures if invalid
ARF_parse_PDB_ribosome() Compute RP–rRNA minimum Euclidean distances from a PDB/CIF structure
ARF_convert_Ribo3D_pos() Liftover 3D distance coordinates to a target organism via rRNA alignment
dripARF_get_RP_proximity_sets() Convert proximity matrix to GSEA gene sets with random controls

Testing the installation

If ARF installed successfully, test it with the datasets provided in test_data/:

  • Mouse RP-tagging dataset (Rpl10aRps25Rpl22_samples_mouse.tsv): 12 samples across three FLAG-tagged RPs (uL1/Rpl10a, eL22/Rpl22, eS25/Rps25) and Total ribosome controls. Use with organism = "mm" and dripARF().
  • Arabidopsis 3-AT stress dataset (SRP043036_Lareau_Brown_2014.tsv): 7 samples comparing CTRL vs 3-aminotriazole stress conditions. Use with the plant rRNA FASTA (4V7R) and dricARF().

Citations

If you use ARF in your research, please cite the relevant publication(s) below:

Ferhat Alkan, Oscar Wilkins, Santiago Hernandez-Perez, Sofia Ramalho, Joana Silva, Jernej Ule, William J Faller; Identifying ribosome heterogeneity using ribosome profiling data, Nucleic Acids Research, 2022; gkac484, https://doi.org/10.1093/nar/gkac484

Edwin Sakyi Kyei-Baffour, Jitske Bak, Joana Silva, William J Faller, Ferhat Alkan; Detecting ribosome collisions with differential rRNA fragment analysis in ribosome profiling data, NAR Genomics and Bioinformatics, Volume 7, Issue 2, June 2025; lqaf045, https://doi.org/10.1093/nargab/lqaf045

Edwin Sakyi Kyei-Baffour, William J Faller, Ferhat Alkan; Multi-species support for ribosome collision and heterogeneity predictions with ARF, 2026, (Under Review).

Edwin Sakyi Kyei-Baffour, William J Faller, Ferhat Alkan; Ribo-Seq rRNA reads as a base for predicting ribosomal protein and collisions changes, 2026, (Under Review).


Copyright

Copyright 2026 Ferhat Alkan, Edwin Sakyi Kyei-Baffour, William J Faller.

Released under the GPL-3 License.


Contact

For questions, bug reports, and feature requests, please use the GitHub Issues tab or contact: fallerlab@gmail.com and/or feralkan@gmail.com.

About

Analyses of Ribo-seq rRNA fragments for Differential Ribosome Heterogeneity and Collision predictions

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages