GitHub - LuChenLab/r-scATS

scATS

An R package for quantifying alternative transcription start site (ATS) in 5' single-cell RNA-seq data

Here, we developed scATS, which introduces two novel metrics (α and β) to precisely quantify RNA degradation, and provides both raw (ψ) and corrected (θ) metrics to correct ATS expression profiles from resulting distortions.

overall, scATS is a stepwise computational method, scATS, to de novo infer the relative usage of TSSs and correct the RNA degradation in the paired-end 5’ scRNA-seq.

Overview

scATS contains three main steps (Fig. a):

identifing TSS in each gene according to the eCDF of the start sites of all read 1 (R1) reads.
quantifying an EM model based RNA degradation rate (α) and a degradation metric (β), representing the relative area under the cumulative distribution curve of each TSS.
providing raw percent spliced in (PSI, ψ) and corrected PSI values (θ) of each TSS account for RNA degradation.

We further applied scATS on three different 5’ scRNA-seq datasets, including mouse hematopoietic stem and progenitor cells (HSPCs), and human clinical samples from COVID-19 and lung cancer patients (Fig. b). Benchmark was performed in mouse HSPCs datasets. And the accuracy of our model was assessed by promoter motifs, epigenetic information, conservation and long-read sequencing (Fig. c). We also developed a machine-learning method to screen for the functional TSSs in lung cancer and guide subsequent experimental validation and mechanism study (Fig. d).

Requirements

Please choose the appropriate version of scATS based on your R environment:

v0.5.6 (Recommended): For users with R >= 4.4 (developed with R 4.5.2).
v0.5.5 (Legacy): For users with R < 4.4 (developed under R 4.0.0).

System Dependencies

Before installing scATS, please ensure that samtools is installed and available in your system's PATH.

conda install -c bioconda samtools

R Dependencies

The following R packages are required for all versions of scATS. You can install them using the following command:

install.packages(c(
    'R.utils', 
    'TTR', 
    'VGAM',
    'prettyunits',
    'httr2',
    'fitdistrplus',
    'mclust',
    'mixtools'))

Note: 💡 For v0.5.5, you must manually install the smoother package from the CRAN archive before installing scATS:

install.packages('https://cran.r-project.org/src/contrib/Archive/smoother/smoother_1.3.tar.gz', repos = NULL, type = 'source')

Installation

To install scATS, you have two options: either install directly from GitHub or use the compressed source file.

Via GitHub (only for v0.5.6):

if (!require("devtools")) install.packages("devtools")
devtools::install_github("LuChenLab/r-scATS", force = TRUE)

# or
if (!require("remotes")) install.packages("remotes")
remotes::install_github("LuChenLab/r-scATS")

Via Source File (for v0.5.6 and v0.5.5):

Alternatively, you can install scATS using the source file downloaded from the repository :

# Install scATS from a downloaded source file
R CMD INSTALL scATS_0.5.6.tar.gz

# or
install.packages("scATS_0.5.6.tar.gz", repos=NULL)

Installation Logs

To ensure reproducibility, we provide comprehensive installation logs for three different installation methods across multiple R versions:

Docker Guide

We provide a pre-configured Docker container to eliminate system-specific dependency conflicts.

Step 1: Install Docker

If you do not have Docker installed, please follow the official Docker installation guide.

Step 2: Run the scATS Container

Pull the pre-built image from the GitHub Container Registry and start an interactive session:

# Pull the image
docker pull ghcr.io/kayla-xu/scats-evi:r451

# Run the container
docker run -it ghcr.io/kayla-xu/scats-evi:r451

Step 3: Build Docker Locally (Optional)

If you prefer to build the image locally, use the provided Dockerfiles. Building logs are also provided for your reference.

# Build the base environment
docker build --no-cache \
  -t scATS/base:r451 \
  -f Dockerfile_base_r451 . \
  2>&1 | tee Dockerfile_base_r451.log

# Build the final scATS image
docker build --no-cache \
  -t scATS/evi:r451 \
  -f Dockerfile_scats_r451 . \
  2>&1 | tee Dockerfile_scats_r451.log

Quick start

infer TSS

Now, let's test scATS with a function (TSSCDF) that directly processes 5′ scRNA-seq data stored in a Seurat object. Before running the analysis, we first check the help message of TSSCDF. In R, you can always access the documentation of any function using ?TSSCDF or help("TSSCDF").

#--- check out the help message of TSSCDF
library(scATS)
?TSSCDF
# or
help("TSSCDF")

It should print the help message as the followings:

TSSCDF(
  object,
  bam,
  sep = "_",
  genes,
  gtfFile,
  txdb = NULL,
  UTROnly = FALSE,
  MinGeneReads = 500,
  scDR = FALSE,
  MinscTSSReads = 5,
  mapqFilter = 255,
  isSecondaryAlignment = FALSE,
  isSupplementaryAlignment = FALSE,
  isDuplicate = NA,
  Upstream = 1000,
  min.TSS.percent = 5,
  min.local.percent = 1,
  p.cutoff = 0.01,
  window = 10,
  greedy = TRUE,
  cores = 1L,
  verbose = FALSE,
  project.name = NULL
)

Arguments	Description
object	A Seurat object.
bam, sep	Path of bam file(s), and the character string to separate the bams.
genes	Genes used for ATS inference and quantification. By default, all genes in the Seurat object are included.
gtfFile	Path of GTF file or Granges file of GTF file.
txdb	A TxDb object.
UTROnly	Only infer ATS within annotated 5’ UTR regions or first exons of transcripts without 5’ UTR (default=True).
scDR	Whether to estimate the single-cell level degradation rate of TSS. Setting TRUE triggers an additional inference step and vastly increases runtime (default=False).
MinGeneReads	The minimum reads of a gene for TSS inference (default=500).
MinscTSSReads	Minimum reads of single-cell TSS reads for single-cell degradation rate estimation when scDR = TRUE (default=5).
min.TSS.percent	Defines the minimum proportion of total supporting reads (i.e., the change in cumulative density, F1 – F0) that a candidate TSS region must contribute. This parameter filters out weak or noise-driven TSS peaks (default=5%).
min.local.percent	Based on the maximum local derivative change, this parameter evaluates whether a candidate TSS region exhibits sufficiently sharp local enrichment. It is used to remove locally insignificant or low-confidence peaks (default=1%).
mapqFilter, isSecondaryAlignment, isSupplementaryAlignment, isDuplicate	parameters for ScanBamParam to read the BAM file(s).
p.cutoff	A significance threshold used to determine whether the smoothed derivative exceeds the background derivative distribution, where the cutoff is computed from a fitted normal model. Smaller values of p.cutoff impose stricter criteria and retain only high-confidence TSS candidates (default=0.01).
window	Controls the Gaussian smoothing window and is also used to define local flanking regions and extend candidate TSS boundaries. Larger windows produce smoother trends with reduced resolution, while smaller windows increase sensitivity at the cost of higher noise (default=10).
greedy	greedy for the first annotated TSS? If this parameter is set to TRUE, the most distal annotated TSS will be printed even if it does not satisfy the specified condition (default=True).
cores	The number of cores for parallel working.(default=1).
verbose	A logical controlling if a text progress bar is displayed.(default=False).
project.name	The name of project.(default=NULL).

In practice, MinGeneReads is a key parameter controlling the stringency of TSS detection, as it specifies the minimum number of effective reads supporting a gene required for TSS inference. To help users select an appropriate threshold, scATS provides the built-in function loadbamgene, which allows users to inspect the distribution of effective supporting reads per gene using their own data. For example:

scATS:::loadbamgene(gene = "GATD3B",
            txdb = txdb,
            bam = bam,
            region = as("21:5079294-5128413:-", "GRanges"))

By examining this distribution, users may choose a suitable MinGeneReads value based on the sequencing depth and signal quality of their dataset. Reducing MinGeneReads increases the number of detected TSSs but inevitably introduces more noise, while higher values make detection more stringent.

The detailed installation and usage of scATS are available in our document. Demo data can be obtained from repository.

Disease model

The pipeline for identifying disease-related TSSs is not included in the scATS R package. All code and model related to this workflow are provided separately and can be found in here.

1. Requirements

Packages	Version
pandas	>= 2.0.3
matplotlib	>= 3.7.5
numpy	>= 1.20.3
shap	>= 0.44.1
scikit-learn	>= 1.3.2

The code was developed and tested in a Python 3.8 environment. A complete Conda environment specification file (LRS.yml) is provided in here, Users can recreate the exact environment by running:

conda env create -f LRS.yml
conda activate LRS

2. model development:

The training dataset contains 33 features. Detailed feature categories can be found in the column headers of the data file.

The dataset was split into training, validation, and test sets using StratifiedKFold cross-validation with n_splits=10, shuffle=True, and random_state=42.

Features were standardized using StandardScaler.

A Random Forest Classifier (RandomForestClassifier) was used as the base model.

Both preprocessing and the classifier were wrapped into a Pipeline, ensuring they can be saved and reloaded together.

Resources in this repository:

Training data: dataset.xlsx

Trained model file: final_LRS_RF.pkl

You can directly load these data from here for predictions without applying feature scaling again. Example:

import pickle

# Load the trained pipeline
with open("rf_pipeline_model.pkl", "rb") as f:
    model = pickle.load(f)

See the detailed documentation for model training.

Citing

A pre-print is going to be uploaded soon.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Codes		Codes
LRS		LRS
R		R
demo		demo
docs		docs
logs		logs
man		man
.readthedocs.yaml		.readthedocs.yaml
DESCRIPTION		DESCRIPTION
Dockerfile_base_r451		Dockerfile_base_r451
Dockerfile_scats_r451		Dockerfile_scats_r451
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
scATS.Rproj		scATS.Rproj
scATS.png		scATS.png
scATS_0.5.5.tar.gz		scATS_0.5.5.tar.gz
scATS_0.5.6.tar.gz		scATS_0.5.6.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scATS

Contents

Overview

Requirements

Installation

Installation Logs

Docker Guide

Quick start

infer TSS

Disease model

1. Requirements

2. model development:

Citing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scATS

Contents

Overview

Requirements

Installation

Installation Logs

Docker Guide

Quick start

infer TSS

Disease model

1. Requirements

2. model development:

Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages