Contents
Software packages
RNA-seq
-
anchor - [Python] - ⚓ Find bimodal, unimodal, and multimodal features in your data
-
bonvoyage - [Python] - 📐 Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.
-
Cell_BLAST - [Python] - A BLAST-like toolkit for scRNA-seq data querying and automated annotation.
-
CellCNN - [Python] - Representation Learning for detection of phenotype-associated cell subsets
-
CellRanger - [Linux Binary] - Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis. Software requires registration with 10xgenomics.
-
Clustergrammer - [Python, JavaScript] - Interative web-based heatmap for visualizing and analyzing high dimensional biological data, including single-cell RNA-seq. Clustergrammer can be used within a Jupyter notebook as an interative widget that can be shared using GitHub and NBviewer, see example notebook.
-
Clustergrammer2 - [Python, JavaScript] - Interative WebGL web-based heatmap for visualizing and analyzing single-cell high-dimensional and location-based biological data. Clustergrammer can be used within a Jupyter notebook as an interative widget that can be shared using GitHub and NBviewer, see case studies.
-
cyclum - [python] - Cyclum is a novel AutoEncoder approach that characterizes circular trajectories in the high-dimensional gene expression space. Applying Cyclum to removing cell-cycle effects leads to substantially improved delineations of cell subpopulations, which is useful for establishing various cell atlases and studying tumor heterogeneity. bioRxiv
-
dropkick - [Python] - Automated cell filtering for single-cell RNA sequencing data.
-
dynamo - [Python] - Inclusive model of expression dynamics with scSLAM-seq and multiomics, vector field reconstruction and potential landscape mapping.
-
Falco - [AWS cloud] - Falco: A quick and flexible single-cell RNA-seq processing framework on the cloud.
-
FastProject - [Python] - Signature analysis on low-dimensional projections of single-cell expression data.
-
flotilla - [Python] - Reproducible machine learning analysis of gene expression and alternative splicing data
-
GPfates - [Python] - Model transcriptional cell fates as mixtures of Gaussian Processes
-
GSEApy - [Python] - GSEApy: Gene Set Enrichment Analysis in Python. GSEApy is a Python/Rust implementation for GSEA and wrapper for Enrichr. GSEApy can be used for RNA-seq, ChIP-seq, Microarray data. It can be used for convenient GO enrichment and to produce publication quality figures in python.
-
HTSeq - [Python] - A Python library to facilitate programmatic analysis of data from high-throughput sequencing (HTS) experiments. A popular component of HTSeq is htseq-count, a script to quantify gene expression in bulk and single-cell RNA-Seq and similar experiments.
-
ICGS - [Python] - Iterative Clustering and Guide-gene Selection (Olsson et al. Nature 2016). Identify discrete, transitional and mixed-lineage states from diverse single-cell transcriptomics platforms. Integrated FASTQ pseudoalignment /quantification (Kallisto), differential expression, cell-type prediction and optional cell cycle exclusion analyses. Specialized methods for processing BAM and 10X Genomics spares matrix files. Associated single-cell splicing PSI methods (MultIPath-PSI). Apart of the AltAnalyze toolkit along with accompanying visualization methods (e.g., heatmap, t-SNE, SashimiPlots, network graphs). Easy-to-use graphical user and commandline interfaces.
-
InMoose - [Python] - InMoose is the Integrated Multi Omic Open Source Environment. It is a collection of tools for the analysis of omic data. Allows for batch effect correction, cohort QC, Differential Expression Analysis and Consensus Clustering.
-
ivis - [Python or R] - Structure-preserving dimensionality reduction in single-cell datasets.
-
kb-python - [Python] - kb-python is a python package for processing single-cell RNA-sequencing. It wraps the kallisto | bustools single-cell RNA-seq command line tools in order to unify multiple processing workflows.
-
knn-smoothing - [python or R or matlab] - The algorithm is based on the observation that across protocols, the technical noise exhibited by UMI-filtered scRNA-Seq data closely follows Poisson statistics. Smoothing is performed by first identifying the nearest neighbors of each cell in a step-wise fashion, based on variance-stabilized and partially smoothed expression profiles, and then aggregating their transcript counts.
-
MIMOSCA - [python] - A repository for the design and analysis of pooled single cell RNA-seq perturbation experiments (Perturb-seq).
-
nimfa - [Python] - Nimfa is a Python scripting library which includes a number of published matrix factorization algorithms, initialization methods, quality and performance measures and facilitates the combination of these to produce new strategies. The library represents a unified and efficient interface to matrix factorization algorithms and methods.
-
novoSpaRc - [Python] - Predict locations of single cells in space by solely using single-cell RNA sequencing data. An existing reference database of marker genes is not required, but significantly enhances performance if available. bioRxiv.
-
outrigger - [Python] - Outrigger is a program to calculate alternative splicing scores of RNA-Seq data based on junction reads and a de novo, custom annotation created with a graph database, especially made for single-cell analyses.
-
PyGMNormalize - [Python] - Python implementation of edgeR normalization method for count matrices.
-
RAPIDS-singlecell - [Python] - A GPU-accelerated tool leveraging RAPIDS for scRNA analysis. Seamless scverse compatibility for efficient single-cell data processing and analysis. Replcates features from Scanpy, while also incorporating select functionalities from Squidpy and Decoupler.
-
rMATS - [Python] - RNA-Seq Multavariate Analysis of Transcript Splicing.
-
Scanpy - [Python] - Scanpy provides computationally efficient tools that scale up to very large data sets and enables simple integration of advanced machine learning algorithms.
-
scbean - [Python] - Scbean integrates a range of models for single-cell data analysis, including dimensionality reduction, removing batch effects, and transferring well-annotated cell type labels from scRNA-seq to scATAC-seq and spatial resolved transcriptomics, and joint-analysis of paired multimodal single-cell data.
-
SCCAF - [Python] Single Cell Clustering Assessment Framework (SCCAF) is a method for automated identification of putative cell types from single cell data by iteratively applying clustering and a machine learning approach. Putative cell type discovery from single-cell gene expression data
-
schist - [Python] - schist is a scanpy-compatible python library which implements Nested Stochastic Block Models to identify cell groups in single cell experiments.
-
scVI - [python] - scVI is a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells (batch correction, visualization, clustering, and differential expression). Deep generative modeling for single-cell transcriptomics
-
scTDA - [Python] - scTDA is an object oriented python library for topological data analysis of high-throughput single-cell RNA-seq data. It includes tools for the preprocessing, analysis, and exploration of single-cell RNA-seq data based on topological representations.
-
scTCRseq - [python] - Map T-cell receptor (TCR) repertoires from single cell RNAseq.
-
singlet - [Python] - Single cell RNA-Seq analysis with phenotypes.
-
SPRING - [matlab, javascript, python] - SPRING is a collection of pre-processing scripts and a web browser-based tool for visualizing and interacting with high dimensional data. SPRING was developed for single cell RNA-Seq data but can be applied more generally.
-
scTOP - [Python] - Single-cell type order parameters. Physics-inspired method of processing single-cell RNA-seq and identifying cell fate, motivated by the epigenetic landscape.
Quality control
- gene_network_evaluation - [Python] - A flexible framework to evaluate the plausibility of gene programs inferred from single-cell genomic data. The assessment is broken down into themes such as goodness of fit (ability to explain the data), co-regulation, mechanistic interactions etc. Under each theme, multiple evaluation tasks are conceptualised and implemented using appropriate statistical tests.
Marker and differential gene expression identification
-
GPseudoClust - [Python] - Software that clusters genes for pseudotemporally ordered data and quantifies the uncertainty in cluster allocations arising from the uncertainty in the pseudotime ordering.
-
GiniClust - [Python/R] - GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data. GiniClust can be applied to datasets originating from different platforms, such as multiplex qPCR data, traditional single-cell RNAseq or newly emerging UMI-based single-cell RNAseq, e.g. inDrops and Drop-seq.
-
Phenotype Cover - [Python] - Provides two algorithms for marker selection (G-PC, CEM-PC) introduced in Multiset multicover methods for discriminative marker selection. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets.
Cell clustering
-
BackSPIN - [Python] - Biclustering algorithm developed taking into account intrinsic features of single-cell RNA-seq experiments.
-
dropClust - [R/Python] - Efficient clustering of ultra-large scRNA-seq data.
Dimension reduction
-
torchdr - [python] - Dimensionality reduction toolbox using PyTorch, featuring various algorithms such as TSNE, UMAP, and more. Supports GPU acceleration to maximize computational efficiency.
-
scvis - [python] - Interpretable dimensionality reduction of single cell transcriptome data with deep generative models
-
ZIFA - [Python] - Zero-inflated dimensionality reduction algorithm for single-cell data.
-
scPRINT - [python] - scPRINT is pretrained on 50M cells and generates multiple cell embeddings from single cell RNAseq profiles. scPRINT: pre-training on 50 million cells allows robust gene network predictions
Archetypal analysis
- scAAnet - [Python] - scAAnet performs non-linear archetypal analysis through autoencoder networks to identify shared gene expression programs (GEPs) among heterogenous cell populations and infer relative activity of each GEP across cells.
Batch-effect removal
-
BatchEffectRemoval - [Python] - Removal of Batch Effects using Distribution-Matching Residual Networks
-
ResPAN - [Python] - ResPAN is a light structured Residual autoencoder and mutual nearest neighbor Paring guided Adversarial Network for scRNA-seq batch correction.
-
TASC - [C++, python] - To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences.
-
UNCURL - [Python] - Unsupervised and semi-supervised sampling effect removal for single-cell RNA-seq data.
Cell projection and unimodal integration
- Monet - [python] - A package for analyzing and integrating scRNA-Seq data using PCA-based latent spaces.
Simulation***
-
dropsim - [R] - Simulating droplet based scRNA-seq data.
-
powsimR - [R] - Power analysis is essential to optimize the design of RNA-seq experiments and to assess and compare the power to detect differentially expressed genes. PowsimR is a flexible tool to simulate and evaluate differential expression from bulk and especially single-cell RNA-seq data making it suitable for a priori and posterior power analyses.
-
splatter - [R] - Splatter is a package for the simulation of single-cell RNA sequencing count data. It provides a simple interface for creating complex simulations that are reproducible and well-documented.
-
symsim - [R] - SymSim (Synthetic model of multiple variability factors for Simulation) is an R package for simulation of single cell RNA-Seq data.
Pseudotime and trajectory inference
-
CoSpar - [python] - CoSpar is a toolkit for dynamic inference by integrating state and lineage information. It gains superior robustness and accuracy by exploiting both the local coherence and sparsity of differentiation transitions, i.e., neighboring initial states share similar yet sparse fate outcomes. When only state information is available, CoSpar also improves upon existing dynamic inference methods by imposing sparsity and coherence.
-
ECLAIR - [python] - ECLAIR stands for Ensemble Clustering for Lineage Analysis, Inference and Robustness. Robust and scalable inference of cell lineages from gene expression data.
-
MERLoT - [R/python] - Reconstructing complex lineage trees from scRNA-seq data using MERLoT.
-
ouijaflow - [python] - A descriptive marker gene approach to single-cell pseudotime inference
-
Palantir - [Python] - Characterization of cell fate probabilities in single-cell data with Palantir
-
SCDIFF - [Python, JavaScript] - SCDIFF is a single-cell trajectory inference method with interactive visualizations powered by D3.js. SCDIFF utilized the TF regulatory information to mitigate the impact of enormous single-cell RNA-seq noise (such as drop-out). With the TF regulatory information, SCDIFF is also able to predict the TFs (and their activation time), which drive the cells to different cell fates. Such predictive power has been experimentally validated.
-
SCIMITAR - [Python] - Single Cell Inference of Morphing Trajectories and their Associated Regulation module (SCIMITAR) is a method for inferring biological properties from a pseudotemporal ordering. It can also be used to obtain progression-associated genes that vary along the trajectory, and genes that change their correlation structure over the trajectory; progression co-associated genes.
-
scVelo - [Python] - scVelo is a scalable toolkit for RNA velocity analysis in single cells. It generalizes the concept of RNA velocity by relaxing previously made assumptions with a dynamical model. It allows to identify putative driver genes, infer a latent time, estimate reaction rates of transcription, splicing and degradation, and detect competing kinetics.
-
TopSLAM - [python] - Extracting and using probabilistic Waddington's landscape recreation from single cell gene expression measurements
-
VELOCYTO - [Python, R] - Estimating RNA velocity in single cell RNA sequencing datasets.
Cell type identification and classification
-
scANVI - [python] - single-cell ANnotation using Variational Inference (scANVI) is a semi-supervised variant of scVI designed to leverage any available cell state annotations --- for instance when only one data set in a cohort is annotated, or when only a few cells in a single data set can be labeled using marker genes. Harmonization and Annotation of Single-cell Transcriptomics data with Deep Generative Models
-
DeepSort - [python] - A reference-free cell-type annotation tool for single-cell RNA-seq data using deep learning with a weighted graph neural network, which is learned based on the most comprehensive single-cell transcriptomics atlases involving 764,741 cells across 88 tissues of human and mouse. bioRxiv
-
ImmClassifier - [R,python,Docker] - A cell type annotation algorithm that employs a knowledge-based approach to annotating cells based on their underlying ontology and multitudes of previously-published data. By encoding immune cell hierarchy in a neural network, ImmClassifier is able to identify fine-grained cell types with high accuracy. By running in Docker the tool is platform-agnostic. bioRxiv
-
Celltypist - [Python] - Celltypist is an automated cell type annotation tool for scRNA-seq datasets on the basis of logistic regression classifiers optimized by the stochastic gradient descent algorithm. Celltypist provides several different models for predictions, with a current focus on immune sub-populations, in order to assist in the accurate classification of different cell types and subtypes.
-
scPRINT - [python] - scPRINT is pretrained on 50M cells to predict multiple cell labels de novo, from any single cell RNAseq profile. scPRINT: pre-training on 50 million cells allows robust gene network predictions
Doublet Identification
-
AMULET - [shell, Python, R] - A count based method for detecting multiplets from single nucleus ATAC-seq (snATAC-seq) data. Genome Biology
-
demuxlet - [shell] - Multiplexed droplet single-cell RNA-sequencing using natural genetic variation
-
DoubletDetection - [R, Python] - A Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. An R implementation is in development.
-
Scrublet - [Python] - Computational identification of cell doublets in single-cell transcriptomic data. BioRxiv
-
solo - [Python] - Doublet detection via semi-supervised deep learning.
Cell subsampling
- geosketch - [Python] - Method to subsample massive scRNA-seq datasets while preserving rare cell states. Resulting "sketch" accelerates clustering, visualization, and integration analyses. Paper
Feature (Gene) imputation
-
MAGIC - [R, Python, MATLAB] - Markov Affinity-based Graph Imputation of Cells (MAGIC). A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. On BioRviv and published in Cell.
-
scPRINT - [python] - scPRINT is pretrained on 50M cells to denoise and perform zero imputation of any single cell RNAseq profile. scPRINT: pre-training on 50 million cells allows robust gene network predictions
Variant calling
-
cb_sniffer - [python] - Mutation barcode caller, calls mutant and ref barcodes from 10x single cell data.
-
cerebra - [python] - Cerebra is a tool for high-throughput summarizing of vcf entries following traditional variant calling for a sequencing experiment. Helps to extract relevant mutation information from among tens of thousands of vcf lines.
-
monovar - [python] - Monovar is a single nucleotide variant (SNV) detection and genotyping algorithm for single-cell DNA sequencing data. It takes a list of bam files as input and outputs a vcf file containing the detected SNVs.
-
SSrGE - [python] - SSrGE is an approach to identify SNVs correlated with Gene Expression using multiple regularized linear regressions. It contains its own pipeline to infer SNVs from scRNA-seq reads and is able to identify and sort genes and SNVs for a given cell subgroup. Nature Communication (2017) Using single nucleotide variations in single-cell RNA-seq to identify subpopulations and genotype-phenotype linkage.
Epigenomics
-
AtacWorks - [python] - AtacWorks is a deep learning tool to denoise and identify accessible chromatin regions from low-coverage, low cell count, or low-quality ATAC-seq data. AtacWorks can denoise signal and identify peaks from rare cellular subtypes in a mixed population. Biorxiv
-
BIRD - [C++/R] - BIRD is a tool for predicting chromatin accessibility and inferring regulatory element activities in single cells using scRNA-seq. Global prediction of chromatin accessibility using small-cell-number and single-cell RNA-seq
-
DeepCpg - [python] - DeepCpG is a deep neural network for predicting the methylation state of CpG dinucleotides in multiple cells. It allows to accurately impute incomplete DNA methylation profiles, to discover predictive sequence motifs, and to quantify the effect of sequence mutations.
-
EpiScanpy - [python] - EpiScanpy is the epigenomic extension of scRNA-seq analysis tool Scanpy. It analyses single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. EpiScanpy: integrated single-cell epigenomic analysis
-
SCALE - [python] - SCALE is a deeplearning tool combining GMM with VAE for single-cell ATAC-seq analysis (visualization, clustering, imputation, batch effect removal, downstream analysis for celltype-specific TFs). SCALE method for single-cell ATAC-seq analysis via latent feature extraction
-
scbs - [python] - A command line tool for the analysis of Single-Cell Bisulfite-Sequencing data. scbs makes it easy to obtain a cell×region methylation matrix (≈count matrix) from raw single-cell methylation files and enables efficient storage, quality control and visualization. Furthermore, scbs allows you to scan the entire genome for variably or differentially methylated regions (VMRs or DMRs), and implements a novel approach for more accurate quantification of methylation in genomic intervals. bioRχiv
-
scE2G - [python] - Family of models to predict enhancer-gene regulation based on single-cell data. These models use features derived from either single-cell ATAC-seq or multiomicRNA and ATAC-seq data in supervised classifiers trained on a new harmonized CRISPR perturbation dataset including over 13,000 examples.
Multi-assay data integration
-
CITE-seq-Count - [python] Cite-seq-Count is a python package that deals with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) and cell hashing data. CITE-seq is a multimodal single cell phenotyping method that allows for immunophenotyping of cells with a potentially limitless number of markers and unbiased transcriptome analysis using existing single-cell sequencing approaches.
-
Cobolt - [python] - Cobolt is a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. Cobolt: integrative analysis of multimodal single-cell sequencing data
-
gimVI - [python] - gimVI is a joint generative model for imputation of missing genes in spatial transcriptomics assay from unpaired scRNA-seq data. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements
-
GLUE - [python] - GLUE (Graph-Linked Unified Embedding) is a deep learning method for unpaired single-cell multi-omics data integration and regulatory inference (Paper).
-
MATCHER - [python] - MATCHER: An algorithm for integrating single cell transcriptomic and epigenomic data using manifold alignment. MATCHER takes multiple types of single cell measurements performed on distinct single cells and infers single cell multi-omic profiles.
-
MultiVI - [python] - MultiVI is a probabilistic framework that leverages deep neural networks to jointly analyze scRNA, scATAC and multiomic (scRNA + scATAC) data. MultiVI: deep generative model for the integration of multi-modal data
-
MOFA - [python, R] - Multi‐Omics Factor Analysis, a framework for unsupervised integration of multi‐omics data sets. MOFA is a method for disentangling the different sources of heterogeneity in bulk and single-cell multi-omics data sets. It identifies the latent factors that drive unique and shared variability in the different assays. The factors can be used for visualisation, pseudotime reconstruction, imputation, among other functionalities. Paper
-
SCALEX - [python] - SCALEX provides a VAE framework for integration of heterogeneous single-cell data by disentangling batch-invariant components from batch-related variations and projecting the batch-invariant components into a generalized, low-dimensional cell-embedding space. Construction of continuously expandable single-cell atlases through integration of heterogeneous datasets in a generalized cell-embedding space
-
scarf - [python] - 🧣 Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop. Preprint
-
scDART - [python] - scDART is a deep learning framework that integrates scRNA-seq and scATAC-seq data and learns cross-modalities relationships simultaneously. scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously
-
SISUA - [python] - In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture. bioRxiv
-
TotalVI - [python] - Total Variational Inference (totalVI) is a coupled generative model and inference procedure for CITE-seq data. TotalVI deals with modelisation of the background noise of protein measurements, harmonization of multiple CITE-seq experiments and imputation of missing proteins. A Joint Model of RNA Expression and Surface Protein Abundance in Single Cells
Rare cell detection
- FiRE - [python, R, C++] - Finder of rare entities (FiRE) helps identify rare cell types in voluminous single-cell datasets. Design of FiRE is inspired by the observation that rareness estimation of a particular data point is the flip side of measuring the density around it. In principle, FiRE uses the Sketching technique, a variant of locality sensitive hashing, to assign rareness score to every cell. Paper
Single cell large model
-
geneformer - [Python] a single-cell large model training on 30 million human single-cell transcriptomics, supporting batch integration, gene dosage sensitivity predictions, chromatin dynamics prediction, network dynamics prediction, etc. manuscript open access: Transfer learning enables predictions in network biology
-
scGPT - [Python] a single-cell large model training on 33 million human single-cell transcriptomics, supporting single-cell annotation, batch integration, perturbation prediction manuscript open access: scGPT: toward building a foundation model for single-cell multi-omics using generative AI
-
scFoundation - [Python] a single-cell large model training on 50 million human single-cell transcriptomics with 100 million parameters, supporting single-cell clustering, drug response prediction, perturbation prediction, single-cell annotation, gene module inference, etc. manuscript open access: Large-scale foundation model on single-cell transcriptomics
-
CellPLM - [Python] the first single-Cell Pre-trained Language Model that encodes cell-cell relations and it consistently outperforms existing pre-trained and non-pre-trained models in diverse downstream tasks, with 100x higher inference speed compared to existing pre-trained models, training on 9 million scRNA-seq cells and 2 million SRT cells. manuscript open access: CellPLM: Pre-training of Cell Language Model Beyond Single Cells
Other applications
-
BASIC - [python] - BASIC is a semi-de novo assembly method to determine the full-length sequence of the BCR in single B cells from scRNA-seq data.
-
dropSeqPipe - [python, R, snakemake] - An automatic data handling pipeline for drop-seq/scrb-seq data. It runs from raw fastq.gz data until the final count matrix with QC plots along the way.
-
ffq - [python] - Fetch run and metadata information for single-cell genomics datasets.
-
gget - [Python] - gget is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.
-
immunarch - [R] - R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
-
SCope - [python] - SCope is a fast visualization tool for large-scale and high dimensional scRNA-seq datasets. Publication here.
-
scTE - [python] - Quantifying transposable element expression from single-cell sequencing data.
-
SCIFER - [shell, python] - Approach for analysis of LINE-1 mRNA expression in single cells at a single locus resolution.
-
Sinto - [python] - A toolkit for working with aligned single-cell reads. Includes functions to split BAM files by cell barcode, add cell barcodes as read tags, move cell barcode information to read groups, and create a scATAC-seq fragment file from a BAM file.
-
sircel - [python] - sircel (pronounced "circle") separates reads in a fastq file based on barcode sequences that occur at known positions of reads. This is an essential first step in analyzing single-cell genomics data from experiments such as Drop-Seq. Barcode sequences often contain deletion and/or mismatch errors that arise during barcode synthesis and sequencing, and we have designed our barcode recovery approach with these issues in mind. In addition to identifying barcodes in an unbiased manner, sircel also quantifies their abundances. doi
-
Snakemake single-cell-rna-seq workflow - [python, R, snakemake] - An automated pipeline for single cell RNA-seq analysis.
-
Wishbone - [python] - Wishbone is an algorithm to identify bifurcating developmental trajectories from single cell data. Wishbone can applied to both single cell RNA-seq and mass cytometry datasets.
Spatial transcriptomics
-
cell2location - [Python] A Bayesian model that perform spatial deconvolution in SRT data and create cellular maps of diverse tissues based on negative binomial distribution. manuscript open access: Cell2location maps fine-grained cell types in spatial transcriptomics
-
DestVI - [Python] A spatial deconvolution method designed with single-cell Latent Variable Model, scLVM(variational auto-encoder architecture). manuscript open access: DestVI identifies continuums of cell types in spatial transcriptomics data
-
DSTG - [Python] A spatial deconvolution method designed with graph-based convolutional networks. manuscript open access: DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence
-
Merfishtools - [Python] - MERFISHtools implement a Bayesian framework for accurately predicting gene or transcript expression from MERFISH data.
-
NMFreg - [Python] - The method is proposed in Slide-seq paper and reconstructs expression of each Slide-seq bead as a weighted combination of metagene factors, each corresponding to the expression signature of an individual cell type, defined from scRNA-seq.
-
PASTE - [Python] A spatial alignment tool for aligning homogeneous spatial transcriptomic slices based on optimal transport and euclidean distance. manuscript open access: Alignment and integration of spatial transcriptomics data
-
PASTE2 - [Python] A spatial alignment tool for aligning homogeneous spatial transcriptomic slices based on the partial extension of the Fused Gromov-Wasserstein optimal transport. manuscript open access: PASTE2: Partial Alignment of Multi-slice Spatially Resolved Transcriptomics Data
-
SLAT - [Python] SLAT is to align both homogeneous and heterogeneous (the first work) single cell spatial omics data by employing a graph alignment framework consists of LGCN and adversarial discriminator. manuscript open access: Spatial-linked alignment tool (SLAT) for aligning heterogenous slices
-
SpaGCN - [Python] SpaGCN is a graph convolutional network to integrate gene expression and histology to identify spatial domains and spatially variable genes. manuscript open access: SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network
-
SpatialDe - [Python] - SpatialDE is a statistical test to identify genes with spatial patterns of expression variation from multiplexed imaging or spatial RNA-sequencing data.
-
SpatialPrompt - [Python] SpatialPrompt is a spot deconvolution and domain identification method for spatially resolved transcriptomics datasets. Main advantage of this tool is, it is highly scalable for large datasets.
-
Splotch - [Python] Splotch is a hierarchical generative probabilistic model for analyzing Spatial Transcriptomics data.
-
squidpy - [Python] - Squidpy is a Python package for the analysis and visualization of spatial molecular data. It provides tools to process, analyze, and visualize spatial transcriptomics data, including spatially resolved transcriptomics and spatial proteomics. Squidpy: a scalable framework for spatial omics data analysis.
-
Starspace - [Python] - Defines a schema for gene or protein expression data containing spatially localized information. Converts data from a variety of assay types, including Spatial Transcriptomics, CODEX, In-situ Sequencing, MERFISH, osmFISH, and starMAP. Demonstrates how to visualize and interact with these data using common analysis packages, and convert the formats into loom and anndata objects, for downstream analysis in R and Python.
-
STAGATE - [Python] STAGATE is designed for spatial clustering and denoising expressions of spatial resolved transcriptomics (ST) data by learning low-dimensional latent embeddings with both spatial information and gene expressions via a graph attention auto-encoder(GATE). manuscript open access: Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder
-
STAligner - [Python] STAligner is designed for alignment and integration of spatially resolved transcriptomics data, it employs a graph attention auto-encoder neural network(GATE) to extract spatially aware embedding and introduces the triplet loss to update the spot embedding to reduce the distance from the anchor to positive spot. manuscript open access: STAligner enables the integration and alignment of multiple spatial transcriptomics datasets
-
Tangram - [Python] Tangram is used to map single-cell (or single-nucleus) gene expression data onto spatial gene expression data designed with optimizing a specially designed mapping objective loss. manuscript open access: Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram
Tutorials and workflows
-
Analysis of single cell RNA-seq data - [R and Python] - The course is taught through the University of Cambridge Bioinformatics training unit, but the material found on these pages is meant to be used for anyone interested in learning about computational analysis of scRNA-seq data.
-
Aaron Lun's Single Cell workflow on Bioconductor - [R] - This article describes a computational workflow for basic analysis of scRNA-seq data using software packages from the open-source Bioconductor project.
-
ATAC-Seq Pipeline - [Shell and R] - Chromatin accessibility landscape of pediatric T-lymphoblastic leukemia and human T-cell precursors.
-
Bioconductor2016 Single-cell-RNA-sequencing workshop by Sandrine Dudoit lab - [R] - SCONE, clusterExperiment, and slingshot tutorial.
-
BiomedCentral Single Cell Omics collectin - collection of papers describing techniques for single-cell analysis and protocols.
-
Clustering 3K PBMCs with Scanpy in Galaxy - Galaxy Training Material.
-
CRUK CI Introduction to single-cell RNA-seq data analysis | website
-
CSHL Single Cell Analysis - Bioinformatics course materials - Uses Shalek 2013 and Macaulay 2016 datasets to teach machine learning to biologists
-
Dana-Farber Cancer Institute Trajectory inference across conditions: differential expression and differential progression | website
-
Dan Beiting DIY Transcriptomics | website - A hybrid course covering best practices for bulk and single cell RNA-seq data analysis, with a primary focus on empowering students to be independent in the use of lightweight and open-source software and the R/bioconductor environment.
-
ELIXIR EXCELERATE Single RNA-seq data analysis with R | website
-
Festival of Genomics California Single Cell Workshop - [R] - Explores basic workflow from exploratory data analysis to normalization and downstream analyses using a dataset of 1679 cells from the Allen Brain Atlas.
-
Gilad Lab Single Cell Data Exploration - R-based exploration of single cell sequence data. Lots of experimentation.
-
GPU accelerated single-cell analysis using RAPIDS - NVIDIA tutorials on using RAPIDS (https://www.rapids.ai/) to accelerate single-cell analysis on GPUs.
-
Harvard Chan Bioinformatics Core Single-cell RNA-seq data analysis workshop | website
-
Harvard STEM Cell Institute Single Cell Workshop 2015 - workshop on common computational analysis techniques for scRNA-seq data from differential expression to subpopulation identification and network analysis. See course description for more information
-
kallistobustools - kallisto | bustools workflow for pre-processing single-cell RNA-seq data.
-
nf-core/scrnaseq - nf-core/scrnaseq is a bioinformatics best-practice analysis pipeline for processing 10x Genomics single-cell RNA-seq data. The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
-
nf-core/scflow - nf-core/scflow is a bioinformatics pipeline for scalable, reproducible, best-practice analyses of single-cell/nuclei RNA-sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
-
nf-core/scnanoseq - nf-core/scnanoseq is a Nextflow analysis pipeline for processing 10X Genomics single-cell/nuclei RNA-seq data derived from the Oxford Nanopore long-read sequencer. The pipeline has been designed to be scalable to large datasets (including PromethION data), portable and reproducible.
-
Orchestrating Single-Cell Analysis with Bioconductor - [R] - This blogdown book describes a comprehensive and reproducible workflow for the analysis of single-cell RNA-sequencing data.
-
Pre-processing of 10X Single-Cell RNA Datasets in Galaxy - Galaxy Training Material.
-
Theis Lab Single Cell Tutorial - The main part of this repository is a case study where the best-practices established in the manuscript are applied to a mouse intestinal epithelium regions dataset from Haber et al., Nature 551 (2018) available from the GEO under GSE92332.
-
Using Seurat (v1.2) for unsupervised clustering and biomarker discovery - 301 single cells across diverse tissues from (Pollen et al., Nature Biotechnology, 2014). Original tutorial using Seurat 1.2
-
Using Seurat (v1.2) for spatial inference in single-cell data - 851 single cells from zebrafish embryogenesis (Satija*, Farrell* et al., Nature Biotechnology, 2015). Original tutorial using Seurat 1.2
-
Seurat (v3.0) - Guided Clustering Tutorial - new tutorial using Seurat 3.0
-
SIB NBIS/SciLifeLab Advanced topics in Single Cell Omics | website
-
Wellcome Sanger Institute Analysis of single cell RNA-seq data | website
Web portals, apps, and databases
Web portals and databases
-
10X Genomics datasets - 10x genomics public datasets, including 1.3M cell mouse brain dataset.
-
ASAP - Automated Single-cell Analysis Pipeline (deposited in BioRXiv on December 22, 2016).
-
cellBrowser - [Python, Javascript] Python pipeline and Javascript scatter plot library for single-cell datasets. Demo
-
CellView - CellView is an R Shiny web application that allows knowledge-based and hypothesis-driven exploration of processed single cell transcriptomic data. ref.
-
Cell_BLAST - A Web portal powered by Cell_BLAST (scRNA-seq querying tool) and ACA (scRNA-seq database).
-
CELLxGENE - CELLxGENE is a suite of tools that help scientists to find, download, explore, analyze, annotate, and publish single cell datasets. It includes several powerful tools with various features to help you to engage with single cell data.
-
conquer - A repository of consistently processed, analysis-ready single-cell RNA-seq data sets.
-
Curated Database of single-cell studies - Available as a tsv download. Over 500 single cell transcriptomics studies have been published to date. Many of these have data available, but the links between data, study, and systems studied can be hard to identify through literature search. This manuscript describes a nearly exhaustive and manually curated database of single cell transcriptomics studies with descriptions of what kind of data and what biological systems have been studied. bioRxiv.
-
D^3^E - Discrete Distributional Differential Expression (D^3^E) is a tool for identifying differentially-expressed genes, based on single-cell RNA-seq data.
-
dseqr - Dseqr runs end-to-end multi-sample single-cell and bulk RNA-seq analyses using a user friendly web app built around best practices from the OSCA handbook. Features include pseudobulk differential expression analysis, automated cluster annotation, reference mapping with Azimuth, Gene Ontology analysis, and drug connectivity mapping. Projects can either be analysed online or locally using the dseqr R package.
-
EBI Single Cell Expression Atlas - The Single Cell Expression Atlas contains uniformly re-analysed single cell expression data across different species and provides interactive visualizations to explore that data.
-
Galaxy Single Cell Omics Workbench - dedicated Galaxy server for analyzing single cell data.
-
IRIS3 - IRIS3 (integrated cell-type-specific regulon inference server from single-cell RNA-Seq) is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified cell-type-specific regulons.
-
JingleBells - A repository of standardized single cell RNA-Seq datasets for analysis and visualization in IGV at the single cell level. Currently focused on immune cells (http://www.jimmunol.org/content/198/9/3375.long).
-
SCPortalen - SCPortalen: human and mouse single-cell centric database. ref
-
scRNA.seq.datasets - Collection of public scRNA-Seq datasets used by Hemberg Lab
-
scRNASeqDB - A database aggregating human single-cell RNA-seq datasets. ref
-
Single Cell Portal - The Single-Cell Portal was developed to facilitate open data and open science in Single-cell Genomics. The portal currently focuses on sharing scientific results interactively, and sharing associated datasets.
-
Single-Cell Tumor Immune Atlas project - [R] - We generated a single-cell tumor immune atlas, jointly analyzing >500,000 cells from 217 patients and 13 cancer types, providing the basis for a patient stratification based on immune cell compositions.
-
V-SVA - An R Shiny application for detecting and annotating hidden sources of variation in single cell RNA-seq data.
-
WOT - Waddington Optimal Transport (wot) uses time-course data to infer how the probability distribution of cells in gene-expression space evolves over time, by using the mathematical approach of optimal transport.
Interactive visualization and analysis
-
Cellar - [Python] - Cellar is an easy to use, interactive, and comprehensive software tool for the assignment of cell types in single-cell studies. It supports preprocessing, dimensionality reduction, clustering, differential expression & enrichment analysis, spatial transcriptomics, label transfer, semi-supervised clustering and more. A live web server running Cellar is available here. NatureComm.
-
cellBrowser - [Python, Javascript] Python pipeline and Javascript scatter plot library for single-cell datasets. Demo
-
Cirrocumulus - Cirrocumulus is an interactive visualization tool for large-scale single-cell genomics (e.g. sc/snRNA-seq, spatial) data.
-
CReSCENT - [R, Javascript, Python] - CReSCENT: CanceR Single Cell ExpressioN Toolkit (Mohanraj et al. 2020), is an intuitive and scalable web portal incorporating a containerized pipeline execution engine for standardized analysis of cancer scRNA-seq data and associated metadata. CReSCENT uses public data sets and preconfigured pipelines that are accessible to computational biology non-experts and are user-editable to allow optimization, comparison, and reanalysis for specific experiments. Users can also upload their own scRNA-seq data for analysis and results can be kept private or shared with other users.
-
FASTGenomics - [Python, R] - FASTGenomics is an online platform to share single-cell RNA sequencing data and analyses using reproducible workflows. Gene expression data can be shared meeting European data protection standards (GDPR). FASTGenomics enables the user to upload their own data and generate customized and reproducible workflows for the exploration and analysis of gene expression data (Scholz et al. 2018). Follow us on Twitter.
-
iS-CellR - iS-CellR (Interactive platform for Single-cell RNAseq) is a web-based Shiny app that integrates the Seurat package with Shiny's reactive programming framework to provide comprhensive analysis and interactive visualization of single-cell RNAseq data. Paper
-
PIVOT - Platform for Interactive analysis and Visualization Of Transcriptomics data. ref
-
scRNAseqApp - The scRNAseqApp is a Shiny app package designed for interactive visualization of single-cell data. It is an enhanced version derived from the ShinyCell, repackaged to accommodate multiple datasets. The app enables users to visualize data containing various types of information simultaneously, facilitating comprehensive analysis. Additionally, it includes a user management system to regulate database accessibility for different users.
-
singleCellTK - The singleCellTK is an R/Shiny package and GUI for analyzing and visualizing scRNA-Seq through a web interface. Analysis modules include data summary and filtering, dimensionality reduction and clustering, batch correction, differential expression analysis, pathway activity analysis, and power analysis. Comprehensive generation, visualization, and reporting of quality control metrics for single-cell RNA sequencing data.
-
spatialGE-web - A web application providing point-and-click access to the methods in the spatialGE R package and other tools (including STdeconvolve, InSituType, SpaGCN). The web application requires no coding experience. User accounts can be created to safely keep samples and results organized within projects.
-
STREAM - STREAM is an interactive computational pipeline for reconstructing complex celluar developmental trajectories from sc-qPCR, scRNA-seq or scATAC-seq data. preprint.
-
Vitessce - [JavaScript, Python, R] - A framework for integrative visualization of multi-modal single-cell data, supporting microscopy images, cell segmentations, dimensionality reduction scatterplots, gene expression heatmaps, and genome browser tracks. Vitessce is packaged as a React component, Jupyter widget, and R htmlwidget.
Big data approach overview
Experimental design
-
Design and computational analysis of single-cell RNA-sequencing experiments
-
How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives
-
Sensei. How to determine the number of patients required to ascertain a cell-type abundance change estimated from scRNA-seq experiment.
People
Gender bias at conferences is a well known problem (http://www.sciencemag.org/careers/2015/07/countering-gender-bias-conferences). Creating a list of potential speakers can help mitigate this bias and a community of people developing and maintaining helps to further diversify this list beyond smaller networks.
Female
-
Barbara Di Camillo (Information Engineering Department, University of Padova, Italy
-
Jinmiao Chen (Singapore Immunology Network, A*STAR, Singapore)
-
Stephanie Hicks (Johns Hopkins Bloomberg School of Public Health, USA)
-
Christina Kendziorski (University of Wisconsin--Madison, USA)
-
Samantha Morris (Depts of Dev. Bio. and Genetics, Washington University, St. Louis)
-
Alicia Oshlack (Murdoch Children's Research Institute, Australia)
-
Charlotte Soneson (Institute of Molecular Life Sciences, University of Zurich)
-
Sanja Vickovic (New York Genome Center & Columbia University, USA)
Male
-
Bart DePlancke (EPFL, School of Life sciences, Institute of Bioengineering, Switzerland)
-
Raphael Gottardo (Fred Hutchinson Cancer Research Center, USA)
-
Chung Chau Hon (RIKEN Centre for Integrative Medical Sciences, Yokohama)
-
Peter Kharchenko (Department of Biomedical Informatics, Harvard Medical School, USA)
-
John Reid (MRC Biostatistics Unit, Cambridge University, UK)
-
Mark Robinson (Institute of Molecular Life Sciences, University of Zurich)
-
Yvan Saeys (Vlaams Instituut voor Biotechnologie, Ghent, Belgium)
-
Peter Sims (Columbia University, Department of Systems Biology)
-
Fabian Theis (Institute of Computational Biology, Helmholtz Zentrum München)
-
Cole Trapnell (University of Washington, Department of Genome Sciences)
-
Itai Yanai (New York University, School of Medicine, Institute for Computational Medicine, USA)