MIDAS

Interactive R/Shiny applications for 16S rRNA and ITS amplicon sequence analysis

MIDAS provides complete, code-free graphical interfaces for amplicon sequence analysis using the DADA2 pipeline. Two separate applications cover the most widely used marker genes in microbial ecology:

16S rRNA app (MIDAS_16S.R) — for bacterial and archaeal community profiling (10-step workflow)
ITS app (MIDAS_ITS.R) — for fungal community profiling (11-step workflow)

Both applications take raw paired-end Illumina FASTQ files as input and produce publication-ready figures, statistical test results, and downloadable data tables — all without writing a single line of code.

Please see below for more screenshots.

Installation

Prerequisites

R (version 4.0 or higher) — Download R
RStudio (recommended) — Download RStudio
cutadapt (ITS app only) — installed automatically via pip, or install manually

Quick Start

Clone the repository:

git clone https://github.com/beantkapoor786/MIDAS.git
cd MIDAS

Launch the 16S app:

shiny::runApp("MIDAS_16S.R")

If you recieve an error related to file opening, please double check that MIDAS_16S.R exists in your current working directory in R/Rstudio.

Or launch the ITS app:

shiny::runApp("MIDAS_ITS.R")

If you receive an error related to file opening, please double check that MIDAS_ITS.R exists in your current working directory in R/Rstudio.

On first launch, all required R packages will be installed automatically. This may take several minutes.

Required Packages

The apps automatically install missing packages on startup. The full dependency list:

CRAN: shiny, ggplot2, tidyverse, shinyjs, DT, shinycssloaders, callr, vegan

Bioconductor: dada2, phyloseq, Biostrings, DECIPHER, microbiome, ANCOMBC

ITS app additionally requires: ShortRead (Bioconductor), cutadapt (Python)

Usage

Input Data

Paired-end FASTQ files (.fastq, .fastq.gz, .fq, .fq.gz)
Files must be demultiplexed (one pair per sample)
Forward and reverse reads should follow a naming convention (e.g., _R1_001.fastq.gz / _R2_001.fastq.gz)

Metadata File

CSV or TSV format
First column must contain sample names matching the extracted sample names
Additional columns are metadata variables (e.g., Treatment, Site, Age)
Variables are auto-detected as factor or continuous, with manual override
Handles quoted-row CSVs (common Excel export format) automatically

Taxonomy Databases

16S app: SILVA v138.1 is downloaded automatically on first use (~130 MB)
ITS app: Provide the path to a UNITE reference FASTA file, or point to a directory for auto-download

Session Persistence

All pipeline state is saved automatically to your data directory after each step. When relaunching the app, you will be offered the option to resume from your last completed step.

Exports

Export	Format	Available At
ASV table	CSV	Taxonomy step
Taxonomy table	CSV	Taxonomy step
Read tracking table	CSV	Merge step
Full R workspace	.RData	Phyloseq step
Phyloseq object	.rds	Phyloseq step
All figures	PNG (300 DPI)	Figures step
PERMANOVA results	CSV	PERMANOVA step
Betadisper results	CSV	PERMANOVA step
Pairwise PERMANOVA	CSV	PERMANOVA step
ANCOM-BC2 global test	CSV	ANCOM-BC2 step
ANCOM-BC2 pairwise	CSV	ANCOM-BC2 step

Features

Shared Features (Both Apps)

Guided step-by-step workflow with real-time progress monitoring
Asynchronous processing — computationally intensive steps run in background subprocesses, keeping the interface responsive
Session persistence — automatically saves progress; resume analyses across sessions
Auto-detection of forward/reverse read file patterns
Configurable sample name extraction with live preview
Interactive quality profile visualization with downloadable plots
Full DADA2 filtering parameters exposed with sensible defaults
Background error learning, dereplication, and chimera removal
Read tracking table and plot showing retention at each pipeline step
Phyloseq integration with metadata upload (CSV/TSV) and variable type selection
Data transformation — rarefaction (configurable depth), relative abundance, or CLR (Centered Log-Ratio)
Interactive visualizations:
- Rarefaction curves (colored by group)
- Alpha diversity boxplots (Observed, Shannon, Simpson)
- NMDS ordination with 95% confidence ellipses
- PCoA ordination with variance-explained axes
- Taxonomic abundance bar plots (top N taxa at any rank)
PERMANOVA — free-text formula input, betadisper homogeneity test, pairwise comparisons with Bonferroni correction, command preview, CSV downloads
ANCOM-BC2 — differential abundance analysis with fixed/random effects formulas, collapsible advanced settings panel (all parameters), global test, pairwise directional test, async execution, CSV downloads
Publication-ready figure export at 300 DPI (PNG)
Dark-themed modern interface (Outfit + JetBrains Mono fonts)

16S-Specific Features

SILVA database (v138.1) auto-download for taxonomy assignment
Naive Bayesian classifier or IdTaxa (DECIPHER) for taxonomy
Species-level assignment via exact sequence matching
Fixed-length truncation (truncLen) for quality filtering

ITS-Specific Features

Primer removal using cutadapt with:
- Support for multiple forward/reverse primers (unequal counts allowed)
- Automatic N-prefiltering
- Primer orientation hit count table for verification
- Automatic cutadapt detection or installation via pip
No fixed-length truncation (biological length variation preserved)
Minimum length filtering (default 50 bp) to remove spurious sequences
UNITE database for fungal taxonomy assignment

Workflow

16S rRNA Pipeline (10 Steps)

Step	Name	Description
1	Setup & Files	Path input, file detection, sample name extraction
2	Quality Profiles	Visualize per-position quality scores
3	Filter & Trim	Quality filtering with configurable parameters
4	Dereplication	Error learning and sequence dereplication
5	Merge & Chimeras	Paired-read merging and chimera removal
6	Taxonomy	SILVA-based taxonomy assignment
7	Phyloseq	Object construction, metadata, transformation
8	Figures	Rarefaction, alpha/beta diversity, abundance plots
9	PERMANOVA	Multivariate community composition testing
10	ANCOM-BC2	Differential abundance analysis

ITS Pipeline (11 Steps)

Step	Name	Description
1	Setup & Files	Path input, file detection, sample name extraction
2	Quality Profiles	Visualize per-position quality scores
3	Primer Removal	Cutadapt-based primer trimming with verification
4	Filter & Trim	Quality filtering (no truncation, minLen = 50)
5	Dereplication	Error learning and sequence dereplication
6	Merge & Chimeras	Paired-read merging and chimera removal
7	Taxonomy	UNITE-based fungal taxonomy assignment
8	Phyloseq	Object construction, metadata, transformation
9	Figures	Rarefaction, alpha/beta diversity, abundance plots
10	PERMANOVA	Multivariate community composition testing
11	ANCOM-BC2	Differential abundance analysis

Screenshots

Setup & File Detection

Quality Profiles

Filter & Trim

DADA2 Error Learning & Dereplication

Merge Reads & Remove Chimeras

Read tracking table

Assign taxonomy

Phyloseq & Data Transformation

Rarefaction curve

Alpha Diversity

Ordination (PCoA)

PERMANOVA

ANCOM-BC2

Architecture

Both apps share a common architecture:

Asynchronous computation via callr::r_bg() background subprocesses — resolves the incompatibility between DADA2's fork-based multiprocessing and Shiny's reactive framework
JavaScript-driven instant progress indicators — animated progress bars appear immediately on button click, before Shiny's reactive flush
Reactive session persistence — auto-saves .RData to the data directory after each step
Dynamic UI — metadata variable types, primer inputs, and quality plot sample counts adapt to the data

Troubleshooting

Common Issues

App won't start / package installation fails

Ensure R >= 4.0 is installed

Try installing Bioconductor packages manually:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(c("dada2", "phyloseq", "ANCOMBC", "microbiome", "DECIPHER"))

Cutadapt not found (ITS app)

Install via pip: pip install cutadapt or pip3 install cutadapt
Or provide the full path to cutadapt in the app

Cutadapt produces no output files

Check primer sequences match your reads (verify via the primer hit count table)
Paths with spaces are handled automatically via shell quoting

Phyloseq construction fails / no sample names match

Ensure the first column of your metadata CSV contains sample names matching the extracted sample names shown in Step 1
The app handles CSVs with quoted rows (common Excel export issue) automatically

ANCOM-BC2 / PERMANOVA errors about missing variables

Both tools validate formula variables before running — check the error notification for the exact variable name and available metadata columns

Taxonomy step hangs

16S: SILVA database download may take time on first use
ITS: Ensure the UNITE database path points to an actual .fasta file or a directory containing one

Citation

If you use MIDAS in your research, please cite:

[Authors]. MIDAS: an interactive R/Shiny platform for end-to-end 16S rRNA and ITS amplicon sequence analysis using DADA2 [Journal], [Year]. DOI: [DOI]

Please also cite the underlying tools:

DADA2: Callahan BJ et al. (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13:581-583
phyloseq: McMurdie PJ & Holmes S (2013) phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8:e61217
ANCOM-BC2: Lin H & Peddada SD (2024) Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures. Nature Methods 21:83-91
vegan: Oksanen J et al. (2022) vegan: Community Ecology Package. R package
cutadapt: Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10-12
SILVA: Quast C et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41:D590–D596
UNITE: Nilsson RH et al. (2019) The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research 47:D259–D264

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -am 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

MIDAS builds on the outstanding work of the R/Bioconductor microbiome analysis ecosystem, including DADA2, phyloseq, vegan, ANCOM-BC2, microbiome, DECIPHER, and the Shiny framework.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
images		images
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MIDAS_16S.R		MIDAS_16S.R
MIDAS_ITS.R		MIDAS_ITS.R
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MIDAS

Installation

Prerequisites

Quick Start

Required Packages

Usage

Input Data

Metadata File

Taxonomy Databases

Session Persistence

Exports

Features

Shared Features (Both Apps)

16S-Specific Features

ITS-Specific Features

Workflow

16S rRNA Pipeline (10 Steps)

ITS Pipeline (11 Steps)

Screenshots

Setup & File Detection

Quality Profiles

Filter & Trim

DADA2 Error Learning & Dereplication

Merge Reads & Remove Chimeras

Read tracking table

Assign taxonomy

Phyloseq & Data Transformation

Rarefaction curve

Alpha Diversity

Ordination (PCoA)

PERMANOVA

ANCOM-BC2

Architecture

Troubleshooting

Common Issues

Citation

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages