[to be determined]
In this project, the authors investigate the pathogenesis of biliary atresia (BA) by creating and sequencing patient-derived multilineage biliary organoids (MBO). Bulk RNA sequencing and single-cell RNA sequencing were used to investigate markers of epithelial-mesenchymal transition (EMT) in these MBOs.
For the bulk RNAseq, we performed quality assessment and pre-processing using FASTQC, Trim Galore, Cutadapt, and SAMtools. We aligned the reads to hg38 genome using Bowtie2 and removed low-quality alignments and PCR duplicates using SAMtools and Picard MarkDuplicates. We quantified gene expression using htseq-count, analyzed differential expression using DESeq2, and performed PCA using FactoMineR. PCAs, box plots, heatmaps and volcano plots were visualized using ggplot2, pheatmap and EnhancedVolcano.
For the proteomics data, we analyzed differential expression using stats and visualized heatmap using pheatmap.
For the single-cell portion of the paper, we used Cell Ranger to generate gene-by-cell expression matrices, SoupX to correct for ambient RNAs, and Seurat to perform QC and data integration, as well as most of the plots and statistics. In addition to generating the integrated dataset, we quantified differential cell type proportions between BA MBOs and normal control (NC) MBOs with Seurat, performed gene set enrichment using Enrichr, scored cells for known EMT transcriptional markers, and used CellPhoneDB to predict cell-cell communication.
The following script was used to process all bulk RNAseq data and to generate figures:
- All analyses and figure generation (bulk_analysis.R)
The following files include codes for specific analyses on proteomics:
- Proteomics_analysis_code.R
- heatmap_code.R
The following scripts were used to perform the following basic tasks to generate the integrated dataset:
- Cell Ranger (cellranger6.sh, run.sh)
- SoupX (SoupX.sh and .R)
- QC and filtering (QC_and_Filtering.sh and .R)
- Make Seurat objects (Seurat.sh and .R)
- Sample integration (Seurat_Integration.sh and .R)
The following scripts were used to perform specific analyses on the integrated dataset:
- Differential cell type proportions (UMAP_and_Differential_Proportions.R)
- Gene set enrichment analysis (Enrichr.R)
- EMT scoring and statistics (EMT_Scoring.R)
- Cell-cell communication prediction with CellPhoneDB (CPDB_Prep.R, CPDB_Run.sh, CPDB_Post.R)
The following scripts and files are needed for running some of the above scripts. These include:
- Gene information metadata sheet, needed for QC and filtering (gene_info_GRCh38_PvG200204.txt)
- Signature genes for EMT scoring. The set described in the paper is the second column (EMT_signatures.xlsx)
- Functions used in gene scoring, prior to modification, from Single-cell_BPDCN_Functions.R at https://github.com/petervangalen/Single-cell_BPDCN/ (200116_FunctionsGeneral.R)