This appendix keeps the operational details in one place so the README can stay clean. For biological context on the liver cell compartments, see biology_primer_liver_fibrosis.md.
config/ Dataset paths, marker panels, scoring weights
workflow/ Ordered R workflow stages
src/R/ Shared R helpers
scripts/ Data prep, validation, evidence enrichment, checks
nextflow/ Local and AWS workflow scaffold
dashboard/ Shiny app and dashboard-ready CSVs
reports/ Executive report, written responses, figures, tables
docs/ Method walkthrough and technical appendix
data/demo/ Tiny tracked demo dataset
data/metadata/ Curated public-data manifests
Primary input:
- GSE136103 GEO processed count matrices:
barcodes.tsv.gz,genes.tsv.gz,matrix.mtx.gz - Manifest generated by
workflow/02_curate_metadata.R
Validation inputs:
- GSE244832 processed liver matrix and metadata for MASH/HSC validation
- GSE207310 bulk count tables and GEO phenotype metadata for NAFLD/NASH directionality
- GSE136103 blood and mouse libraries for marker specificity and preclinical conservation checks
For proprietary or future datasets, the expected minimum metadata are:
| Field | Purpose |
|---|---|
sample_id |
Stable sample identifier |
donor |
Biological replicate |
disease_state |
Control, MASH, cirrhosis, fibrosis bin, or source label |
tissue |
Liver, blood, other tissue source |
species |
Human, mouse, other |
assay_type |
scRNA-seq, snRNA-seq, bulk RNA-seq, spatial |
batch |
Chemistry, site, run, or processing batch |
Recommended fields include fibrosis stage, MASLD/MASH category, biopsy source, sex, age, BMI, diabetes status, medication, and clinical covariates.
make check
make fetch-data
make curate
make analyze
make refine-labels
make pseudobulk
make pathfindr
make prioritize
make validation
make hsc-validation
make gse244832-focused
make gse207310-validation
make secondary-validation
make evidence
make translational-evidence
make dashboard
make render-summary
make validate-repoThe full local workflow is:
make all- R 4.6.0
- Seurat 5.5.0
- pathfindR 2.6.0
- Java/OpenJDK for pathfindR active-subnetwork search and Nextflow
renv.lockfor package pinningDockerfilefor containerized execution- Nextflow plus Java for local and AWS pipeline execution
Restore R packages:
Rscript -e "renv::restore()"Executive and interpretation:
reports/executive_submission_summary.htmlreports/executive_submission_summary.Rmdreports/executive_submission_summary.mddocs/analysis_walkthrough.mddocs/biology_primer_liver_fibrosis.mdreports/screening_responses/README.md
Core tables:
reports/tables/qc_by_library.csvreports/tables/qc_filtered_by_library_compartment.csvreports/tables/compartment_de_cell_level_exploratory.csvreports/tables/pseudobulk_de_by_refined_state.csvreports/tables/pseudobulk_priority_gene_de.csvreports/tables/hallmark_pathway_enrichment.csvreports/tables/pathfindr_pseudobulk_run_summary.csvreports/tables/pathfindr_pseudobulk_reactome_enrichment.csvreports/tables/ranked_biomarker_target_candidates_translational.csvreports/tables/target_prioritization_scoring_components.csvreports/tables/target_prioritization_scoring_method.csv
Validation tables:
reports/tables/gse244832_focused_object_candidate_summary.csvreports/tables/validation_gse207310_candidate_lm_results.csvreports/tables/gse136103_blood_candidate_marker_role_summary.csvreports/tables/gse136103_mouse_candidate_ortholog_summary.csv
Core figures:
reports/figures/required_compartment_marker_dotplot.pngreports/figures/umap_refined_cell_states.pngreports/figures/pseudobulk_priority_gene_de.pngreports/figures/pathfindr_pseudobulk_reactome_barplot.pngreports/figures/pathfindr_pseudobulk_reactome_dotplot.pngreports/figures/ranked_candidate_scores.pngreports/figures/gse244832_focused_object_validation_heatmap.pngreports/figures/gse207310_candidate_validation_heatmap.pngreports/figures/gse136103_blood_candidate_marker_heatmap.pngreports/figures/gse136103_mouse_candidate_ortholog_heatmap.png
Tracked:
- code
- config
- metadata manifests
- demo dataset
- compact tables and figures
- dashboard-ready CSVs
- reports and documentation
Not tracked:
- raw GEO archives
- extracted validation matrices
- large Seurat objects
- logs
- private notes or assignment context
Large analysis objects should live in local ignored folders for this repo and in S3/EFS for AWS runs.
The target-evidence enrichment uses:
- MyGene.info for identifiers
- Open Targets for tractability and safety-liability annotations
- ClinicalTrials.gov for liver-context trial text matches
- ClinVar for gene-level clinical variant counts
- UniProt for protein localization, tissue specificity, and function
- PubMed for perturbation and safety literature signal
babelgenefor human-to-mouse orthology
These are triage layers. They do not replace donor-aware disease biology, validation data, protein localization, or perturbation experiments.
Local demo:
make nextflow-demoThe demo outputs a compact report plus QC, embedding, candidate DE, pathway-theme, and ranked-candidate artifacts under reports/nextflow_demo/. It is a small contract test for a dataset-independent workflow, not a substitute for the full Seurat run.
Direct Nextflow run:
export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"
nextflow run nextflow/fibrotarget_demo -profile local --outdir reports/nextflow_demoAWS pattern:
export NXF_WORK=s3://<bucket>/fibrotarget-liver/work
export PIPELINE_IMAGE=<account>.dkr.ecr.us-west-2.amazonaws.com/fibrotarget-liver:latest
nextflow run nextflow/fibrotarget_demo -profile aws \
--input s3://<bucket>/demo/demo_samplesheet.csv \
--outdir s3://<bucket>/results/fibrotarget-demoExpected production services:
- S3 for raw, processed, report, and dashboard outputs
- ECR for the analysis image
- AWS Batch or ECS for execution
- CloudWatch for logs
- Parameter Store or Secrets Manager for protected configuration
- Cell-level DE is exploratory; donor-level pseudobulk is the preferred inferential layer.
- GSE244832 has focused local validation; full all-gene reanalysis belongs on larger compute.
- GSE136103 mouse validation has one healthy and one fibrotic mouse sample, so it is a conservation screen, not a powered preclinical DE analysis.
- Macrophage candidates need a macrophage-focused external atlas and spatial validation before therapeutic nomination.