Skip to content

vshi2316/EOAD-LOAD-Genetics

Repository files navigation

Genetic Heterogeneity of Early-Onset versus Late-Onset Alzheimer's Disease: From Polygenic Architecture to Cell-Type-Specific Mechanisms

License: MIT

Overview

This repository contains analysis code for a comprehensive investigation of the genetic heterogeneity between early-onset Alzheimer's disease (EOAD, onset < 65 years) and late-onset Alzheimer's disease (LOAD). The study integrates genome-wide association study (GWAS) summary statistics, spatial transcriptomics, transcriptome-wide association studies, polygenic risk scores, and multi-cohort clinical validation to characterize subtype-specific genetic architectures and their downstream biological consequences.

The central finding is a dissociation between EOAD and LOAD at multiple levels: EOAD exhibits an oligogenic architecture driven by the APOE locus with convergent effects on lipid metabolism and oligodendrocyte/myelination pathways, whereas LOAD displays a highly polygenic architecture dominated by microglial neuroinflammatory mechanisms. Cross-cohort validation in ADNI, A4, HABS, and AIBL confirms age-dependent expression of these genetic effects.

Repository Structure

EOAD-LOAD-Genetics/
├── 01_Pathway_Discovery/
│   └── EOAD_LOAD_Pathway_Discovery.R
├── 02_PRS_Construction/
│   └── PRS_LDpred2_Analysis.R
├── 03_ADNI_Validation/
│   └── ADNI_Validation.R
├── 04_External_Cohort_Validation/
│   └── A4_HABS_AIBL_Validation.R
├── LICENSE
└── README.md

Analysis Modules

1. Data-Driven Pathway Discovery (01_Pathway_Discovery/)

EOAD_LOAD_Pathway_Discovery.R

Gene-based and pathway-level dissection of EOAD versus LOAD genetic architecture:

  • MAGMA gene-based association (v1.10): gene boundaries extended 35 kb upstream / 10 kb downstream, 1000 Genomes Phase 3 EUR LD reference, Bonferroni threshold P < 0.05/19,000 protein-coding genes
  • Graham et al. glial module enrichment: one-sided Fisher's exact tests mapping GWAS-significant genes onto five curated microglial and oligodendrocyte transcriptional programs (Human Microglial AD-Significant, Human Oligodendrocyte AD-Significant, Mouse ARM, Mouse Phagolysosomal, Mouse Longevity)
  • GSEA via clusterProfiler: genes ranked by MAGMA Z-statistics, GO-BP pathways, FDR < 0.05
  • MAGMA gene-property analysis: nine curated cell-type marker gene sets (T-cell, microglia, oligodendrocyte, Aβ clearance, APP metabolism, myelination, astrocyte, neuron, endothelial)
  • Conditional regression: testing T-cell signal independence after controlling for microglial gene expression
  • LOAD downsampling validation: 50 independent iterations matching LOAD effective sample size to EOAD (N ≈ 1,573), re-executing MAGMA gene-based and gene-set analyses to distinguish power-dependent from architecture-dependent pathway enrichment

2. Polygenic Risk Score Construction and Transcriptome-Wide Association (02_PRS_Construction/)

PRS_LDpred2_Analysis.R

Bayesian PRS construction, pathway-specific partitioning, and S-PrediXcan TWAS:

  • S-PrediXcan TWAS: GTEx v8 prediction models across five brain regions (Cortex, Anterior Cingulate Cortex BA24, Frontal Cortex BA9, Hippocampus, Putamen basal ganglia); genetically predicted expression of 49 MAGMA-identified oligodendrocyte-myelin pathway genes; two-sided Wilcoxon rank-sum tests comparing median predicted Z-scores between oligodendrocyte and background genes, one-sided tests confirming directional effects, Levene's test assessing expression variability; LOAD and multivariate aging GWAS as controls to verify EOAD specificity
  • LDpred2-auto: 30 MCMC chains, jointly estimating SNP heritability (h²) and polygenicity (p); chain bagging aggregating posterior effect estimates across converged chains; EOAD, LOAD, and multivariate aging summary statistics harmonised to GRCh37 coordinates and matched to HapMap3+ European LD reference panel
  • Pathway-specific PRS: six gene sets (T-cell activation, activated microglia, Aβ clearance, APP metabolism, oligodendrocyte, myelination); SNPs within 100 kb of pathway gene boundaries retained with LDpred2 weights, all remaining SNPs masked to zero
  • Cellular origin burden: percentage of total genetic risk attributable to each pathway
  • APOE sensitivity: APOE-excluded PRS generated by masking variants within Chr19:44,000,000–46,000,000 bp (GRCh37); APOE contribution calculated as percentage reduction in pathway burden after masking

3. ADNI Validation (03_ADNI_Validation/)

ADNI_Validation.R

Primary validation cohort analysis (N = 812):

  • Pathway-specific PRS: six PRS computed by applying LDpred2 posterior weights to imputed genotypes, standardized to z-scores
  • Neuroimaging phenotypes: FreeSurfer pipeline — total WMH volume, bilateral hippocampal volume, bilateral entorhinal cortical volume, all normalized to intracranial volume; 68 cortical and subcortical regions (Desikan–Killiany atlas)
  • CSF biomarkers: AlzBio3 immunoassay platform (Aβ42, total tau, phosphorylated tau); sTREM2 for microglial activation assessment
  • Linear regression: PRS associations with neuroimaging and CSF biomarkers, adjusting for age, sex, APOE ε4 allele count, and education
  • Age stratification: younger (<70 years) versus older (≥70 years) subgroups; age × PRS interaction terms to evaluate age-dependent effects
  • MCI-to-dementia conversion: logistic regression (N = 482 MCI, 190 conversions, median follow-up 4.0 years); age-dependent reversal of oligodendrocyte pathway genetic risk (age × PRS interaction OR = 0.54, P = 0.005)
  • Sliding-window analysis: overlapping 10-year windows with 2-year increments across ages 55–85; Benjamini–Hochberg FDR correction across windows
  • Unsupervised k-means clustering: three genetic subtypes (Background_Risk, Oligo_Driven, High_Aβ) from six pathway-specific PRS; hierarchical clustering achieving 78% concordance
  • Regional brain atrophy: subtype-specific patterns across 68 cortical and subcortical regions, ggseg visualization

4. External Cohort Validation (04_External_Cohort_Validation/)

A4_HABS_AIBL_Validation.R

Cross-cohort mechanistic validation spanning preclinical to clinical AD. Only ADNI provided genome-wide genotyping for PRS computation; A4, HABS, and AIBL lacked genotyping data and therefore employed WMH as a phenotypic proxy for oligodendrocyte dysfunction.

  • HABS (N = 1,490): phenotype-driven mediation analysis testing whether WMH exerts an indirect effect on cognition through p-tau217 (average causal mediation effect estimated via 5,000 bootstrap iterations, confirmed by structural equation modelling); age-stratified SEM (<75 versus ≥75 years); formal Age × p-tau217 interaction term on cognition
  • A4 Study (N = 1,260): linear regression with heteroscedasticity-consistent standard errors (HC3) assessing WMH–cognition association (Preclinical Alzheimer Cognitive Composite), adjusting for age, sex, APOE ε4, education, and Centiloid amyloid burden
  • AIBL (N = 408): Cox proportional hazards regression for conversion to dementia with APOE ε4 as primary predictor; clinical utility of WMH for predicting cognitive impairment evaluated using Firth-corrected logistic regression, AUC, net reclassification improvement, and decision curve analysis
  • Participant flow assessment: quantification of potential selection bias for each cohort
  • Cross-cohort forest plot visualization: standardized effect sizes with 95% CI

Requirements

R Version

  • R >= 4.2.0

R Packages

# CRAN packages
install.packages(c(
  "data.table", "dplyr", "tidyr", "ggplot2", "cowplot",
  "survival", "survminer", "lme4", "lmerTest",
  "cluster", "factoextra", "pheatmap", "corrplot",
  "pROC", "lavaan", "logistf", "PRROC",
  "sandwich", "lmtest", "mediation", "meta",
  "ggseg", "flexsurv", "dcurves", "patchwork",
  "viridis", "RColorBrewer"
))

# Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c(
  "bigsnpr", "clusterProfiler", "org.Hs.eg.db",
  "org.Mm.eg.db", "biomaRt",
  "TxDb.Hsapiens.UCSC.hg19.knownGene",
  "GenomicRanges"
))

Data Availability

GWAS Summary Statistics

Dataset Source Sample Size Population PubMed ID Access
EOAD FinnGen R11 Cases: 1,573; Controls: 199,505 European (Finnish) 36653562 Download
LOAD EADB Consortium Cases: 85,934; Controls: 401,577 European (Multi-country) 35379992 EBI GWAS (GCST90027158)
Aging Timmers et al. Healthspan: 300,477; Lifespan: 1,012,240; Longevity: 36,745 European 32678081, 31413261 Edinburgh DataShare

Dataset Details

Early-Onset Alzheimer's Disease (EOAD)

  • Definition: Age of onset < 65 years (ICD-10 code G30.0)
  • Endpoint: AD_EO_EXMORE (strict exclusion criteria removing other dementia subtypes)
  • Genotyping: Illumina Global Screening Array
  • Imputation: SISu v3 Finnish reference panel

Late-Onset Alzheimer's Disease (LOAD)

  • Composition: 39,106 clinically diagnosed + 46,828 proxy cases
  • Diagnosis: NINCDS-ADRDA or DSM criteria
  • Cohorts: 42 cohorts across Europe and North America
  • SNPs: 21,101,114 SNPs post-QC

Multivariate Aging

  • Healthspan: Years free from major age-related diseases
  • Parental lifespan: Age at death or current age
  • Longevity: Survival to 90th percentile vs 60th percentile

White Matter Microstructure (UK Biobank BIG40)

Tract UKB Field Sample Size Description
Corpus Callosum Body 25059 33,224 Primary inter-hemispheric motor/premotor tract
Cingulum Hippocampus (R) 25092 33,224 Limbic tract connecting cingulate-entorhinal-hippocampus
Uncinate Fasciculus (R) 25100 33,224 Temporal-orbitofrontal connection for semantic memory
Fornix 25061 33,224 Primary hippocampal efferent pathway (Papez circuit)
  • Phenotype: Mean fractional anisotropy (FA) derived from dMRI processed using tract-based spatial statistics (TBSS)
  • PubMed ID: 30305740
  • Access: UK Biobank BIG40

Human Spatial Transcriptomic Datasets

Dataset Region Source Spatial Units Clusters Access
LIBD Spatial Atlas Anterior Hippocampus Lieber Institute for Brain Development 4,992 9 unsupervised transcriptomic clusters GEO GSE264692
Maynard et al. Dorsolateral Prefrontal Cortex (DLPFC) Sample 151673 3,639 8 layer-resolved clusters spatialLIBD
  • Platform: 10X Visium spatial transcriptomics

Mouse Spatial Transcriptomic Datasets (Cross-Species Validation)

Dataset Description Source
MOSTA Adult Mouse Brain Hippocampal subfields, cortical layers, white matter tracts MOSTA
MOSTA Embryonic Mouse Brain (E16.5) Developmental control MOSTA

Validation Cohorts

Cohort N Description Access
ADNI 812 Genome-wide genotyping, structural MRI, CSF biomarkers, longitudinal follow-up adni.loni.usc.edu
A4 Study 1,260 Cognitively normal adults with elevated amyloid; Centiloid scale ida.loni.usc.edu
HABS 1,490 Multimodal imaging, plasma p-tau217 (ALZpath Simoa v2), WMH habs.mgh.harvard.edu
AIBL 408 Non-North American replication, longitudinal conversion aibl.csiro.au

LD Reference Panel

  • HapMap3+ EUR variants from the UK Biobank (Privé et al. 2022)
  • 1000 Genomes Phase 3 European panel (~1.2 million HapMap Phase 3 SNPs for LDSC)
  • Available via the bigsnpr R package

External Tools

Computational workflows employed publicly available implementations:

Tool Version/URL Purpose
LDSC GitHub Cross-trait genetic correlation and SNP heritability
HESS GitHub Local SNP heritability partitioning across 1,703 LD-independent regions
MAGMA v1.10, Website Gene-based association and gene-property analysis
LAVA GitHub Local genetic correlation across ~2,495 independent loci
conjFDR GitHub Conditional/conjunctional FDR for pleiotropic variant discovery
FUMA v1.5.4, Web Functional mapping and annotation (CADD, RegulomeDB, chromatin states)
gsMap GitHub Spatial transcriptomic enrichment mapping (Cauchy combination test)
S-PrediXcan GitHub Transcriptome-wide association with GTEx v8 prediction models
LDpred2 via bigsnpr R package Bayesian PRS construction (LDpred2-auto)
g:Profiler Web GO/KEGG pathway enrichment of pleiotropic gene sets

Reproducibility Notes

  1. All scripts assume GWAS summary statistics have been downloaded and formatted with columns: chr, pos, a0, a1, beta, beta_se, p, n_eff.
  2. ADNI, A4, HABS, and AIBL data require approved data use agreements from their respective consortia.
  3. File paths in the scripts use placeholder variables (e.g., base_dir, data_dir) that should be modified to match your local directory structure.
  4. LDpred2 chromosome-wise SFBM computation is memory-intensive; 32 GB RAM recommended.
  5. MAGMA gene-based analysis requires pre-computed annotation files (.genes.annot) generated from the 1000 Genomes Phase 3 EUR reference panel.
  6. S-PrediXcan requires GTEx v8 prediction model databases (.db files) and covariance files for each brain tissue.
  7. gsMap requires spatial transcriptomic data in standard 10X Visium format; human gene symbols are converted to mouse orthologs via Ensembl BioMart for cross-species analyses.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Contact

For questions regarding the code, please open an issue or contact the corresponding author.

Acknowledgments

Data collection and sharing for ADNI was funded by the Alzheimer's Disease Neuroimaging Initiative (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). The A4 Study is a secondary prevention trial in preclinical Alzheimer's disease funded by Eli Lilly and Company, the Alzheimer's Association, and the National Institute on Aging. HABS is funded by the National Institute on Aging (P01 AG036694). AIBL is funded by the CSIRO, the Science and Industry Endowment Fund, and the National Health and Medical Research Council of Australia.

We thank the FinnGen Consortium, the European Alzheimer & Dementia Biobank (EADB) Consortium, and the UK Biobank for providing access to GWAS summary statistics. We thank all participants and their families for their contributions to research.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages