Genetic Heterogeneity of Early-Onset versus Late-Onset Alzheimer's Disease: From Polygenic Architecture to Cell-Type-Specific Mechanisms
This repository contains analysis code for a comprehensive investigation of the genetic heterogeneity between early-onset Alzheimer's disease (EOAD, onset < 65 years) and late-onset Alzheimer's disease (LOAD). The study integrates genome-wide association study (GWAS) summary statistics, spatial transcriptomics, transcriptome-wide association studies, polygenic risk scores, and multi-cohort clinical validation to characterize subtype-specific genetic architectures and their downstream biological consequences.
The central finding is a dissociation between EOAD and LOAD at multiple levels: EOAD exhibits an oligogenic architecture driven by the APOE locus with convergent effects on lipid metabolism and oligodendrocyte/myelination pathways, whereas LOAD displays a highly polygenic architecture dominated by microglial neuroinflammatory mechanisms. Cross-cohort validation in ADNI, A4, HABS, and AIBL confirms age-dependent expression of these genetic effects.
EOAD-LOAD-Genetics/
├── 01_Pathway_Discovery/
│ └── EOAD_LOAD_Pathway_Discovery.R
├── 02_PRS_Construction/
│ └── PRS_LDpred2_Analysis.R
├── 03_ADNI_Validation/
│ └── ADNI_Validation.R
├── 04_External_Cohort_Validation/
│ └── A4_HABS_AIBL_Validation.R
├── LICENSE
└── README.md
EOAD_LOAD_Pathway_Discovery.R
Gene-based and pathway-level dissection of EOAD versus LOAD genetic architecture:
- MAGMA gene-based association (v1.10): gene boundaries extended 35 kb upstream / 10 kb downstream, 1000 Genomes Phase 3 EUR LD reference, Bonferroni threshold P < 0.05/19,000 protein-coding genes
- Graham et al. glial module enrichment: one-sided Fisher's exact tests mapping GWAS-significant genes onto five curated microglial and oligodendrocyte transcriptional programs (Human Microglial AD-Significant, Human Oligodendrocyte AD-Significant, Mouse ARM, Mouse Phagolysosomal, Mouse Longevity)
- GSEA via clusterProfiler: genes ranked by MAGMA Z-statistics, GO-BP pathways, FDR < 0.05
- MAGMA gene-property analysis: nine curated cell-type marker gene sets (T-cell, microglia, oligodendrocyte, Aβ clearance, APP metabolism, myelination, astrocyte, neuron, endothelial)
- Conditional regression: testing T-cell signal independence after controlling for microglial gene expression
- LOAD downsampling validation: 50 independent iterations matching LOAD effective sample size to EOAD (N ≈ 1,573), re-executing MAGMA gene-based and gene-set analyses to distinguish power-dependent from architecture-dependent pathway enrichment
PRS_LDpred2_Analysis.R
Bayesian PRS construction, pathway-specific partitioning, and S-PrediXcan TWAS:
- S-PrediXcan TWAS: GTEx v8 prediction models across five brain regions (Cortex, Anterior Cingulate Cortex BA24, Frontal Cortex BA9, Hippocampus, Putamen basal ganglia); genetically predicted expression of 49 MAGMA-identified oligodendrocyte-myelin pathway genes; two-sided Wilcoxon rank-sum tests comparing median predicted Z-scores between oligodendrocyte and background genes, one-sided tests confirming directional effects, Levene's test assessing expression variability; LOAD and multivariate aging GWAS as controls to verify EOAD specificity
- LDpred2-auto: 30 MCMC chains, jointly estimating SNP heritability (h²) and polygenicity (p); chain bagging aggregating posterior effect estimates across converged chains; EOAD, LOAD, and multivariate aging summary statistics harmonised to GRCh37 coordinates and matched to HapMap3+ European LD reference panel
- Pathway-specific PRS: six gene sets (T-cell activation, activated microglia, Aβ clearance, APP metabolism, oligodendrocyte, myelination); SNPs within 100 kb of pathway gene boundaries retained with LDpred2 weights, all remaining SNPs masked to zero
- Cellular origin burden: percentage of total genetic risk attributable to each pathway
- APOE sensitivity: APOE-excluded PRS generated by masking variants within Chr19:44,000,000–46,000,000 bp (GRCh37); APOE contribution calculated as percentage reduction in pathway burden after masking
ADNI_Validation.R
Primary validation cohort analysis (N = 812):
- Pathway-specific PRS: six PRS computed by applying LDpred2 posterior weights to imputed genotypes, standardized to z-scores
- Neuroimaging phenotypes: FreeSurfer pipeline — total WMH volume, bilateral hippocampal volume, bilateral entorhinal cortical volume, all normalized to intracranial volume; 68 cortical and subcortical regions (Desikan–Killiany atlas)
- CSF biomarkers: AlzBio3 immunoassay platform (Aβ42, total tau, phosphorylated tau); sTREM2 for microglial activation assessment
- Linear regression: PRS associations with neuroimaging and CSF biomarkers, adjusting for age, sex, APOE ε4 allele count, and education
- Age stratification: younger (<70 years) versus older (≥70 years) subgroups; age × PRS interaction terms to evaluate age-dependent effects
- MCI-to-dementia conversion: logistic regression (N = 482 MCI, 190 conversions, median follow-up 4.0 years); age-dependent reversal of oligodendrocyte pathway genetic risk (age × PRS interaction OR = 0.54, P = 0.005)
- Sliding-window analysis: overlapping 10-year windows with 2-year increments across ages 55–85; Benjamini–Hochberg FDR correction across windows
- Unsupervised k-means clustering: three genetic subtypes (Background_Risk, Oligo_Driven, High_Aβ) from six pathway-specific PRS; hierarchical clustering achieving 78% concordance
- Regional brain atrophy: subtype-specific patterns across 68 cortical and subcortical regions, ggseg visualization
A4_HABS_AIBL_Validation.R
Cross-cohort mechanistic validation spanning preclinical to clinical AD. Only ADNI provided genome-wide genotyping for PRS computation; A4, HABS, and AIBL lacked genotyping data and therefore employed WMH as a phenotypic proxy for oligodendrocyte dysfunction.
- HABS (N = 1,490): phenotype-driven mediation analysis testing whether WMH exerts an indirect effect on cognition through p-tau217 (average causal mediation effect estimated via 5,000 bootstrap iterations, confirmed by structural equation modelling); age-stratified SEM (<75 versus ≥75 years); formal Age × p-tau217 interaction term on cognition
- A4 Study (N = 1,260): linear regression with heteroscedasticity-consistent standard errors (HC3) assessing WMH–cognition association (Preclinical Alzheimer Cognitive Composite), adjusting for age, sex, APOE ε4, education, and Centiloid amyloid burden
- AIBL (N = 408): Cox proportional hazards regression for conversion to dementia with APOE ε4 as primary predictor; clinical utility of WMH for predicting cognitive impairment evaluated using Firth-corrected logistic regression, AUC, net reclassification improvement, and decision curve analysis
- Participant flow assessment: quantification of potential selection bias for each cohort
- Cross-cohort forest plot visualization: standardized effect sizes with 95% CI
- R >= 4.2.0
# CRAN packages
install.packages(c(
"data.table", "dplyr", "tidyr", "ggplot2", "cowplot",
"survival", "survminer", "lme4", "lmerTest",
"cluster", "factoextra", "pheatmap", "corrplot",
"pROC", "lavaan", "logistf", "PRROC",
"sandwich", "lmtest", "mediation", "meta",
"ggseg", "flexsurv", "dcurves", "patchwork",
"viridis", "RColorBrewer"
))
# Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"bigsnpr", "clusterProfiler", "org.Hs.eg.db",
"org.Mm.eg.db", "biomaRt",
"TxDb.Hsapiens.UCSC.hg19.knownGene",
"GenomicRanges"
))| Dataset | Source | Sample Size | Population | PubMed ID | Access |
|---|---|---|---|---|---|
| EOAD | FinnGen R11 | Cases: 1,573; Controls: 199,505 | European (Finnish) | 36653562 | Download |
| LOAD | EADB Consortium | Cases: 85,934; Controls: 401,577 | European (Multi-country) | 35379992 | EBI GWAS (GCST90027158) |
| Aging | Timmers et al. | Healthspan: 300,477; Lifespan: 1,012,240; Longevity: 36,745 | European | 32678081, 31413261 | Edinburgh DataShare |
Early-Onset Alzheimer's Disease (EOAD)
- Definition: Age of onset < 65 years (ICD-10 code G30.0)
- Endpoint: AD_EO_EXMORE (strict exclusion criteria removing other dementia subtypes)
- Genotyping: Illumina Global Screening Array
- Imputation: SISu v3 Finnish reference panel
Late-Onset Alzheimer's Disease (LOAD)
- Composition: 39,106 clinically diagnosed + 46,828 proxy cases
- Diagnosis: NINCDS-ADRDA or DSM criteria
- Cohorts: 42 cohorts across Europe and North America
- SNPs: 21,101,114 SNPs post-QC
Multivariate Aging
- Healthspan: Years free from major age-related diseases
- Parental lifespan: Age at death or current age
- Longevity: Survival to 90th percentile vs 60th percentile
| Tract | UKB Field | Sample Size | Description |
|---|---|---|---|
| Corpus Callosum Body | 25059 | 33,224 | Primary inter-hemispheric motor/premotor tract |
| Cingulum Hippocampus (R) | 25092 | 33,224 | Limbic tract connecting cingulate-entorhinal-hippocampus |
| Uncinate Fasciculus (R) | 25100 | 33,224 | Temporal-orbitofrontal connection for semantic memory |
| Fornix | 25061 | 33,224 | Primary hippocampal efferent pathway (Papez circuit) |
- Phenotype: Mean fractional anisotropy (FA) derived from dMRI processed using tract-based spatial statistics (TBSS)
- PubMed ID: 30305740
- Access: UK Biobank BIG40
| Dataset | Region | Source | Spatial Units | Clusters | Access |
|---|---|---|---|---|---|
| LIBD Spatial Atlas | Anterior Hippocampus | Lieber Institute for Brain Development | 4,992 | 9 unsupervised transcriptomic clusters | GEO GSE264692 |
| Maynard et al. | Dorsolateral Prefrontal Cortex (DLPFC) | Sample 151673 | 3,639 | 8 layer-resolved clusters | spatialLIBD |
- Platform: 10X Visium spatial transcriptomics
| Dataset | Description | Source |
|---|---|---|
| MOSTA Adult Mouse Brain | Hippocampal subfields, cortical layers, white matter tracts | MOSTA |
| MOSTA Embryonic Mouse Brain (E16.5) | Developmental control | MOSTA |
| Cohort | N | Description | Access |
|---|---|---|---|
| ADNI | 812 | Genome-wide genotyping, structural MRI, CSF biomarkers, longitudinal follow-up | adni.loni.usc.edu |
| A4 Study | 1,260 | Cognitively normal adults with elevated amyloid; Centiloid scale | ida.loni.usc.edu |
| HABS | 1,490 | Multimodal imaging, plasma p-tau217 (ALZpath Simoa v2), WMH | habs.mgh.harvard.edu |
| AIBL | 408 | Non-North American replication, longitudinal conversion | aibl.csiro.au |
- HapMap3+ EUR variants from the UK Biobank (Privé et al. 2022)
- 1000 Genomes Phase 3 European panel (~1.2 million HapMap Phase 3 SNPs for LDSC)
- Available via the bigsnpr R package
Computational workflows employed publicly available implementations:
| Tool | Version/URL | Purpose |
|---|---|---|
| LDSC | GitHub | Cross-trait genetic correlation and SNP heritability |
| HESS | GitHub | Local SNP heritability partitioning across 1,703 LD-independent regions |
| MAGMA | v1.10, Website | Gene-based association and gene-property analysis |
| LAVA | GitHub | Local genetic correlation across ~2,495 independent loci |
| conjFDR | GitHub | Conditional/conjunctional FDR for pleiotropic variant discovery |
| FUMA | v1.5.4, Web | Functional mapping and annotation (CADD, RegulomeDB, chromatin states) |
| gsMap | GitHub | Spatial transcriptomic enrichment mapping (Cauchy combination test) |
| S-PrediXcan | GitHub | Transcriptome-wide association with GTEx v8 prediction models |
| LDpred2 | via bigsnpr R package | Bayesian PRS construction (LDpred2-auto) |
| g:Profiler | Web | GO/KEGG pathway enrichment of pleiotropic gene sets |
- All scripts assume GWAS summary statistics have been downloaded and formatted with columns:
chr,pos,a0,a1,beta,beta_se,p,n_eff. - ADNI, A4, HABS, and AIBL data require approved data use agreements from their respective consortia.
- File paths in the scripts use placeholder variables (e.g.,
base_dir,data_dir) that should be modified to match your local directory structure. - LDpred2 chromosome-wise SFBM computation is memory-intensive; 32 GB RAM recommended.
- MAGMA gene-based analysis requires pre-computed annotation files (
.genes.annot) generated from the 1000 Genomes Phase 3 EUR reference panel. - S-PrediXcan requires GTEx v8 prediction model databases (
.dbfiles) and covariance files for each brain tissue. - gsMap requires spatial transcriptomic data in standard 10X Visium format; human gene symbols are converted to mouse orthologs via Ensembl BioMart for cross-species analyses.
This project is licensed under the MIT License — see the LICENSE file for details.
For questions regarding the code, please open an issue or contact the corresponding author.
Data collection and sharing for ADNI was funded by the Alzheimer's Disease Neuroimaging Initiative (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). The A4 Study is a secondary prevention trial in preclinical Alzheimer's disease funded by Eli Lilly and Company, the Alzheimer's Association, and the National Institute on Aging. HABS is funded by the National Institute on Aging (P01 AG036694). AIBL is funded by the CSIRO, the Science and Industry Endowment Fund, and the National Health and Medical Research Council of Australia.
We thank the FinnGen Consortium, the European Alzheimer & Dementia Biobank (EADB) Consortium, and the UK Biobank for providing access to GWAS summary statistics. We thank all participants and their families for their contributions to research.