This repository contains the complete code and analysis for the comparative aging study across multiple species "A damage accumulation model reveals strategies of aging across species". It includes simulation tools, Bayesian MCMC analysis, figure generation code, and dataset preparation workflows.
Before running any analysis, you must first set up the repository and download the required data files.
Run the setup script to install dependencies and download posterior distribution data:
./setup.shOr manually:
# Install SRtools package and requirements
cd SRtools
pip install -r requirements.txt
pip install -e .
cd ..
# Download posterior distribution data
python download_posterior_data.pyThe setup script will:
- Install the
SRtoolspackage and all required dependencies - Download posterior distribution files from Zenodo to the appropriate directories
Note: The posterior distribution files are large (~851 MB) and are stored in git-ignored directories. They will be automatically downloaded to:
SI_11_and_SHAP_analysis/posteriors/Baysian03/analysis/products_baysian/posteriors/
The Datasets_preperation/ folder contains all raw and processed datasets used in this study, along with processing notebooks that document the data cleaning and preparation steps.
Contents:
Rawfiles/: Original raw data files from various sourcescleaned_datasets/: Processed datasets ready for analysisCleanup_notebooks/: Jupyter notebooks documenting the data processing pipeline for each datasetLifetables/: Life table data for human populations
Each cleanup notebook includes detailed documentation explaining the data processing steps, transformations applied, and any filtering or quality control measures taken.
Important Note on Yeast and C. elegans Data: Synthetic datasets (Yeast_ds.csv and Celegans_ds.csv) are included in the Figures/datasets/ directory for code execution purposes. These are NOT the real datasets - they are synthetic placeholder datasets. The original real datasets for yeast and C. elegans cannot be shared in this repository as we do not own them and require permission from the data owners. The synthetic datasets allow the figure generation code to run without errors, but results using these synthetic data should not be interpreted as representing the actual yeast or C. elegans analyses from the publication.
Raw and processed versions will be made available separately upon approval from the data owners.
The SRtools/ package contains the core simulation and analysis tools used throughout this project. This package includes:
- Simulation tools: Code for running survival/mortality simulations using the SR model
- Analysis tools: Statistical analysis functions for mortality data
- MCMC tools: Bayesian MCMC sampling and posterior analysis utilities
- Visualization utilities: Plotting functions for figures and analysis
The package is installed as a Python package and can be imported as:
import SRtoolsSee SRtools/README.md for detailed documentation of the package components.
The Baysian03/ folder contains the complete results of all MCMC runs and their analysis.
Key components:
analysis/: Full analysis notebooks for each MCMC run, including:- Posterior distribution analysis
- Parameter estimation results
- Likelihood statistics
- Diagnostic plots
datasets/: Processed datasets used for MCMC analysisconfigurations_baysian.xlsx: Configuration file specifying all MCMC run parametersrun_*.py: Scripts for running MCMC analyses
Each analysis notebook in the analysis/ folder provides a complete workflow for a specific dataset, including data loading, model fitting, posterior sampling, and result visualization.
The Figures/ folder contains Jupyter notebooks with code to reproduce all main and supplementary figures from the publication.
Figure notebooks:
Fig_2_datasets_vs_sim.ipynb: Figure 2 - Comparison of datasets with simulations, and supplementary figure S2FIg_3_production_vs_LS.ipynb: Figure 3a - Production vs. lifespan analysisFig_3_balistic_vs_ss.ipynb: Figure 3c - Ballistic vs. steady-state comparisonFig_4_invariants_in_mammals.ipynb: Figure 4 - Invariant relationships in mammalsFig_5_Yeast.ipynb: Figure 5 - Yeast analysisFig_6_dimensionlessgroups.ipynb: Figure 6 - Dimensionless group analysisSuplementary_Fig_3_all params_and_trends.ipynb: Supplementary Figure S3Supplementary_Fig_4_Weibull_and_Gompertz_fits.ipynb: Supplementary Figure S4
Note: Figure 3b is generated by SI_11_and_SHAP_analysis/shap_analysis_4.py. Supplementary Figures S9-S10 are generated in the SI_11_and_SHAP_analysis/ folder.
Important Note on Yeast and C. elegans Datasets: Some figure notebooks (Fig_2_datasets_vs_sim.ipynb, Supplementary_Fig_4_Weibull_and_Gompertz_fits.ipynb) require datasets for yeast and C. elegans. Synthetic placeholder datasets (Yeast_ds.csv and Celegans_ds.csv) are included in the Figures/datasets/ directory to allow the code to run. These synthetic datasets are NOT the real data - the original datasets cannot be shared as we do not own them and require permission from the data owners. Results using these synthetic datasets should not be interpreted as representing the actual yeast or C. elegans analyses from the publication.
Additional notebooks:
QSS_explanation_figure.ipynb: Explanation figure for quasi-steady-stateparams_vs_LS.ipynb: Parameter vs. lifespan analysis
Results folder (results/):
The results/ folder contains three types of parameter estimation tables with full parameter estimates for all datasets:
-
summery_max_likelihood.csv: Contains the single sample from each MCMC run with the highest likelihood. This represents the best-fit parameter set based on maximum likelihood. -
summery_mode_overall.csv: After binning the MCMC samples (with averaging of likelihoods that fall in the same bin), this contains the sample with the highest likelihood within the mode bin (the bin with highest posterior probability). Eithersummery_max_likelihood.csvorsummery_mode_overall.csvshould be used for simulations, as they contain complete parameter sets. -
summery_mode.csv: Contains the modes (highest probabilities) for marginalized posterior distributions over different parameters and parameter groups, including 95% confidence intervals. Important: Since these are marginalized distributions, the values represent modes of individual parameters rather than a coherent parameter set. Therefore, this file should not be used for simulations, but is useful for understanding the distribution of individual parameters and their uncertainties. These are the values in tables 2,3 in the paper.
The Different_Noises/ folder contains code for Supplementary Information Figure 9, which analyzes the effects of different noise types on the model results.
Contents:
SR_noises.py: Core noise analysis functionsNoise_tests.ipynb: Notebook running noise sensitivity analysis- Generated plots showing noise effects
The SI_11_and_SHAP_analysis/ folder contains code for SHAP analysis, ANOVA variance decomposition, and single parameter substitution tests used in Figure 3b, Supplementary Figures S9-S10, and Supplementary Tables 8-12.
Contents:
-
shap_analysis_4.py: Python script for SHAP (SHapley Additive exPlanations) analysis- Generates Figure 3b: SHAP analysis visualization
- Generates Supplementary Figure S9: SHAP analysis results
-
random_sampling_ANOVA_type_I.ipynb: Jupyter notebook for ANOVA Type I analysis- Generates Supplementary Tables 9-12: ANOVA for best fits, number of parameter sets used for ANOVA validation, and 1000 random parameter sets validation of ANOVA Type I and Type III
-
random_sampling_ANOVA_type_III.ipynb: Jupyter notebook for ANOVA Type III analysis- Generates Supplementary Tables 9-12: ANOVA for best fits, number of parameter sets used for ANOVA validation, and 1000 random parameter sets validation of ANOVA Type I and Type III
-
Single_parameter_substitution_test.ipynb: Jupyter notebook for single parameter substitution analysis- Generates Supplementary Figure S10: Single parameter substitution test visualization
- Generates Supplementary Table 8: Single parameter substitution test results
-
download_posterior_data.py: Script to download posterior data (also available in root) -
summery_mode_no_CI.csv: Summary statistics used for the best fit values -
SHAP_outputs/: Directory containing SHAP analysis output files
This analysis identifies which model parameters (eta, beta, epsilon, xc) contribute most to explaining variance in median lifetimes across species using multiple complementary approaches: SHAP analysis, ANOVA variance decomposition, and parameter substitution tests.
The performence tests/ folder contains Excel files documenting all test runs and their configurations. Some of this analysis is presented in SI 6.
Contents:
configurations_for_tests.xlsx: Full configuration specifications for all test runssummery_of_error_analysis.xlsx: Summary of error analysis results
These files provide complete documentation of the testing methodology and results used to validate the analysis pipeline.
All Python package requirements are specified in SRtools/requirements.txt. The main dependencies include:
- numpy, pandas, scipy
- matplotlib, seaborn, plotly
- emcee (MCMC sampling)
- jupyter, ipykernel
- lifelines (survival analysis)
- corner (posterior visualization)
- And others (see
SRtools/requirements.txtfor complete list)
Install all requirements using:
cd SRtools
pip install -r requirements.txt
pip install -e .Posterior distribution files are available from Zenodo: 10.5281/zenodo.17804233
The download script (download_posterior_data.py) automatically retrieves these files during setup.
If you use this code, please cite the associated publication (citation to be added upon publication).
[Add license information]
[Add]