Comparative Aging Analysis V3

This repository contains the complete code and analysis for the comparative aging study across multiple species "A damage accumulation model reveals strategies of aging across species". It includes simulation tools, Bayesian MCMC analysis, figure generation code, and dataset preparation workflows.

Quick Start

Before running any analysis, you must first set up the repository and download the required data files.

Run the setup script to install dependencies and download posterior distribution data:

./setup.sh

Or manually:

# Install SRtools package and requirements
cd SRtools
pip install -r requirements.txt
pip install -e .
cd ..

# Download posterior distribution data
python download_posterior_data.py

The setup script will:

Install the SRtools package and all required dependencies
Download posterior distribution files from Zenodo to the appropriate directories

Note: The posterior distribution files are large (~851 MB) and are stored in git-ignored directories. They will be automatically downloaded to:

SI_11_and_SHAP_analysis/posteriors/
Baysian03/analysis/products_baysian/posteriors/

Repository Structure

Datasets Preparation

The Datasets_preperation/ folder contains all raw and processed datasets used in this study, along with processing notebooks that document the data cleaning and preparation steps.

Contents:

Rawfiles/: Original raw data files from various sources
cleaned_datasets/: Processed datasets ready for analysis
Cleanup_notebooks/: Jupyter notebooks documenting the data processing pipeline for each dataset
Lifetables/: Life table data for human populations

Each cleanup notebook includes detailed documentation explaining the data processing steps, transformations applied, and any filtering or quality control measures taken.

Important Note on Yeast and C. elegans Data: Synthetic datasets (Yeast_ds.csv and Celegans_ds.csv) are included in the Figures/datasets/ directory for code execution purposes. These are NOT the real datasets - they are synthetic placeholder datasets. The original real datasets for yeast and C. elegans cannot be shared in this repository as we do not own them and require permission from the data owners. The synthetic datasets allow the figure generation code to run without errors, but results using these synthetic data should not be interpreted as representing the actual yeast or C. elegans analyses from the publication. Raw and processed versions will be made available separately upon approval from the data owners.

SRtools

The SRtools/ package contains the core simulation and analysis tools used throughout this project. This package includes:

Simulation tools: Code for running survival/mortality simulations using the SR model
Analysis tools: Statistical analysis functions for mortality data
MCMC tools: Bayesian MCMC sampling and posterior analysis utilities
Visualization utilities: Plotting functions for figures and analysis

The package is installed as a Python package and can be imported as:

import SRtools

See SRtools/README.md for detailed documentation of the package components.

Bayesian Analysis

The Baysian03/ folder contains the complete results of all MCMC runs and their analysis.

Key components:

analysis/: Full analysis notebooks for each MCMC run, including:
- Posterior distribution analysis
- Parameter estimation results
- Likelihood statistics
- Diagnostic plots
datasets/: Processed datasets used for MCMC analysis
configurations_baysian.xlsx: Configuration file specifying all MCMC run parameters
run_*.py: Scripts for running MCMC analyses

Each analysis notebook in the analysis/ folder provides a complete workflow for a specific dataset, including data loading, model fitting, posterior sampling, and result visualization.

Figures

The Figures/ folder contains Jupyter notebooks with code to reproduce all main and supplementary figures from the publication.

Figure notebooks:

Fig_2_datasets_vs_sim.ipynb: Figure 2 - Comparison of datasets with simulations, and supplementary figure S2
FIg_3_production_vs_LS.ipynb: Figure 3a - Production vs. lifespan analysis
Fig_3_balistic_vs_ss.ipynb: Figure 3c - Ballistic vs. steady-state comparison
Fig_4_invariants_in_mammals.ipynb: Figure 4 - Invariant relationships in mammals
Fig_5_Yeast.ipynb: Figure 5 - Yeast analysis
Fig_6_dimensionlessgroups.ipynb: Figure 6 - Dimensionless group analysis
Suplementary_Fig_3_all params_and_trends.ipynb: Supplementary Figure S3
Supplementary_Fig_4_Weibull_and_Gompertz_fits.ipynb: Supplementary Figure S4

Note: Figure 3b is generated by SI_11_and_SHAP_analysis/shap_analysis_4.py. Supplementary Figures S9-S10 are generated in the SI_11_and_SHAP_analysis/ folder.

Important Note on Yeast and C. elegans Datasets: Some figure notebooks (Fig_2_datasets_vs_sim.ipynb, Supplementary_Fig_4_Weibull_and_Gompertz_fits.ipynb) require datasets for yeast and C. elegans. Synthetic placeholder datasets (Yeast_ds.csv and Celegans_ds.csv) are included in the Figures/datasets/ directory to allow the code to run. These synthetic datasets are NOT the real data - the original datasets cannot be shared as we do not own them and require permission from the data owners. Results using these synthetic datasets should not be interpreted as representing the actual yeast or C. elegans analyses from the publication.

Additional notebooks:

QSS_explanation_figure.ipynb: Explanation figure for quasi-steady-state
params_vs_LS.ipynb: Parameter vs. lifespan analysis

Results folder (results/): The results/ folder contains three types of parameter estimation tables with full parameter estimates for all datasets:

summery_max_likelihood.csv: Contains the single sample from each MCMC run with the highest likelihood. This represents the best-fit parameter set based on maximum likelihood.
summery_mode_overall.csv: After binning the MCMC samples (with averaging of likelihoods that fall in the same bin), this contains the sample with the highest likelihood within the mode bin (the bin with highest posterior probability). Either summery_max_likelihood.csv or summery_mode_overall.csv should be used for simulations, as they contain complete parameter sets.
summery_mode.csv: Contains the modes (highest probabilities) for marginalized posterior distributions over different parameters and parameter groups, including 95% confidence intervals. Important: Since these are marginalized distributions, the values represent modes of individual parameters rather than a coherent parameter set. Therefore, this file should not be used for simulations, but is useful for understanding the distribution of individual parameters and their uncertainties. These are the values in tables 2,3 in the paper.

Different Noises

The Different_Noises/ folder contains code for Supplementary Information Figure 9, which analyzes the effects of different noise types on the model results.

Contents:

SR_noises.py: Core noise analysis functions
Noise_tests.ipynb: Notebook running noise sensitivity analysis
Generated plots showing noise effects

SI 11 and SHAP Analysis

The SI_11_and_SHAP_analysis/ folder contains code for SHAP analysis, ANOVA variance decomposition, and single parameter substitution tests used in Figure 3b, Supplementary Figures S9-S10, and Supplementary Tables 8-12.

Contents:

shap_analysis_4.py: Python script for SHAP (SHapley Additive exPlanations) analysis
- Generates Figure 3b: SHAP analysis visualization
- Generates Supplementary Figure S9: SHAP analysis results
random_sampling_ANOVA_type_I.ipynb: Jupyter notebook for ANOVA Type I analysis
- Generates Supplementary Tables 9-12: ANOVA for best fits, number of parameter sets used for ANOVA validation, and 1000 random parameter sets validation of ANOVA Type I and Type III
random_sampling_ANOVA_type_III.ipynb: Jupyter notebook for ANOVA Type III analysis
- Generates Supplementary Tables 9-12: ANOVA for best fits, number of parameter sets used for ANOVA validation, and 1000 random parameter sets validation of ANOVA Type I and Type III
Single_parameter_substitution_test.ipynb: Jupyter notebook for single parameter substitution analysis
- Generates Supplementary Figure S10: Single parameter substitution test visualization
- Generates Supplementary Table 8: Single parameter substitution test results
download_posterior_data.py: Script to download posterior data (also available in root)
summery_mode_no_CI.csv: Summary statistics used for the best fit values
SHAP_outputs/: Directory containing SHAP analysis output files

This analysis identifies which model parameters (eta, beta, epsilon, xc) contribute most to explaining variance in median lifetimes across species using multiple complementary approaches: SHAP analysis, ANOVA variance decomposition, and parameter substitution tests.

Performance Tests

The performence tests/ folder contains Excel files documenting all test runs and their configurations. Some of this analysis is presented in SI 6.

Contents:

configurations_for_tests.xlsx: Full configuration specifications for all test runs
summery_of_error_analysis.xlsx: Summary of error analysis results

These files provide complete documentation of the testing methodology and results used to validate the analysis pipeline.

Requirements

All Python package requirements are specified in SRtools/requirements.txt. The main dependencies include:

numpy, pandas, scipy
matplotlib, seaborn, plotly
emcee (MCMC sampling)
jupyter, ipykernel
lifelines (survival analysis)
corner (posterior visualization)
And others (see SRtools/requirements.txt for complete list)

Install all requirements using:

cd SRtools
pip install -r requirements.txt
pip install -e .

Data Availability

Posterior distribution files are available from Zenodo: 10.5281/zenodo.17804233

The download script (download_posterior_data.py) automatically retrieves these files during setup.

Citation

If you use this code, please cite the associated publication (citation to be added upon publication).

License

[Add license information]

Contact

[Add]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Aging Analysis V3

Quick Start

Repository Structure

Datasets Preparation

SRtools

Bayesian Analysis

Figures

Different Noises

SI 11 and SHAP Analysis

Performance Tests

Requirements

Data Availability

Citation

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Baysian03		Baysian03
Datasets_preperation		Datasets_preperation
Different_Noises		Different_Noises
Figures		Figures
SI_11_and_SHAP_analysis		SI_11_and_SHAP_analysis
SRtools		SRtools
performence tests		performence tests
results		results
.gitignore		.gitignore
README.md		README.md
download_posterior_data.py		download_posterior_data.py
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Comparative Aging Analysis V3

Quick Start

Repository Structure

Datasets Preparation

SRtools

Bayesian Analysis

Figures

Different Noises

SI 11 and SHAP Analysis

Performance Tests

Requirements

Data Availability

Citation

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages