Bayesian Multiple Imputation by Chained Equations for uncertainty-aware imputation of time-series data
Missing data are pervasive in real-world time-series applications, particularly in environmental monitoring and healthcare, where reliable uncertainty quantification is essential. tBayes-MICE extends classical MICE by replacing deterministic regression updates with Bayesian regression models whose parameters and imputations are jointly sampled via Markov Chain Monte Carlo (MCMC). The method support Random Walk Metropolis (RWM) sampling, with theoretically motivated optimal scaling to improve convergence and mixing.
Bayesian-MICE/
├── Datasets/
│ ├── AirQualityUCI.csv
│ ├── Data_subset_AirQuality.csv
│ ├── Data_with_missing_AirQuality.csv
│ ├── physionet_5000patients.csv
│ ├── physio_subdata.csv
│ └── physio_with_missing.csv
│
├── MCMC_MICE_codes/
│ ├── placeholder.py
│ ├── PhysioData_Loader.py
│ ├── MCMC_CHAIN.py
│ ├── SimpleMCMC.py
│ ├── Run_Single_MCMC.py
│ ├── Comparison_runs.py
│ ├── Run_experiments.py
│ ├── Visualisation.py
│ ├── BRITS.py
│ └── packages.py
│
├── AirQuality_Plots/
├── PhysioNet_Plots/
├── requirements.txt
└── README.md
| File | Description |
|---|---|
Datasets/AirQualityUCI.csv |
Original AirQuality data (hourly, unprocessed) |
Datasets/Data_subset_AirQuality.csv |
AirQuality after removing original NaNs |
Datasets/Data_with_missing_AirQuality.csv |
AirQuality with artificial missing values for evaluation |
Datasets/physionet_5000patients.csv |
PhysioNet tabular data, filtered to rows with ≤60% missingness |
Datasets/physio_subdata.csv |
PhysioNet after removing all NaNs |
Datasets/physio_with_missing.csv |
PhysioNet with artificial missing values |
MCMC_MICE_codes/placeholder.py |
Missing value initialisation (mean-based and time-aware variants) |
MCMC_MICE_codes/PhysioData_Loader.py |
Converts raw PhysioNet data to structured format and applies missingness masks |
MCMC_MICE_codes/MCMC_CHAIN.py |
MCMC samplers — Random Walk Metropolis (RWM) |
MCMC_MICE_codes/SimpleMCMC.py |
Lagged predictor construction and parallel MCMC chains |
MCMC_MICE_codes/Run_Single_MCMC.py |
Runs MCMC within each MICE iteration and checks convergence |
MCMC_MICE_codes/Comparison_runs.py |
30-run multiple imputation comparison across all methods |
MCMC_MICE_codes/Run_experiments.py |
Full experimental workflow manager |
MCMC_MICE_codes/Visualisation.py |
Generates all figures used in the paper |
MCMC_MICE_codes/BRITS.py |
BRITS baseline implementation using pypots |
MCMC_MICE_codes/packages.py |
Full list of packages used |
1. Clone the repository
git clone https://github.com/sydney-machine-learning/Bayesian-MICE.git
cd Bayesian-MICE2. Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux / macOS
venv\Scripts\activate # Windows3. Install dependencies
pip install -r requirements.txtOr install manually:
pip install numpy pandas scikit-learn matplotlib arviz scipy pypots- Source: UCI Machine Learning Repository — Air Quality Dataset
- Description: Hourly air quality measurements from an Italian city, March 2004 to February 2005
- Variables: CO, NO2, NOx, O3, temperature, humidity (15 columns)
- Missing rate: ~11% naturally occurring, plus artificially masked values for evaluation
- Source: PhysioNet Challenge 2012
- Description: ICU patient records, 48-hour time series of 37 clinical variables
- Processing: Filtered to 5000 patients; rows with more than 60% missingness removed
- Missing rate: Varies by variable (10–80%)
python MCMC_MICE_codes/PhysioData_Loader.pyOutputs:
Datasets/physio_subdata.csvDatasets/physio_with_missing.csv
python MCMC_MICE_codes/Comparison_runs.pyCompares tBayes-MICE, MICE and BRITS over 30 runs.
python MCMC_MICE_codes/Run_experiments.pyRuns all methods on both datasets over 30 independent runs.
python MCMC_MICE_codes/Visualisation.pyReproduces all figures from the paper. Output saved to AirQuality_Plots/ and PhysioNet_Plots/.
| Item | Detail |
|---|---|
| Random seed | Fixed at 42 across all experiments |
| Number of runs | 30 independent runs per method |
| Hardware | UNSW Katana HPC cluster |
| HPC citation | DOI: 10.26190/669XA286 |
| Python version | 3.8+ |
| OS | Linux (Ubuntu 20.04) |
Results may vary slightly on different hardware due to floating-point precision differences. Reported metrics are means over 30 runs to account for this variability.
Performance on AirQuality dataset (MAE, lower is better):
| Method | MAE | RMSE |
|---|---|---|
| MICE (classical) | — | — |
| tBayes-MICE V1 | — | — |
| BRITS | — | — |
Full results with confidence intervals are reported in the paper.
If you use this code or results in your work, please cite:
@article{ibenegbu2026tbayes,
title={tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data},
author={Ibenegbu, Amuche and de Micheaux, Pierre Lafaye and Chandra, Rohitash},
journal={arXiv preprint arXiv:2603.27142},
year={2026}
url = {https://arxiv.org/abs/2603.27142}
}- Found a bug? Open an issue
- Want to contribute? Fork the repository and submit a pull request
Experiments were run on the Katana High Performance Computing cluster, supported by Research Technology Services at UNSW Sydney.