Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
725b0cd
feat(posterior sbc): implements posterior sbc and misc fixes
Cab14bacc May 4, 2026
f6c5cee
feat(tests): add tests for posterior sbc and renames the original sbc…
Cab14bacc May 4, 2026
8324301
chore(doc): add example for posterior sbc
Cab14bacc May 4, 2026
74b9dcb
chore(doc): update original sbc to prior sbc and deleting the old exa…
Cab14bacc May 4, 2026
48fd666
feat: add option for progressbar
Cab14bacc May 4, 2026
45c1ed2
chore(doc): add image for posterior sbc and remove progress bar from …
Cab14bacc May 4, 2026
6b423cb
fix(doc): fix prior sbc example link
Cab14bacc May 6, 2026
2f14df8
feat(compute_rank_statistics): introduce param_transform and re-evalu…
Cab14bacc May 6, 2026
bfb7113
fix(simulator): set observed and free vars based on simulator output.
Cab14bacc May 6, 2026
7478dd4
fix(test): modify centered_eight_no_observed model to have explicit y…
Cab14bacc May 6, 2026
9c80efa
feat(posterior sbc): implements posterior sbc and misc fixes
Cab14bacc May 4, 2026
57baec0
feat(tests): add tests for posterior sbc and renames the original sbc…
Cab14bacc May 4, 2026
d6b7d52
chore(doc): add example for posterior sbc
Cab14bacc May 4, 2026
afad610
chore(doc): update original sbc to prior sbc and deleting the old exa…
Cab14bacc May 4, 2026
e4ebfd9
feat: add option for progressbar
Cab14bacc May 4, 2026
306a8fc
chore(doc): add image for posterior sbc and remove progress bar from …
Cab14bacc May 4, 2026
4d64312
fix(doc): fix prior sbc example link
Cab14bacc May 6, 2026
218f0c4
Merge branch 'feat/Posterior_SBC' of https://github.com/Cab14bacc/sim…
Cab14bacc May 6, 2026
604eb11
feat(compute_rank_statistics): introduce param_transform and re-evalu…
Cab14bacc May 6, 2026
fe8a454
feat(posterior sbc): implements posterior sbc and misc fixes
Cab14bacc May 4, 2026
c2b1201
feat(tests): add tests for posterior sbc and renames the original sbc…
Cab14bacc May 4, 2026
7454f7b
chore(doc): add example for posterior sbc
Cab14bacc May 4, 2026
ae5cb23
chore(doc): update original sbc to prior sbc and deleting the old exa…
Cab14bacc May 4, 2026
535ca88
feat: add option for progressbar
Cab14bacc May 4, 2026
dd58653
chore(doc): add image for posterior sbc and remove progress bar from …
Cab14bacc May 4, 2026
ee93bec
fix(doc): fix prior sbc example link
Cab14bacc May 6, 2026
3e4a204
Merge branch 'feat/Posterior_SBC' of https://github.com/Cab14bacc/sim…
Cab14bacc May 6, 2026
4a29a6b
fix: remove duplicate definition of compute_rank_statistics
Cab14bacc May 6, 2026
4331fc7
fix: remove duplicate decorator
Cab14bacc May 6, 2026
0934050
fix: remove duplicate compute_rank_statistics call
Cab14bacc May 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,24 @@ The gallery below presents examples that demonstrate the use of Simuk.
:gutter: 2 2 3 3

.. grid-item-card::
:link: ./examples/gallery/sbc.html
:link: ./examples/gallery/prior_sbc.html
:text-align: center
:shadow: none
:class-card: example-gallery

.. image:: examples/img/sbc.png
:alt: SBC
.. image:: examples/img/prior_sbc.png
:alt: Prior SBC

+++
SBC
Prior SBC

.. grid-item-card::
:link: ./examples/gallery/posterior_sbc.html
:text-align: center
:shadow: none
:class-card: example-gallery

.. image:: examples/img/posterior_sbc.png
:alt: Posterior SBC
+++
Posterior SBC
236 changes: 236 additions & 0 deletions docs/examples/gallery/posterior_sbc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
kernelspec:
display_name: Python 3
language: python
name: python3
---

# Posterior Simulation-Based Calibration

**Posterior SBC** (Säilynoja et al., 2025) validates the inference algorithm
*conditional on observed data*, rather than averaging over the prior.

```{admonition} When to use Posterior SBC
:class: tip

Use **Prior SBC** when you want to check that your inference pipeline works
for a wide range of datasets generated under the prior.

Use **Posterior SBC** when you already have observed data and want to verify
that the inference algorithm is trustworthy *for that specific dataset*.
Posterior SBC focuses on the region of the parameter space that matters
for the observed data, making it more sensitive to local calibration issues.
```

```{jupyter-execute}

import pymc as pm
from arviz_plots import plot_ecdf_pit, style
import matplotlib.pyplot as plt
import numpy as np
import simuk

style.use("arviz-variat")
```

## How Posterior SBC works

Given a model $\pi(\theta, y) = \pi(\theta)\,\pi(y \mid \theta)$ and
observed data $y_{\text{obs}}$, Posterior SBC proceeds as follows:

1. **Fit the model** to $y_{\text{obs}}$ to obtain posterior draws
$\theta'_i \sim \pi(\theta \mid y_{\text{obs}})$.
2. **Generate replicated data** from the posterior predictive:
$y_i \sim \pi(y \mid \theta'_i)$.
3. **Augment** the observations: $y_{\text{aug}} = (y_{\text{obs}}, y_i)$.
4. **Re-fit the model** on the augmented data to get
$\theta''_{i,1}, \ldots, \theta''_{i,S} \sim \pi(\theta \mid y_i, y_{\text{obs}})$.
5. **Compute the rank statistics** of $f(\theta'_i)$ among $f(\theta''_{i,1}), \ldots, f(\theta''_{i,S})$. Where $f$ is an optional test quantity applied to the parameters before computing ranks.

By the self-consistency of Bayesian updating, $\theta'_i$ is also a draw
from the augmented posterior $\pi(\theta \mid y_i, y_{\text{obs}})$.
Therefore the rank statistics should be **uniformly distributed** if the inference
is calibrated.

## Example: Linear Regression Model

### Define the model

```{admonition} Model requirements for Posterior SBC
:class: warning

Posterior SBC augments the observed data (concatenating original + replicated),
which changes its size. For this to work, store observed data in ``pm.Data``
containers, and specify size using the ``dims`` parameter instead of setting a static shape.
If your model uses ``dims`` and ``coords``, you are also responsible for resizing them to the correct size corresponding to the new augmented dataset via the ``update_data`` callback.
Similarly, if your model has covariates, store them in ``pm.Data`` so they
can be resized in the same callback.
```

```{jupyter-execute}

random_seed = 42
np.random.seed(random_seed)

x_data = np.linspace(0, 10, 100)
y_data = np.random.normal(x_data ** 1.2, 1)

coords = {
"obs_id": np.arange(len(x_data))
}

with pm.Model(coords=coords) as model:
model_x_data = pm.Data("x_data", x_data, dims="obs_id")
model_y_data = pm.Data("y_data", y_data, dims="obs_id")

alpha = pm.Normal("alpha", mu=0, sigma=10)
beta = pm.Normal("beta", mu=0, sigma=10)
sigma = pm.HalfNormal("sigma", sigma=10)

# pm.Deterministic forces PyMC to track this equation's output
mu = pm.Deterministic("mu", alpha + beta * model_x_data)
y = pm.Normal("y", mu=mu, sigma=sigma, observed=model_y_data)
```

### Fit the original posterior

First, we need the posterior samples from the observed data. These will
serve as the reference distribution for Posterior SBC.

```{jupyter-execute}

with model:
idata = pm.sample(200, random_seed=random_seed, progressbar=False)
```

### Using `update_data` with covariates and `dims`

When your model uses `dims`/`coords` or has covariates stored in `pm.Data`,
you must provide an `update_data` callback that resizes everything to
match the augmented observations. The callback is called **before** the model
is re-conditioned, and runs inside the model context.

```{jupyter-execute}

def update_data(model, augmented_data, simulation_idx):
with model:
pm.set_data(
{"x_data": np.concatenate([model["x_data"].get_value(), model["x_data"].get_value()])},
coords={"obs_id": np.arange(len(augmented_data["y"]))},
)
```

### Custom test quantities with `param_transform`

You can define a scalar test quantity applied to both the reference draw
and the posterior draws before computing the rank statistic. The function
receives `(param_name, param_value)` and should return a comparable value.

```{jupyter-execute}

def param_transform(param_name, param_value):
return np.pow(param_value, 2)
```

### Run Posterior SBC

Pass `method="posterior"` and provide the `trace`. Each iteration
generates replicated data from the posterior predictive, augments it
with the original observations, and re-fits the model.

```{jupyter-execute}
sbc = simuk.SBC(
model,
method="posterior",
trace=idata,
param_transform=param_transform,
update_data=update_data,
num_simulations=50,
seed=random_seed,
sample_kwargs={"chains": 4, "draws": 50, "tune": 50},
progress_bar=False,
)

sbc.run_simulations();
```

### Visualize the results

We expect the ECDF lines to fall inside the grey simultaneous confidence
band, indicating that the ranks are consistent with a uniform distribution.

```{jupyter-execute}

plot_ecdf_pit(sbc.simulations,
group="posterior_sbc",
visuals={"xlabel": False},
);
```

## Intentionally Skewing the Augmented Posterior Using Custom augmentation with `augment_observed`

We intentionally skew the augmented posterior by keeping only the last 25 original observations and concatenating them with the replicated data. This creates a mismatch between the reference draw (which is based on the full observed data) and the augmented posterior (which is based on a subset of the observed data), leading to skewed rank statistics.

```{jupyter-execute}

def augment_observed(model, observed_data, replicated_data, simulation_idx):
"""Keep only the last 25 original observations + replicated."""
data = {"y": np.concatenate([observed_data["y"].values[-25:], replicated_data["y"]])}
return data


def update_data(model, augmented_data, simulation_idx):
with model:
pm.set_data(
{
"x_data": np.concatenate(
[model["x_data"].get_value()[-25:], model["x_data"].get_value()]
)
},
coords={"obs_id": np.arange(25 + len(model["x_data"].get_value()))},
)


skewed_sbc = simuk.SBC(
model,
method="posterior",
trace=idata,
augment_observed=augment_observed,
update_data=update_data,
num_simulations=50,
sample_kwargs={"chains": 4, "draws": 50, "tune": 50},
progress_bar=False,
)

skewed_sbc.run_simulations()
```

### Visualize the skewed results

The results indicate a clear deviation from uniformity, with the ECDF lines falling outside the confidence band. This suggests that the self-consistency property of Bayesian updating does not hold.

```{jupyter-execute}

plot_ecdf_pit(skewed_sbc.simulations, group="posterior_sbc", visuals={"xlabel": False})
```

We shall also replot the original Posterior SBC results for comparison using `compute_rank_statistics` without need to re-run the simulations.

```{jupyter-execute}

sbc.compute_rank_statistics(lambda _, param_value: param_value)
plot_ecdf_pit(sbc.simulations, group="posterior_sbc", visuals={"xlabel": False})
```

## References

- Säilynoja, T., Schmitt, M., Bürkner, P.-C., & Vehtari, A. (2025).
*Posterior SBC: Simulation-Based Calibration Checking Conditional on Data*.
[arXiv:2502.03279](https://arxiv.org/abs/2502.03279)
- Talts, S., Betancourt, M., Simpson, D., Vehtari, A., & Gelman, A. (2020).
*Validating Bayesian Inference Algorithms with Simulation-Based Calibration*.
[arXiv:1804.06788](https://arxiv.org/abs/1804.06788)
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ kernelspec:
name: python3
---

# Simulation based calibration
# Prior Simulation based calibration

```{jupyter-execute}

Expand All @@ -19,8 +19,8 @@ import simuk
style.use("arviz-variat")
```

## Out-of-the-box SBC
This example demonstrates how to use the `SBC` class for simulation-based calibration, supporting PyMC, Bambi and Numpyro models. By default, the generative model implied by the probabilistic model is used.
## Out-of-the-box Prior SBC
This example demonstrates how to use the `SBC` class for prior simulation-based calibration, supporting PyMC, Bambi and Numpyro models. By default, the generative model implied by the probabilistic model is used.


::::::{tab-set}
Expand Down
Binary file added docs/examples/img/posterior_sbc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
69 changes: 67 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Overview

Simuk is a Python library for simulation-based calibration (SBC) and the generation of synthetic data.
Simulation-Based Calibration (SBC) is a method for validating Bayesian inference by checking whether the
posterior distributions align with the expected theoretical results derived from the prior.
posterior distributions align with the expected theoretical results derived from the prior (posterior).

Quickstart
Prior SBC Quickstart
----------

This quickstart guide provides a simple example to help you get started. If you're looking for more examples
Expand Down Expand Up @@ -52,6 +52,71 @@ Plot the empirical CDF to compare the differences between the prior and posterio
The lines should be nearly uniform and fall within the oval envelope. It suggests that the prior and posterior distributions
are properly aligned and that there are no significant biases or issues with the model.

Posterior SBC Quickstart
------------------------

While Prior SBC checks the global validity of an inference algorithm across the entire prior space,
Posterior SBC evaluates validity locally, conditional on your observed data. To use it, simply pass ``method="posterior"`` and the original ``trace`` to the ``SBC`` class:
Currently, it's only implemented for PyMC.

.. warning::

**Model requirements for Posterior SBC**

Posterior SBC augments the observed data (concatenating original + replicated),
which changes its size. For this to work, store observed data in ``pm.Data``
containers, and specify size using the ``dims`` parameter instead of setting a static shape.
If your model uses ``dims`` and ``coords``, you are also responsible for resizing them to the correct size corresponding to the new augmented dataset via the ``update_data`` callback.
Similarly, if your model has covariates, store them in ``pm.Data`` so they
can be resized in the same callback.

.. code-block:: python

# Define the model conforming to the Posterior SBC implementation requirements.
import numpy as np
import pymc as pm

data = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])

with pm.Model(coords={"school": np.arange(8)}) as centered_eight:
school_idx = pm.Data("school_idx", np.arange(8))
y_data = pm.Data("y_data", data)
sigma_data = pm.Data("sigma_data", sigma)

mu = pm.Normal('mu', mu=0, sigma=5)
tau = pm.HalfCauchy('tau', beta=5)
theta = pm.Normal('theta', mu=mu, sigma=tau, dims="school")
y_obs = pm.Normal('y', mu=theta[school_idx], sigma=sigma_data, observed=y_data)

# Run the model and save the trace.
with centered_eight:
idata = pm.sample(progressbar=False)

# Define necessary callbacks to resize our covariates
def update_data(model, augmented_data, simulation_idx):
with model:
pm.set_data({
"sigma_data": np.concatenate([sigma, sigma]),
"school_idx": np.concatenate([np.arange(8), np.arange(8)])
})

# Run Posterior SBC
post_sbc = simuk.SBC(
centered_eight,
method="posterior",
trace=idata,
update_data=update_data,
num_simulations=100,
sample_kwargs={'draws': 25, 'tune': 50},
progress_bar=False
)
post_sbc.run_simulations()

plot_ecdf_pit(post_sbc.simulations, group="posterior_sbc", visuals={"xlabel": False})

For more advanced use cases, such as custom data augmentation or re-evaluating rank statistics, check out the :doc:`Posterior SBC tutorial <examples/gallery/posterior_sbc>`.

.. toctree::
:maxdepth: 1
:hidden:
Expand Down
Loading