Tutorials on building PyRenew multisignal models#819
Conversation
for more information, see https://pre-commit.ci
…into mem_tutorial_h_e_approx
|
@cdc-mitzimorris can you address the CI website build failure? |
website should build now. I highly recommend rendering the docs and reading the rendered tutorial before diving into the Quarto markdown. |
…into mem_tutorial_h_e_approx
|
Thank you for your contribution @cdc-mitzimorris 🚀! Your github-pages is ready for download 👉 here 👈! |
| The components are: | ||
|
|
||
| - **Latent infections**: `PopulationInfections` produces a single aggregate infection trajectory. | ||
| - **$\mathcal{R}(t)$process**: `WeeklyTemporalProcess` samples weekly values and broadcasts them to the daily renewal equation. |
There was a problem hiding this comment.
| - **$\mathcal{R}(t)$process**: `WeeklyTemporalProcess` samples weekly values and broadcasts them to the daily renewal equation. | |
| - **$\mathcal{R}(t)$ process**: `WeeklyTemporalProcess` samples weekly values and broadcasts them to the daily renewal equation. |
| ) | ||
| ``` | ||
|
|
||
| This tutorial builds a small hospital + emergency department model using PyRenew's high-level components and `PyrenewBuilder` class to specify a `MultiSignalModel` object, then fits it to a synthetic dataset. |
There was a problem hiding this comment.
| This tutorial builds a small hospital + emergency department model using PyRenew's high-level components and `PyrenewBuilder` class to specify a `MultiSignalModel` object, then fits it to a synthetic dataset. | |
| This tutorial builds a small hospital (H) + emergency department (E) model using PyRenew's high-level components and `PyrenewBuilder` class to specify a `MultiSignalModel` object, then fits it to a synthetic dataset. |
| It requires the following: | ||
|
|
||
| - **Generation interval**: PMF for secondary infection timing. | ||
| - **Initial conditions ($I(0)$ and $\log(\mathcal{R}(0))$)**: Infection prevalence at time $0$ as a proportion of the population, and the starting point of the $\log(\mathcal{R}(t))$ trajectory. |
There was a problem hiding this comment.
| - **Initial conditions ($I(0)$ and $\log(\mathcal{R}(0))$)**: Infection prevalence at time $0$ as a proportion of the population, and the starting point of the $\log(\mathcal{R}(t))$ trajectory. | |
| - **Initial conditions ($I(0)$ and $\log(\mathcal{R}(0))$)**: Infection incidence at time $0$ as a proportion of the population, and the starting point of the $\log(\mathcal{R}(t))$ trajectory. |
| The logit-scale representation keeps sampled rates between 0 and 1 after transformation back to the probability scale. | ||
|
|
||
| We use a common generic baseline rate and a logit-scale standard deviation of $0.3$, matching the prior scale used by custom-H-E. | ||
| The prior correlation of $0.5$ says that the two rates may differ, but prior draws that are high for hospital ascertainment also tend to be high for ED ascertainment. |
There was a problem hiding this comment.
Mention that correlated ascertainment is important for identifiability.
| The day-of-week effect multiplies the expected daily ED visits before the count observation model is applied. | ||
| Each day of the week has its own multiplier: a value above 1 increases expected ED visits on that day, and a value below 1 decreases them. | ||
|
|
||
| Following the custom H+E model, we place a symmetric Dirichlet prior with concentration value $5$ on these weights and scale the draw by $7$. |
There was a problem hiding this comment.
@dylanhmorris Can you suggest a justification to include here?
| ) | ||
| hosp_plot_df["week_start"] = hosp_plot_df["week_end"] - pd.Timedelta(days=6) | ||
|
|
||
| raw_obs_plot = ( |
There was a problem hiding this comment.
I was thinking two dot plots would be a better visualization, but I kind of like that the thickness of the bars calls attention to the daily vs weekly scale.
| ) | ||
| ``` | ||
|
|
||
| We compare the model estimates to the data-generating parameters. |
There was a problem hiding this comment.
Can we make it easier to compare the true value to the posterior? By putting them in the same table?
| rt_comparison_df = pd.concat(rt_comparison_parts, ignore_index=True) | ||
| ``` | ||
|
|
||
| ```{python} |
| ```{python} | ||
| #| label: synthetic-recovery-check | ||
|
|
||
| rhat = pyrenew_parameter_summary["r_hat"].astype(float) | ||
| ess = pyrenew_parameter_summary["ess_bulk"].astype(float) | ||
| print(f"Max R-hat: {rhat.max():.3f}") | ||
| print(f"Min bulk ESS: {ess.min():.0f}") | ||
|
|
||
| latent_inf = idata_trimmed.posterior["latent_infections"] | ||
| inf_q05 = latent_inf.quantile(0.05, dim=["chain", "draw"]).values | ||
| inf_q50 = latent_inf.quantile(0.50, dim=["chain", "draw"]).values | ||
| inf_q95 = latent_inf.quantile(0.95, dim=["chain", "draw"]).values | ||
|
|
||
| true_infections = daily_infections["true_infections"].to_numpy() | ||
| n_compare = min(len(true_infections), len(inf_q05)) | ||
| inf_covered = (true_infections[:n_compare] >= inf_q05[:n_compare]) & ( | ||
| true_infections[:n_compare] <= inf_q95[:n_compare] | ||
| ) | ||
| print(f"90% interval coverage for true infections: {np.mean(inf_covered):.1%}") | ||
| ``` |
There was a problem hiding this comment.
This table needs more commentary. I'm not sure what it's about.
| ) | ||
| + theme_tutorial | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
I think it would make more sense here to show the posterior predictive for E and H, not just a simple MA.
| ) | ||
| + theme_tutorial | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
What do we learn from this plot that we didn't learn from the previous one? Why bin into two week estimates?
There was a problem hiding this comment.
After further reading, I see that this introduces the idea of relative widths, which are an important comparison point between the models.
| rt_plot | ||
| ``` | ||
|
|
||
| Rt is useful as a companion to latent infections because it focuses on epidemic shape rather than absolute infection scale. |
There was a problem hiding this comment.
| Rt is useful as a companion to latent infections because it focuses on epidemic shape rather than absolute infection scale. | |
| $\mathcal{R}(t)$ is useful as a companion to latent infections because it focuses on epidemic shape rather than absolute infection scale. |
| ``` | ||
|
|
||
| Rt is useful as a companion to latent infections because it focuses on epidemic shape rather than absolute infection scale. | ||
| The plot below compares Rt across all three models against the synthetic data-generating Rt trajectory. |
There was a problem hiding this comment.
The intro only mentions 2 models. Perhaps I was not reading carefully enough. Can we state what the three models are at the very beginning?
There was a problem hiding this comment.
I think the names could be a bit clearer, as well.
Something like
pyrenew-H-E -> independent-ascertainment
linked-pyrenew-H-E -> correlated-ascertainment
custom-H-E -> ratio-ascertainment
?
| .reset_index() | ||
| ) | ||
| latent_width_summary | ||
| ``` |
There was a problem hiding this comment.
The above table needs much more detail, I think. Mean relative width of what?
|
|
||
| ```{python} | ||
| #| label: linked-latent-infections | ||
| #| fig-cap: Posterior latent infections under the two PyRenew parameterizations |
There was a problem hiding this comment.
I just realized these figure captions aren't showing in the html (I have just been reviewing the html, not the qmd). We should figure out how to display them.
| ) | ||
| + theme_tutorial | ||
| ) | ||
| ``` |
| ) | ||
| + theme_tutorial | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
yikes. They all fail to recover the IEDR and IHR. Can we plot the priors too? Perhaps in a facet above?
| ) | ||
| + theme_tutorial | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Would love to see priors here too.
There was a problem hiding this comment.
Really nice work @cdc-mitzimorris! I have only reviewed the new tutorial.


This PR adds tutorial "Multi-signal Model for Weekly Hospital and Daily ED Data" which compares a PyRenew
MultiSignalModelto the production H+E model fitting both on synthetic data.