Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions docs/conventions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# `cuthbert` conventions

On this page we'll explain and justify some of the conventions and design decisions
made in `cuthbert`.


## Unified Interface

{%
include-markdown "../cuthbert/README.md"
start="<!--unified-interface-start-->"
end="<!--unified-interface-end-->"
%}

!!! success "Justification"
The unified interface allows all inference-specific details to be encapsulated
within the `build_filter` arguments with the subsequent methods being unified
across all inference methods. This allows for the user to swap between inference
methods and use model agnostic `cuthbert.filter` and `cuthbert.smoother` methods.


## Filter as a unified operation (no `predict` or `update`)

`cuthbert` methods do not have individual `predict` or `update` methods. Instead, they
are unified into a single `filter` method. Of course, in practice the user can still
call separate `predict` and `update` methods directly if they so desire.

The user can achieve a `predict` step through a degenerate observation
i.e. $p(y_t \mid x_t) \propto 1$.

Similarly an `update` step can be achieved through degenerate dynamics
i.e. $p(x_t \mid x_{t-1}) = \delta(x_t \mid x_{t-1})$.

All `cuthbert` methods support these degenerate cases through appropriate specification
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm unfamiliar with the context. Why would a user want to do predict and update steps? When would they not just want the full filter step?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main one that comes to mind is online inference when they wanted to make decisions based on predict before observing the data

`model_inputs` and the functions passed to `build_filter`.

### Degenerate observation (predict)

- For discrete methods a degenerate observation is achieved by making
`get_obs_lls` return a constant array (i.e. all zeros).

- For Gaussian methods a degenerate observation is achieved by setting the observation
to an array of all `jnp.nan`.

- For SMC methods a degenerate observation is achieved by making
`log_potential` return a constant (i.e. zero).


### Degenerate dynamics (update)

- For discrete methods degenerate dynamics are achieved by making
`get_trans_matrix` return the identity matrix.

- For Gaussian methods degenerate dynamics are achieved by setting `chol_Q` to a zero
matrix. With the exception of `gaussian.taylor` where the user does not define
`chol_Q` directly, in this case degenerate dynamics are achieved with a `linearization_point`
of all `jnp.nan`.

- For SMC methods degenerate dynamics are achieved by making
`propagate_sample` return the previous state unchanged.


!!! success "Justification"
The single `filter` method represents a simplified interface which is easier to test
and maintain whilst providing the user with only one method to call.



## No initial observation

`cuthbert` adopts the convention of defining state-space models as $p(x_{0:T} \mid y_{1:T})$ rather than $p(x_{1:T} \mid y_{1:T})$ as is common in some implementations
(such as [`dynamax`](https://github.com/probml/dynamax)).

Given the above support for degenerate dynamics and observations, the user can achieve
a $p(x_{1:T} \mid y_{1:T})$ model by passing degenerate dynamics to the first step
of a $p(x_{0:T} \mid y_{1:T})$ model.

!!! success "Justification"
The decision to omit of an initial observation was to support [factorial state-space models](https://doi.org/10.1093/jrsssc/qlae035) where the initialisation applies globally
to all factors vs filter steps which act locally on a small number of factors.
Including an initial observation resulted in awkward shape mismatches in the
`init_prepare` function.



## Square root covariance matrices




## Why `prepare` and `combine`?




## What goes in `model_inputs`?




## No dedicated methods for parameter estimation
2 changes: 1 addition & 1 deletion docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Quick start
# Quickstart

This guide will get you up and running with `cuthbert` for state-space model inference.
We'll walk through an example of ranking international football teams over
Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ theme:

nav:
- Home: index.md
- Quick Start: quickstart.md
- Quickstart: quickstart.md
- cuthbert conventions: conventions.md
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking your headings, I think I would create a new docs subsection called "Under the hood" or "Backend" or something like that with the following

- Under the hood
  - What cuthbert does and does not do
  - Backend design
  - Conventions: conventions.md
  - Parallel filter

I would focus on the why it was designed like this and not other ways.

What cuthbert does and does not do: basically explain cuthbert does inner loop (list current methods, future methods and possibly a roadmap for the library), not parameter estimation outer loop.

Conventions can contain: No initial observation, Square root covariance matrices

Backend design can contain:

  • Overview of the design/Unified interfact: Filter, Smoother (common) and build_filter (the part that is different between inference algorithms). My notes from someone fresh to the project may be helpful documentation: explain the backend design #217
  • No predict and update. This is what's in the textbook, but we don't do it because we want to make an affine map from $m_{t+1|t+1} = A m_{t|t} + b$ so that parallel inference is easier (I think that's why). And then some notes on prepare and combine and how they work, with nuances for each method (noop for SMC?)
  • Degenerate dynamics

Parallel filter can explain how the associative scan gets a speedup. Feel free to just copy / edit my notes that I wrote here #217 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree on the why it was designed like this and not other ways !

I'm currently mainly concerned that it's a bit dense and unreadable lol. Splitting it up should help with that!

- Contributing: contributing.md
- Examples:
- examples/index.md
Expand Down
Loading