Skip to content

General refactoring and cleanup in preparation for release v0.2#14

Draft
bgroenks96 wants to merge 28 commits intomainfrom
refactoring-v0.2
Draft

General refactoring and cleanup in preparation for release v0.2#14
bgroenks96 wants to merge 28 commits intomainfrom
refactoring-v0.2

Conversation

@bgroenks96
Copy link
Owner

@bgroenks96 bgroenks96 commented Jan 3, 2026

The primary goal of this PR is to make the package functional again with up-to-date dependencies (the v0.1.x versions currently have hard compat bounds that cause version conflicts for some package combinations) and to make several necessary changes to the API that should greatly improve clarity and usability.

Breaking changes

  • Removes hard dependencies on DiffEqBase, Zygote, and AbstractGPs.
  • Temporarily removes the Emulators module. This has always been arguably out-of-scope of the SimulationBasedInference package and increased the footprint by adding dependencies. My current plan is to re-introduce this code as a separate source module in the repository with the goal of eventually registering it as a standalone package. Also removes the GP likelihood for now.
  • Decouples SimulatorForwardProblem from the SciML problem interface by allowing it to directly handle arbitrary simulator functions/types.
  • Refactors the SimulatorObservable constructor(s) to be more clear and concise, as well as enabling do-syntax for the observable function.
  • Replaces ensemble_solve with a more general implementation for SimulatorForwardProblems that have matrix-valued parameters where columns correspond to parameter samples (ensemble members). This decouples the parallel/ensemble forward simulation code from the ensemble inference algorithms.
  • Simplify the usage of SimulationArrayStorage; e.g. ensemble matrices are now stored directly rather than being flattened into the input/output arrays.
  • Removes outdated Turing/Gen model constuctor code. These APIs were never fully tested and constituted an additional maintenance burden for relatively little gain. Users can still use the PPLs to define SimulationPriors or can directly incorporate SimulatorForwardProblem into a full probabilistic program on a case-by-case basis.

Other changes

  • Adds a new trait-based Simulator interface for defining external simulator types. The Simulator trait currently has three subtypes: ForwardMap, Iterative and Dynamical. The interfaces for Iterative and Dynamical simulators are extensions of the CommonSolve interface also used by SimulatorForwardProblem.
  • Adds a DiffEqBase extension module
  • Adds CI workflow for Julia 1.10 and 1.11
  • Updates all dependencies to their latest versions along with all necessary compatibility changes

This PR is still a WIP and I will update this with further changes.

@bgroenks96 bgroenks96 self-assigned this Jan 3, 2026
@rsenne
Copy link

rsenne commented Jan 16, 2026

The following code is not type stable:

struct SimulationArrayStorage{inputType, outputType, metaType} <: SimulationData{inputType, outputType}
    inputs::Vector{inputType}
    outputs::Vector{outputType}
    metadata::Vector{metaType}
end

SimulationArrayStorage(;
    input_type::Type = Any,
    output_type::Type = Any,
    metadata_type::Type = Any
) = SimulationArrayStorage(input_type[], output_type[], metadata_type[])

An alternate pattern is:

struct SimulationArrayStorage{I,O,M} <: SimulationData{I,O}
    inputs::Vector{I}
    outputs::Vector{O}
    metadata::Vector{M}
end

SimulationArrayStorage{I,O,M}() where {I,O,M} =
    SimulationArrayStorage(Vector{I}(), Vector{O}(), Vector{M}())

@rsenne
Copy link

rsenne commented Jan 16, 2026

You should consider adding tests using JET and Aqua. These will help keep the code clean and performant

@rsenne
Copy link

rsenne commented Jan 16, 2026

mutable struct SimulatorInferenceSolution{algType,probType,storageType}
    prob::probType
    alg::algType
    storage::storageType
    result::Any  
end

result should have its own type. Any is not type stable and lead to runtime dispatch

@rsenne
Copy link

rsenne commented Jan 16, 2026

I think my biggest comment overall is actually about types. In many places i feel types are either under-restricted (e.g., above). This can lead to degraded performance. I think we should lean on Julia's type promotion system.

Here is an example patter for EMSDAState that can be reused in many places.

mutable struct ESMDAState{ensType,meanType,covType,T<:Real} <: EnsembleState
    ens::ensType
    obs_mean::meanType
    obs_cov::covType
    prior::MvNormal
    loglik::Vector{T}     # Parametric but bounded by <:Real
    logprior::Vector{T}   
    iter::Int
    rng::AbstractRNG
end

# Convenience constructor with promotion
function ESMDAState(
    ens, obs_mean, obs_cov, prior,
    loglik::Vector{T1}=Float64[],
    logprior::Vector{T2}=Float64[],
    iter::Int=0,
    rng::AbstractRNG=Random.default_rng()
) where {T1<:Real, T2<:Real}
    T = promote_type(T1, T2)  # Promote to common type
    return ESMDAState{typeof(ens), typeof(obs_mean), typeof(obs_cov), T}(
        ens, obs_mean, obs_cov, prior,
        Vector{T}(loglik),
        Vector{T}(logprior),
        iter, rng
    )
end

# Usage is clean:
state = ESMDAState(ens, obs, obs_cov, prior)  # Defaults to Float64
state = ESMDAState(ens, obs, obs_cov, prior, Float32[], Float32[])  # Float32 for GPU

@bgroenks96
Copy link
Owner Author

bgroenks96 commented Jan 17, 2026

Hi @rsenne, thanks a lot for the feedback! Much appreciated.

I assure you that I am aware of type stability concerns and how Julia's type system works. As a general rule, if a struct in this package has abstract typed fields, it was done intentionally.

Fully type-parameterized structs do not come for free. They increase type-inference/compile-time as well as type complexity, making debugging (e.g. stack traces and error messages) messier and thus pose usability concerns.

Furthermore, the presence of an untyped or abstract typed field is not automatically a cause of type instability. It depends on how/where it is used; e.g. in this example that you pointed out:

mutable struct SimulatorInferenceSolution{algType,probType,storageType}
    prob::probType
    alg::algType
    storage::storageType
    result::Any  
end

result is not used by any performance critical code. It is simply assigned after the inference algorithm finishes. Thus, it does not cause any issues with type instability. It could pose an issue for user-code but this could be easily avoided with the use of function barriers. Note however that the whole API design around inference results still needs further re-design and improvement, so this specific case is a bit of a moot point.

Similarly here

struct SimulationArrayStorage{inputType, outputType, metaType} <: SimulationData{inputType, outputType}
    inputs::Vector{inputType}
    outputs::Vector{outputType}
    metadata::Vector{metaType}
end

SimulationArrayStorage(;
    input_type::Type = Any,
    output_type::Type = Any,
    metadata_type::Type = Any
) = SimulationArrayStorage(input_type[], output_type[], metadata_type[])

the use of Any as a default value is intentional because some simulator configurations may not have type information available for the state/output of the simulator. Thus, this alleviates the user of having to worry about precisely assigning types when specifying the simulator. Of course, for cases where type stability in the inference algorithm is a concern, the types can be passed to the constructor, thus making it type stable.

In general, however, I would not be terribly worried about degraded performance at the level of the inference algorithm. For SBI applications, it's rarely the bottleneck, at least in my experience. Usually, it's the simulator that you need to worry about.

So heeding the preeminent words of Donald Knuth -- "Premature optimization is the root of evil" -- I would make sure that any additional type complexity is justified by a specific, measurable performance issue due to type inference failure.

@bgroenks96
Copy link
Owner Author

That being said, you are absolutely right that we need more generic types for compatibility with GPU arrays and alternative number formats like Float32.

@rsenne
Copy link

rsenne commented Jan 18, 2026

Hi @bgroenks96!

I assure you that I am aware

Sorry! I wasn't trying to imply that you weren't aware, this is just what I saw on a cursory glance and you obviously know this code far better than I do. Your reasoning makes sense to me, so i agree that this isn't as big of a concern as my initial message suggested

@bgroenks96
Copy link
Owner Author

No worries, just wanted to make sure we're on the same page. Note that 3ed2928 adds generic types to the ensemble solvers and adds a type parameter for result (mostly because I think we will need it for dispatch) as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants