GPU-Accelerated Data-Driven Bayesian Uncertainty Quantification with Koopman Operators

What if you have data and a general model and would like to evaluate the probability that the fitted model outcomes would have had a given behavior? The purpose of this tutorial is to demonstrate a fast workflow for doing exactly this. It composes together a few different pieces of the SciML ecosystem:

Parameter estimation with uncertainty with Bayesian differential equations by integrating the differentiable differential equation solvers with the Turing.jl library.
Fast calculation of probabilistic estimates of differential equation solutions with parametric uncertainty using the Koopman expectation.
GPU-acceleration of batched differential equation solves.

Let's dive right in.

Bayesian Parameter Estimation with Uncertainty

Let's start by importing all the necessary libraries:

using OrdinaryDiffEq
using Turing, MCMCChains, Distributions
using KernelDensity
using SciMLExpectations, Cuba
using DiffEqGPU
using Plots, StatsPlots
using Random;
Random.seed!(1);

Random.TaskLocalRNG()

For this tutorial, we will use the Lotka-Volterra equation:

function lotka_volterra(du, u, p, t)
    @inbounds begin
        x = u[1]
        y = u[2]
        α = p[1]
        β = p[2]
        γ = p[3]
        δ = p[4]
        du[1] = (α - β * y) * x
        du[2] = (δ * x - γ) * y
    end
end
p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob1 = ODEProblem(lotka_volterra, u0, (0.0, 10.0), p)
sol = solve(prob1, Tsit5())
plot(sol)

From the Lotka-Volterra equation, we will generate a dataset with known parameters:

sol1 = solve(prob1, Tsit5(), saveat = 0.1)

retcode: Success
Interpolation: 1st order linear
t: 101-element Vector{Float64}:
  0.0
  0.1
  0.2
  0.3
  0.4
  0.5
  0.6
  0.7
  0.8
  0.9
  ⋮
  9.2
  9.3
  9.4
  9.5
  9.6
  9.7
  9.8
  9.9
 10.0
u: 101-element Vector{Vector{Float64}}:
 [1.0, 1.0]
 [1.0610780673356455, 0.8210842775886171]
 [1.1440276717257598, 0.6790526689784503]
 [1.2491712125724483, 0.5668931465841179]
 [1.3776445705636384, 0.47881295137951546]
 [1.5312308177480134, 0.4101564670866144]
 [1.7122697558187643, 0.35726544879948385]
 [1.923578275830157, 0.31734720616177164]
 [2.1683910896994067, 0.288388843787324]
 [2.4502506671402524, 0.2690537093960071]
 ⋮
 [1.8172823219555856, 4.064946595043356]
 [1.442761298838369, 3.5397375780465627]
 [1.208908107884453, 2.9914550030314535]
 [1.0685925969627899, 2.4820729201626373]
 [0.991022962327608, 2.037244570196892]
 [0.957421348475827, 1.6632055724974297]
 [0.9569793912886565, 1.3555870283301439]
 [0.9835609063200599, 1.1062868199420042]
 [1.033758125602055, 0.9063703842886213]

Now let's assume our dataset should have noise. We can add this noise in and plot the noisy data against the generating set:

odedata = Array(sol1) + 0.8 * randn(size(Array(sol1)))
plot(sol1, alpha = 0.3, legend = false);
scatter!(sol1.t, odedata');

Now let's assume that all we know is the data odedata and the model form. What we want to do is use the data to inform us of the parameters, but also get a probabilistic sense of the uncertainty around our parameter estimate. This is done via Bayesian estimation. For a full look at Bayesian estimation of differential equations, look at the Bayesian differential equation tutorial from Turing.jl.

Following that tutorial, we choose a set of priors and perform NUTS sampling to arrive at the MCMC chain:

Turing.setadbackend(:forwarddiff)

@model function fitlv(data, prob1)
    σ ~ InverseGamma(2, 3) # ~ is the tilde character
    α ~ truncated(Normal(1.5, 0.5), 1.0, 2.0)
    β ~ truncated(Normal(1.2, 0.5), 0.5, 1.5)
    γ ~ truncated(Normal(3.0, 0.5), 2, 4)
    δ ~ truncated(Normal(1.0, 0.5), 0.5, 1.5)

    p = [α, β, γ, δ]
    prob = remake(prob1, p = p)
    predicted = solve(prob, Tsit5(), saveat = 0.1)

    for i in 1:length(predicted)
        data[:, i] ~ MvNormal(predicted[i], σ)
    end
end

model = fitlv(odedata, prob1)

# This next command runs 3 independent chains without using multithreading.
chain = mapreduce(c -> sample(model, NUTS(0.45), 1000), chainscat, 1:3)

Chains MCMC chain (1000×17×3 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 3
Samples per chain = 1000
Wall duration     = 40.13 seconds
Compute duration  = 37.16 seconds
parameters        = σ, α, β, γ, δ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

           σ    0.8512    0.0431    0.0052    67.3525   104.7005    1.0153     ⋯
           α    1.5063    0.0460    0.0068    49.1516   153.0110    1.0922     ⋯
           β    1.0641    0.0498    0.0070    51.9934   201.5784    1.0621     ⋯
           γ    3.0610    0.1395    0.0184    58.5106   141.6427    1.0848     ⋯
           δ    0.9846    0.0477    0.0069    48.0480   180.0053    1.0915     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           σ    0.7669    0.8222    0.8515    0.8775    0.9431
           α    1.4243    1.4759    1.5024    1.5328    1.5984
           β    0.9820    1.0269    1.0606    1.0972    1.1690
           γ    2.7818    2.9803    3.0621    3.1570    3.3302
           δ    0.8930    0.9541    0.9854    1.0151    1.0784

This chain gives a discrete approximation to the probability distribution of our desired quantities. We can plot the chains to see these distributions in action:

plot(chain)

Great! From our data, we have arrived at a probability distribution for our parameter values.

Evaluating Model Hypotheses with the Koopman Expectation

Now, let's try to ask a question: what is the expected value of x (the first term in the differential equation) at time t=10 given the known uncertainties in our parameters? This is a good tutorial question because all other probabilistic statements can be phrased similarly. Asking a question like, “what is the probability that x(T) > 1 at the final time T?”, can similarly be phrased as an expected value (probability statements are expected values of characteristic functions which are 1 if true 0 if false). So in general, the kinds of questions we want to ask and answer are expectations about the solutions of the differential equation.

The trivial to solve this problem is to sample 100,000 sets of parameters from our parameter distribution given by the Bayesian estimation, solve the ODE 100,000 times, and then take the average. But is 100,000 ODE solves enough? Well it's hard to tell, and even then, the convergence of this approach is slow. This is the Monte Carlo approach, and it converges to the correct answer by sqrt(N). Slow.

However, the Koopman expectation can converge with much fewer points, allowing the use of higher order quadrature methods to converge exponentially faster in many cases. To use the Koopman expectation, we first need to define our observable function g. This function designates the thing about the solution we wish to calculate the expectation of. Thus for our question “what is the expected value of xat time t=10?” we would simply use:

function g(sol, p)
    sol[1, end]
end

g (generic function with 1 method)

Now we need to use the expectation call, where we need to provide our initial condition and parameters as probability distributions. For this case, we will use the same constant u0 as before. But, let's turn our Bayesian MCMC chains into distributions through kernel density estimation (the plots of the distribution above are just KDE plots!).

p_kde = [kde(vec(Array(chain[:α]))), kde(vec(Array(chain[:β]))),
    kde(vec(Array(chain[:γ]))), kde(vec(Array(chain[:δ])))]

4-element Vector{KernelDensity.UnivariateKDE{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}}:
 KernelDensity.UnivariateKDE{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}(1.3271793547629829:0.000215949797022359:1.7692285892677517, [1.678613864619649e-5, 1.7602666114058252e-5, 1.8626282555445073e-5, 1.986682995402944e-5, 2.1335972292479966e-5, 2.3047263154385433e-5, 2.501622158890271e-5, 2.726041621148312e-5, 2.9799557613507943e-5, 3.265559904219728e-5  …  1.9053268946550572e-5, 1.7971606347255698e-5, 1.7089682246673732e-5, 1.6401082844008652e-5, 1.5900974449645222e-5, 1.558609387783405e-5, 1.545474597125507e-5, 1.5506808429460728e-5, 1.574374407800616e-5, 1.6168620710299564e-5])
 KernelDensity.UnivariateKDE{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}(0.8896366324720044:0.0002048008539062745:1.3088639804181483, [9.44742801811671e-6, 9.520591463640926e-6, 9.66729962481594e-6, 9.888376127697995e-6, 1.0185059313272404e-5, 1.055900602597859e-5, 1.1012296670109833e-5, 1.1547441493231503e-5, 1.2167388132144907e-5, 1.2875530392530976e-5  …  1.2874796798623134e-5, 1.2166762014764743e-5, 1.1546912179694058e-5, 1.101185509583269e-5, 1.0558644596869726e-5, 1.018477178449384e-5, 9.888157519899465e-6, 9.667146140424077e-6, 9.520500430182377e-6, 9.447397849804418e-6])
 KernelDensity.UnivariateKDE{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}(2.378296020592484:0.0006378071622694851:3.68388728175812, [5.216507249716784e-6, 5.083453989085385e-6, 5.005219854359311e-6, 4.980911664898002e-6, 5.010059300758485e-6, 5.092610462220648e-6, 5.228927239864234e-6, 5.419784456783816e-6, 5.666369761847534e-6, 5.970285426482036e-6  …  1.0080908670967448e-5, 9.249974873242264e-6, 8.508554790388062e-6, 7.850821720183435e-6, 7.271528823330278e-6, 6.765984679266666e-6, 6.330030803492707e-6, 5.96002113377285e-6, 5.65280348592892e-6, 5.405702952696045e-6])
 KernelDensity.UnivariateKDE{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}(0.773867962027943:0.00020051593398531342:1.1843240788958795, [1.530306887076449e-5, 1.494870660057046e-5, 1.472844814820462e-5, 1.464033005138532e-5, 1.4683256236658337e-5, 1.4856989487554983e-5, 1.5162146095981655e-5, 1.560019366131371e-5, 1.6173452033640956e-5, 1.6885097337215882e-5  …  2.7307886600699827e-5, 2.53010582479396e-5, 2.349988116179952e-5, 2.1892530292798307e-5, 2.0468313560846607e-5, 1.9217630350398807e-5, 1.813193334532226e-5, 1.7203693678247145e-5, 1.6426369455813106e-5, 1.5794377570793605e-5])

Now that we have our observable and our uncertainty distributions, let's calculate the expected value. Using GenericDistribution, SciMLExpectation can calculate expectations of any distribution-like object that supports evaluating the probability density function, for the Koopman solver algorithm, and generating random numbers for the MonteCarlo solver algorithm.

pdf_func(x) = prod(pdf.(p_kde, x))
rand_func() = nothing # not needed for Koopman
lb = [p_kde_i.x[begin] for p_kde_i in p_kde] #lower bound for integration
ub = [p_kde_i.x[end] for p_kde_i in p_kde] #upper bound for integration
gd = GenericDistribution(pdf_func, rand_func, lb, ub)
h(x, u, p) = u, x
sm = SystemMap(prob1, Tsit5())
exprob = ExpectationProblem(sm, g, h, gd)
sol = solve(exprob, Koopman(); reltol = 1e-2, abstol = 1e-2)
sol.u

Note how that gives the expectation and a residual for the error bound!

sol.resid

GPU-Accelerated Expectations

Note

Batch functionality not yet fully implemented in version 2 of SciMLExpectations.

Are we done? No, we need to add some GPUs! As mentioned earlier, probability calculations can take quite a bit of ODE solves, so let's parallelize across the parameters. DiffEqGPU.jl allows you to GPU-parallelize across parameters by using the Ensemble interface. Note that you do not have to do any of the heavy lifting: all the conversion to GPU kernels is done automatically by simply specifying EnsembleGPUArray as the ensembling method. For example:

function lotka_volterra(du, u, p, t)
    @inbounds begin
        x = u[1]
        y = u[2]
        α = p[1]
        β = p[2]
        γ = p[3]
        δ = p[4]
        du[1] = (α - β * y) * x
        du[2] = (δ * x - γ) * y
    end
end
p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob = ODEProblem(lotka_volterra, u0, (0.0, 10.0), p)
prob_func = (prob, i, repeat) -> remake(prob, p = rand(Float64, 4) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
@time sol = solve(monteprob, Tsit5(), EnsembleGPUArray(), trajectories = 10_000,
                  saveat = 1.0f0)

Let's now use this in the ensembling method. We need to specify a batch for the number of ODEs solved at the same time, and pass in our ensembling method. The following is a GPU-accelerated uncertainty quantified estimate of the expectation of the solution:

# batchmode = EnsembleGPUArray() #where to pass this?
sol = solve(exprob, Koopman(), batch = 100, quadalg = CubaSUAVE())