Bayesian Inference of Pendulum Parameters

In this tutorial, we will perform Bayesian parameter inference of the parameters of a pendulum.

Set up simple pendulum problem

using DiffEqBayes, OrdinaryDiffEq, RecursiveArrayTools, Distributions, Plots, StatsPlots,
      BenchmarkTools, TransformVariables, StanSample, DynamicHMC

Let's define our simple pendulum problem. Here, our pendulum has a drag term ω and a length L.

pendulum

We get first order equations by defining the first term as the velocity and the second term as the position, getting:

function pendulum(du, u, p, t)
    ω, L = p
    x, y = u
    du[1] = y
    du[2] = -ω * y - (9.8 / L) * sin(x)
end

u0 = [1.0, 0.1]
tspan = (0.0, 10.0)
prob1 = ODEProblem(pendulum, u0, tspan, [1.0, 2.5])
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
Non-trivial mass matrix: false
timespan: (0.0, 10.0)
u0: 2-element Vector{Float64}:
 1.0
 0.1

Solve the model and plot

To understand the model and generate data, let's solve and visualize the solution with the known parameters:

sol = solve(prob1, Tsit5())
plot(sol)
Example block output

It's the pendulum, so you know what it looks like. It's periodic, but since we have not made a small angle assumption, it's not exactly sin or cos. Because the true dampening parameter ω is 1, the solution does not decay over time, nor does it increase. The length L determines the period.

Create some dummy data to use for estimation

We now generate some dummy data to use for estimation

t = collect(range(1, stop = 10, length = 10))
randomized = VectorOfArray([(sol(t[i]) + 0.01randn(2)) for i in 1:length(t)])
data = convert(Array, randomized)
2×10 Matrix{Float64}:
  0.0441402  -0.386183  0.126105   0.0833621  …  -0.00391038   0.0136207
 -1.2156      0.335783  0.305696  -0.238926      -0.0043068   -0.00118562

Let's see what our data looks like on top of the real solution

scatter!(data')
Example block output

This data captures the non-dampening effect and the true period, making it perfect for attempting a Bayesian inference.

Perform Bayesian Estimation

Now let's fit the pendulum to the data. Since we know our model is correct, this should give us back the parameters that we used to generate the data! Define priors on our parameters. In this case, let's assume we don't have much information, but have a prior belief that ω is between 0.1 and 3.0, while the length of the pendulum L is probably around 3.0:

priors = [
    truncated(Normal(0.1, 1.0), lower = 0.0),
    truncated(Normal(3.0, 1.0), lower = 0.0)
]
2-element Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Nothing}}:
 Truncated(Distributions.Normal{Float64}(μ=0.1, σ=1.0); lower=0.0)
 Truncated(Distributions.Normal{Float64}(μ=3.0, σ=1.0); lower=0.0)

Finally, let's run the estimation routine from DiffEqBayes.jl with the Turing.jl backend to check if we indeed recover the parameters!

bayesian_result = turing_inference(prob1, Tsit5(), t, data, priors;
    syms = [:omega, :L], sample_args = (num_samples = 10_000,))
Chains MCMC chain (10000×17×1 Array{Float64, 3}):

Iterations        = 1001:1:11000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 133.99 seconds
Compute duration  = 133.99 seconds
parameters        = omega, L, σ[1]
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

       omega    1.0468    0.1852    0.0028   5186.6148   4506.2308    1.0003   ⋯
           L    2.5252    0.2139    0.0032   4764.7180   4093.9328    1.0015   ⋯
        σ[1]    0.1601    0.0389    0.0006   4769.4884   5075.8078    1.0000   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

       omega    0.7627    0.9231    1.0199    1.1421    1.4904
           L    2.1138    2.3905    2.5210    2.6489    2.9725
        σ[1]    0.1013    0.1328    0.1544    0.1807    0.2532

Notice that while our guesses had the wrong means, the learned parameters converged to the correct means, meaning that it learned good posterior distributions for the parameters. To look at these posterior distributions on the parameters, we can examine the chains:

plot(bayesian_result)
Example block output

As a diagnostic, we will also check the parameter chains. The chain is the MCMC sampling process. The chain should explore parameter space and converge reasonably well, and we should be taking a lot of samples after it converges (it is these samples that form the posterior distribution!)

plot(bayesian_result, colordim = :parameter)
Example block output

Notice that after a while these chains converge to a “fuzzy line”, meaning it found the area with the most likelihood and then starts to sample around there, which builds a posterior distribution around the true mean.

DiffEqBayes.jl allows the choice of using Stan.jl, Turing.jl and DynamicHMC.jl for MCMC, you can also use ApproxBayes.jl for Approximate Bayesian computation algorithms. Let's compare the timings across the different MCMC backends. We'll stick with the default arguments and 10,000 samples in each. However, there is a lot of room for micro-optimization specific to each package and algorithm combinations, you might want to do your own experiments for specific problems to get better understanding of the performance.

@btime bayesian_result = turing_inference(prob1, Tsit5(), t, data, priors;
    syms = [:omega, :L], sample_args = (num_samples = 10_000,))
Chains MCMC chain (10000×17×1 Array{Float64, 3}):

Iterations        = 1001:1:11000
Number of chains  = 1
Samples per chain = 10000
Wall duration     = 63.02 seconds
Compute duration  = 63.02 seconds
parameters        = omega, L, σ[1]
internals         = n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size, logprior, loglikelihood, logjoint

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

       omega    1.0436    0.1810    0.0027   5525.9002   4090.4425    1.0008   ⋯
           L    2.5254    0.2056    0.0028   5361.6854   3947.3259    1.0010   ⋯
        σ[1]    0.1590    0.0372    0.0005   4907.6031   3788.5165    1.0000   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

       omega    0.7536    0.9214    1.0214    1.1410    1.4650
           L    2.1367    2.3954    2.5190    2.6490    2.9542
        σ[1]    0.1019    0.1324    0.1540    0.1801    0.2468
@btime bayesian_result = stan_inference(prob1, :rk45, t, data, priors;
    sample_kwargs = Dict(:num_samples => 10_000), print_summary = false)
Chains MCMC chain (10000×4×1 Array{Float64, 3}):

Iterations        = 1:1:10000
Number of chains  = 1
Samples per chain = 10000
parameters        = sigma1.1, sigma1.2, theta_1, theta_2
internals         = 

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

    sigma1.1    0.2626    0.0775    0.0010   7388.9488   6183.8533    1.0000   ⋯
    sigma1.2    0.2784    0.0840    0.0010   8014.2586   6629.5339    0.9999   ⋯
     theta_1    1.0732    0.2782    0.0033   7806.6908   6352.0649    1.0001   ⋯
     theta_2    2.5856    0.3457    0.0042   7024.5118   5119.8612    1.0001   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

    sigma1.1    0.1514    0.2086    0.2492    0.3017    0.4502
    sigma1.2    0.1585    0.2190    0.2639    0.3228    0.4738
     theta_1    0.6493    0.8814    1.0307    1.2194    1.7454
     theta_2    1.9600    2.3623    2.5627    2.7813    3.3254
@btime bayesian_result = dynamichmc_inference(prob1, Tsit5(), t, data, priors;
    num_samples = 10_000)
(posterior = [(parameters = [0.990630173018456, 2.487515342244008], σ = [0.010385343973385216, 0.01562032332080324]), (parameters = [1.0133212591806906, 2.483345492433588], σ = [0.007204915274783321, 0.014410562323149228]), (parameters = [1.0154833678443507, 2.520833314727842], σ = [0.03217837829438977, 0.016315247332551578]), (parameters = [1.0067569552248725, 2.5065374585781677], σ = [0.028328295183784158, 0.013346183105061056]), (parameters = [1.0044510248033438, 2.5086255130195134], σ = [0.02885335810249998, 0.01318822243953091]), (parameters = [1.007314591056976, 2.493261739045255], σ = [0.007749254708931057, 0.011187153260157233]), (parameters = [1.0087603278589838, 2.504522313740702], σ = [0.009871526022251623, 0.01080849895842283]), (parameters = [1.0076527012474423, 2.5072608556438114], σ = [0.017656886568009723, 0.013201536978216268]), (parameters = [0.9708843362729699, 2.562318512936997], σ = [0.020768475162494537, 0.020294010668365427]), (parameters = [0.9818318790831404, 2.5246472207940323], σ = [0.011674739271341742, 0.022574628053244176])  …  (parameters = [0.9937670745116894, 2.492480311647731], σ = [0.018672961619401598, 0.022901546227841536]), (parameters = [1.0072963224594966, 2.506673524021265], σ = [0.01812829322920758, 0.014055412230158588]), (parameters = [0.9992038891609397, 2.5167230785831287], σ = [0.029507451677890883, 0.015745613903168997]), (parameters = [0.9879209257868881, 2.5067673929131513], σ = [0.015673492250189696, 0.011817108144748584]), (parameters = [1.0079639187914375, 2.505790355666559], σ = [0.01351053431401092, 0.017866174899937474]), (parameters = [1.018222637677098, 2.4826808632730533], σ = [0.01182432489199011, 0.023998500868982565]), (parameters = [0.9916121826283071, 2.5269160452365966], σ = [0.011561283622133562, 0.010957471273861182]), (parameters = [1.054297624347127, 2.4388888091855527], σ = [0.014375116601664989, 0.0229723516318586]), (parameters = [0.9990961552550609, 2.507180506247176], σ = [0.01374043840385554, 0.010023824050391364]), (parameters = [1.0018619136161515, 2.5044258127068253], σ = [0.01448622060714102, 0.00991349822037528])], posterior_matrix = [-0.009413999955659448 0.013233311396819847 … -0.0009042534588950615 0.0018601824035736052; 0.911284357767875 0.9096066399933534 … 0.9191588174805768 0.9180594917784134; -4.567359700066312 -4.932991808824869 … -4.287412085575009 -4.234557384335523; -4.159182435629685 -4.2397938466127805 … -4.6027906143764845 -4.613857993900952], tree_statistics = DynamicHMC.TreeStatisticsNUTS[DynamicHMC.TreeStatisticsNUTS(43.571886001967954, 3, turning at positions -4:3, 0.9652016523049702, 7, DynamicHMC.Directions(0x68c1103b)), DynamicHMC.TreeStatisticsNUTS(40.83412887102971, 3, turning at positions -3:4, 0.9225469702239824, 7, DynamicHMC.Directions(0x27a5eb9c)), DynamicHMC.TreeStatisticsNUTS(38.9500821924763, 3, turning at positions -7:0, 0.989522518028429, 7, DynamicHMC.Directions(0xeca297c0)), DynamicHMC.TreeStatisticsNUTS(37.41989868456175, 2, turning at positions -3:-6, 0.9440909026390207, 7, DynamicHMC.Directions(0xe3cb94a9)), DynamicHMC.TreeStatisticsNUTS(42.14010944081143, 2, turning at positions -2:1, 0.9993314286126546, 3, DynamicHMC.Directions(0xd5d111ed)), DynamicHMC.TreeStatisticsNUTS(41.75303707249556, 3, turning at positions -7:0, 0.9786026249857225, 7, DynamicHMC.Directions(0xef01d7d0)), DynamicHMC.TreeStatisticsNUTS(42.7242697565215, 2, turning at positions -3:0, 0.9352837633318921, 3, DynamicHMC.Directions(0x09466220)), DynamicHMC.TreeStatisticsNUTS(44.95989030923547, 3, turning at positions -6:1, 0.9842257651939718, 7, DynamicHMC.Directions(0xb3de81a9)), DynamicHMC.TreeStatisticsNUTS(37.15301280306391, 3, turning at positions 0:7, 0.8220175166328939, 7, DynamicHMC.Directions(0x25b2064f)), DynamicHMC.TreeStatisticsNUTS(38.520793662657624, 3, turning at positions -3:4, 0.9892609880116536, 7, DynamicHMC.Directions(0x2bb16e34))  …  DynamicHMC.TreeStatisticsNUTS(40.5447614162418, 3, turning at positions 0:7, 0.9999999999999999, 7, DynamicHMC.Directions(0x899fe94f)), DynamicHMC.TreeStatisticsNUTS(41.0695504661172, 3, turning at positions -5:2, 0.9936352695978732, 7, DynamicHMC.Directions(0x02419c12)), DynamicHMC.TreeStatisticsNUTS(40.45123525716805, 2, turning at positions 1:4, 0.9025006795016857, 7, DynamicHMC.Directions(0x89a026c4)), DynamicHMC.TreeStatisticsNUTS(40.1347125358481, 3, turning at positions -2:5, 0.9950765209891533, 7, DynamicHMC.Directions(0x2f2f396d)), DynamicHMC.TreeStatisticsNUTS(41.02325140407287, 3, turning at positions -7:0, 0.7778930414902906, 7, DynamicHMC.Directions(0xcb9a0860)), DynamicHMC.TreeStatisticsNUTS(42.04021451500599, 2, turning at positions 1:4, 0.9425707483984155, 7, DynamicHMC.Directions(0xf4aa358c)), DynamicHMC.TreeStatisticsNUTS(41.79488052415427, 3, turning at positions -7:0, 0.9848671701984658, 7, DynamicHMC.Directions(0xa6548ec0)), DynamicHMC.TreeStatisticsNUTS(38.79607338552382, 3, turning at positions 1:4, 0.722217499480879, 11, DynamicHMC.Directions(0xe6536bc8)), DynamicHMC.TreeStatisticsNUTS(38.50783774089423, 3, turning at positions -5:2, 0.9804831593574975, 7, DynamicHMC.Directions(0x17c0f712)), DynamicHMC.TreeStatisticsNUTS(44.94666681772992, 3, turning at positions -3:4, 0.9490610613028496, 7, DynamicHMC.Directions(0xdbbde5c4))], logdensities = [44.52444940117986, 41.89844020920544, 40.28071574180082, 42.4365204111535, 42.27049780619511, 43.59134145352303, 45.48199583208458, 45.26350514869423, 39.22886942390149, 42.74694186602532  …  42.126167802894365, 44.99112353031899, 41.33381936729745, 44.3715397443187, 44.81392092784077, 42.94673901518387, 44.13882943994933, 39.4425941729494, 45.60116770106705, 45.63981977053267], κ = Gaussian kinetic energy (Diagonal), √diag(M⁻¹): [0.01452414760035942, 0.006488741746964621, 0.2600159507664517, 0.269096290970012], ϵ = 0.47219680107345596)