Generalized Likelihood Inference

In this example, we will demo the likelihood-based approach to parameter fitting. First, let's generate a dataset to fit. We will re-use the Lotka-Volterra equation, but in this case, fit just two parameters.

using DifferentialEquations, DiffEqParamEstim, Optimization, OptimizationBBO
f1 = function (du, u, p, t)
    du[1] = p[1] * u[1] - p[2] * u[1] * u[2]
    du[2] = -3.0 * u[2] + u[1] * u[2]
end
p = [1.5, 1.0]
u0 = [1.0; 1.0]
tspan = (0.0, 10.0)
prob1 = ODEProblem(f1, u0, tspan, p)
sol = solve(prob1, Tsit5())
retcode: Success
Interpolation: specialized 4th order "free" interpolation
t: 34-element Vector{Float64}:
  0.0
  0.0776084743154256
  0.23264513699277584
  0.4291185174543143
  0.6790821987497083
  0.9444046158046306
  1.2674601546021105
  1.6192913303893046
  1.9869754428624007
  2.2640902393538296
  ⋮
  7.584863345264154
  7.978068981329682
  8.48316543760351
  8.719248247740158
  8.949206788834692
  9.200185054623292
  9.438029017301554
  9.711808134779586
 10.0
u: 34-element Vector{Vector{Float64}}:
 [1.0, 1.0]
 [1.0454942346944578, 0.8576684823217128]
 [1.1758715885138271, 0.6394595703175443]
 [1.419680960717083, 0.4569962601282089]
 [1.8767193950080012, 0.3247334292791134]
 [2.588250064553348, 0.26336255535952197]
 [3.860708909220769, 0.2794458098285261]
 [5.750812667710401, 0.522007253793458]
 [6.8149789991301635, 1.9177826328390826]
 [4.392999292571394, 4.1946707928506015]
 ⋮
 [2.6142539677883248, 0.26416945387526314]
 [4.24107612719179, 0.3051236762922018]
 [6.791123785297775, 1.1345287797146668]
 [6.26537067576476, 2.741693507540315]
 [3.780765111887945, 4.431165685863443]
 [1.816420140681737, 4.064056625315896]
 [1.1465021407690763, 2.791170661621642]
 [0.9557986135403417, 1.623562295185047]
 [1.0337581256020802, 0.9063703842885995]

This is a function with two parameters, [1.5,1.0] which generates the same ODE solution as before. This time, let's generate 100 datasets where at each point adds a little bit of randomness:

using RecursiveArrayTools # for VectorOfArray
t = collect(range(0, stop = 10, length = 200))
function generate_data(sol, t)
    randomized = VectorOfArray([(sol(t[i]) + 0.01randn(2)) for i in 1:length(t)])
    data = convert(Array, randomized)
end
aggregate_data = convert(Array, VectorOfArray([generate_data(sol, t) for i in 1:100]))
2×200×100 Array{Float64, 3}:
[:, :, 1] =
 1.00612   1.03041   1.05593   1.10871   …  0.982801  1.0079    1.01625
 0.989917  0.899193  0.817097  0.746061     1.11306   0.998265  0.909498

[:, :, 2] =
 1.01409  1.0158    1.0589    1.10935   …  0.986381  0.99131  1.01279
 1.01164  0.901104  0.835527  0.752054     1.10925   1.00624  0.885856

[:, :, 3] =
 1.00302  1.03031   1.05286   1.12121   …  0.988349  1.00519  1.04277
 1.00768  0.899918  0.825591  0.740876     1.10143   1.00056  0.894057

;;; … 

[:, :, 98] =
 0.991195  1.03211  1.06583   1.1003    …  1.00434  1.00141  1.0399
 1.0058    0.89621  0.832592  0.747871     1.11426  0.9872   0.914876

[:, :, 99] =
 1.01133  1.02931   1.04764   1.1011    …  0.980993  1.01634  1.05003
 0.9946   0.917774  0.812422  0.725464     1.10117   1.007    0.909396

[:, :, 100] =
 1.00148  1.03285   1.05581   1.10123   …  0.969726  1.00974  1.01674
 1.01403  0.908687  0.799422  0.742787     1.1074    1.00289  0.909057

here, with t we measure the solution at 200 evenly spaced points. Thus, aggregate_data is a 2x200x100 matrix where aggregate_data[i,j,k] is the ith component at time j of the kth dataset. What we first want to do is get a matrix of distributions where distributions[i,j] is the likelihood of component i at take j. We can do this via fit_mle on a chosen distributional form. For simplicity, we choose the Normal distribution. aggregate_data[i,j,:] is the array of points at the given component and time, and thus we find the distribution parameters which fits best at each time point via:

using Distributions
distributions = [fit_mle(Normal, aggregate_data[i, j, :]) for i in 1:2, j in 1:200]
2×200 Matrix{Distributions.Normal{Float64}}:
 Distributions.Normal{Float64}(μ=1.00017, σ=0.00995663)  …  Distributions.Normal{Float64}(μ=1.03334, σ=0.0109348)
 Distributions.Normal{Float64}(μ=1.00003, σ=0.010534)       Distributions.Normal{Float64}(μ=0.906826, σ=0.010336)

Notice for example that we have:

distributions[1, 1]
Distributions.Normal{Float64}(μ=1.0001722581603771, σ=0.00995663362281762)

that is, it fits the distribution to have its mean just about where our original solution was, and the variance is about how much noise we added to the dataset. This is a good check to see that the distributions we are trying to fit our parameters to makes sense.

Note that in this case the Normal distribution was a good choice, and often it's a nice go-to choice, but one should experiment with other choices of distributions as well. For example, a TDist can be an interesting way to incorporate robustness to outliers since low degrees of free T-distributions act like Normal distributions but with longer tails (though fit_mle does not work with a T-distribution, you can get the means/variances and build appropriate distribution objects yourself).

Once we have the matrix of distributions, we can build the objective function corresponding to that distribution fit:

obj = build_loss_objective(prob1, Tsit5(), LogLikeLoss(t, distributions),
                           maxiters = 10000, verbose = false)
(::SciMLBase.OptimizationFunction{true, SciMLBase.NoAD, DiffEqParamEstim.var"#29#30"{Nothing, typeof(DiffEqParamEstim.STANDARD_PROB_GENERATOR), Base.Pairs{Symbol, Integer, Tuple{Symbol, Symbol}, @NamedTuple{maxiters::Int64, verbose::Bool}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, Vector{Float64}, SciMLBase.ODEFunction{true, SciMLBase.AutoSpecialize, Main.var"#1#2", LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, LogLikeLoss{Vector{Float64}, Matrix{Distributions.Normal{Float64}}}, Nothing, Tuple{}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)

First, let's use the objective function to plot the likelihood landscape:

using Plots;
plotly();
prange = 0.5:0.1:5.0
heatmap(prange, prange, [obj([j, i]) for i in prange, j in prange],
        yscale = :log10, xlabel = "Parameter 1", ylabel = "Parameter 2",
        title = "Likelihood Landscape")

2 Parameter Likelihood

Recall that this is the negative log-likelihood, and thus the minimum is the maximum of the likelihood. There is a clear valley where the first parameter is 1.5, while the second parameter's likelihood is more muddled. By taking a one-dimensional slice:

plot(prange, [obj([1.5, i]) for i in prange], lw = 3,
     title = "Parameter 2 Likelihood (Parameter 1 = 1.5)",
     xlabel = "Parameter 2", ylabel = "Objective Function Value")

1 Parameter Likelihood

we can see that there's still a clear minimum at the true value. Thus, we will use the global optimizers from BlackBoxOptim.jl to find the values. We set our search range to be from 0.5 to 5.0 for both of the parameters and let it optimize:

bound1 = Tuple{Float64, Float64}[(0.5, 5), (0.5, 5)]
optprob = OptimizationProblem(obj, [2.0, 2.0], lb = first.(bound1), ub = last.(bound1))
OptimizationProblem. In-place: true
u0: 2-element Vector{Float64}:
 2.0
 2.0

This shows that it found the true parameters as the best fit to the likelihood.