Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations
One of the most time-consuming parts of modeling is building the model. How do you know when your model is correct? When you solve an inverse problem to calibrate your model to data, who you gonna call if there are no parameters that make the model the data? This is the problem that the Universal Differential Equation (UDE) approach solves: the ability to start from the model you have, and suggest minimal mechanistic extensions that would allow the model to fit the data. In this showcase, we will show how to take a partially correct model and auto-complete it to find the missing physics.
For a scientific background on the universal differential equation approach, check out Universal Differential Equations for Scientific Machine Learning
Starting Point: The Packages To Use
There are many packages which are used as part of this showcase. Let's detail what they are and how they are used. For the neural network training:
Module | Description |
---|---|
OrdinaryDiffEq.jl (DifferentialEquations.jl) | The numerical differential equation solvers |
SciMLSensitivity.jl | The adjoint methods, defines gradients of ODE solvers |
Optimization.jl | The optimization library |
OptimizationOptimisers.jl | The optimization solver package with Adam |
OptimizationOptimJL.jl | The optimization solver package with LBFGS |
LineSearches.jl | Line search algorithms package to be used with LBFGS |
For the symbolic model discovery:
Module | Description |
---|---|
ModelingToolkit.jl | The symbolic modeling environment |
DataDrivenDiffEq.jl | The symbolic regression interface |
DataDrivenSparse.jl | The sparse regression symbolic regression solvers |
Zygote.jl | The automatic differentiation library for fast gradients |
Julia standard libraries:
Module | Description |
---|---|
LinearAlgebra | Required for the norm function |
Statistics | Required for the mean function |
And external libraries:
Module | Description |
---|---|
Lux.jl | The deep learning (neural network) framework |
ComponentArrays.jl | For the ComponentArray type to match Lux to SciML |
Plots.jl | The plotting and visualization library |
StableRNGs.jl | Stable random seeding |
The deep learning framework Flux.jl could be used in place of Lux, though most tutorials in SciML generally prefer Lux.jl due to its explicit parameter interface, leading to nicer code. Both share the same internal implementations of core kernels, and thus have very similar feature support and performance.
# SciML Tools
using OrdinaryDiffEq, ModelingToolkit, DataDrivenDiffEq, SciMLSensitivity, DataDrivenSparse
using Optimization, OptimizationOptimisers, OptimizationOptimJL, LineSearches
# Standard Libraries
using LinearAlgebra, Statistics
# External Libraries
using ComponentArrays, Lux, Zygote, Plots, StableRNGs
gr()
# Set a random seed for reproducible behaviour
rng = StableRNG(1111)
StableRNGs.LehmerRNG(state=0x000000000000000000000000000008af)
Problem Setup
In order to know that we have automatically discovered the correct model, we will use generated data from a known model. This model will be the Lotka-Volterra equations. These equations are given by:
\[\begin{aligned} \frac{dx}{dt} &= \alpha x - \beta x y \\ \frac{dy}{dt} &= -\delta y + \gamma x y \\ \end{aligned}\]
This is a model of rabbits and wolves. $\alpha x$ is the exponential growth of rabbits in isolation, $-\beta x y$ and $\gamma x y$ are the interaction effects of wolves eating rabbits, and $-\delta y$ is the term for how wolves die hungry in isolation.
Now assume that we have never seen rabbits and wolves in the same room. We only know the two effects $\alpha x$ and $-\delta y$. Can we use Scientific Machine Learning to automatically discover an extension to what we already know? That is what we will solve with the universal differential equation.
Generating the Training Data
First, let's generate training data from the Lotka-Volterra equations. This is straightforward and standard DifferentialEquations.jl usage. Our sample data is thus generated as follows:
function lotka!(du, u, p, t)
α, β, γ, δ = p
du[1] = α * u[1] - β * u[2] * u[1]
du[2] = γ * u[1] * u[2] - δ * u[2]
end
# Define the experimental parameter
tspan = (0.0, 5.0)
u0 = 5.0f0 * rand(rng, 2)
p_ = [1.3, 0.9, 0.8, 1.8]
prob = ODEProblem(lotka!, u0, tspan, p_)
solution = solve(prob, Vern7(), abstol = 1e-12, reltol = 1e-12, saveat = 0.25)
# Add noise in terms of the mean
X = Array(solution)
t = solution.t
x̄ = mean(X, dims = 2)
noise_magnitude = 5e-3
Xₙ = X .+ (noise_magnitude * x̄) .* randn(rng, eltype(X), size(X))
plot(solution, alpha = 0.75, color = :black, label = ["True Data" nothing])
scatter!(t, transpose(Xₙ), color = :red, label = ["Noisy Data" nothing])
Definition of the Universal Differential Equation
Now let's define our UDE. We will use Lux.jl to define the neural network as follows:
rbf(x) = exp.(-(x .^ 2))
# Multilayer FeedForward
const U = Lux.Chain(Lux.Dense(2, 5, rbf), Lux.Dense(5, 5, rbf), Lux.Dense(5, 5, rbf),
Lux.Dense(5, 2))
# Get the initial parameters and state variables of the model
p, st = Lux.setup(rng, U)
const _st = st
(layer_1 = NamedTuple(), layer_2 = NamedTuple(), layer_3 = NamedTuple(), layer_4 = NamedTuple())
We then define the UDE as a dynamical system that is u' = known(u) + NN(u)
like:
# Define the hybrid model
function ude_dynamics!(du, u, p, t, p_true)
û = U(u, p, _st)[1] # Network prediction
du[1] = p_true[1] * u[1] + û[1]
du[2] = -p_true[4] * u[2] + û[2]
end
# Closure with the known parameter
nn_dynamics!(du, u, p, t) = ude_dynamics!(du, u, p, t, p_)
# Define the problem
prob_nn = ODEProblem(nn_dynamics!, Xₙ[:, 1], tspan, p)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 5.0)
u0: 2-element Vector{Float64}:
3.1463924566781167
1.5423300037202512
Notice that the most important part of this is that the neural network does not have hard-coded weights. The weights of the neural network are the parameters of the ODE system. This means that if we change the parameters of the ODE system, then we will have updated the internal neural networks to new weights. Keep that in mind for the next part.
... and tada: now we have a neural network integrated into our dynamical system!
Even if the known physics is only approximate or correct, it can be helpful to improve the fitting process! Check out this JuliaCon talk which dives into this issue.
Setting Up the Training Loop
Now let's build a training loop around our UDE. First, let's make a function predict
which runs our simulation at new neural network weights. Recall that the weights of the neural network are the parameters of the ODE, so what we need to do in predict
is update our ODE to our new parameters and then run it.
For this update step, we will use the remake
function from the SciMLProblem interface. remake
works by specifying key = value
pairs to update in the problem fields. Thus to update u0
, we would add a keyword argument u0 = ...
. To update the parameters, we'd do p = ...
. The field names can be acquired from the problem documentation (or the docstrings!).
Knowing this, our predict
function looks like:
function predict(θ, X = Xₙ[:, 1], T = t)
_prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
Array(solve(_prob, Vern7(), saveat = T,
abstol = 1e-6, reltol = 1e-6,
sensealg = QuadratureAdjoint(autojacvec = ReverseDiffVJP(true))))
end
predict (generic function with 3 methods)
There are many choices for the combination of sensitivity algorithm and automatic differentiation library (see Choosing a Sensitivity Algorithm. For example, you could have used sensealg=ForwardDiffSensitivity()
.
Now, for our loss function, we solve the ODE at our new parameters and check its L2 loss against the dataset. Using our predict
function, this looks like:
function loss(θ)
X̂ = predict(θ)
mean(abs2, Xₙ .- X̂)
end
loss (generic function with 1 method)
Lastly, what we will need to track our optimization is to define a callback as defined by the OptimizationProblem's solve interface. Because our function only returns one value, the loss l
, the callback will be a function of the current parameters θ
and l
. Let's setup a callback prints every 50 steps the current loss:
losses = Float64[]
callback = function (state, l)
push!(losses, l)
if length(losses) % 50 == 0
println("Current loss after $(length(losses)) iterations: $(losses[end])")
end
return false
end
#1 (generic function with 1 method)
Training
Now we're ready to train! To run the training process, we will need to build an OptimizationProblem
. Because we have a lot of parameters, we will use Zygote.jl. Optimization.jl makes the choice of automatic differentiation easy, just by specifying an adtype
in the OptimizationFunction
construction
Knowing this, we can build our OptimizationProblem
as follows:
adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x, p) -> loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p))
OptimizationProblem. In-place: true
u0: ComponentVector{Float64}(layer_1 = (weight = [0.6538559794425964 0.7530555129051208; 0.5314245820045471 -1.146309733390808; … ; 0.6230413317680359 -0.9949618577957153; -0.267433762550354 -0.42296043038368225], bias = [0.6228184700012207, -0.36390432715415955, 0.5564076900482178, -0.6573317050933838, 0.0013557798229157925]), layer_2 = (weight = [-0.6952739953994751 -0.17967917025089264 … 0.24313239753246307 0.001123766996897757; -0.17176459729671478 -0.6643869280815125 … 0.18968744575977325 0.04383226856589317; … ; -0.6787672638893127 -0.4131874442100525 … 0.1658746600151062 -0.5179171562194824; -0.10373302549123764 -0.49185648560523987 … -0.04393347352743149 -0.21508421003818512], bias = [0.0037434629630297422, -0.18621833622455597, -0.20786269009113312, -0.3489077389240265, 0.14812849462032318]), layer_3 = (weight = [-0.5676549077033997 0.2754976153373718 … 0.5977734923362732 -0.04514680802822113; 0.763871967792511 0.0947229415178299 … -0.4744568467140198 -0.6932665705680847; … ; -0.030543562024831772 -0.18625907599925995 … -0.33154672384262085 0.035615935921669006; 0.5577947497367859 0.5849692225456238 … 0.20916323363780975 0.7067035436630249], bias = [-0.25027230381965637, 0.0903530940413475, 0.41296181082725525, 0.19502975046634674, -0.20660826563835144]), layer_4 = (weight = [0.22668680548667908 0.28912559151649475 … -0.49543461203575134 -0.13401341438293457; 0.6387283205986023 0.011346815153956413 … 0.20840883255004883 -0.3841484487056732], bias = [-0.30624446272850037, 0.07177023589611053]))
Now... we optimize it. We will use a mixed strategy. First, let's do some iterations of ADAM because it's better at finding a good general area of parameter space, but then we will move to BFGS which will quickly hone in on a local minimum. Note that if we only use ADAM it will take a ton of iterations, and if we only use BFGS we normally end up in a bad local minimum, so this combination tends to be a good one for UDEs.
Thus we first solve the optimization problem with ADAM. Choosing a learning rate of 0.1 (tuned to be as high as possible that doesn't tend to make the loss shoot up), we see:
res1 = Optimization.solve(
optprob, OptimizationOptimisers.Adam(), callback = callback, maxiters = 5000)
println("Training loss after $(length(losses)) iterations: $(losses[end])")
Current loss after 50 iterations: 124231.94202136253
Current loss after 100 iterations: 99525.79353133567
Current loss after 150 iterations: 73986.99608008424
Current loss after 200 iterations: 57253.859029620355
Current loss after 250 iterations: 46433.64120845539
Current loss after 300 iterations: 37891.22625079194
Current loss after 350 iterations: 30743.91701109269
Current loss after 400 iterations: 24605.353250294065
Current loss after 450 iterations: 19422.922187963344
Current loss after 500 iterations: 14986.695033950775
Current loss after 550 iterations: 11212.358093286573
Current loss after 600 iterations: 8098.753689897016
Current loss after 650 iterations: 5665.969091146896
Current loss after 700 iterations: 3872.1479110656296
Current loss after 750 iterations: 2601.3619778459697
Current loss after 800 iterations: 1721.9534653174674
Current loss after 850 iterations: 1122.8664627497449
Current loss after 900 iterations: 720.5492324164438
Current loss after 950 iterations: 454.518553999165
Current loss after 1000 iterations: 281.5842691759737
Current loss after 1050 iterations: 171.22540322985955
Current loss after 1100 iterations: 102.16454733706459
Current loss after 1150 iterations: 59.821881107961254
Current loss after 1200 iterations: 34.40378939080521
Current loss after 1250 iterations: 19.473611195203993
Current loss after 1300 iterations: 10.897067767711581
Current loss after 1350 iterations: 6.08124851467881
Current loss after 1400 iterations: 3.4392056779854028
Current loss after 1450 iterations: 2.023719791547296
Current loss after 1500 iterations: 1.2836012121429308
Current loss after 1550 iterations: 0.9062388134519439
Current loss after 1600 iterations: 0.7193557654830671
Current loss after 1650 iterations: 0.6285326618746371
Current loss after 1700 iterations: 0.585388530513248
Current loss after 1750 iterations: 0.5654290916117396
Current loss after 1800 iterations: 0.5564062582684649
Current loss after 1850 iterations: 0.5523966312765395
Current loss after 1900 iterations: 0.5506224146831474
Current loss after 1950 iterations: 0.5498169406636374
Current loss after 2000 iterations: 0.5494185547547528
Current loss after 2050 iterations: 0.5491853262932695
Current loss after 2100 iterations: 0.549016141814157
Current loss after 2150 iterations: 0.5488700478639679
Current loss after 2200 iterations: 0.5487309916486452
Current loss after 2250 iterations: 0.5485928947732501
Current loss after 2300 iterations: 0.5484535417637989
Current loss after 2350 iterations: 0.5483121625689491
Current loss after 2400 iterations: 0.5481685097183534
Current loss after 2450 iterations: 0.5480225187571808
Current loss after 2500 iterations: 0.5478741879745075
Current loss after 2550 iterations: 0.5477235374773672
Current loss after 2600 iterations: 0.5475705958616723
Current loss after 2650 iterations: 0.5474153960973511
Current loss after 2700 iterations: 0.5472579743618035
Current loss after 2750 iterations: 0.5470983697747643
Current loss after 2800 iterations: 0.5469366243947303
Current loss after 2850 iterations: 0.5467727832857205
Current loss after 2900 iterations: 0.5466068945998176
Current loss after 2950 iterations: 0.5464390096605017
Current loss after 3000 iterations: 0.5462691830426868
Current loss after 3050 iterations: 0.5460974726481311
Current loss after 3100 iterations: 0.5459239397755652
Current loss after 3150 iterations: 0.5457486491849941
Current loss after 3200 iterations: 0.5455716691556194
Current loss after 3250 iterations: 0.5453930715368189
Current loss after 3300 iterations: 0.5452129317915849
Current loss after 3350 iterations: 0.5450313290317826
Current loss after 3400 iterations: 0.5448483460446452
Current loss after 3450 iterations: 0.5446640693098068
Current loss after 3500 iterations: 0.5444785890062415
Current loss after 3550 iterations: 0.5442919990084443
Current loss after 3600 iterations: 0.544104396871173
Current loss after 3650 iterations: 0.5439158838020817
Current loss after 3700 iterations: 0.5437265646215974
Current loss after 3750 iterations: 0.5435365477093422
Current loss after 3800 iterations: 0.5433459449365368
Current loss after 3850 iterations: 0.543154871583682
Current loss after 3900 iterations: 0.5429634462429742
Current loss after 3950 iterations: 0.5427717907048729
Current loss after 4000 iterations: 0.5425800298282992
Current loss after 4050 iterations: 0.5423882913940021
Current loss after 4100 iterations: 0.5421967059406464
Current loss after 4150 iterations: 0.5420054065832866
Current loss after 4200 iterations: 0.5418145288138929
Current loss after 4250 iterations: 0.5416242102837753
Current loss after 4300 iterations: 0.5414345905677361
Current loss after 4350 iterations: 0.5412458109099295
Current loss after 4400 iterations: 0.5410580139514983
Current loss after 4450 iterations: 0.5408713434401783
Current loss after 4500 iterations: 0.5406859439221674
Current loss after 4550 iterations: 0.5405019604166919
Current loss after 4600 iterations: 0.5403195380738394
Current loss after 4650 iterations: 0.5401388218163727
Current loss after 4700 iterations: 0.5399599559535952
Current loss after 4750 iterations: 0.5397830838232627
Current loss after 4800 iterations: 0.5396083473815712
Current loss after 4850 iterations: 0.5394358867730494
Current loss after 4900 iterations: 0.5392658399196554
Current loss after 4950 iterations: 0.5390983420899249
Current loss after 5000 iterations: 0.5389335254617472
Training loss after 5000 iterations: 0.5389335254617472
Now we use the optimization result of the first run as the initial condition of the second optimization, and run it with BFGS. This looks like:
optprob2 = Optimization.OptimizationProblem(optf, res1.u)
res2 = Optimization.solve(
optprob2, LBFGS(linesearch = BackTracking()), callback = callback, maxiters = 1000)
println("Final training loss after $(length(losses)) iterations: $(losses[end])")
# Rename the best candidate
p_trained = res2.u
ComponentVector{Float64}(layer_1 = (weight = [0.752996988814218 0.8526379531687615; -0.2455948135447293 -1.4596518084355918; … ; -0.1216670775525581 -1.3829542399229324; -0.38231804919136736 -0.6882420033199484], bias = [0.7227089150130206, -0.7682288482839728, 0.5493962708599257, -0.7990058157648816, -0.1503414738798957]), layer_2 = (weight = [-0.5477410676980777 -0.01411941929938429 … 0.5153173824885828 0.2605940031008346; -0.21305474163197738 -0.7782851185719629 … 0.0845709352130633 -0.16055632773667028; … ; -0.6770546089203405 -0.2336714204956424 … 0.32515789505921944 -0.4067819432940765; -0.06795818285252915 -0.37552402841378507 … 0.06149754496677458 -0.35776963051909383], bias = [0.19580551055839335, -0.29527107645431777, 0.10765215566800727, -0.31200566341952785, 0.10561201008712083]), layer_3 = (weight = [-1.2402103638006894 -0.22936131561378903 … 1.1116816021542375 0.3478050112860267; 0.6214857535121526 -0.04530689034780612 … -0.6714088557831388 -0.8812183732846564; … ; -0.952107875087353 -0.8699981183871959 … 0.15862959431460563 0.40273421755539524; 1.0918842275825973 0.8028557933754301 … -0.5957483357340678 0.00596373341790026], bias = [0.2315486125523051, -0.09705921795292043, 0.8258066553810954, 0.655973043694172, -1.011919503137277]), layer_4 = (weight = [-1.2384010169639845 0.067316475710089 … -1.3191049149670129 -2.121929749783032; 1.7324887033489678 0.15913032931224355 … 1.1181207444646641 0.9084184365771087], bias = [-0.8472159164890658, 0.6753278327990444]))
and bingo, we have a trained UDE.
Visualizing the Trained UDE
How well did our neural network do? Let's take a look:
# Plot the losses
pl_losses = plot(1:5000, losses[1:5000], yaxis = :log10, xaxis = :log10,
xlabel = "Iterations", ylabel = "Loss", label = "ADAM", color = :blue)
plot!(5001:length(losses), losses[5001:end], yaxis = :log10, xaxis = :log10,
xlabel = "Iterations", ylabel = "Loss", label = "LBFGS", color = :red)
Next, we compare the original data to the output of the UDE predictor. Note that we can even create more samples from the underlying model by simply adjusting the time steps!
## Analysis of the trained network
# Plot the data and the approximation
ts = first(solution.t):(mean(diff(solution.t)) / 2):last(solution.t)
X̂ = predict(p_trained, Xₙ[:, 1], ts)
# Trained on noisy data vs real solution
pl_trajectory = plot(ts, transpose(X̂), xlabel = "t", ylabel = "x(t), y(t)", color = :red,
label = ["UDE Approximation" nothing])
scatter!(solution.t, transpose(Xₙ), color = :black, label = ["Measurements" nothing])
Let's see how well the unknown term has been approximated:
# Ideal unknown interactions of the predictor
Ȳ = [-p_[2] * (X̂[1, :] .* X̂[2, :])'; p_[3] * (X̂[1, :] .* X̂[2, :])']
# Neural network guess
Ŷ = U(X̂, p_trained, st)[1]
pl_reconstruction = plot(ts, transpose(Ŷ), xlabel = "t", ylabel = "U(x,y)", color = :red,
label = ["UDE Approximation" nothing])
plot!(ts, transpose(Ȳ), color = :black, label = ["True Interaction" nothing])
And have a nice look at all the information:
# Plot the error
pl_reconstruction_error = plot(ts, norm.(eachcol(Ȳ - Ŷ)), yaxis = :log, xlabel = "t",
ylabel = "L2-Error", label = nothing, color = :red)
pl_missing = plot(pl_reconstruction, pl_reconstruction_error, layout = (2, 1))
pl_overall = plot(pl_trajectory, pl_missing)
That looks pretty good. And if we are happy with deep learning, we can leave it at that: we have trained a neural network to capture our missing dynamics.
But...
Can we also make it print out the LaTeX for what the missing equations were? Find out more after the break!
Symbolic regression via sparse regression (SINDy based)
This part of the showcase is still a work in progress... shame on us. But be back in a jiffy and we'll have it done.
Okay, that was a quick break, and that's good because this next part is pretty cool. Let's use DataDrivenDiffEq.jl to transform our trained neural network from machine learning mumbo jumbo into predictions of missing mechanistic equations. To do this, we first generate a symbolic basis that represents the space of mechanistic functions we believe this neural network should map to. Let's choose a bunch of polynomial functions:
@variables u[1:2]
b = polynomial_basis(u, 4)
basis = Basis(b, u);
\[ \begin{align} \mathtt{\varphi{_1}} &= 1 \\ \mathtt{\varphi{_2}} &= u_{1} \\ \mathtt{\varphi{_3}} &= u_{1}^{2} \\ \mathtt{\varphi{_4}} &= u_{1}^{3} \\ \mathtt{\varphi{_5}} &= u_{1}^{4} \\ \mathtt{\varphi{_6}} &= u_{2} \\ \mathtt{\varphi{_7}} &= u_{1} u_{2} \\ \mathtt{\varphi{_8}} &= u_{1}^{2} u_{2} \\ \mathtt{\varphi{_9}} &= u_{1}^{3} u_{2} \\ \mathtt{\varphi{_{1 0}}} &= u_{2}^{2} \\ \mathtt{\varphi{_{1 1}}} &= u_{2}^{2} u_{1} \\ \mathtt{\varphi{_{1 2}}} &= u_{2}^{2} u_{1}^{2} \\ \mathtt{\varphi{_{1 3}}} &= u_{2}^{3} \\ \mathtt{\varphi{_{1 4}}} &= u_{2}^{3} u_{1} \\ \mathtt{\varphi{_{1 5}}} &= u_{2}^{4} \end{align} \]
Now let's define our DataDrivenProblem
s for the sparse regressions. To assess the capability of the sparse regression, we will look at 3 cases:
- What if we trained no neural network and tried to automatically uncover the equations from the original noisy data? This is the approach in the literature known as structural identification of dynamical systems (SINDy). We will call this the full problem. This will assess whether this incorporation of prior information was helpful.
- What if we trained the neural network using the ideal right-hand side missing derivative functions? This is the value computed in the plots above as
Ȳ
. This will tell us whether the symbolic discovery could work in ideal situations. - Do the symbolic regression directly on the function
y = NN(x)
, i.e. the trained learned neural network. This is what we really want, and will tell us how to extend our known equations.
To define the full problem, we need to define a DataDrivenProblem
that has the time series of the solution X
, the time points of the solution t
, and the derivative at each time point of the solution, obtained by the ODE solution's interpolation. We can just use an interpolation to get the derivative:
full_problem = ContinuousDataDrivenProblem(Xₙ, t)
Continuous DataDrivenProblem{Float64} ##DDProblem#3393 in 2 dimensions and 21 samples
Now for the other two symbolic regressions, we are regressing input/outputs of the missing terms, and thus we directly define the datasets as the input/output mappings like:
ideal_problem = DirectDataDrivenProblem(X̂, Ȳ)
nn_problem = DirectDataDrivenProblem(X̂, Ŷ)
Direct DataDrivenProblem{Float64} ##DDProblem#3395 in 2 dimensions and 41 samples
Let's solve the data-driven problems using sparse regression. We will use the ADMM
method, which requires we define a set of shrinking cutoff values λ
, and we do this like:
λ = 1e-1
opt = ADMM(λ)
DataDrivenSparse.ADMM{Float64, Float64}(0.1, 1.0)
This is one of many methods for sparse regression, consult the DataDrivenDiffEq.jl documentation for more information on the algorithm choices. Taking this, let's solve each of the sparse regressions:
options = DataDrivenCommonOptions(maxiters = 10_000,
normalize = DataNormalization(ZScoreTransform),
selector = bic, digits = 1,
data_processing = DataProcessing(split = 0.9,
batchsize = 30,
shuffle = true,
rng = StableRNG(1111)))
full_res = solve(full_problem, basis, opt, options = options)
full_eqs = get_basis(full_res)
println(full_res)
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 19
└ @ MLUtils ~/.cache/julia-buildkite-plugin/depots/0183cc98-c3b4-4959-aaaa-6c0d5f351407/packages/MLUtils/LmmaQ/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 19
└ @ MLUtils ~/.cache/julia-buildkite-plugin/depots/0183cc98-c3b4-4959-aaaa-6c0d5f351407/packages/MLUtils/LmmaQ/src/batchview.jl:95
"DataDrivenSolution{Float64}" with 2 equations and 4 parameters.
Returncode: Success
Residual sum of squares: 84.99147506548839
options = DataDrivenCommonOptions(maxiters = 10_000,
normalize = DataNormalization(ZScoreTransform),
selector = bic, digits = 1,
data_processing = DataProcessing(split = 0.9,
batchsize = 30,
shuffle = true,
rng = StableRNG(1111)))
ideal_res = solve(ideal_problem, basis, opt, options = options)
ideal_eqs = get_basis(ideal_res)
println(ideal_res)
"DataDrivenSolution{Float64}" with 2 equations and 2 parameters.
Returncode: Success
Residual sum of squares: 12.2533642593345
options = DataDrivenCommonOptions(maxiters = 10_000,
normalize = DataNormalization(ZScoreTransform),
selector = bic, digits = 1,
data_processing = DataProcessing(split = 0.9,
batchsize = 30,
shuffle = true,
rng = StableRNG(1111)))
nn_res = solve(nn_problem, basis, opt, options = options)
nn_eqs = get_basis(nn_res)
println(nn_res)
"DataDrivenSolution{Float64}" with 2 equations and 2 parameters.
Returncode: Success
Residual sum of squares: 12.559447636106114
Note that we passed the identical options into each of the solve
calls to get the same data for each call.
We already saw that the full problem has failed to identify the correct equations of motion. To have a closer look, we can inspect the corresponding equations:
for eqs in (full_eqs, ideal_eqs, nn_eqs)
println(eqs)
println(get_parameter_map(eqs))
println()
end
Model ##Basis#3396 with 2 equations
States : u[1] u[2]
Parameters : p₁ p₂ p₃ p₄
Independent variable: t
Equations
Differential(t)(u[1]) = p₁*u[1] + p₂*u[2] + p₃*(u[2]^2)
Differential(t)(u[2]) = p₄*u[2]
Pair{SymbolicUtils.BasicSymbolic{Real}, Float64}[p₁ => 0.8, p₂ => -0.2, p₃ => -0.4, p₄ => -0.9]
Model ##Basis#3400 with 2 equations
States : u[1] u[2]
Parameters : p₁ p₂
Independent variable: t
Equations
φ₁ = p₁*u[1]*u[2]
φ₂ = p₂*u[1]*u[2]
Pair{SymbolicUtils.BasicSymbolic{Real}, Float64}[p₁ => -0.8, p₂ => 0.7]
Model ##Basis#3404 with 2 equations
States : u[1] u[2]
Parameters : p₁ p₂
Independent variable: t
Equations
φ₁ = p₁*u[1]*u[2]
φ₂ = p₂*u[1]*u[2]
Pair{SymbolicUtils.BasicSymbolic{Real}, Float64}[p₁ => -0.8, p₂ => 0.7]
Next, we want to predict with our model. To do so, we embed the basis into a function like before:
# Define the recovered, hybrid model
function recovered_dynamics!(du, u, p, t)
û = nn_eqs(u, p) # Recovered equations
du[1] = p_[1] * u[1] + û[1]
du[2] = -p_[4] * u[2] + û[2]
end
estimation_prob = ODEProblem(recovered_dynamics!, u0, tspan, get_parameter_values(nn_eqs))
estimate = solve(estimation_prob, Tsit5(), saveat = solution.t)
# Plot
plot(solution)
plot!(estimate)
We are still a bit off, so we fine tune the parameters by simply minimizing the residuals between the UDE predictor and our recovered parametrized equations:
function parameter_loss(p)
Y = reduce(hcat, map(Base.Fix2(nn_eqs, p), eachcol(X̂)))
sum(abs2, Ŷ .- Y)
end
optf = Optimization.OptimizationFunction((x, p) -> parameter_loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, get_parameter_values(nn_eqs))
parameter_res = Optimization.solve(optprob, Optim.LBFGS(), maxiters = 1000)
retcode: Success
u: 2-element Vector{Float64}:
-0.9007795900806872
0.8000884585760631
Simulation
# Look at long term prediction
t_long = (0.0, 50.0)
estimation_prob = ODEProblem(recovered_dynamics!, u0, t_long, parameter_res)
estimate_long = solve(estimation_prob, Tsit5(), saveat = 0.1) # Using higher tolerances here results in exit of julia
plot(estimate_long)
true_prob = ODEProblem(lotka!, u0, t_long, p_)
true_solution_long = solve(true_prob, Tsit5(), saveat = estimate_long.t)
plot!(true_solution_long)
Post Processing and Plots
c1 = 3 # RGBA(174/255,192/255,201/255,1) # Maroon
c2 = :orange # RGBA(132/255,159/255,173/255,1) # Red
c3 = :blue # RGBA(255/255,90/255,0,1) # Orange
c4 = :purple # RGBA(153/255,50/255,204/255,1) # Purple
p1 = plot(t, abs.(Array(solution) .- estimate)' .+ eps(Float32),
lw = 3, yaxis = :log, title = "Timeseries of UODE Error",
color = [3 :orange], xlabel = "t",
label = ["x(t)" "y(t)"],
titlefont = "Helvetica", legendfont = "Helvetica",
legend = :topright)
# Plot L₂
p2 = plot3d(X̂[1, :], X̂[2, :], Ŷ[2, :], lw = 3,
title = "Neural Network Fit of U2(t)", color = c1,
label = "Neural Network", xaxis = "x", yaxis = "y",
titlefont = "Helvetica", legendfont = "Helvetica",
legend = :bottomright)
plot!(X̂[1, :], X̂[2, :], Ȳ[2, :], lw = 3, label = "True Missing Term", color = c2)
p3 = scatter(solution, color = [c1 c2], label = ["x data" "y data"],
title = "Extrapolated Fit From Short Training Data",
titlefont = "Helvetica", legendfont = "Helvetica",
markersize = 5)
plot!(p3, true_solution_long, color = [c1 c2], linestyle = :dot, lw = 5,
label = ["True x(t)" "True y(t)"])
plot!(p3, estimate_long, color = [c3 c4], lw = 1,
label = ["Estimated x(t)" "Estimated y(t)"])
plot!(p3, [2.99, 3.01], [0.0, 10.0], lw = 1, color = :black, label = nothing)
annotate!([(1.5, 13, text("Training \nData", 10, :center, :top, :black, "Helvetica"))])
l = @layout [grid(1, 2)
grid(1, 1)]
plot(p1, p2, p3, layout = l)