Sparse Identification with noisy data
Many real-world data sources are corrupted with measurement noise, which can have a big impact on the recovery of the underlying equations of motion. This example shows how we can use a collocation method and batching to perform SINDy in the presence of noise.
using DataDrivenDiffEq
using LinearAlgebra
using OrdinaryDiffEq
using DataDrivenSparse
using StableRNGs
using Plots
gr()
rng = StableRNG(1337)
function pendulum(u, p, t)
x = u[2]
y = -9.81sin(u[1]) - 0.3u[2]^3 - 3.0 * cos(u[1]) - 10.0 * exp(-((t - 5.0) / 5.0)^2)
return [x; y]
end
u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);
We add random noise to our measurements.
X = sol[:, :] + 0.2 .* randn(rng, size(sol));
ts = sol.t;
plot(ts, X', color = :red)
plot!(sol, color = :black)
To estimate the system, we first create a DataDrivenProblem
, which requires measurement data. Using a collocation method, it automatically provides the derivative and smoothes the trajectory. Control signals can be passed in as a function (u,p,t)->control
or an array of measurements.
prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel(),
U = (u, p, t) -> [exp(-((t - 5.0) / 5.0)^2)],
p = ones(2))
plot(prob, size = (600,600))
Now we infer the system structure. First we define a Basis
which collects all possible candidate terms. Since we want to use SINDy, we call solve
with an sparsifying algorithm
, in this case STLSQ
which iterates different sparsity thresholds and returns a Pareto optimal solution. Note that we include the control signal in the basis as an additional variable c
.
@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)
h = Num[sin.(w[1] .* u[1]); cos.(w[2] .* u[1]); polynomial_basis(u, 5); c]
basis = Basis(h, u, parameters = w, controls = c);
Model ##Basis#332 with 24 equations
States : u[1] u[2]
Parameters : w[1] w[2]
Independent variable: t
Equations
φ₁ = sin(u[1]*w[1])
φ₂ = cos(u[1]*w[2])
φ₃ = 1
φ₄ = u[1]
...
φ₂₄ = c[1]
To solve the problem, we also define a DataProcessing
which defines randomly shuffled minibatches of our data and selects the best fit.
sampler = DataProcessing(split = 0.8, shuffle = true, batchsize = 30, rng = rng)
λs = exp10.(-10:0.1:0)
opt = STLSQ(λs)
res = solve(prob, basis, opt,
options = DataDrivenCommonOptions(data_processing = sampler, digits = 1))
"DataDrivenSolution{Float64}"
A more detailed description of the result can be printed via print(res, Val{true})
, which also includes the discovered equations and parameter values.
Where the resulting DataDrivenSolution
stores information about the inferred model and the parameters:
system = get_basis(res)
params = get_parameter_map(system)
Model ##Basis#335 with 2 equations
States : u[1] u[2]
Parameters : 19
Independent variable: t
Equations
Differential(t)(u[1]) = p₃*u[2]
Differential(t)(u[2]) = p₆ + c[1]*p₁₉ + p₁₀*u[2] + p₄*sin(u[1]*w[1]) + p₅*cos(u[1]*w[2]) + p₇*u[1] + p₁₁*u[1]*u[2] + p₁₄*(u[2]^2) + p₈*(u[1]^2) + p₁₂*(u[1]^2)*u[2] + p₁₅*u[1]*(u[2]^2) + p₁₇*(u[2]^3) + p₉*(u[1]^3) + p₁₃*(u[1]^3)*u[2] + p₁₆*(u[1]^2)*(u[2]^2) + p₁₈*u[1]*(u[2]^3)
Pair{SymbolicUtils.BasicSymbolic{Real}, Float64}[w[1] => 1.0, w[2] => 1.0, p₃ => 0.9, p₄ => -8.6, p₅ => -5.8, p₆ => 9.2, p₇ => 7.6, p₈ => 1.8, p₉ => 0.1, p₁₀ => 2.8, p₁₁ => 5.3, p₁₂ => 2.8, p₁₃ => 0.5, p₁₄ => 2.7, p₁₅ => 2.5, p₁₆ => 0.7, p₁₇ => 0.6, p₁₈ => 0.5, p₁₉ => -10.3]
We can see that even if there are other terms active, the most important terms are included inside the model.
And a visual check of the result can be performed by plotting the result
plot(
plot(prob), plot(res), layout = (1,2)
)
Copy-Pasteable Code
using DataDrivenDiffEq
using LinearAlgebra
using OrdinaryDiffEq
using DataDrivenSparse
using StableRNGs
rng = StableRNG(1337)
function pendulum(u, p, t)
x = u[2]
y = -9.81sin(u[1]) - 0.3u[2]^3 - 3.0 * cos(u[1]) - 10.0 * exp(-((t - 5.0) / 5.0)^2)
return [x; y]
end
u0 = [0.99π; -1.0]
tspan = (0.0, 15.0)
prob = ODEProblem(pendulum, u0, tspan)
sol = solve(prob, Tsit5(), saveat = 0.01);
X = sol[:, :] + 0.2 .* randn(rng, size(sol));
ts = sol.t;
prob = ContinuousDataDrivenProblem(X, ts, GaussianKernel(),
U = (u, p, t) -> [exp(-((t - 5.0) / 5.0)^2)],
p = ones(2))
@variables u[1:2] c[1:1]
@parameters w[1:2]
u = collect(u)
c = collect(c)
w = collect(w)
h = Num[sin.(w[1] .* u[1]); cos.(w[2] .* u[1]); polynomial_basis(u, 5); c]
basis = Basis(h, u, parameters = w, controls = c);
println(basis) # hide
sampler = DataProcessing(split = 0.8, shuffle = true, batchsize = 30, rng = rng)
λs = exp10.(-10:0.1:0)
opt = STLSQ(λs)
res = solve(prob, basis, opt,
options = DataDrivenCommonOptions(data_processing = sampler, digits = 1))
system = get_basis(res)
params = get_parameter_map(system)
println(system) # hide
println(params) # hide
# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl
This page was generated using Literate.jl.