Symbolic Regression of a Linear Time Continuous Systems

Note

Symbolic regression is using regularized evolution, simulated annealing, and gradient-free optimization to find suitable equations. Hence, the performance might differ and depends strongly on the hyperparameters of the optimization. This example might not recover the groundtruth, but is showing off the use within DataDrivenDiffEq.jl.

DataDrivenDiffEq offers an interface to SymbolicRegression.jl to infer more complex functions.

using DataDrivenDiffEq
using LinearAlgebra
using OrdinaryDiffEq
using DataDrivenSR
using Plots

A = [-0.9 0.2; 0.0 -0.5]
B = [0.0; 1.0]
u0 = [10.0; -10.0]
tspan = (0.0, 20.0)

f(u, p, t) = A * u .+ B .* sin(0.5 * t)

sys = ODEProblem(f, u0, tspan)
sol = solve(sys, Tsit5(), saveat = 0.01);

We will use the data provided by our problem, but add the control signal U = sin(0.5*t) to it. Instead of using a function, like in another example

X = Array(sol)
t = sol.t
U = permutedims(sin.(0.5 * t))
prob = ContinuousDataDrivenProblem(X, t, U = U)
Continuous DataDrivenProblem{Float64} ##DDProblem#393 in 2 dimensions and 2001 samples with controls

And plot the problems data.

plot(prob)
Example block output

To solve our problem, we will use EQSearch, which provides a wrapper for the symbolic regression interface. We will stick to simple operations, use a L1DistLoss, and limit the verbosity of the algorithm.

eqsearch_options = SymbolicRegression.Options(binary_operators = [+, *],
    loss = L1DistLoss(),
    verbosity = -1, progress = false, npop = 30,
    timeout_in_seconds = 60.0)

alg = EQSearch(eq_options = eqsearch_options)
EQSearch
  weights: Nothing nothing
  numprocs: Nothing nothing
  procs: Nothing nothing
  addprocs_function: Nothing nothing
  parallelism: Symbol serial
  runtests: Bool true
  eq_options: Options{SymbolicRegression.CoreModule.OptionsStructModule.ComplexityMapping{Int64, Int64}, OperatorEnum, Node, Expression, @NamedTuple{}, MutationWeights, false, false, nothing, Nothing, 5}

Again, we solve the problem to obtain a DataDrivenSolution. Note that any additional keyword arguments are passed onto symbolic regressions EquationSearch with the exception of niterations which is maxiters

res = solve(prob, alg, options = DataDrivenCommonOptions(maxiters = 100))
println(res)
"DataDrivenSolution{Float64}" with 2 equations and 0 parameters.
Returncode: Success
Residual sum of squares: 263685.7920194242

We can inspect the systems metrics, here the loglikelihood of the result.

loglikelihood(res)
-8380.115916799206
Note

Currently the parameters of the result found by EQSearch are not turned into symbolic parameters. This affects some functions like dof, aicc, bic.

system = get_basis(res)
Model ##Basis#400 with 2 equations
States : x₁ x₂
Independent variable: t
Equations
Differential(t)(x₁) = -u₁ - x₁ + x₂
Differential(t)(x₂) = 1.0 + u₁ + x₁ - x₂ + x₁*(-1.0 + x₂)

Copy-Pasteable Code

using DataDrivenDiffEq
using LinearAlgebra
using OrdinaryDiffEq
using DataDrivenSR

A = [-0.9 0.2; 0.0 -0.5]
B = [0.0; 1.0]
u0 = [10.0; -10.0]
tspan = (0.0, 20.0)

f(u, p, t) = A * u .+ B .* sin(0.5 * t)

sys = ODEProblem(f, u0, tspan)
sol = solve(sys, Tsit5(), saveat = 0.01);

X = Array(sol)
t = sol.t
U = permutedims(sin.(0.5 * t))
prob = ContinuousDataDrivenProblem(X, t, U = U)

eqsearch_options = SymbolicRegression.Options(binary_operators = [+, *],
    loss = L1DistLoss(),
    verbosity = -1, progress = false, npop = 30,
    timeout_in_seconds = 60.0)

alg = EQSearch(eq_options = eqsearch_options)

res = solve(prob, alg, options = DataDrivenCommonOptions(maxiters = 100))

loglikelihood(res)

system = get_basis(res)

# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.