Training Neural Stochastic Differential Equations with GPU acceleration (I)
In the previous models we had to define a model. Now let's shift the burden of model-proofing onto data by utilizing neural differential equations. A neural differential equation is a differential equation where the model equations are replaced, either in full or in part, by a neural network. For example, a neural ordinary differential equation is an equation $u^\prime = f(u,p,t)$ where $f$ is a neural network. We can learn this neural network from data using various methods, the easiest of which is known as the single shooting method, where one chooses neural network parameters, solves the equation, and checks the ODE's solution against data as a loss.
In this example we will define and train various forms of neural differential equations. Note that all of the differential equation types are compatible with neural differential equations, so this is only going to scratch the surface of the possibilities!
Part 1: Constructing and Training a Basic Neural ODE
Use the DiffEqFlux.jl README to construct a neural ODE to train against the training data:
u0 = Float32[2.; 0.]
datasize = 30
tspan = (0.0f0,1.5f0)
function trueODEfunc(du,u,p,t)
true_A = [-0.1 2.0; -2.0 -0.1]
du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
prob = ODEProblem(trueODEfunc,u0,tspan)
ode_data = Array(solve(prob,Tsit5(),saveat=t))
Part 2: GPU-accelerating the Neural ODE Process
Use the gpu
function from Flux.jl to transform all of the calculations onto the GPU and train the neural ODE using GPU-accelerated Tsit5
with adjoints.
Part 3: Defining and Training a Mixed Neural ODE
Gather data from the Lotka-Volterra equation:
function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = α*x - β*x*y
du[2] = dy = -δ*y + γ*x*y
end
u0 = [1.0,1.0]
tspan = (0.0,10.0)
p = [1.5,1.0,3.0,1.0]
prob = ODEProblem(lotka_volterra,u0,tspan,p)
sol = Array(solve(prob,Tsit5())(0.0:1.0:10.0))
Now use the mixed neural section of the documentation to define the mixed neural ODE where the functional form of $\frac{dx}{dt}$ is known, and try to derive a neural formulation for $\frac{dy}{dt}$ directly from the data.
Part 4: Constructing a Basic Neural SDE
Generate data from the Lotka-Volterra equation with multiplicative noise
function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = α*x - β*x*y
du[2] = dy = -δ*y + γ*x*y
end
function lv_noise(du,u,p,t)
du[1] = p[5]*u[1]
du[2] = p[6]*u[2]
end
u0 = [1.0,1.0]
tspan = (0.0,10.0)
p = [1.5,1.0,3.0,1.0,0.1,0.1]
prob = SDEProblem(lotka_volterra,lv_noise,u0,tspan,p)
sol = [Array(solve(prob,SOSRI())(0.0:1.0:10.0)) for i in 1:20] # 20 solution samples
Train a neural stochastic differential equation $dX = f(X)dt + g(X)dW_t$ where both the drift ($f$) and the diffusion ($g$) functions are neural networks. See if constraining $g$ can make the problem easier to fit.
Part 5: Optimizing the training behavior with minibatching (E)
Use minibatching on the data to improve the training procedure. An example can be found at this PR.