The `DeepSplitting` algorithm

Problems Supported:

HighDimPDE.DeepSplitting — Type

DeepSplitting(nn, K=1, opt = Flux.Optimise.Adam(0.01), λs = nothing, mc_sample =  NoSampling())

Deep splitting algorithm.

Arguments

nn: a Flux.Chain, or more generally a functor.
K: the number of Monte Carlo integrations.
opt: optimizer to be used. By default, Flux.Optimise.Adam(0.01).
λs: the learning rates, used sequentially. Defaults to a single value taken from opt.
mc_sample::MCSampling : sampling method for Monte Carlo integrations of the non-local term. Can be UniformSampling(a,b), NormalSampling(σ_sampling, shifted), or NoSampling (by default).

Example

hls = d + 50 # hidden layer size
d = 10 # size of the sample

# Neural network used by the scheme
nn = Flux.Chain(Dense(d, hls, tanh),
                Dense(hls,hls,tanh),
                Dense(hls, 1, x->x^2))

alg = DeepSplitting(nn, K=10, opt = Flux.Optimise.Adam(), λs = [5e-3,1e-3],
                    mc_sample = UniformSampling(zeros(d), ones(d)) )

source

CommonSolve.solve — Method

solve(
    prob::Union{PIDEProblem, ParabolicPDEProblem},
    alg::DeepSplitting,
    dt;
    batch_size,
    abstol,
    verbose,
    maxiters,
    use_cuda,
    cuda_device,
    verbose_rate
) -> PIDESolution{_A, _B, _C, Vector{_A1}, Vector{Any}, Nothing} where {_A, _B, _C, _A1}

Returns a PIDESolution object.

Arguments

maxiters: number of iterations per time step. Can be a tuple, where maxiters[1] is used for the training of the neural network used in the first time step (which can be long) and maxiters[2] is used for the rest of the time steps.
batch_size : the batch size.
abstol : threshold for the objective function under which the training is stopped.
verbose : print training information.
verbose_rate : rate for printing training information (every verbose_rate iterations).
use_cuda : set to true to use CUDA.
cuda_device : integer, to set the CUDA device used in the training, if use_cuda == true.

source

The DeepSplitting algorithm reformulates the PDE as a stochastic learning problem.

The algorithm relies on two main ideas:

The approximation of the solution $u$ by a parametric function $\bf u^\theta$. This function is generally chosen as a (Feedforward) Neural Network, as it is a universal approximator.
The training of $\bf u^\theta$ by simulated stochastic trajectories of particles, through the link between linear PDEs and the expected trajectory of associated Stochastic Differential Equations (SDEs), explicitly stated by the Feynman Kac formula.

The general idea 💡

Consider the PDE

\[\partial_t u(t,x) = \mu(t, x) \nabla_x u(t,x) + \frac{1}{2} \sigma^2(t, x) \Delta_x u(t,x) + f(x, u(t,x)) \tag{1}\]

with initial conditions $u(0, x) = g(x)$, where $u \colon \R^d \to \R$.

Local Feynman Kac formula

DeepSplitting solves the PDE iteratively over small time intervals by using an approximate Feynman-Kac representation locally.

More specifically, considering a small time step $dt = t_{n+1} - t_n$ one has that

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \mathbb{E} \left[ f(t, X_{T - t_{n}}, u(t_{n},X_{T - t_{n}}))(t_{n+1} - t_n) + u(t_{n}, X_{T - t_{n}}) | X_{T - t_{n+1}}\right] \tag{3}.\]

One can therefore use Monte Carlo integrations to approximate the expectations

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \frac{1}{\text{batch\_size}}\sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + (t_{n+1} - t_n)\sum_{k=1}^{K} \big[ f(t_n, X_{T - t_{n}}^{(j)}, u(t_{n},X_{T - t_{n}}^{(j)})) \big] \right]\]

Reformulation as a learning problem

The DeepSplitting algorithm approximates $u(t_{n+1}, x)$ by a parametric function ${\bf u}^\theta_n(x)$. It is advised to let this function be a neural network ${\bf u}_\theta \equiv NN_\theta$ as they are universal approximators.

For each time step $t_n$, the DeepSplitting algorithm

Generates the particle trajectories $X^{x, (j)}$ satisfying Eq. (2) over the whole interval $[0,T]$.
Seeks ${\bf u}_{n+1}^{\theta}$ by minimizing the loss function

\[L(\theta) = ||{\bf u}^\theta_{n+1}(X_{T - t_n}) - \left[ f(t, X_{T - t_{n-1}}, {\bf u}_{n-1}(X_{T - t_{n-1}}))(t_{n} - t_{n-1}) + {\bf u}_{n-1}(X_{T - t_{n-1}}) \right] ||\]

This way, the PDE approximation problem is decomposed into a sequence of separate learning problems. In HighDimPDE.jl the right parameter combination $\theta$ is found by iteratively minimizing $L$ using stochastic gradient descent.

Tip

To solve with DeepSplitting, one needs to provide to solve

dt
batch_size
maxiters: the number of iterations for minimizing the loss function
abstol: the absolute tolerance for the loss function
use_cuda: if you have a Nvidia GPU, recommended.

Solving point-wise or on a hypercube

Pointwise

DeepSplitting allows obtaining $u(t,x)$ on a single point $x \in \Omega$ with the keyword $x$.

prob = PIDEProblem(μ, σ, x, tspan, g, f)

Hypercube

Yet more generally, one wants to solve Eq. (1) on a $d$-dimensional cube $[a,b]^d$. This is offered by HighDimPDE.jl with the keyword x0_sample.

prob = PIDEProblem(μ, σ, x, tspan, g, f; x0_sample = x0_sample)

Internally, this is handled by assigning a random variable as the initial point of the particles, i.e.

\[X_t^\xi = \int_0^t \mu(X_s^x)ds + \int_0^t\sigma(X_s^x)dB_s + \xi,\]

where $\xi$ a random variable uniformly distributed over $[a,b]^d$. This way, the neural network is trained on the whole interval $[a,b]^d$ instead of a single point.

Non-local PDEs

DeepSplitting can solve for non-local reaction diffusion equations of the type

\[\partial_t u = \mu(x) \nabla_x u + \frac{1}{2} \sigma^2(x) \Delta u + \int_{\Omega}f(x,y, u(t,x), u(t,y))dy\]

The non-localness is handled by a Monte Carlo integration.

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + \frac{(t_{n+1} - t_n)}{K}\sum_{k=1}^{K} \big[ f(t, X_{T - t_{n}}^{(j)}, Y_{X_{T - t_{n}}^{(j)}}^{(k)}, u(t_{n},X_{T - t_{n}}^{(j)}), u(t_{n},Y_{X_{T - t_{n}}^{(j)}}^{(k)})) \big] \right]\]

Tip

In practice, if you have a non-local model, you need to provide the sampling method and the number $K$ of MC integration through the keywords mc_sample and K. julia alg = DeepSplitting(nn, opt = opt, mc_sample = mc_sample, K = 1)mc_sample can be whether UniformSampling(a, b) or NormalSampling(σ_sampling, shifted).

References

Boussange, V., Becker, S., Jentzen, A., Kuckuck, B., Pellissier, L., Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. arXiv (2022)
Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A., Deep splitting method for parabolic PDEs. arXiv (2019)