# The DeepSplitting algorithm

HighDimPDE.DeepSplittingType
DeepSplitting(nn, K=1, opt = ADAM(0.01), λs = nothing, mc_sample =  NoSampling())

Deep splitting algorithm.

Arguments

• nn: a Flux.Chain, or more generally a functor.
• K: the number of Monte Carlo integrations.
• opt: optimiser to be use. By default, Flux.ADAM(0.01).
• λs: the learning rates, used sequentially. Defaults to a single value taken from opt.
• mc_sample::MCSampling : sampling method for Monte Carlo integrations of the non local term. Can be UniformSampling(a,b), NormalSampling(σ_sampling, shifted), or NoSampling (by default).

Example

hls = d + 50 # hidden layer size
d = 10 # size of the sample

# Neural network used by the scheme
nn = Flux.Chain(Dense(d, hls, tanh),
Dense(hls,hls,tanh),
Dense(hls, 1, x->x^2))

alg = DeepSplitting(nn, K=10, opt = ADAM(), λs = [5e-3,1e-3],
mc_sample = UniformSampling(zeros(d), ones(d)) )
HighDimPDE.solveMethod
solve(prob::PIDEProblem,
alg::DeepSplitting,
dt;
batch_size = 1,
abstol = 1f-6,
verbose = false,
maxiters = 300,
use_cuda = false,
cuda_device = nothing,
verbose_rate = 100)

Returns a PIDESolution object.

Arguments

• maxiters: number of iterations per time step. Can be a tuple, where maxiters[1] is used for the training of the neural network used in the first time step (which can be long) and maxiters[2] is used for the rest of the time steps.
• batch_size : the batch size.
• abstol : threshold for the objective function under which the training is stopped.
• verbose : print training information.
• verbose_rate : rate for printing training information (every verbose_rate iterations).
• use_cuda : set to true to use CUDA.
• cuda_device : integer, to set the CUDA device used in the training, if use_cuda == true.

The DeepSplitting algorithm reformulates the PDE as a stochastic learning problem.

The algorithm relies on two main ideas:

• the approximation of the solution $u$ by a parametric function $\bf u^\theta$. This function is generally chosen as a (Feedforward) Neural Network, as it is a universal approximator.

• the training of $\bf u^\theta$ by simulated stochastic trajectories of particles, through the link between linear PDEs and the expected trajectory of associated Stochastic Differential Equations (SDEs), explicitly stated by the Feynman Kac formula.

## The general idea 💡

Consider the PDE

$$$\partial_t u(t,x) = \mu(t, x) \nabla_x u(t,x) + \frac{1}{2} \sigma^2(t, x) \Delta_x u(t,x) + f(x, u(t,x)) \tag{1}$$$

with initial conditions $u(0, x) = g(x)$, where $u \colon \R^d \to \R$.

### Local Feynman Kac formula

DeepSplitting solves the PDE iteratively over small time intervals by using an approximate Feynman-Kac representation locally.

More specifically, considering a small time step $dt = t_{n+1} - t_n$ one has that

$$$u(t_{n+1}, X_{T - t_{n+1}}) \approx \mathbb{E} \left[ f(t, X_{T - t_{n}}, u(t_{n},X_{T - t_{n}}))(t_{n+1} - t_n) + u(t_{n}, X_{T - t_{n}}) | X_{T - t_{n+1}}\right] \tag{3}.$$$

One can therefore use Monte Carlo integrations to approximate the expectations

$$$u(t_{n+1}, X_{T - t_{n+1}}) \approx \frac{1}{\text{batch\_size}}\sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + (t_{n+1} - t_n)\sum_{k=1}^{K} \big[ f(t_n, X_{T - t_{n}}^{(j)}, u(t_{n},X_{T - t_{n}}^{(j)})) \big] \right]$$$

### Reformulation as a learning problem

The DeepSplitting algorithm approximates $u(t_{n+1}, x)$ by a parametric function ${\bf u}^\theta_n(x)$. It is advised to let this function be a neural network ${\bf u}_\theta \equiv NN_\theta$ as they are universal approximators.

For each time step $t_n$, the DeepSplitting algorithm

1. Generates the particle trajectories $X^{x, (j)}$ satisfying Eq. (2) over the whole interval $[0,T]$.

2. Seeks ${\bf u}_{n+1}^{\theta}$ by minimising the loss function

$$$L(\theta) = ||{\bf u}^\theta_{n+1}(X_{T - t_n}) - \left[ f(t, X_{T - t_{n-1}}, {\bf u}_{n-1}(X_{T - t_{n-1}}))(t_{n} - t_{n-1}) + {\bf u}_{n-1}(X_{T - t_{n-1}}) \right] ||$$$

This way the PDE approximation problem is decomposed into a sequence of separate learning problems. In HighDimPDE.jl the right parameter combination $\theta$ is found by iteratively minimizing $L$ using stochastic gradient descent.

Tip

To solve with DeepSplitting, one needs to provide to solve

• dt
• batch_size
• maxiters: the number of iterations for minimising the loss function
• abstol: the absolute tolerance for the loss function
• use_cuda: if you have a Nvidia GPU, recommended.

## Solving point-wise or on a hypercube

### Pointwise

DeepSplitting allows to obtain $u(t,x)$ on a single point $x \in \Omega$ with the keyword $x$.

prob = PIDEProblem(g, f, μ, σ, x, tspan)

### Hypercube

Yet more generally, one wants to solve Eq. (1) on a $d$-dimensional cube $[a,b]^d$. This is offered by HighDimPDE.jl with the keyworkd x0_sample.

prob = PIDEProblem(g, f, μ, σ, x, tspan, x0_sample = x0_sample)

Internally, this is handled by assigning a random variable as the initial point of the particles, i.e.

$$$X_t^\xi = \int_0^t \mu(X_s^x)ds + \int_0^t\sigma(X_s^x)dB_s + \xi,$$$

where $\xi$ a random variable uniformly distributed over $[a,b]^d$. This way, the neural network is trained on the whole interval $[a,b]^d$ instead of a single point.

## Nonlocal PDEs

DeepSplitting can solve for non-local reaction diffusion equations of the type

$$$\partial_t u = \mu(x) \nabla_x u + \frac{1}{2} \sigma^2(x) \Delta u + \int_{\Omega}f(x,y, u(t,x), u(t,y))dy$$$

The non-localness is handled by a Monte Carlo integration.

$$$u(t_{n+1}, X_{T - t_{n+1}}) \approx \sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + \frac{(t_{n+1} - t_n)}{K}\sum_{k=1}^{K} \big[ f(t, X_{T - t_{n}}^{(j)}, Y_{X_{T - t_{n}}^{(j)}}^{(k)}, u(t_{n},X_{T - t_{n}}^{(j)}), u(t_{n},Y_{X_{T - t_{n}}^{(j)}}^{(k)})) \big] \right]$$$
Tip

In practice, if you have a non-local model you need to provide the sampling method and the number $K$ of MC integration through the keywords mc_sample and K.

alg = DeepSplitting(nn, opt = opt, mc_sample = mc_sample, K = 1)

mc_sample can be whether UniformSampling(a, b) or NormalSampling(σ_sampling, shifted).

## References

• Boussange, V., Becker, S., Jentzen, A., Kuckuck, B., Pellissier, L., Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. arXiv (2022)
• Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A., Deep splitting method for parabolic PDEs. arXiv (2019)
• Han, J., Jentzen, A., E, W., Solving high-dimensional partial differential equations using deep learning. arXiv (2018)