The DeepSplitting algorithm

HighDimPDE.DeepSplittingType
DeepSplitting(nn, K=1, opt = ADAM(0.01), λs = nothing, mc_sample =  NoSampling())

Deep splitting algorithm.

Arguments

  • nn: a Flux.Chain, or more generally a functor.
  • K: the number of Monte Carlo integrations.
  • opt: optimiser to be use. By default, Flux.ADAM(0.01).
  • λs: the learning rates, used sequentially. Defaults to a single value taken from opt.
  • mc_sample::MCSampling : sampling method for Monte Carlo integrations of the non local term. Can be UniformSampling(a,b), NormalSampling(σ_sampling, shifted), or NoSampling (by default).

Example

hls = d + 50 # hidden layer size
d = 10 # size of the sample

# Neural network used by the scheme
nn = Flux.Chain(Dense(d, hls, tanh),
                Dense(hls,hls,tanh),
                Dense(hls, 1, x->x^2))

alg = DeepSplitting(nn, K=10, opt = ADAM(), λs = [5e-3,1e-3],
                    mc_sample = UniformSampling(zeros(d), ones(d)) )
HighDimPDE.solveMethod
solve(prob::PIDEProblem,
    alg::DeepSplitting,
    dt;
    batch_size = 1,
    abstol = 1f-6,
    verbose = false,
    maxiters = 300,
    use_cuda = false,
    cuda_device = nothing,
    verbose_rate = 100)

Returns a PIDESolution object.

Arguments

  • maxiters: number of iterations per time step. Can be a tuple, where maxiters[1] is used for the training of the neural network used in the first time step (which can be long) and maxiters[2] is used for the rest of the time steps.
  • batch_size : the batch size.
  • abstol : threshold for the objective function under which the training is stopped.
  • verbose : print training information.
  • verbose_rate : rate for printing training information (every verbose_rate iterations).
  • use_cuda : set to true to use CUDA.
  • cuda_device : integer, to set the CUDA device used in the training, if use_cuda == true.

The DeepSplitting algorithm reformulates the PDE as a stochastic learning problem.

The algorithm relies on two main ideas:

  • the approximation of the solution $u$ by a parametric function $\bf u^\theta$. This function is generally chosen as a (Feedforward) Neural Network, as it is a universal approximator.

  • the training of $\bf u^\theta$ by simulated stochastic trajectories of particles, through the link between linear PDEs and the expected trajectory of associated Stochastic Differential Equations (SDEs), explicitly stated by the Feynman Kac formula.

The general idea 💡

Consider the PDE

\[\partial_t u(t,x) = \mu(t, x) \nabla_x u(t,x) + \frac{1}{2} \sigma^2(t, x) \Delta_x u(t,x) + f(x, u(t,x)) \tag{1}\]

with initial conditions $u(0, x) = g(x)$, where $u \colon \R^d \to \R$.

Local Feynman Kac formula

DeepSplitting solves the PDE iteratively over small time intervals by using an approximate Feynman-Kac representation locally.

More specifically, considering a small time step $dt = t_{n+1} - t_n$ one has that

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \mathbb{E} \left[ f(t, X_{T - t_{n}}, u(t_{n},X_{T - t_{n}}))(t_{n+1} - t_n) + u(t_{n}, X_{T - t_{n}}) | X_{T - t_{n+1}}\right] \tag{3}.\]

One can therefore use Monte Carlo integrations to approximate the expectations

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \frac{1}{\text{batch\_size}}\sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + (t_{n+1} - t_n)\sum_{k=1}^{K} \big[ f(t_n, X_{T - t_{n}}^{(j)}, u(t_{n},X_{T - t_{n}}^{(j)})) \big] \right]\]

Reformulation as a learning problem

The DeepSplitting algorithm approximates $u(t_{n+1}, x)$ by a parametric function ${\bf u}^\theta_n(x)$. It is advised to let this function be a neural network ${\bf u}_\theta \equiv NN_\theta$ as they are universal approximators.

For each time step $t_n$, the DeepSplitting algorithm

  1. Generates the particle trajectories $X^{x, (j)}$ satisfying Eq. (2) over the whole interval $[0,T]$.

  2. Seeks ${\bf u}_{n+1}^{\theta}$ by minimising the loss function

\[L(\theta) = ||{\bf u}^\theta_{n+1}(X_{T - t_n}) - \left[ f(t, X_{T - t_{n-1}}, {\bf u}_{n-1}(X_{T - t_{n-1}}))(t_{n} - t_{n-1}) + {\bf u}_{n-1}(X_{T - t_{n-1}}) \right] ||\]

This way the PDE approximation problem is decomposed into a sequence of separate learning problems. In HighDimPDE.jl the right parameter combination $\theta$ is found by iteratively minimizing $L$ using stochastic gradient descent.

Tip

To solve with DeepSplitting, one needs to provide to solve

  • dt
  • batch_size
  • maxiters: the number of iterations for minimising the loss function
  • abstol: the absolute tolerance for the loss function
  • use_cuda: if you have a Nvidia GPU, recommended.

Solving point-wise or on a hypercube

Pointwise

DeepSplitting allows to obtain $u(t,x)$ on a single point $x \in \Omega$ with the keyword $x$.

prob = PIDEProblem(g, f, μ, σ, x, tspan)

Hypercube

Yet more generally, one wants to solve Eq. (1) on a $d$-dimensional cube $[a,b]^d$. This is offered by HighDimPDE.jl with the keyworkd x0_sample.

prob = PIDEProblem(g, f, μ, σ, x, tspan, x0_sample = x0_sample)

Internally, this is handled by assigning a random variable as the initial point of the particles, i.e.

\[X_t^\xi = \int_0^t \mu(X_s^x)ds + \int_0^t\sigma(X_s^x)dB_s + \xi,\]

where $\xi$ a random variable uniformly distributed over $[a,b]^d$. This way, the neural network is trained on the whole interval $[a,b]^d$ instead of a single point.

Nonlocal PDEs

DeepSplitting can solve for non-local reaction diffusion equations of the type

\[\partial_t u = \mu(x) \nabla_x u + \frac{1}{2} \sigma^2(x) \Delta u + \int_{\Omega}f(x,y, u(t,x), u(t,y))dy\]

The non-localness is handled by a Monte Carlo integration.

\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + \frac{(t_{n+1} - t_n)}{K}\sum_{k=1}^{K} \big[ f(t, X_{T - t_{n}}^{(j)}, Y_{X_{T - t_{n}}^{(j)}}^{(k)}, u(t_{n},X_{T - t_{n}}^{(j)}), u(t_{n},Y_{X_{T - t_{n}}^{(j)}}^{(k)})) \big] \right]\]

Tip

In practice, if you have a non-local model you need to provide the sampling method and the number $K$ of MC integration through the keywords mc_sample and K.

alg = DeepSplitting(nn, opt = opt, mc_sample = mc_sample, K = 1)

mc_sample can be whether UniformSampling(a, b) or NormalSampling(σ_sampling, shifted).

References

  • Boussange, V., Becker, S., Jentzen, A., Kuckuck, B., Pellissier, L., Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. arXiv (2022)
  • Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A., Deep splitting method for parabolic PDEs. arXiv (2019)
  • Han, J., Jentzen, A., E, W., Solving high-dimensional partial differential equations using deep learning. arXiv (2018)