The DeepSplitting
algorithm
HighDimPDE.DeepSplitting
— TypeDeepSplitting(nn, K=1, opt = ADAM(0.01), λs = nothing, mc_sample = NoSampling())
Deep splitting algorithm.
Arguments
nn
: a Flux.Chain, or more generally a functor.K
: the number of Monte Carlo integrations.opt
: optimiser to be use. By default,Flux.ADAM(0.01)
.λs
: the learning rates, used sequentially. Defaults to a single value taken fromopt
.mc_sample::MCSampling
: sampling method for Monte Carlo integrations of the non local term. Can beUniformSampling(a,b)
,NormalSampling(σ_sampling, shifted)
, orNoSampling
(by default).
Example
hls = d + 50 # hidden layer size
d = 10 # size of the sample
# Neural network used by the scheme
nn = Flux.Chain(Dense(d, hls, tanh),
Dense(hls,hls,tanh),
Dense(hls, 1, x->x^2))
alg = DeepSplitting(nn, K=10, opt = ADAM(), λs = [5e-3,1e-3],
mc_sample = UniformSampling(zeros(d), ones(d)) )
HighDimPDE.solve
— Methodsolve(prob::PIDEProblem,
alg::DeepSplitting,
dt;
batch_size = 1,
abstol = 1f-6,
verbose = false,
maxiters = 300,
use_cuda = false,
cuda_device = nothing,
verbose_rate = 100)
Returns a PIDESolution
object.
Arguments
maxiters
: number of iterations per time step. Can be a tuple, wheremaxiters[1]
is used for the training of the neural network used in the first time step (which can be long) andmaxiters[2]
is used for the rest of the time steps.batch_size
: the batch size.abstol
: threshold for the objective function under which the training is stopped.verbose
: print training information.verbose_rate
: rate for printing training information (everyverbose_rate
iterations).use_cuda
: set totrue
to use CUDA.cuda_device
: integer, to set the CUDA device used in the training, ifuse_cuda == true
.
The DeepSplitting
algorithm reformulates the PDE as a stochastic learning problem.
The algorithm relies on two main ideas:
the approximation of the solution $u$ by a parametric function $\bf u^\theta$. This function is generally chosen as a (Feedforward) Neural Network, as it is a universal approximator.
the training of $\bf u^\theta$ by simulated stochastic trajectories of particles, through the link between linear PDEs and the expected trajectory of associated Stochastic Differential Equations (SDEs), explicitly stated by the Feynman Kac formula.
The general idea 💡
Consider the PDE
\[\partial_t u(t,x) = \mu(t, x) \nabla_x u(t,x) + \frac{1}{2} \sigma^2(t, x) \Delta_x u(t,x) + f(x, u(t,x)) \tag{1}\]
with initial conditions $u(0, x) = g(x)$, where $u \colon \R^d \to \R$.
Local Feynman Kac formula
DeepSplitting
solves the PDE iteratively over small time intervals by using an approximate Feynman-Kac representation locally.
More specifically, considering a small time step $dt = t_{n+1} - t_n$ one has that
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \mathbb{E} \left[ f(t, X_{T - t_{n}}, u(t_{n},X_{T - t_{n}}))(t_{n+1} - t_n) + u(t_{n}, X_{T - t_{n}}) | X_{T - t_{n+1}}\right] \tag{3}.\]
One can therefore use Monte Carlo integrations to approximate the expectations
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \frac{1}{\text{batch\_size}}\sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + (t_{n+1} - t_n)\sum_{k=1}^{K} \big[ f(t_n, X_{T - t_{n}}^{(j)}, u(t_{n},X_{T - t_{n}}^{(j)})) \big] \right]\]
Reformulation as a learning problem
The DeepSplitting
algorithm approximates $u(t_{n+1}, x)$ by a parametric function ${\bf u}^\theta_n(x)$. It is advised to let this function be a neural network ${\bf u}_\theta \equiv NN_\theta$ as they are universal approximators.
For each time step $t_n$, the DeepSplitting
algorithm
Generates the particle trajectories $X^{x, (j)}$ satisfying Eq. (2) over the whole interval $[0,T]$.
Seeks ${\bf u}_{n+1}^{\theta}$ by minimising the loss function
\[L(\theta) = ||{\bf u}^\theta_{n+1}(X_{T - t_n}) - \left[ f(t, X_{T - t_{n-1}}, {\bf u}_{n-1}(X_{T - t_{n-1}}))(t_{n} - t_{n-1}) + {\bf u}_{n-1}(X_{T - t_{n-1}}) \right] ||\]
This way the PDE approximation problem is decomposed into a sequence of separate learning problems. In HighDimPDE.jl
the right parameter combination $\theta$ is found by iteratively minimizing $L$ using stochastic gradient descent.
To solve with DeepSplitting
, one needs to provide to solve
dt
batch_size
maxiters
: the number of iterations for minimising the loss functionabstol
: the absolute tolerance for the loss functionuse_cuda
: if you have a Nvidia GPU, recommended.
Solving point-wise or on a hypercube
Pointwise
DeepSplitting
allows to obtain $u(t,x)$ on a single point $x \in \Omega$ with the keyword $x$.
prob = PIDEProblem(g, f, μ, σ, x, tspan)
Hypercube
Yet more generally, one wants to solve Eq. (1) on a $d$-dimensional cube $[a,b]^d$. This is offered by HighDimPDE.jl
with the keyworkd x0_sample
.
prob = PIDEProblem(g, f, μ, σ, x, tspan, x0_sample = x0_sample)
Internally, this is handled by assigning a random variable as the initial point of the particles, i.e.
\[X_t^\xi = \int_0^t \mu(X_s^x)ds + \int_0^t\sigma(X_s^x)dB_s + \xi,\]
where $\xi$ a random variable uniformly distributed over $[a,b]^d$. This way, the neural network is trained on the whole interval $[a,b]^d$ instead of a single point.
Nonlocal PDEs
DeepSplitting
can solve for non-local reaction diffusion equations of the type
\[\partial_t u = \mu(x) \nabla_x u + \frac{1}{2} \sigma^2(x) \Delta u + \int_{\Omega}f(x,y, u(t,x), u(t,y))dy\]
The non-localness is handled by a Monte Carlo integration.
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + \frac{(t_{n+1} - t_n)}{K}\sum_{k=1}^{K} \big[ f(t, X_{T - t_{n}}^{(j)}, Y_{X_{T - t_{n}}^{(j)}}^{(k)}, u(t_{n},X_{T - t_{n}}^{(j)}), u(t_{n},Y_{X_{T - t_{n}}^{(j)}}^{(k)})) \big] \right]\]
In practice, if you have a non-local model you need to provide the sampling method and the number $K$ of MC integration through the keywords mc_sample
and K
.
alg = DeepSplitting(nn, opt = opt, mc_sample = mc_sample, K = 1)
mc_sample
can be whether UniformSampling(a, b)
or NormalSampling(σ_sampling, shifted)
.
References
- Boussange, V., Becker, S., Jentzen, A., Kuckuck, B., Pellissier, L., Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. arXiv (2022)
- Beck, C., Becker, S., Cheridito, P., Jentzen, A., Neufeld, A., Deep splitting method for parabolic PDEs. arXiv (2019)
- Han, J., Jentzen, A., E, W., Solving high-dimensional partial differential equations using deep learning. arXiv (2018)