The DeepSplitting algorithm
Problems Supported:
HighDimPDE.DeepSplitting — TypeDeepSplitting(nn, K=1, opt = Flux.Optimise.Adam(0.01), λs = nothing, mc_sample = NoSampling())Deep splitting algorithm.
Arguments
nn: a Flux.Chain, or more generally a functor.K: the number of Monte Carlo integrations.opt: optimizer to be used. By default,Flux.Optimise.Adam(0.01).λs: the learning rates, used sequentially. Defaults to a single value taken fromopt.mc_sample::MCSampling: sampling method for Monte Carlo integrations of the non-local term. Can beUniformSampling(a,b),NormalSampling(σ_sampling, shifted), orNoSampling(by default).
Example
hls = d + 50 # hidden layer size
d = 10 # size of the sample
# Neural network used by the scheme
nn = Flux.Chain(Dense(d, hls, tanh),
Dense(hls,hls,tanh),
Dense(hls, 1, x->x^2))
alg = DeepSplitting(nn, K=10, opt = Flux.Optimise.Adam(), λs = [5e-3,1e-3],
mc_sample = UniformSampling(zeros(d), ones(d)) )CommonSolve.solve — Methodsolve(
prob::Union{PIDEProblem, ParabolicPDEProblem},
alg::DeepSplitting,
dt;
batch_size,
abstol,
verbose,
maxiters,
use_cuda,
cuda_device,
verbose_rate
) -> PIDESolution{_A, _B, _C, Vector{_A1}, Vector{Any}, Nothing} where {_A, _B, _C, _A1}
Returns a PIDESolution object.
Arguments
maxiters: number of iterations per time step. Can be a tuple, wheremaxiters[1]is used for the training of the neural network used in the first time step (which can be long) andmaxiters[2]is used for the rest of the time steps.batch_size: the batch size.abstol: threshold for the objective function under which the training is stopped.verbose: print training information.verbose_rate: rate for printing training information (everyverbose_rateiterations).use_cuda: set totrueto use CUDA.cuda_device: integer, to set the CUDA device used in the training, ifuse_cuda == true.
The DeepSplitting algorithm reformulates the PDE as a stochastic learning problem.
The algorithm relies on two main ideas:
The approximation of the solution $u$ by a parametric function $\bf u^\theta$. This function is generally chosen as a (Feedforward) Neural Network, as it is a universal approximator.
The training of $\bf u^\theta$ by simulated stochastic trajectories of particles, through the link between linear PDEs and the expected trajectory of associated Stochastic Differential Equations (SDEs), explicitly stated by the Feynman Kac formula.
The general idea 💡
Consider the PDE
\[\partial_t u(t,x) = \mu(t, x) \nabla_x u(t,x) + \frac{1}{2} \sigma^2(t, x) \Delta_x u(t,x) + f(x, u(t,x)) \tag{1}\]
with initial conditions $u(0, x) = g(x)$, where $u \colon \R^d \to \R$.
Local Feynman Kac formula
DeepSplitting solves the PDE iteratively over small time intervals by using an approximate Feynman-Kac representation locally.
More specifically, considering a small time step $dt = t_{n+1} - t_n$ one has that
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \mathbb{E} \left[ f(t, X_{T - t_{n}}, u(t_{n},X_{T - t_{n}}))(t_{n+1} - t_n) + u(t_{n}, X_{T - t_{n}}) | X_{T - t_{n+1}}\right] \tag{3}.\]
One can therefore use Monte Carlo integrations to approximate the expectations
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \frac{1}{\text{batch\_size}}\sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + (t_{n+1} - t_n)\sum_{k=1}^{K} \big[ f(t_n, X_{T - t_{n}}^{(j)}, u(t_{n},X_{T - t_{n}}^{(j)})) \big] \right]\]
Reformulation as a learning problem
The DeepSplitting algorithm approximates $u(t_{n+1}, x)$ by a parametric function ${\bf u}^\theta_n(x)$. It is advised to let this function be a neural network ${\bf u}_\theta \equiv NN_\theta$ as they are universal approximators.
For each time step $t_n$, the DeepSplitting algorithm
Generates the particle trajectories $X^{x, (j)}$ satisfying Eq. (2) over the whole interval $[0,T]$.
Seeks ${\bf u}_{n+1}^{\theta}$ by minimizing the loss function
\[L(\theta) = ||{\bf u}^\theta_{n+1}(X_{T - t_n}) - \left[ f(t, X_{T - t_{n-1}}, {\bf u}_{n-1}(X_{T - t_{n-1}}))(t_{n} - t_{n-1}) + {\bf u}_{n-1}(X_{T - t_{n-1}}) \right] ||\]
This way, the PDE approximation problem is decomposed into a sequence of separate learning problems. In HighDimPDE.jl the right parameter combination $\theta$ is found by iteratively minimizing $L$ using stochastic gradient descent.
To solve with DeepSplitting, one needs to provide to solve
dtbatch_sizemaxiters: the number of iterations for minimizing the loss functionabstol: the absolute tolerance for the loss functionuse_cuda: if you have a Nvidia GPU, recommended.
Solving point-wise or on a hypercube
Pointwise
DeepSplitting allows obtaining $u(t,x)$ on a single point $x \in \Omega$ with the keyword $x$.
prob = PIDEProblem(μ, σ, x, tspan, g, f)Hypercube
Yet more generally, one wants to solve Eq. (1) on a $d$-dimensional cube $[a,b]^d$. This is offered by HighDimPDE.jl with the keyword x0_sample.
prob = PIDEProblem(μ, σ, x, tspan, g, f; x0_sample = x0_sample)Internally, this is handled by assigning a random variable as the initial point of the particles, i.e.
\[X_t^\xi = \int_0^t \mu(X_s^x)ds + \int_0^t\sigma(X_s^x)dB_s + \xi,\]
where $\xi$ a random variable uniformly distributed over $[a,b]^d$. This way, the neural network is trained on the whole interval $[a,b]^d$ instead of a single point.
Non-local PDEs
DeepSplitting can solve for non-local reaction diffusion equations of the type
\[\partial_t u = \mu(x) \nabla_x u + \frac{1}{2} \sigma^2(x) \Delta u + \int_{\Omega}f(x,y, u(t,x), u(t,y))dy\]
The non-localness is handled by a Monte Carlo integration.
\[u(t_{n+1}, X_{T - t_{n+1}}) \approx \sum_{j=1}^{\text{batch\_size}} \left[ u(t_{n}, X_{T - t_{n}}^{(j)}) + \frac{(t_{n+1} - t_n)}{K}\sum_{k=1}^{K} \big[ f(t, X_{T - t_{n}}^{(j)}, Y_{X_{T - t_{n}}^{(j)}}^{(k)}, u(t_{n},X_{T - t_{n}}^{(j)}), u(t_{n},Y_{X_{T - t_{n}}^{(j)}}^{(k)})) \big] \right]\]
In practice, if you have a non-local model, you need to provide the sampling method and the number $K$ of MC integration through the keywords mc_sample and K. julia alg = DeepSplitting(nn, opt = opt, mc_sample = mc_sample, K = 1)mc_sample can be whether UniformSampling(a, b) or NormalSampling(σ_sampling, shifted).