EnsembleGPUArray

API

DiffEqGPU.EnsembleGPUArrayType
EnsembleGPUArray(backend,cpu_offload = 0.2)

An EnsembleArrayAlgorithm which utilizes the GPU kernels to parallelize each ODE solve with their separate ODE integrator on each kernel.

Positional Arguments

  • backend: the KernelAbstractions backend for performing the computation.
  • cpu_offload: the percentage of trajectories to offload to the CPU. Default is 0.2 or 20% of trajectories.

Limitations

EnsembleGPUArray requires being able to generate a kernel for f using KernelAbstractons.jl and solving the resulting ODE defined over CuArray input types. This introduces the following limitations on its usage:

  • Not all standard Julia f functions are allowed. Only Julia f functions which are capable of being compiled into a GPU kernel are allowed. This notably means that certain features of Julia can cause issues inside of kernel, like:

    • Allocating memory (building arrays)
    • Linear algebra (anything that calls BLAS)
    • Broadcast
  • Not all ODE solvers are allowed, only those from OrdinaryDiffEq.jl. The tested feature set from OrdinaryDiffEq.jl includes:

    • Explicit Runge-Kutta methods
    • Implicit Runge-Kutta methods
    • Rosenbrock methods
    • DiscreteCallbacks and ContinuousCallbacks
  • Stiff ODEs require the analytical solution of every derivative function it requires. For example, Rosenbrock methods require the Jacobian and the gradient with respect to time, and so these two functions are required to be given. Note that they can be generated by the modelingtoolkitize approach.

  • To use multiple GPUs over clusters, one must manually set up one process per GPU. See the multi-GPU tutorial for more details.

Warn

Callbacks with terminate! does not work well with EnsembleGPUArray as the entire integration will hault when any of the trajectories hault. Use with caution.

Example

using DiffEqGPU, CUDA, OrdinaryDiffEq
function lorenz(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end

u0 = Float32[1.0;0.0;0.0]
tspan = (0.0f0,100.0f0)
p = [10.0f0,28.0f0,8/3f0]
prob = ODEProblem(lorenz,u0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob,p=rand(Float32,3).*p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy=false)
@time sol = solve(monteprob,Tsit5(),EnsembleGPUArray(CUDADevice()),trajectories=10_000,saveat=1.0f0)
source
DiffEqGPU.EnsembleCPUArrayType
EnsembleCPUArray(cpu_offload = 0.2)

An EnsembleArrayAlgorithm which utilizes the CPU kernels to parallelize each ODE solve with their separate ODE integrator on each kernel. This method is meant to be a debugging counterpart to EnsembleGPUArray, having the same behavior and using the same KernelAbstractions.jl process to build the combined ODE, but without the restrictions of f being a GPU-compatible kernel function.

It is unlikely that this method is useful beyond library development and debugging, as almost any case should be faster with EnsembleThreads or EnsembleDistributed.

source