EnsembleGPUKernel

API

DiffEqGPU.EnsembleGPUKernelType
EnsembleGPUKernel(backend,cpu_offload = 0.2)

A massively-parallel ensemble algorithm which generates a unique GPU kernel for the entire ODE which is then executed. This leads to a very low overhead GPU code generation, but imparts some extra limitations on the use.

Positional Arguments

  • backend: the KernelAbstractions backend for performing the computation.
  • cpu_offload: the percentage of trajectories to offload to the CPU. Default is 0.0 or 0% of trajectories.

Limitations

  • Not all standard Julia f functions are allowed. Only Julia f functions which are capable of being compiled into a GPU kernel are allowed. This notably means that certain features of Julia can cause issues inside a kernel, like:

    • Allocating memory (building arrays)
    • Linear algebra (anything that calls BLAS)
    • Broadcast
  • Only out-of-place f definitions are allowed. Coupled with the requirement of not allowing for memory allocations, this means that the ODE must be defined with StaticArray initial conditions.

  • Only specific ODE solvers are allowed. This includes:

    • GPUTsit5
    • GPUVern7
    • GPUVern9
  • To use multiple GPUs over clusters, one must manually set up one process per GPU. See the multi-GPU tutorial for more details.

Example

using DiffEqGPU, CUDA, OrdinaryDiffEq, StaticArrays

function lorenz(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

@time sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(), trajectories = 10_000,
                  adaptive = false, dt = 0.1f0)
source

Specialized Solvers

DiffEqGPU.GPUTsit5Type

GPUTsit5()

A specialized implementation of the 5th order Tsit5 method specifically for kernel generation with EnsembleGPUKernel. For a similar CPU implementation, see SimpleATsit5 from SimpleDiffEq.jl.

source
DiffEqGPU.GPUVern7Type

GPUVern7()

A specialized implementation of the 7th order Vern7 method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPUVern9Type

GPUVern9()

A specialized implementation of the 9th order Vern9 method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPUEMType

GPUEM()

A specialized implementation of the Euler-Maruyama GPUEM method with weak order 1.0. Made specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPUSIEAType

GPUSIEA()

A specialized implementation of the weak order 2.0 for Ito SDEs GPUSIEA method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPURosenbrock23Type

GPURosenbrock23()

A specialized implementation of the W-method Rosenbrock23 method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPURodas4Type

GPURodas4()

A specialized implementation of the Rodas4 method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPURodas5PType

GPURodas5P()

A specialized implementation of the Rodas5P method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPUKvaerno3Type

GPUKvaerno3()

A specialized implementation of the Kvaerno3 method specifically for kernel generation with EnsembleGPUKernel.

source
DiffEqGPU.GPUKvaerno5Type

GPUKvaerno5()

A specialized implementation of the Kvaerno5 method specifically for kernel generation with EnsembleGPUKernel.

source

Lower Level API

DiffEqGPU.vectorized_solveFunction
vectorized_solve(probs, prob::Union{ODEProblem, SDEProblem}alg;
                 dt, saveat = nothing,
                 save_everystep = true,
                 debug = false, callback = CallbackSet(nothing), tstops = nothing)

A lower level interface to the kernel generation solvers of EnsembleGPUKernel with fixed time-stepping.

Arguments

  • probs: the GPU-setup problems generated by the ensemble.
  • prob: the quintessential problem form. Can be just probs[1]
  • alg: the kernel-based differential equation solver. Must be one of the EnsembleGPUKernel specialized methods.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source
DiffEqGPU.vectorized_asolveFunction
vectorized_asolve(probs, prob::ODEProblem, alg;
                  dt = 0.1f0, saveat = nothing,
                  save_everystep = false,
                  abstol = 1.0f-6, reltol = 1.0f-3,
                  callback = CallbackSet(nothing), tstops = nothing)

A lower level interface to the kernel generation solvers of EnsembleGPUKernel with adaptive time-stepping.

Arguments

  • probs: the GPU-setup problems generated by the ensemble.
  • prob: the quintessential problem form. Can be just probs[1]
  • alg: the kernel-based differential equation solver. Must be one of the EnsembleGPUKernel specialized methods.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source
DiffEqGPU.vectorized_map_solveFunction

Lower level API for EnsembleArrayAlgorithm. Avoids conversion of solution to CPU arrays.

vectorized_map_solve(probs, alg,
                     ensemblealg::Union{EnsembleArrayAlgorithm}, I,
                     adaptive)

Arguments

  • probs: the GPU-setup problems generated by the ensemble.
  • alg: the kernel-based differential equation solver. Most of the solvers from OrdinaryDiffEq.jl are supported.
  • ensemblealg: The EnsembleGPUArray() algorithm.
  • I: The iterator argument. Can be set to for e.g. 1:10_000 to simulate 10,000 trajectories.
  • adaptive: The Boolean argument for time-stepping. Use true to enable adaptive time-stepping.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source