EnsembleGPUKernel

API

DiffEqGPU.EnsembleGPUKernel — Type

EnsembleGPUKernel(backend,cpu_offload = 0.2)

A massively-parallel ensemble algorithm which generates a unique GPU kernel for the entire ODE which is then executed. This leads to a very low overhead GPU code generation, but imparts some extra limitations on the use.

Positional Arguments

backend: the KernelAbstractions backend for performing the computation.
cpu_offload: the percentage of trajectories to offload to the CPU. Default is 0.0 or 0% of trajectories.

Limitations

Not all standard Julia f functions are allowed. Only Julia f functions which are capable of being compiled into a GPU kernel are allowed. This notably means that certain features of Julia can cause issues inside a kernel, like:
- Allocating memory (building arrays)
- Linear algebra (anything that calls BLAS)
- Broadcast
Only out-of-place f definitions are allowed. Coupled with the requirement of not allowing for memory allocations, this means that the ODE must be defined with StaticArray initial conditions.
Only specific ODE solvers are allowed. This includes:
- GPUTsit5
- GPUVern7
- GPUVern9
To use multiple GPUs over clusters, one must manually set up one process per GPU. See the multi-GPU tutorial for more details.

Example

using DiffEqGPU, CUDA, OrdinaryDiffEq, StaticArrays

function lorenz(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

@time sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(), trajectories = 10_000,
                  adaptive = false, dt = 0.1f0)

source

Specialized Solvers

DiffEqGPU.GPUTsit5 — Type

GPUTsit5()

A specialized implementation of the 5th order Tsit5 method specifically for kernel generation with EnsembleGPUKernel. For a similar CPU implementation, see SimpleATsit5 from SimpleDiffEq.jl.

source

DiffEqGPU.GPUVern7 — Type

GPUVern7()

A specialized implementation of the 7th order Vern7 method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPUVern9 — Type

GPUVern9()

A specialized implementation of the 9th order Vern9 method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPUEM — Type

GPUEM()

A specialized implementation of the Euler-Maruyama GPUEM method with weak order 1.0. Made specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPUSIEA — Type

GPUSIEA()

A specialized implementation of the weak order 2.0 for Ito SDEs GPUSIEA method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPURosenbrock23 — Type

GPURosenbrock23()

A specialized implementation of the W-method Rosenbrock23 method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPURodas4 — Type

GPURodas4()

A specialized implementation of the Rodas4 method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPURodas5P — Type

GPURodas5P()

A specialized implementation of the Rodas5P method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPUKvaerno3 — Type

GPUKvaerno3()

A specialized implementation of the Kvaerno3 method specifically for kernel generation with EnsembleGPUKernel.

source

DiffEqGPU.GPUKvaerno5 — Type

GPUKvaerno5()

A specialized implementation of the Kvaerno5 method specifically for kernel generation with EnsembleGPUKernel.

source

Lower Level API

DiffEqGPU.vectorized_solve — Function

vectorized_solve(probs, prob::Union{ODEProblem, SDEProblem}alg;
                 dt, saveat = nothing,
                 save_everystep = true,
                 debug = false, callback = CallbackSet(nothing), tstops = nothing)

A lower level interface to the kernel generation solvers of EnsembleGPUKernel with fixed time-stepping.

Arguments

probs: the GPU-setup problems generated by the ensemble.
prob: the quintessential problem form. Can be just probs[1]
alg: the kernel-based differential equation solver. Must be one of the EnsembleGPUKernel specialized methods.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source

DiffEqGPU.vectorized_asolve — Function

vectorized_asolve(probs, prob::ODEProblem, alg;
                  dt = 0.1f0, saveat = nothing,
                  save_everystep = false,
                  abstol = 1.0f-6, reltol = 1.0f-3,
                  callback = CallbackSet(nothing), tstops = nothing)

A lower level interface to the kernel generation solvers of EnsembleGPUKernel with adaptive time-stepping.

Arguments

probs: the GPU-setup problems generated by the ensemble.
prob: the quintessential problem form. Can be just probs[1]
alg: the kernel-based differential equation solver. Must be one of the EnsembleGPUKernel specialized methods.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source

DiffEqGPU.vectorized_map_solve — Function

Lower level API for EnsembleArrayAlgorithm. Avoids conversion of solution to CPU arrays.

vectorized_map_solve(probs, alg,
                     ensemblealg::Union{EnsembleArrayAlgorithm}, I,
                     adaptive)

Arguments

probs: the GPU-setup problems generated by the ensemble.
alg: the kernel-based differential equation solver. Most of the solvers from OrdinaryDiffEq.jl are supported.
ensemblealg: The EnsembleGPUArray() algorithm.
I: The iterator argument. Can be set to for e.g. 1:10_000 to simulate 10,000 trajectories.
adaptive: The Boolean argument for time-stepping. Use true to enable adaptive time-stepping.

Keyword Arguments

Only a subset of the common solver arguments are supported.

source