PreallocationTools.jl
PreallocationTools.jl is a set of tools for helping build non-allocating pre-cached functions for high-performance computing in Julia. Its tools handle edge cases of automatic differentiation to make it easier for users to get high performance, even in the cases where code generation may change the function that is being called.
DiffCache
DiffCache
is a method for generating doubly-preallocated vectors which are compatible with non-allocating forward-mode automatic differentiation by ForwardDiff.jl. Since ForwardDiff uses chunked duals in its forward pass, two vector sizes are required in order for the arrays to be properly defined. DiffCache
creates a dispatching type to solve this, so that by passing a qualifier it can automatically switch between the required cache. This method is fully type-stable and non-dynamic, made for when the highest performance is needed.
Using DiffCache
DiffCache(u::AbstractArray, N::Int = ForwardDiff.pickchunksize(length(u)); levels::Int = 1)
DiffCache(u::AbstractArray, N::AbstractArray{<:Int})
The DiffCache
function builds a DiffCache
object that stores both a version of the cache for u
and for the Dual
version of u
, allowing use of pre-cached vectors with forward-mode automatic differentiation. Note that DiffCache
, due to its design, is only compatible with arrays that contain concretely typed elements.
To access the caches, one uses:
get_tmp(tmp::DiffCache, u)
When u
has an element subtype of Dual
numbers, then it returns the Dual
version of the cache. Otherwise, it returns the standard cache (for use in the calls without automatic differentiation).
In order to preallocate to the right size, the DiffCache
needs to be specified to have the correct N
matching the chunk size of the dual numbers or larger. If the chunk size N
specified is too large, get_tmp
will automatically resize when dispatching; this remains type-stable and non-allocating, but comes at the expense of additional memory.
In a differential equation, optimization, etc., the default chunk size is computed from the state vector u
, and thus if one creates the DiffCache
via DiffCache(u)
it will match the default chunking of the solver libraries.
DiffCache
is also compatible with nested automatic differentiation calls through the levels
keyword (N
for each level computed using based on the size of the state vector) or by specifying N
as an array of integers of chunk sizes, which enables full control of chunk sizes on all differentiation levels.
DiffCache Example 1: Direct Usage
using ForwardDiff, PreallocationTools
randmat = rand(5, 3)
sto = similar(randmat)
stod = DiffCache(sto)
function claytonsample!(sto, τ, α; randmat = randmat)
sto = get_tmp(sto, τ)
sto .= randmat
τ == 0 && return sto
n = size(sto, 1)
for i in 1:n
v = sto[i, 2]
u = sto[i, 1]
sto[i, 1] = (1 - u^(-τ) + u^(-τ) * v^(-(τ / (1 + τ))))^(-1 / τ) * α
sto[i, 2] = (1 - u^(-τ) + u^(-τ) * v^(-(τ / (1 + τ))))^(-1 / τ)
end
return sto
end
ForwardDiff.derivative(τ -> claytonsample!(stod, τ, 0.0), 0.3)
ForwardDiff.jacobian(x -> claytonsample!(stod, x[1], x[2]), [0.3; 0.0])
In the above, the chunk size of the dual numbers has been selected based on the size of randmat
, resulting in a chunk size of 8 in this case. However, since the derivative is calculated with respect to τ and the Jacobian is calculated with respect to τ and α, specifying the DiffCache
with stod = DiffCache(sto, 1)
or stod = DiffCache(sto, 2)
, respectively, would have been the most memory efficient way of performing these calculations (only really relevant for much larger problems).
DiffCache Example 2: ODEs
using LinearAlgebra, OrdinaryDiffEq
function foo(du, u, (A, tmp), t)
mul!(tmp, A, u)
@. du = u + tmp
nothing
end
prob = ODEProblem(foo, ones(5, 5), (0.0, 1.0), (ones(5, 5), zeros(5, 5)))
solve(prob, TRBDF2())
fails because tmp
is only real numbers, but during automatic differentiation we need tmp
to be a cache of dual numbers. Since u
is the value that will have the dual numbers, we dispatch based on that:
using LinearAlgebra, OrdinaryDiffEq, PreallocationTools
function foo(du, u, (A, tmp), t)
tmp = get_tmp(tmp, u)
mul!(tmp, A, u)
@. du = u + tmp
nothing
end
chunk_size = 5
prob = ODEProblem(foo,
ones(5, 5),
(0.0, 1.0),
(ones(5, 5), DiffCache(zeros(5, 5), chunk_size)))
solve(prob, TRBDF2(chunk_size = chunk_size))
or just using the default chunking:
using LinearAlgebra, OrdinaryDiffEq, PreallocationTools
function foo(du, u, (A, tmp), t)
tmp = get_tmp(tmp, u)
mul!(tmp, A, u)
@. du = u + tmp
nothing
end
chunk_size = 5
prob = ODEProblem(foo, ones(5, 5), (0.0, 1.0), (ones(5, 5), DiffCache(zeros(5, 5))))
solve(prob, TRBDF2())
DiffCache Example 3: Nested AD calls in an optimization problem involving a Hessian matrix
using LinearAlgebra, OrdinaryDiffEq, PreallocationTools, Optim, Optimization
function foo(du, u, p, t)
tmp = p[2]
A = reshape(p[1], size(tmp.du))
tmp = get_tmp(tmp, u)
mul!(tmp, A, u)
@. du = u + tmp
nothing
end
coeffs = -collect(0.1:0.1:0.4)
cache = DiffCache(zeros(2, 2), levels = 3)
prob = ODEProblem(foo, ones(2, 2), (0.0, 1.0), (coeffs, cache))
realsol = solve(prob, TRBDF2(), saveat = 0.0:0.1:10.0, reltol = 1e-8)
function objfun(x, prob, realsol, cache)
prob = remake(prob, u0 = eltype(x).(prob.u0), p = (x, cache))
sol = solve(prob, TRBDF2(), saveat = 0.0:0.1:10.0, reltol = 1e-8)
ofv = 0.0
if any((s.retcode != :Success for s in sol))
ofv = 1e12
else
ofv = sum((sol .- realsol) .^ 2)
end
return ofv
end
fn(x, p) = objfun(x, p[1], p[2], p[3])
optfun = OptimizationFunction(fn, Optimization.AutoForwardDiff())
optprob = OptimizationProblem(optfun, zeros(length(coeffs)), (prob, realsol, cache))
solve(optprob, Newton())
Solves an optimization problem for the coefficients, coeffs
, appearing in a differential equation. The optimization is done with Optim.jl's Newton()
algorithm. Since this involves automatic differentiation in the ODE solver and the calculation of Hessians, three automatic differentiations are nested within each other. Therefore, the DiffCache
is specified with levels = 3
.
FixedSizeDiffCache
FixedSizeDiffCache
is a lot like DiffCache
, but it stores dual numbers in its caches instead of a flat array. Because of this, it can avoid a view, making it a little more performant for generating caches of non-Array
types. However, it is a lot less flexible than DiffCache
, and is thus only recommended for cases where the chunk size is known in advance (for example, ODE solvers) and where u
is not an Array
.
The interface is almost exactly the same, except with the constructor:
FixedSizeDiffCache(u::AbstractArray, chunk_size = Val{ForwardDiff.pickchunksize(length(u))})
FixedSizeDiffCache(u::AbstractArray, chunk_size::Integer)
Note that the FixedSizeDiffCache
can support duals that are of a small chunk size than the preallocated ones, but not a larger size. Nested duals are not supported with this construct.
LazyBufferCache
LazyBufferCache(f::F = identity)
A LazyBufferCache
is a Dict
-like type for the caches, which automatically defines new cache arrays on demand when they are required. The function f
maps size_of_cache = f(size(u))
, which by default creates cache arrays of the same size.
By default the created buffers are not initialized, but a function initializer!
can be supplied which is applied to the buffer when it is created, for instance buf -> fill!(buf, 0.0)
.
Note that LazyBufferCache
is type-stable and contains no dynamic dispatch. This gives it a ~15ns overhead. The upside of LazyBufferCache
is that the user does not have to worry about potential issues with chunk sizes and such: LazyBufferCache
is much easier!
Example
using LinearAlgebra, OrdinaryDiffEq, PreallocationTools
function foo(du, u, (A, lbc), t)
tmp = lbc[u]
mul!(tmp, A, u)
@. du = u + tmp
nothing
end
prob = ODEProblem(foo, ones(5, 5), (0.0, 1.0), (ones(5, 5), LazyBufferCache()))
solve(prob, TRBDF2())
Note About ReverseDiff Support for LazyBuffer
ReverseDiff support is done in SciMLSensitivity.jl to reduce the AD requirements on this package. Load that package if ReverseDiff overloads are required.
GeneralLazyBufferCache
GeneralLazyBufferCache(f = identity)
A GeneralLazyBufferCache
is a Dict
-like type for the caches, which automatically defines new caches on demand when they are required. The function f
generates the cache matching for the type of u
, and subsequent indexing reuses that cache if that type of u
has already been seen.
Note that GeneralLazyBufferCache
's return is not type-inferred. This means it's the slowest of the preallocation methods, but it's the most general.
Example
In all the previous cases, our cache was an array. However, in this case, we want to preallocate a DifferentialEquations ODEIntegrator
object. This object is the one created via DifferentialEquations.init(ODEProblem(ode_fnc, y₀, (0.0, T), p), Tsit5(); saveat = t)
, and we want to optimize p
in a way that changes its type to ForwardDiff. Thus, what we can do is make a GeneralLazyBufferCache which holds these integrator objects, defined by p
, and indexing it with p
in order to retrieve the cache. The first time it's called it will build the integrator, and in subsequent calls it will reuse the cache.
Defining the cache as a function of p
to build an integrator thus looks like:
lbc = GeneralLazyBufferCache(function (p)
DifferentialEquations.init(ODEProblem(ode_fnc, y₀, (0.0, T), p), Tsit5(); saveat = t)
end)
then lbc[p]
will be smart and reuse the caches. A full example looks like the following:
using Random, DifferentialEquations, LinearAlgebra, Optimization, OptimizationNLopt,
OptimizationOptimJL, PreallocationTools
lbc = GeneralLazyBufferCache(function (p)
DifferentialEquations.init(ODEProblem(ode_fnc, y₀, (0.0, T), p), Tsit5(); saveat = t)
end)
Random.seed!(2992999)
λ, y₀, σ = -0.5, 15.0, 0.1
T, n = 5.0, 200
Δt = T / n
t = [j * Δt for j in 0:n]
y = y₀ * exp.(λ * t)
yᵒ = y .+ [0.0, σ * randn(n)...]
ode_fnc(u, p, t) = p * u
function loglik(θ, data, integrator)
yᵒ, n, ε = data
λ, σ, u0 = θ
integrator.p = λ
reinit!(integrator, u0)
solve!(integrator)
ε = yᵒ .- integrator.sol.u
ℓ = -0.5n * log(2π * σ^2) - 0.5 / σ^2 * sum(ε .^ 2)
end
θ₀ = [-1.0, 0.5, 19.73]
negloglik = (θ, p) -> -loglik(θ, p, lbc[θ[1]])
fnc = OptimizationFunction(negloglik, Optimization.AutoForwardDiff())
ε = zeros(n)
prob = OptimizationProblem(fnc,
θ₀,
(yᵒ, n, ε),
lb = [-10.0, 1e-6, 0.5],
ub = [10.0, 10.0, 25.0])
solve(prob, LBFGS())
Similar Projects
AutoPreallocation.jl tries to do this automatically at the compiler level. Alloc.jl tries to do this with a bump allocator.
Contributing
Please refer to the SciML ColPrac: Contributor's Guide on Collaborative Practices for Community Packages for guidance on PRs, issues, and other matters relating to contributing to SciML.
See the SciML Style Guide for common coding practices and other style decisions.
There are a few community forums:
- The #diffeq-bridged and #sciml-bridged channels in the Julia Slack
- The #diffeq-bridged and #sciml-bridged channels in the Julia Zulip
- On the Julia Discourse forums
- See also SciML Community page
Reproducibility
The documentation of this SciML package was built using these direct dependencies,
Status `~/work/PreallocationTools.jl/PreallocationTools.jl/docs/Project.toml`
[e30172f5] Documenter v1.9.0
[d236fae5] PreallocationTools v0.4.26 `~/work/PreallocationTools.jl/PreallocationTools.jl`
and using this machine and Julia version.
Julia Version 1.11.4
Commit 8561cc3d68d (2025-03-10 11:36 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 4 × AMD EPYC 7763 64-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
A more complete overview of all dependencies and their versions is also provided.
Status `~/work/PreallocationTools.jl/PreallocationTools.jl/docs/Manifest.toml`
[a4c015fc] ANSIColoredPrinters v0.0.1
[1520ce14] AbstractTrees v0.4.5
[79e6a3ab] Adapt v4.3.0
[4fba245c] ArrayInterface v7.18.0
[944b1d66] CodecZlib v0.7.8
[bbf7d656] CommonSubexpressions v0.3.1
[163ba53b] DiffResults v1.1.0
[b552c78f] DiffRules v1.15.1
[ffbed154] DocStringExtensions v0.9.3
[e30172f5] Documenter v1.9.0
[f6369f11] ForwardDiff v1.0.0
[d7ba0133] Git v1.3.1
[b5f81e59] IOCapture v0.2.5
[92d709cd] IrrationalConstants v0.2.4
[692b3bcd] JLLWrappers v1.7.0
[682c06a0] JSON v0.21.4
[0e77f7df] LazilyInitializedFields v1.3.0
[2ab3a3ac] LogExpFunctions v0.3.29
[1914dd2f] MacroTools v0.5.15
[d0879d2d] MarkdownAST v0.1.2
[77ba4419] NaNMath v1.1.2
[69de0a69] Parsers v2.8.1
[d236fae5] PreallocationTools v0.4.26 `~/work/PreallocationTools.jl/PreallocationTools.jl`
⌅ [aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[2792f1a3] RegistryInstances v0.1.0
[ae029012] Requires v1.3.1
[276daf66] SpecialFunctions v2.5.0
[1e83bf80] StaticArraysCore v1.4.3
[3bb67fe8] TranscodingStreams v0.11.3
[2e619515] Expat_jll v2.6.5+0
[f8c6e375] Git_jll v2.49.0+0
[94ce4f54] Libiconv_jll v1.18.0+0
[458c3c95] OpenSSL_jll v3.0.16+0
[efe28fd5] OpenSpecFun_jll v0.5.6+0
[0dad84c5] ArgTools v1.1.2
[56f22d72] Artifacts v1.11.0
[2a0f44e3] Base64 v1.11.0
[ade2ca70] Dates v1.11.0
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching v1.11.0
[b77e0a4c] InteractiveUtils v1.11.0
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2 v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[56ddb016] Logging v1.11.0
[d6f4376e] Markdown v1.11.0
[a63ad114] Mmap v1.11.0
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.11.0
[de0858da] Printf v1.11.0
[3fa0cd96] REPL v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization v1.11.0
[6462fe0b] Sockets v1.11.0
[f489334b] StyledStrings v1.11.0
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test v1.11.0
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.6.0+0
[e37daf67] LibGit2_jll v1.7.2+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.6+0
[14a3606d] MozillaCACerts_jll v2023.12.12
[4536629a] OpenBLAS_jll v0.3.27+1
[05823500] OpenLibm_jll v0.8.1+4
[efcefdf7] PCRE2_jll v10.42.0+1
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.11.0+0
[8e850ede] nghttp2_jll v1.59.0+0
[3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
You can also download the manifest file and the project file.