Automatic Differentiation Construction Choice Recommendations
The choices for the auto-AD fill-ins with quick descriptions are:
AutoForwardDiff()
: The fastest choice for small optimizationsAutoReverseDiff(compile=false)
: A fast choice for large scalar optimizationsAutoTracker()
: Like ReverseDiff but GPU-compatibleAutoZygote()
: The fastest choice for non-mutating array-based (BLAS) functionsAutoFiniteDiff()
: Finite differencing, not optimal but always applicableAutoModelingToolkit()
: The fastest choice for large scalar optimizationsAutoEnzyme()
: Highly performant AD choice for type stable and optimized codeAutoMooncake()
: Like Zygote and ReverseDiff, but supports GPU and mutating code
Automatic Differentiation Choice API
The following sections describe the Auto-AD choices in detail.
ADTypes.AutoForwardDiff
— TypeAutoForwardDiff{chunksize} <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoForwardDiff(); kwargs...)
This uses the ForwardDiff.jl package. It is the fastest choice for small systems, especially with heavy scalar interactions. It is easy to use and compatible with most Julia functions which have loose type restrictions. However, because it's forward-mode, it scales poorly in comparison to other AD choices. Hessian construction is suboptimal as it uses the forward-over-forward approach.
- Compatible with GPUs
- Compatible with Hessian-based optimization
- Compatible with Hv-based optimization
- Compatible with constraints
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via ForwardDiff.
AutoForwardDiff{chunksize,T}
Struct used to select the ForwardDiff.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoForwardDiff(; chunksize=nothing, tag=nothing)
Type parameters
chunksize
: the preferred chunk size to evaluate several derivatives at once
Fields
tag::T
: a custom tag to handle nested differentiation calls (usually not necessary)
ADTypes.AutoFiniteDiff
— TypeAutoFiniteDiff{T1,T2,T3} <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoFiniteDiff(); kwargs...)
This uses FiniteDiff.jl. While not necessarily the most efficient, this is the only choice that doesn't require the f
function to be automatically differentiable, which means it applies to any choice. However, because it's using finite differencing, one needs to be careful as this procedure introduces numerical error into the derivative estimates.
- Compatible with GPUs
- Compatible with Hessian-based optimization
- Compatible with Hv-based optimization
- Compatible with constraint functions
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via FiniteDiff.
Constructor
AutoFiniteDiff(; fdtype = Val(:forward)fdjtype = fdtype, fdhtype = Val(:hcentral))
fdtype
: the method used for defining the gradientfdjtype
: the method used for defining the Jacobian of constraints.fdhtype
: the method used for defining the Hessian
For more information on the derivative type specifiers, see the FiniteDiff.jl documentation.
AutoFiniteDiff{T1,T2,T3}
Struct used to select the FiniteDiff.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoFiniteDiff(;
fdtype=Val(:forward), fdjtype=fdtype, fdhtype=Val(:hcentral),
relstep=nothing, absstep=nothing, dir=true
)
Fields
fdtype::T1
: finite difference typefdjtype::T2
: finite difference type for the Jacobianfdhtype::T3
: finite difference type for the Hessianrelstep
: relative finite difference step sizeabsstep
: absolute finite difference step sizedir
: direction of the finite difference step
ADTypes.AutoReverseDiff
— TypeAutoReverseDiff <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoReverseDiff(); kwargs...)
This uses the ReverseDiff.jl package. AutoReverseDiff
has a default argument, compile
, which denotes whether the reverse pass should be compiled. compile
should only be set to true
if f
contains no branches (if statements, while loops) otherwise it can produce incorrect derivatives!
AutoReverseDiff
is generally applicable to many pure Julia codes, and with compile=true
it is one of the fastest options on code with heavy scalar interactions. Hessian calculations are fast by mixing ForwardDiff with ReverseDiff for forward-over-reverse. However, its performance can falter when compile=false
.
- Not compatible with GPUs
- Compatible with Hessian-based optimization by mixing with ForwardDiff
- Compatible with Hv-based optimization by mixing with ForwardDiff
- Not compatible with constraint functions
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via ReverseDiff.
Constructor
AutoReverseDiff(; compile = false)
Note: currently, compilation is not defined/used!
AutoReverseDiff{compile}
Struct used to select the ReverseDiff.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoReverseDiff(; compile::Union{Val, Bool} = Val(false))
Fields
compile::Union{Val, Bool}
: whether to allow pre-recording and reusing a tape (which speeds up the differentiation process).- If
compile=false
orcompile=Val(false)
, a new tape must be recorded at every call to the differentiation operator. - If
compile=true
orcompile=Val(true)
, a tape can be pre-recorded on an example input and then reused at every differentiation call.
The boolean version of this keyword argument is taken as the type parameter.
- If
Pre-recording a tape only captures the path taken by the differentiated function when executed on the example input. If said function has value-dependent branching behavior, reusing pre-recorded tapes can lead to incorrect results. In such situations, you should keep the default setting compile=Val(false)
. For more details, please refer to ReverseDiff's AbstractTape
API documentation.
Despite what its name may suggest, the compile
setting does not prescribe whether or not the tape is compiled with ReverseDiff.compile
after being recorded. This is left as a private implementation detail.
ADTypes.AutoZygote
— TypeAutoZygote <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoZygote(); kwargs...)
This uses the Zygote.jl package. This is the staple reverse-mode AD that handles a large portion of Julia with good efficiency. Hessian construction is fast via forward-over-reverse mixing ForwardDiff.jl with Zygote.jl
- Compatible with GPUs
- Compatible with Hessian-based optimization via ForwardDiff
- Compatible with Hv-based optimization via ForwardDiff
- Not compatible with constraint functions
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via Zygote.
AutoZygote
Struct used to select the Zygote.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoZygote()
ADTypes.AutoTracker
— TypeAutoTracker <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoTracker(); kwargs...)
This uses the Tracker.jl package. Generally slower than ReverseDiff, it is generally applicable to many pure Julia codes.
- Compatible with GPUs
- Not compatible with Hessian-based optimization
- Not compatible with Hv-based optimization
- Not compatible with constraint functions
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via Tracker.
AutoTracker
Struct used to select the Tracker.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoTracker()
ADTypes.AutoModelingToolkit
— FunctionAutoModelingToolkit <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoModelingToolkit(); kwargs...)
This uses the ModelingToolkit.jl package's modelingtookitize
functionality to generate the derivatives and other fields of an OptimizationFunction
. This backend creates the symbolic expressions for the objective and its derivatives as well as the constraints and their derivatives. Through structural_simplify
, it enforces simplifications that can reduce the number of operations needed to compute the derivatives of the constraints. This automatically generates the expression graphs that some solver interfaces through OptimizationMOI like AmplNLWriter.jl require.
- Compatible with GPUs
- Compatible with Hessian-based optimization
- Compatible with Hv-based optimization
- Compatible with constraints
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not generated via ModelingToolkit.
Constructor
AutoModelingToolkit(false, false)
obj_sparse
: to indicate whether the objective hessian is sparse.cons_sparse
: to indicate whether the constraints' jacobian and hessian are sparse.
ADTypes.AutoEnzyme
— TypeAutoEnzyme <: AbstractADType
An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:
OptimizationFunction(f, AutoEnzyme(); kwargs...)
This uses the Enzyme.jl package. Enzyme performs automatic differentiation on the LLVM IR code generated from julia. It is highly-efficient and its ability perform AD on optimized code allows Enzyme to meet or exceed the performance of state-of-the-art AD tools.
- Compatible with GPUs
- Compatible with Hessian-based optimization
- Compatible with Hv-based optimization
- Compatible with constraints
Note that only the unspecified derivative functions are defined. For example, if a hess
function is supplied to the OptimizationFunction
, then the Hessian is not defined via Enzyme.
AutoEnzyme{M,A}
Struct used to select the Enzyme.jl backend for automatic differentiation.
Defined by ADTypes.jl.
Constructors
AutoEnzyme(; mode::M=nothing, function_annotation::Type{A}=Nothing)
Type parameters
A
determines how the functionf
to differentiate is passed to Enzyme. It can be:- a subtype of
EnzymeCore.Annotation
(likeEnzymeCore.Const
orEnzymeCore.Duplicated
) to enforce a given annotation Nothing
to simply passf
and let Enzyme choose the most appropriate annotation
- a subtype of
Fields
mode::M
determines the autodiff mode (forward or reverse). It can be:- an object subtyping
EnzymeCore.Mode
(likeEnzymeCore.Forward
orEnzymeCore.Reverse
) if a specific mode is required nothing
to choose the best mode automatically
- an object subtyping
ADTypes.AutoMooncake
— TypeAutoMooncake
Struct used to select the Mooncake.jl backend for automatic differentiation in reverse mode.
Defined by ADTypes.jl.
When forward mode became available in Mooncake.jl v0.4.147, another struct called AutoMooncakeForward
was introduced. It was kept separate to avoid a breaking release of ADTypes.jl. AutoMooncake
remains for reverse mode only.
Constructors
AutoMooncake(; config=nothing)
Fields
config
: eithernothing
or an instance ofMooncake.Config
– see the docstring ofMooncake.Config
for more information.AutoMooncake(; config=nothing)
is equivalent toAutoMooncake(; config=Mooncake.Config())
, i.e. the default configuration.