# Sensitivity Algorithms for Differential Equations with Automatic Differentiation (AD)

SciMLSensitivity.jl's high level interface allows for specifying a sensitivity algorithm (sensealg) to control the method by which solve is differentiated in an automatic differentiation (AD) context by a compatible AD library. The underlying algorithms then use the direct interface methods, like ODEForwardSensitivityProblem and adjoint_sensitivities, to compute the derivatives without requiring the user to do any of the setup.

Current AD libraries whose calls are captured by the sensitivity system are:

## Using and Controlling Sensitivity Algorithms within AD

Take for example this simple differential equation solve on Lotka-Volterra:

using SciMLSensitivity, OrdinaryDiffEq, Zygote

function fiip(du,u,p,t)
du[1] = dx = p[1]*u[1] - p[2]*u[1]*u[2]
du[2] = dy = -p[3]*u[2] + p[4]*u[1]*u[2]
end
p = [1.5,1.0,3.0,1.0]; u0 = [1.0;1.0]
prob = ODEProblem(fiip,u0,(0.0,10.0),p)
sol = solve(prob,Tsit5())
loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)

This will compute the gradient of the loss function "sum of the values of the solution to the ODE at timepoints dt=0.1" using an adjoint method, where du0 is the derivative of the loss function with respect to the initial condition and dp is the derivative of the loss function with respect to the parameters.

Because the gradient is calculated by Zygote.gradient and Zygote.jl is one of the compatible AD libraries, this derivative calculation will be captured by the sensealg system, and one of SciMLSensitivity.jl's adjoint overloads will be used to compute the derivative. By default, if the sensealg keyword argument is not defined, then a smart polyalgorithm is used to automatically determine the most appropriate method for a given equation.

Likewise, the sensealg argument can be given to directly control the method by which the derivative is computed. For example:

loss(u0,p) = sum(solve(prob,Tsit5(),u0=u0,p=p,saveat=0.1))
du0,dp = Zygote.gradient(loss,u0,p)

## Choosing a Sensitivity Algorithm

There are two classes of algorithms: the continuous sensitivity analysis methods, and the discrete sensitivity analysis methods (direct automatic differentiation). Generally:

For an analysis of which methods will be most efficient for computing the solution derivatives for a given problem, consult our analysis in this arxiv paper. A general rule of thumb is:

• ForwardDiffSensitivity is the fastest for differential equations with small numbers of parameters (<100) and can be used on any differential equation solver that is native Julia. If the chosen ODE solver is not compatible with direct automatic differentiation, ForwardSensitivty may be used instead.
• Adjoint senstivity analysis is the fastest when the number of parameters is sufficiently large. There are three configurations of note. Using QuadratureAdjoint is the fastest but uses the most memory, BacksolveAdjoint uses the least memory but on very stiff problems it may be unstable and require a lot of checkpoints, while InterpolatingAdjoint is in the middle, allowing checkpointing to control total memory use.
• The methods which use direct automatic differentiation (ReverseDiffAdjoint, TrackerAdjoint, ForwardDiffSensitivity, and ZygoteAdjoint) support the full range of DifferentialEquations.jl features (SDEs, DDEs, events, etc.), but only work on native Julia solvers.
• For non-ODEs with large numbers of parameters, TrackerAdjoint in out-of-place form may be the best performer on GPUs, and ReverseDiffAdjoint
• TrackerAdjoint is able to use a TrackedArray form with out-of-place functions du = f(u,p,t) but requires an Array{TrackedReal} form for f(du,u,p,t) mutating du. The latter has much more overhead, and should be avoided if possible. Thus if solving non-ODEs with lots of parameters, using TrackerAdjoint with an out-of-place definition may be the current best option.
Note

Compatibility with direct automatic differentiation algorithms (ForwardDiffSensitivity, ReverseDiffAdjoint, etc.) can be queried using the SciMLBase.isautodifferentiable(::SciMLAlgorithm) trait function.

If the chosen algorithm is a continuous sensitivity analysis algorithm, then an autojacvec argument can be given for choosing how the Jacobian-vector product (J*v) or vector-Jacobian product (J'*v) calculation is computed. For the forward sensitivity methods, autojacvec=true is the most efficient, though autojacvec=false is slightly less accurate but very close in efficiency. For adjoint methods it's more complicated and dependent on the way that the user's f function is implemented:

• EnzymeVJP() is the most efficient if it's applicable on your equation.
• If your function has no branching (no if statements) but uses mutation, ReverseDiffVJP(true) will be the most efficient after Enzyme. Otherwise ReverseDiffVJP(), but you may wish to proceed with eliminating mutation as without compilation enabled this can be slow.
• If your on the CPU or GPU and your function is very vectorized and has no mutation, choose ZygoteVJP().
• Else fallback to TrackerVJP() if Zygote does not support the function.

## Special Notes on Non-ODE Differential Equation Problems

While all of the choices are compatible with ordinary differential equations, specific notices apply to other forms:

### Differential-Algebraic Equations

We note that while all 3 are compatible with index-1 DAEs via the derivation in the universal differential equations paper (note the reinitialization), we do not recommend BacksolveAdjoint one DAEs because the stiffness inherent in these problems tends to cause major difficulties with the accuracy of the backwards solution due to reinitialization of the algebraic variables.

### Stochastic Differential Equations

We note that all of the adjoints except QuadratureAdjoint are applicable to stochastic differential equations.

### Delay Differential Equations

We note that only the discretize-then-optimize methods are applicable to delay differential equations. Constant lag and variable lag delay differential equation parameters can be estimated, but the lag times themselves are unable to be estimated through these automatic differentiation techniques.

### Hybrid Equations (Equations with events/callbacks) and Jump Equations

ForwardDiffSensitivity can differentiate code with callbacks when convert_tspan=true. ForwardSensitivity is not compatible with hybrid equations. The shadowing methods are not compatible with callbacks. All methods based on discrete adjoint sensitivity analysis via automatic differentiation, like ReverseDiffAdjoint, TrackerAdjoint, or QuadratureAdjoint are fully compatible with events. This applies to ODEs, SDEs, DAEs, and DDEs. The continuous adjoint sensitivities BacksolveAdjoint, InterpolatingAdjoint, and QuadratureAdjoint are compatible with events for ODEs. BacksolveAdjoint and InterpolatingAdjoint can also handle events for SDEs. Use BacksolveAdjoint if the event terminates the time evolution and several states are saved. Currently, the continuous adjoint sensitivities do not support multiple events per time point.

## Manual VJPs

Note that when defining your differential equation the vjp can be manually overwritten by providing the AbstractSciMLFunction definition with a vjp(u,p,t) that returns a tuple f(u,p,t),v->J*v in the form of ChainRules.jl. When this is done, the choice of ZygoteVJP will utilize your VJP function during the internal steps of the adjoint. This is useful for models where automatic differentiation may have trouble producing optimal code. This can be paired with ModelingToolkit.jl for producing hyper-optimized, sparse, and parallel VJP functions utilizing the automated symbolic conversions.

## Sensitivity Algorithms

The following algorithm choices exist for sensealg. See the sensitivity mathematics page for more details on the definition of the methods.

SciMLSensitivity.ForwardSensitivityType
ForwardSensitivity{CS,AD,FDT} <: AbstractForwardSensitivityAlgorithm{CS,AD,FDT}

An implementation of continuous forward sensitivity analysis for propagating derivatives by solving the extended ODE. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the solve call within the reverse-mode automatic differentiation environment.

Constructor

function ForwardSensitivity(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
autojacmat=false)

Keyword Arguments

• autodiff: Use automatic differentiation in the internal sensitivity algorithm computations. Default is true.
• chunk_size: Chunk size for forward mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• autojacvec: Calculate the Jacobian-vector product via automatic differentiation with special seeding.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.

Further details:

• If autodiff=true and autojacvec=true, then the one chunk J*v forward-mode directional derivative calculation trick is used to compute the product without constructing the Jacobian (via ForwardDiff.jl).
• If autodiff=false and autojacvec=true, then the numerical direction derivative trick (f(x+epsilon*v)-f(x))/epsilon is used to compute J*v without constructing the Jacobian.
• If autodiff=true and autojacvec=false, then the Jacobian is constructed via chunked forward-mode automatic differentiation (via ForwardDiff.jl).
• If autodiff=false and autojacvec=false, then the Jacobian is constructed via finite differences via FiniteDiff.jl.

SciMLProblem Support

This sensealg only supports ODEProblems without callbacks (events).

SciMLSensitivity.ForwardDiffSensitivityType
ForwardDiffSensitivity{CS,CTS} <: AbstractForwardSensitivityAlgorithm{CS,Nothing,Nothing}

An implementation of discrete forward sensitivity analysis through ForwardDiff.jl. When used within adjoint differentiation (i.e. via Zygote), this will cause forward differentiation of the solve call within the reverse-mode automatic differentiation environment.

Constructor

ForwardDiffSensitivity(;chunk_size=0,convert_tspan=nothing)

Keyword Arguments

• chunk_size: the chunk size used by ForwardDiff for computing the Jacobian, i.e. the number of simultaneous columns computed.
• convert_tspan: whether to convert time to also be Dual valued. By default this is nothing which will only convert if callbacks are found. Conversion is required in order to accurately differentiate callbacks (hybrid equations).

SciMLProblem Support

This sensealg supports any SciMLProblems, provided that the solver algorithms is SciMLBase.isautodifferentiable. Note that ForwardDiffSensitivity can accurately differentiate code with callbacks only when convert_tspan=true.

SciMLSensitivity.BacksolveAdjointType
BacksolveAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis using a backwards solution of the ODE. By default this algorithm will use the values from the forward pass to perturb the backwards solution to the correct spot, allowing reduced memory (O(1) memory). Checkpointing stabilization is included for additional numerical stability over the naive implementation.

Constructor

BacksolveAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=true, noisemixing=false)

Keyword Arguments

• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• autojacvec: Calculate the vector-Jacobian product (J'*v) via automatic differentiation with special seeding. The default is true. The total set of choices are:
• false: the Jacobian is constructed via FiniteDiff.jl
• true: the Jacobian is constructed via ForwardDiff.jl
• TrackerVJP: Uses Tracker.jl for the vjp.
• ZygoteVJP: Uses Zygote.jl for the vjp.
• EnzymeVJP: Uses Enzyme.jl for the vjp.
• ReverseDiffVJP(compile=false): Uses ReverseDiff.jl for the vjp. compile is a boolean for whether to precompile the tape, which should only be done if there are no branches (if or while statements) in the f function.
• checkpointing: whether checkpointing is enabled for the reverse pass. Defaults to true.
• noisemixing: Handle noise processes that are not of the form du[i] = f(u[i]). For example, to compute the sensitivities of an SDE with diagonal diffusion
function g_mixing!(du,u,p,t)
du[1] = p[3]*u[1] + p[4]*u[2]
du[2] = p[3]*u[1] + p[4]*u[2]
nothing
end
correctly, noisemixing=true must be enabled. The default is false.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

Applicability of Backsolve and Caution

When BacksolveAdjoint is applicable, it is a fast method and requires the least memory. However, one must be cautious because not all ODEs are stable under backwards integration by the majority of ODE solvers. An example of such an equation is the Lorenz equation. Notice that if one solves the Lorenz equation forward and then in reverse with any adaptive time step and non-reversible integrator, then the backwards solution diverges from the forward solution. As a quick demonstration:

using Sundials
function lorenz(du,u,p,t)
du[1] = 10.0*(u[2]-u[1])
du[2] = u[1]*(28.0-u[3]) - u[2]
du[3] = u[1]*u[2] - (8/3)*u[3]
end
u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
prob2 = ODEProblem(lorenz,sol[end],(100.0,0.0))
sol = solve(prob,Tsit5(),reltol=1e-12,abstol=1e-12)
@show sol[end]-u0 #[-3.22091, -1.49394, 21.3435]

Thus one should check the stability of the backsolve on their type of problem before enabling this method. Additionally, using checkpointing with backsolve can be a low memory way to stabilize it.

For more details on this topic, see Stiff Neural Ordinary Differential Equations.

Checkpointing

To improve the numerical stability of the reverse pass, BacksolveAdjoint includes a checkpointing feature. If sol.u is a time series, then whenever a time sol.t is hit while reversing, a callback will replace the reversing ODE portion with sol.u[i]. This nudges the solution back onto the appropriate trajectory and reduces the numerical caused by drift.

SciMLProblem Support

This sensealg only supports ODEProblems, SDEProblems, and RODEProblems. This sensealg supports callback functions (events).

References

ODE: Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Chen, R.T.Q. and Rubanova, Y. and Bettencourt, J. and Duvenaud, D. K., Neural ordinary differential equations. In Advances in neural information processing systems, pp. 6571–6583 (2018)

Pontryagin, L. S. and Mishchenko, E.F. and Boltyanskii, V.G. and Gamkrelidze, R.V. The mathematical theory of optimal processes. Routledge, (1962)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

DAE: Cao, Y. and Li, S. and Petzold, L. and Serban, R., Adjoint sensitivity analysis for differential-algebraic equations: The adjoint DAE system and its numerical solution, SIAM journal on scientific computing 24 pp: 1076-1089 (2003)

SDE: Gobet, E. and Munos, R., Sensitivity Analysis Using Ito-Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control, SIAM Journal on control and optimization, 43, pp. 1676-1713 (2005)

Li, X. and Wong, T.-K. L.and Chen, R. T. Q. and Duvenaud, D., Scalable Gradients for Stochastic Differential Equations, PMLR 108, pp. 3870-3882 (2020), http://proceedings.mlr.press/v108/li20i.html

SciMLSensitivity.InterpolatingAdjointType
InterpolatingAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis which uses the interpolation of the forward solution for the reverse solve vector-Jacobian products. By default it requires a dense solution of the forward pass and will internally ignore saving arguments during the gradient calculation. When checkpointing is enabled it will only require the memory to interpolate between checkpoints.

Constructor

function InterpolatingAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,
checkpointing=false, noisemixing=false)

Keyword Arguments

• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• autojacvec: Calculate the vector-Jacobian product (J'*v) via automatic differentiation with special seeding. The default is true. The total set of choices are:
• false: the Jacobian is constructed via FiniteDiff.jl
• true: the Jacobian is constructed via ForwardDiff.jl
• TrackerVJP: Uses Tracker.jl for the vjp.
• ZygoteVJP: Uses Zygote.jl for the vjp.
• EnzymeVJP: Uses Enzyme.jl for the vjp.
• ReverseDiffVJP(compile=false): Uses ReverseDiff.jl for the vjp. compile is a boolean for whether to precompile the tape, which should only be done if there are no branches (if or while statements) in the f function.
• checkpointing: whether checkpointing is enabled for the reverse pass. Defaults to false.
• noisemixing: Handle noise processes that are not of the form du[i] = f(u[i]). For example, to compute the sensitivities of an SDE with diagonal diffusion
function g_mixing!(du,u,p,t)
du[1] = p[3]*u[1] + p[4]*u[2]
du[2] = p[3]*u[1] + p[4]*u[2]
nothing
end
correctly, noisemixing=true must be enabled. The default is false.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

Checkpointing

To reduce the memory usage of the reverse pass, InterpolatingAdjoint includes a checkpointing feature. If sol is dense, checkpointing is ignored and the continuous solution is used for calculating u(t) at arbitrary time points. If checkpointing=true and sol is not dense, then dense intervals between sol.t[i] and sol.t[i+1] are reconstructed on-demand for calculating u(t) at arbitrary time points. This reduces the total memory requirement to only the cost of holding the dense solution over the largest time interval (in terms of number of required steps). The total compute cost is no more than double the original forward compute cost.

SciMLProblem Support

This sensealg only supports ODEProblems, SDEProblems, and RODEProblems. This sensealg supports callbacks (events).

References

Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

SciMLSensitivity.QuadratureAdjointType
QuadratureAdjoint{CS,AD,FDT,VJP} <: AbstractAdjointSensitivityAlgorithm{CS,AD,FDT}

An implementation of adjoint sensitivity analysis which develops a full continuous solution of the reverse solve in order to perform a post-ODE quadrature. This method requires the the dense solution and will ignore saving arguments during the gradient calculation. The tolerances in the constructor control the inner quadrature. The inner quadrature uses a ReverseDiff vjp if autojacvec, and compile=false by default but can compile the tape under the same circumstances as ReverseDiffVJP.

This method is O(n^3 + p) for stiff / implicit equations (as opposed to the O((n+p)^3) scaling of BacksolveAdjoint and InterpolatingAdjoint), and thus is much more compute efficient. However, it requires holding a dense reverse pass and is thus memory intensive.

Constructor

function QuadratureAdjoint(;chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=nothing,abstol=1e-6,
reltol=1e-3,compile=false)

Keyword Arguments

• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• autojacvec: Calculate the vector-Jacobian product (J'*v) via automatic differentiation with special seeding. The default is true. The total set of choices are:
• false: the Jacobian is constructed via FiniteDiff.jl
• true: the Jacobian is constructed via ForwardDiff.jl
• TrackerVJP: Uses Tracker.jl for the vjp.
• ZygoteVJP: Uses Zygote.jl for the vjp.
• EnzymeVJP: Uses Enzyme.jl for the vjp.
• ReverseDiffVJP(compile=false): Uses ReverseDiff.jl for the vjp. compile is a boolean for whether to precompile the tape, which should only be done if there are no branches (if or while statements) in the f function.
• abstol: absolute tolerance for the quadrature calculation
• reltol: relative tolerance for the quadrature calculation
• compile: whether to compile the vjp calculation for the integrand calculation. See ReverseDiffVJP for more details.

For more details on the vjp choices, please consult the sensitivity algorithms documentation page or the docstrings of the vjp types.

SciMLProblem Support

This sensealg only supports ODEProblems. This sensealg supports events (callbacks).

References

Rackauckas, C. and Ma, Y. and Martensen, J. and Warner, C. and Zubov, K. and Supekar, R. and Skinner, D. and Ramadhana, A. and Edelman, A., Universal Differential Equations for Scientific Machine Learning, arXiv:2001.04385

Hindmarsh, A. C. and Brown, P. N. and Grant, K. E. and Lee, S. L. and Serban, R. and Shumaker, D. E. and Woodward, C. S., SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers, ACM Transactions on Mathematical Software (TOMS), 31, pp:363–396 (2005)

Rackauckas, C. and Ma, Y. and Dixit, V. and Guo, X. and Innes, M. and Revels, J. and Nyberg, J. and Ivaturi, V., A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions, arXiv:1812.01892

Kim, S., Ji, W., Deng, S., Ma, Y., & Rackauckas, C. (2021). Stiff neural ordinary differential equations. Chaos: An Interdisciplinary Journal of Nonlinear Science, 31(9), 093122.

SciMLSensitivity.ReverseDiffAdjointType
ReverseDiffAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}

An implementation of discrete adjoint sensitivity analysis using the ReverseDiff.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.

Constructor

ReverseDiffAdjoint()

SciMLProblem Support

This sensealg supports any DEProblem if the algorithm is SciMLBase.isautodifferentiable. Requires that the state variables are CPU-based Array types.

SciMLSensitivity.TrackerAdjointType
TrackerAdjoint <: AbstractAdjointSensitivityAlgorithm{nothing,true,nothing}

An implementation of discrete adjoint sensitivity analysis using the Tracker.jl tracing-based AD. Supports in-place functions through an Array of Structs formulation, and supports out of place through struct of arrays.

Constructor

TrackerAdjoint()

SciMLProblem Support

This sensealg supports any DEProblem if the algorithm is SciMLBase.isautodifferentiable Compatible with a limited subset of AbstractArray types for u0, including CuArrays.

Warn

TrackerAdjoint is incompatible with Stiff ODE solvers using forward-mode automatic differentiation for the Jacobians. Thus for example, TRBDF2() will error. Instead, use autodiff=false, i.e. TRBDF2(autodiff=false). This will only remove the forward-mode automatic differentiation of the Jacobian construction, not the reverse-mode AD usage, and thus performance will still be nearly the same, though Jacobian accuracy may suffer which could cause more steps to be required.

SciMLSensitivity.ZygoteAdjointType

An implementation of discrete adjoint sensitivity analysis using the Zygote.jl source-to-source AD directly on the differential equation solver.

Constructor

ZygoteAdjoint()

SciMLProblem Support

Currently fails on almost every solver.

SciMLSensitivity.ForwardLSSType
ForwardLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the discrete, forward-mode least squares shadowing (LSS) method. LSS replaces the ill-conditioned initial value probem (ODEProblem) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the ODEProblem. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of T^(-1/2), where T is the time of the trajectory. See NILSS() and NILSAS() for a more efficient non-intrusive formulation.

Constructor

ForwardLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSregularizer=TimeDilation(10.0,0.0,0.0),
g=nothing)

Keyword Arguments

• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• LSSregularizer: Using LSSregularizer, one can choose between three different regularization routines. The default choice is TimeDilation(10.0,0.0,0.0).
• CosWindowing(): cos windowing of the time grid, i.e. the time grid (saved time steps) is transformed using a cosine.
• Cos2Windowing(): cos^2 windowing of the time grid.
• TimeDilation(alpha::Number,t0skip::Number,t1skip::Number): Corresponds to a time dilation. alpha controls the weight. t0skip and t1skip indicate the times truncated at the beginnning and end of the trajectory, respectively.
• g: instantaneous objective function of the long-time averaged objective.

SciMLProblem Support

This sensealg only supports ODEProblems. This sensealg does not support events (callbacks). This sensealg assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

References

Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).

Wang, Q., Convergence of the Least Squares Shadowing Method for Computing Derivative of Ergodic Averages, SIAM Journal on Numerical Analysis, 52, 156–170 (2014).

Blonigan, P., Gomez, S., Wang, Q., Least Squares Shadowing for sensitivity analysis of turbulent fluid flows, in: 52nd Aerospace Sciences Meeting, 1–24 (2014).

SciMLSensitivity.AdjointLSSType
AdjointLSS{CS,AD,FDT,RType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the discrete, adjoint-mode least square shadowing method. LSS replaces the ill-conditioned initial value probem (ODEProblem) for chaotic systems by a well-conditioned least-squares problem. This allows for computing sensitivities of long-time averaged quantities with respect to the parameters of the ODEProblem. The computational cost of LSS scales as (number of states x number of time steps). Converges to the correct sensitivity at a rate of T^(-1/2), where T is the time of the trajectory. See NILSS() and NILSAS() for a more efficient non-intrusive formulation.

Constructor

AdjointLSS(;
chunk_size=0,autodiff=true,
diff_type=Val{:central},
LSSRegularizer=TimeDilation(10.0,0.0,0.0),
g=nothing)

Keyword Arguments

• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• LSSregularizer: Using LSSregularizer, one can choose between different regularization routines. The default choice is TimeDilation(10.0,0.0,0.0).
• TimeDilation(alpha::Number,t0skip::Number,t1skip::Number): Corresponds to a time dilation. alpha controls the weight. t0skip and t1skip indicate the times truncated at the beginnning and end of the trajectory, respectively. The default value for t0skip and t1skip is zero(alpha).
• g: instantaneous objective function of the long-time averaged objective.

SciMLProblem Support

This sensealg only supports ODEProblems. This sensealg does not support events (callbacks). This sensealg assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

References

Wang, Q., Hu, R., and Blonigan, P. Least squares shadowing sensitivity analysis of chaotic limit cycle oscillations. Journal of Computational Physics, 267, 210-224 (2014).

SciMLSensitivity.NILSSType
struct NILSS{CS,AD,FDT,RNG,nType,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the forward-mode, continuous non-intrusive least squares shadowing method. NILSS allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an ODEProblem by constraining the computation to the unstable subspace. NILSS employs the continuous-time ForwardSensitivity method as tangent solver. To avoid an exponential blow-up of the (homogenous and inhomogenous) tangent solutions, the trajectory should be divided into sufficiently small segments, where the tangent solutions are rescaled on the interfaces. The computational and memory cost of NILSS scale with the number of unstable (positive) Lyapunov exponents (instead of the number of states as in the LSS method). NILSS avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over ForwardLSS.

Constructor

NILSS(nseg, nstep; nus = nothing,
rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
autojacvec=autodiff,
g=nothing)

Arguments

• nseg: Number of segments on full time interval on the attractor.
• nstep: number of steps on each segment.

Keyword Arguments

• nus: Dimension of the unstable subspace. Default is nothing. nus must be smaller or equal to the state dimension (length(u0)). With the default choice, nus = length(u0) - 1 will be set at compile time.
• rng: (Pseudo) random number generator. Used for initializing the homogenous tangent states (w). Default is Xorshifts.Xoroshiro128Plus(rand(UInt64)).
• autodiff: Use automatic differentiation in the internal sensitivity algorithm computations. Default is true.
• chunk_size: Chunk size for forward mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• autojacvec: Calculate the Jacobian-vector product via automatic differentiation with special seeding.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• g: instantaneous objective function of the long-time averaged objective.

SciMLProblem Support

This sensealg only supports ODEProblems. This sensealg does not support events (callbacks). This sensealg assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

References

Ni, A., Blonigan, P. J., Chater, M., Wang, Q., Zhang, Z., Sensitivity analy- sis on chaotic dynamical system by Non-Intrusive Least Square Shadowing (NI-LSS), in: 46th AIAA Fluid Dynamics Conference, AIAA AVIATION Forum (AIAA 2016-4399), American Institute of Aeronautics and Astronautics, 1–16 (2016).

Ni, A., and Wang, Q. Sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Shadowing (NILSS). Journal of Computational Physics 347, 56-77 (2017).

SciMLSensitivity.NILSASType
NILSAS{CS,AD,FDT,RNG,SENSE,gType} <: AbstractShadowingSensitivityAlgorithm{CS,AD,FDT}

An implementation of the adjoint-mode, continuous non-intrusive adjoint least squares shadowing method. NILSAS allows for computing sensitivities of long-time averaged quantities with respect to the parameters of an ODEProblem by constraining the computation to the unstable subspace. NILSAS employs SciMLSensitivity.jl's continuous adjoint sensitivity methods on each segment to compute (homogenous and inhomogenous) adjoint solutions. To avoid an exponential blow-up of the adjoint solutions, the trajectory should be divided into sufficiently small segments, where the adjoint solutions are rescaled on the interfaces. The computational and memory cost of NILSAS scale with the number of unstable, adjoint Lyapunov exponents (instead of the number of states as in the LSS method). NILSAS avoids the explicit construction of the Jacobian at each time step and thus should generally be preferred (for large system sizes) over AdjointLSS. NILSAS is favourable over NILSS for many parameters because NILSAS computes the gradient with respect to multiple parameters with negligible additional cost.

Constructor

NILSAS(nseg, nstep, M=nothing; rng = Xorshifts.Xoroshiro128Plus(rand(UInt64)),
chunk_size=0,autodiff=true,
diff_type=Val{:central},
g=nothing
)

Arguments

• nseg: Number of segments on full time interval on the attractor.
• nstep: number of steps on each segment.
• M: number of homogenous adjoint solutions. This number must be bigger or equal than the number of (positive, adjoint) Lyapunov exponents. Default is nothing.

Keyword Arguments

• rng: (Pseudo) random number generator. Used for initializing the terminate conditions of the homogenous adjoint states (w). Default is Xorshifts.Xoroshiro128Plus(rand(UInt64)).
• adjoint_sensealg: Continuous adjoint sensitivity method to compute homogenous and inhomogenous adjoint solutions on each segment. Default is BacksolveAdjoint(autojacvec=ReverseDiffVJP()).
• autojacvec: Calculate the vector-Jacobian product (J'*v) via automatic
differentiation with special seeding. The default is true. The total set of choices are:
• false: the Jacobian is constructed via FiniteDiff.jl
• true: the Jacobian is constructed via ForwardDiff.jl
• TrackerVJP: Uses Tracker.jl for the vjp.
• ZygoteVJP: Uses Zygote.jl for the vjp.
• EnzymeVJP: Uses Enzyme.jl for the vjp.
• ReverseDiffVJP(compile=false): Uses ReverseDiff.jl for the vjp. compile is a boolean for whether to precompile the tape, which should only be done if there are no branches (if or while statements) in the f function.
• autodiff: Use automatic differentiation for constructing the Jacobian if the Jacobian needs to be constructed. Defaults to true.
• chunk_size: Chunk size for forward-mode differentiation if full Jacobians are built (autojacvec=false and autodiff=true). Default is 0 for automatic choice of chunk size.
• diff_type: The method used by FiniteDiff.jl for constructing the Jacobian if the full Jacobian is required with autodiff=false.
• g: instantaneous objective function of the long-time averaged objective.

SciMLProblem Support

This sensealg only supports ODEProblems. This sensealg does not support events (callbacks). This sensealg assumes that the objective is a long-time averaged quantity and ergodic, i.e. the time evolution of the system behaves qualitatively the same over infinite time independent of the specified initial conditions, such that only the sensitivity with respect to the parameters is of interest.

References

Ni, A., and Talnikar, C., Adjoint sensitivity analysis on chaotic dynamical systems by Non-Intrusive Least Squares Adjoint Shadowing (NILSAS). Journal of Computational Physics 395, 690-709 (2019).

## Vector-Jacobian Product (VJP) Choices

SciMLSensitivity.ZygoteVJPType
ZygoteVJP <: VJPChoice

Uses Zygote.jl to compute vector-Jacobian products. Tends to be the fastest VJP method if the ODE/DAE/SDE/DDE is written with mostly vectorized functions (like neural networks and other layers from Flux.jl) and the f functions is given out-of-place. If the f function is in-place, then Zygote.Buffer arrays are used internally which can greatly reduce the performance of the VJP method.

Constructor

ZygoteVJP(;allow_nothing=false)

Keyword arguments:

• allow_nothing: whether nothings should be implicitly converted to zeros. In Zygote, the derivative of a function with respect to p which does not use p in any possible calculation is given a derivative of nothing instead of zero. By default, this nothing is caught in order to throw an informative error message about a potentially unintentional misdefined function. However, if this was intentional, setting allow_nothing=true will remove the error message.
SciMLSensitivity.EnzymeVJPType
EnzymeVJP <: VJPChoice

Uses Enzyme.jl to compute vector-Jacobian products. Is the fastest VJP whenever applicable, though Enzyme.jl currently has low coverage over the Julia programming language, for example restricting the user's defined f function to not do things like require garbage collection or calls to BLAS/LAPACK. However, mutation is supported, meaning that in-place f with fully mutating non-allocating code will work with Enzyme (provided no high level calls to C like BLAS/LAPACK are used) and this will be the most efficient adjoint implementation.

Constructor

EnzymeVJP(;chunksize=0)

Keyword Arguments

• chunksize: the default chunk size for the temporary variables inside of the vjp's right hand side definition. This is used for compatibility with ODE solves that default to using ForwardDiff.jl for the Jacobian of the stiff ODE solve, such as OrdinaryDiffEq.jl. This should be set to the maximum chunksize that can occur during an integration to preallocate the DualCaches for PreallocationTools.jl. It defaults to 0, using ForwardDiff.pickchunksize but could be decreased if this value is known to be lower to conserve memory.
SciMLSensitivity.TrackerVJPType
TrackerVJP <: VJPChoice

Uses Tracker.jl to compute the vector-Jacobian products. If f is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if f is out-of-place then it uses an array-based reverse mode.

Not as efficient as ReverseDiffVJP, but supports GPUs when doing array-based reverse mode.

Constructor

TrackerVJP(;allow_nothing=false)

Keyword arguments:

• allow_nothing: whether non-tracked values should be implicitly converted to zeros. In Tracker, the derivative of a function with respect to p which does not use p in any possible calculation is given an untracked return instead of zero. By default, this nothing Trackedness is caught in order to throw an informative error message about a potentially unintentional misdefined function. However, if this was intentional, setting allow_nothing=true will remove the error message.
SciMLSensitivity.ReverseDiffVJPType
ReverseDiffVJP{compile} <: VJPChoice

Uses ReverseDiff.jl to compute the vector-Jacobian products. If f is in-place, then it uses a array of structs formulation to do scalarized reverse mode, while if f is out-of-place then it uses an array-based reverse mode.

Usually the fastest when scalarized operations exist in the f function (like in scientific machine learning applications like Universal Differential Equations) and the boolean compilation is enabled (i.e. ReverseDiffVJP(true)), if EnzymeVJP fails on a given choice of f.

Does not support GPUs (CuArrays).

Constructor

ReverseDiffVJP(compile=false)

Keyword Arguments

• compile: Whether to cache the compilation of the reverse tape. This heavily increases the performance of the method but requires that the f function of the ODE/DAE/SDE/DDE has no branching.

## More Details on Sensitivity Algorithm Choices

The following section describes a bit more details to consider when choosing a sensitivity algorithm.

### Optimize-then-Discretize

The original neural ODE paper popularized optimize-then-discretize with O(1) adjoints via backsolve. This is the methodology BacksolveAdjoint When training non-stiff neural ODEs, BacksolveAdjoint with ZygoteVJP is generally the fastest method. Additionally, this method does not require storing the values of any intermediate points and is thus the most memory efficient. However, BacksolveAdjoint is prone to instabilities whenever the Lipschitz constant is sufficiently large, like in stiff equations, PDE discretizations, and many other contexts, so it is not used by default. When training a neural ODE for machine learning applications, the user should try BacksolveAdjoint and see if it is sufficiently accurate on their problem. More details on this topic can be found in Stiff Neural Ordinary Differential Equations

Note that DiffEqFlux's implementation of BacksolveAdjoint includes an extra feature BacksolveAdjoint(checkpointing=true) which mixes checkpointing with BacksolveAdjoint. What this method does is that, at saveat points, values from the forward pass are saved. Since the reverse solve should numerically be the same as the forward pass, issues with divergence of the reverse pass are mitigated by restarting the reverse pass at the saveat value from the forward pass. This reduces the divergence and can lead to better gradients at the cost of higher memory usage due to having to save some values of the forward pass. This can stabilize the adjoint in some applications, but for highly stiff applications the divergence can be too fast for this to work in practice.

To avoid the issues of backwards solving the ODE, InterpolatingAdjoint and QuadratureAdjoint utilize information from the forward pass. By default these methods utilize the continuous solution provided by DifferentialEquations.jl in the calculations of the adjoint pass. QuadratureAdjoint uses this to build a continuous function for the solution of adjoint equation and then performs an adaptive quadrature via Quadrature.jl, while InterpolatingAdjoint appends the integrand to the ODE so it's computed simultaneously to the Lagrange multiplier. When memory is not an issue, we find that the QuadratureAdjoint approach tends to be the most efficient as it has a significantly smaller adjoint differential equation and the quadrature converges very fast, but this form requires holding the full continuous solution of the adjoint which can be a significant burden for large parameter problems. The InterpolatingAdjoint is thus a compromise between memory efficiency and compute efficiency, and is in the same spirit as CVODES.

However, if the memory cost of the InterpolatingAdjoint is too high, checkpointing can be used via InterpolatingAdjoint(checkpointing=true). When this is used, the checkpoints default to sol.t of the forward pass (i.e. the saved timepoints usually set by saveat). Then in the adjoint, intervals of sol.t[i-1] to sol.t[i] are re-solved in order to obtain a short interpolation which can be utilized in the adjoints. This at most results in two full solves of the forward pass, but dramatically reduces the computational cost while being a low-memory format. This is the preferred method for highly stiff equations when memory is an issue, i.e. stiff PDEs or large neural DAEs.

For forward-mode, the ForwardSensitivty is the version that performs the optimize-then-discretize approach. In this case, autojacvec corresponds to the method for computing J*v within the forward sensitivity equations, which is either true or false for whether to use Jacobian-free forward-mode AD (via ForwardDiff.jl) or Jacobian-free numerical differentiation.

### Discretize-then-Optimize

In this approach the discretization is done first and then optimization is done on the discretized system. While traditionally this can be done discrete sensitivity analysis, this is can be equivalently done by automatic differentiation on the solver itself. ReverseDiffAdjoint performs reverse-mode automatic differentiation on the solver via ReverseDiff.jl, ZygoteAdjoint performs reverse-mode automatic differentiation on the solver via Zygote.jl, and TrackerAdjoint performs reverse-mode automatic differentiation on the solver via Tracker.jl. In addition, ForwardDiffSensitivty performs forward-mode automatic differentiation on the solver via ForwardDiff.jl.

We note that many studies have suggested that this approach produces more accurate gradients than the optimize-than-discretize approach