Optimisers.jl
Installation: OptimizationOptimisers.jl
To use this package, install the OptimizationOptimisers package:
import Pkg;
Pkg.add("OptimizationOptimisers");In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage also provides the Sophia optimisation algorithm.
List of optimizers
Optimisers.Descent: Classic gradient descent optimizer with learning ratesolve(problem, Descent(η))ηis the learning rateDefaults:
η = 0.1
Optimisers.Momentum: Classic gradient descent optimizer with learning rate and momentumsolve(problem, Momentum(η, ρ))ηis the learning rateρis the momentumDefaults:
η = 0.01ρ = 0.9
Optimisers.Nesterov: Gradient descent optimizer with learning rate and Nesterov momentumsolve(problem, Nesterov(η, ρ))ηis the learning rateρis the Nesterov momentumDefaults:
η = 0.01ρ = 0.9
Optimisers.RMSProp: RMSProp optimizersolve(problem, RMSProp(η, ρ))ηis the learning rateρis the momentumDefaults:
η = 0.001ρ = 0.9
Optimisers.Adam: Adam optimizersolve(problem, Adam(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)
Optimisers.RAdam: Rectified Adam optimizersolve(problem, RAdam(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)
Optimisers.OAdam: Optimistic Adam optimizersolve(problem, OAdam(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.5, 0.999)
Optimisers.AdaMax: AdaMax optimizersolve(problem, AdaMax(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)
Optimisers.ADAGrad: ADAGrad optimizersolve(problem, ADAGrad(η))ηis the learning rateDefaults:
η = 0.1
Optimisers.ADADelta: ADADelta optimizersolve(problem, ADADelta(ρ))ρis the gradient decay factorDefaults:
ρ = 0.9
Optimisers.AMSGrad: AMSGrad optimizersolve(problem, AMSGrad(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)
Optimisers.NAdam: Nesterov variant of the Adam optimizersolve(problem, NAdam(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)
Optimisers.AdamW: AdamW optimizersolve(problem, AdamW(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsdecayis the decay to weightsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)decay = 0
Optimisers.ADABelief: ADABelief variant of Adamsolve(problem, ADABelief(η, β::Tuple))ηis the learning rateβ::Tupleis the decay of momentumsDefaults:
η = 0.001β::Tuple = (0.9, 0.999)