Optimisers.jl
Installation: OptimizationOptimisers.jl
To use this package, install the OptimizationOptimisers package:
import Pkg;
Pkg.add("OptimizationOptimisers");
In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage also provides the Sophia optimisation algorithm.
List of optimizers
Optimisers.Descent
: Classic gradient descent optimizer with learning ratesolve(problem, Descent(η))
η
is the learning rateDefaults:
η = 0.1
Optimisers.Momentum
: Classic gradient descent optimizer with learning rate and momentumsolve(problem, Momentum(η, ρ))
η
is the learning rateρ
is the momentumDefaults:
η = 0.01
ρ = 0.9
Optimisers.Nesterov
: Gradient descent optimizer with learning rate and Nesterov momentumsolve(problem, Nesterov(η, ρ))
η
is the learning rateρ
is the Nesterov momentumDefaults:
η = 0.01
ρ = 0.9
Optimisers.RMSProp
: RMSProp optimizersolve(problem, RMSProp(η, ρ))
η
is the learning rateρ
is the momentumDefaults:
η = 0.001
ρ = 0.9
Optimisers.Adam
: Adam optimizersolve(problem, Adam(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
Optimisers.RAdam
: Rectified Adam optimizersolve(problem, RAdam(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
Optimisers.OAdam
: Optimistic Adam optimizersolve(problem, OAdam(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.5, 0.999)
Optimisers.AdaMax
: AdaMax optimizersolve(problem, AdaMax(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
Optimisers.ADAGrad
: ADAGrad optimizersolve(problem, ADAGrad(η))
η
is the learning rateDefaults:
η = 0.1
Optimisers.ADADelta
: ADADelta optimizersolve(problem, ADADelta(ρ))
ρ
is the gradient decay factorDefaults:
ρ = 0.9
Optimisers.AMSGrad
: AMSGrad optimizersolve(problem, AMSGrad(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
Optimisers.NAdam
: Nesterov variant of the Adam optimizersolve(problem, NAdam(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
Optimisers.AdamW
: AdamW optimizersolve(problem, AdamW(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsdecay
is the decay to weightsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)
decay = 0
Optimisers.ADABelief
: ADABelief variant of Adamsolve(problem, ADABelief(η, β::Tuple))
η
is the learning rateβ::Tuple
is the decay of momentumsDefaults:
η = 0.001
β::Tuple = (0.9, 0.999)