Optimisers.jl

Installation: OptimizationOptimisers.jl

To use this package, install the OptimizationOptimisers package:

import Pkg;
Pkg.add("OptimizationOptimisers");

In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage also provides the Sophia optimisation algorithm.

List of optimizers

  • Optimisers.Descent: Classic gradient descent optimizer with learning rate

    • solve(problem, Descent(η))

    • η is the learning rate

    • Defaults:

      • η = 0.1
  • Optimisers.Momentum: Classic gradient descent optimizer with learning rate and momentum

    • solve(problem, Momentum(η, ρ))

    • η is the learning rate

    • ρ is the momentum

    • Defaults:

      • η = 0.01
      • ρ = 0.9
  • Optimisers.Nesterov: Gradient descent optimizer with learning rate and Nesterov momentum

    • solve(problem, Nesterov(η, ρ))

    • η is the learning rate

    • ρ is the Nesterov momentum

    • Defaults:

      • η = 0.01
      • ρ = 0.9
  • Optimisers.RMSProp: RMSProp optimizer

    • solve(problem, RMSProp(η, ρ))

    • η is the learning rate

    • ρ is the momentum

    • Defaults:

      • η = 0.001
      • ρ = 0.9
  • Optimisers.Adam: Adam optimizer

    • solve(problem, Adam(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.RAdam: Rectified Adam optimizer

    • solve(problem, RAdam(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.OAdam: Optimistic Adam optimizer

    • solve(problem, OAdam(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.5, 0.999)
  • Optimisers.AdaMax: AdaMax optimizer

    • solve(problem, AdaMax(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.ADAGrad: ADAGrad optimizer

    • solve(problem, ADAGrad(η))

    • η is the learning rate

    • Defaults:

      • η = 0.1
  • Optimisers.ADADelta: ADADelta optimizer

    • solve(problem, ADADelta(ρ))

    • ρ is the gradient decay factor

    • Defaults:

      • ρ = 0.9
  • Optimisers.AMSGrad: AMSGrad optimizer

    • solve(problem, AMSGrad(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.NAdam: Nesterov variant of the Adam optimizer

    • solve(problem, NAdam(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.AdamW: AdamW optimizer

    • solve(problem, AdamW(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • decay is the decay to weights

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)
      • decay = 0
  • Optimisers.ADABelief: ADABelief variant of Adam

    • solve(problem, ADABelief(η, β::Tuple))

    • η is the learning rate

    • β::Tuple is the decay of momentums

    • Defaults:

      • η = 0.001
      • β::Tuple = (0.9, 0.999)