Optimisers.jl

Installation: OptimizationFlux.jl

To use this package, install the OptimizationOptimisers package:

import Pkg; Pkg.add("OptimizationOptimisers")

Local Unconstrained Optimizers

  • Optimisers.Descent: Classic gradient descent optimizer with learning rate

    • solve(problem, Descent(η))
    • η is the learning rate
    • Defaults:
      • η = 0.1
  • Optimisers.Momentum: Classic gradient descent optimizer with learning rate and momentum

    • solve(problem, Momentum(η, ρ))
    • η is the learning rate
    • ρ is the momentum
    • Defaults:
      • η = 0.01
      • ρ = 0.9
  • Optimisers.Nesterov: Gradient descent optimizer with learning rate and Nesterov momentum

    • solve(problem, Nesterov(η, ρ))
    • η is the learning rate
    • ρ is the Nesterov momentum
    • Defaults:
      • η = 0.01
      • ρ = 0.9
  • Optimisers.RMSProp: RMSProp optimizer

    • solve(problem, RMSProp(η, ρ))
    • η is the learning rate
    • ρ is the momentum
    • Defaults:
      • η = 0.001
      • ρ = 0.9
  • Optimisers.Adam: Adam optimizer

    • solve(problem, Adam(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.RAdam: Rectified Adam optimizer

    • solve(problem, RAdam(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.RAdam: Optimistic Adam optimizer

    • solve(problem, OAdam(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.5, 0.999)
  • Optimisers.AdaMax: AdaMax optimizer

    • solve(problem, AdaMax(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.ADAGrad: ADAGrad optimizer

    • solve(problem, ADAGrad(η))
    • η is the learning rate
    • Defaults:
      • η = 0.1
  • Optimisers.ADADelta: ADADelta optimizer

    • solve(problem, ADADelta(ρ))
    • ρ is the gradient decay factor
    • Defaults:
      • ρ = 0.9
  • Optimisers.AMSGrad: AMSGrad optimizer

    • solve(problem, AMSGrad(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.NAdam: Nesterov variant of the Adam optimizer

    • solve(problem, NAdam(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Optimisers.AdamW: AdamW optimizer

    • solve(problem, AdamW(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • decay is the decay to weights
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
      • decay = 0
  • Optimisers.ADABelief: ADABelief variant of Adam

    • solve(problem, ADABelief(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)