Flux.jl

Installation: OptimizationFlux.jl

To use this package, install the OptimizationFlux package:

import Pkg; Pkg.add("OptimizationFlux")
Warn

Flux's optimizers are soon to be deprecated by Optimisers.jl Because of this, we recommend using the OptimizationOptimisers.jl setup instead of OptimizationFlux.jl

Local Unconstrained Optimizers

  • Flux.Optimise.Descent: Classic gradient descent optimizer with learning rate

    • solve(problem, Descent(η))
    • η is the learning rate
    • Defaults:
      • η = 0.1
  • Flux.Optimise.Momentum: Classic gradient descent optimizer with learning rate and momentum

    • solve(problem, Momentum(η, ρ))
    • η is the learning rate
    • ρ is the momentum
    • Defaults:
      • η = 0.01
      • ρ = 0.9
  • Flux.Optimise.Nesterov: Gradient descent optimizer with learning rate and Nesterov momentum

    • solve(problem, Nesterov(η, ρ))
    • η is the learning rate
    • ρ is the Nesterov momentum
    • Defaults:
      • η = 0.01
      • ρ = 0.9
  • Flux.Optimise.RMSProp: RMSProp optimizer

    • solve(problem, RMSProp(η, ρ))
    • η is the learning rate
    • ρ is the momentum
    • Defaults:
      • η = 0.001
      • ρ = 0.9
  • Flux.Optimise.ADAM: ADAM optimizer

    • solve(problem, ADAM(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Flux.Optimise.RADAM: Rectified ADAM optimizer

    • solve(problem, RADAM(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Flux.Optimise.AdaMax: AdaMax optimizer

    • solve(problem, AdaMax(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Flux.Optimise.ADAGRad: ADAGrad optimizer

    • solve(problem, ADAGrad(η))
    • η is the learning rate
    • Defaults:
      • η = 0.1
  • Flux.Optimise.ADADelta: ADADelta optimizer

    • solve(problem, ADADelta(ρ))
    • ρ is the gradient decay factor
    • Defaults:
      • ρ = 0.9
  • Flux.Optimise.AMSGrad: AMSGrad optimizer

    • solve(problem, AMSGrad(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Flux.Optimise.NADAM: Nesterov variant of the ADAM optimizer

    • solve(problem, NADAM(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
  • Flux.Optimise.ADAMW: ADAMW optimizer

    • solve(problem, ADAMW(η, β::Tuple))
    • η is the learning rate
    • β::Tuple is the decay of momentums
    • decay is the decay to weights
    • Defaults:
      • η = 0.001
      • β::Tuple = (0.9, 0.999)
      • decay = 0