Deep Echo State Networks

Deep Echo State Network architectures started to gain some traction recently. In this guide, we illustrate how it is possible to use ReservoirComputing.jl to build a deep ESN.

The network implemented in this library is taken from (Gallicchio and Micheli, 2017). It works by stacking reservoirs on top of each other, feeding the output from one into the next. The states are obtained by merging all the inner states of the stacked reservoirs. For a more in-depth explanation, refer to the paper linked above.

Lorenz Example

For this example, we are going to reuse the Lorenz data used in the Lorenz System Forecasting example.

using OrdinaryDiffEq

#define lorenz system
function lorenz!(du, u, p, t)
    du[1] = 10.0 * (u[2] - u[1])
    du[2] = u[1] * (28.0 - u[3]) - u[2]
    du[3] = u[1] * u[2] - (8 / 3) * u[3]
end

#solve and take data
prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0], (0.0, 200.0))
data = solve(prob, ABM54(); dt=0.02)
data = reduce(hcat, data.u)

#determine shift length, training length and prediction length
shift = 300
train_len = 5000
predict_len = 1250

#split the data accordingly
input_data = data[:, shift:(shift + train_len - 1)]
target_data = data[:, (shift + 1):(shift + train_len)]
test_data = data[:, (shift + train_len + 1):(shift + train_len + predict_len)]

3×1250 Matrix{Float64}:
  4.2235    3.83533   3.55846   3.38262  …   9.44212  10.0034  10.5085
  1.98782   2.18137   2.43519   2.7396      12.3382   12.6956  12.8396
 25.6556   24.4858   23.3794   22.3394      24.3255   25.4374  26.6729

Again, it is important to notice that the data needs to be formatted in a matrix, with the features as rows and time steps as columns, as in this example. This is needed even if the time series consists of single values.

The construction of the ESN is also really similar. The only difference is that the reservoir can be fed as an array of reservoirs.

using ReservoirComputing

reservoirs = [rand_sparse(; radius=1.1, sparsity=0.1),
    rand_sparse(; radius=1.2, sparsity=0.1),
    rand_sparse(; radius=1.4, sparsity=0.1)]

esn = DeepESN(input_data, 3, 200;
    reservoir=reservoirs,
    reservoir_driver=RNN(),
    nla_type=NLADefault(),
    states_type=StandardStates())

DeepESN{Int64, Matrix{Float64}, NLADefault, Vector{Matrix{Float64}}, RNN{typeof(NNlib.tanh_fast), Float64}, Vector{Matrix{Float64}}, Vector{Vector{Float32}}, StandardStates, Int64, Matrix{Float64}}(200, [-9.903289241214775 -9.691839850827593 … 6.13395766702855 5.3690197189757916; -9.018088588651105 -8.473011482849701 … 1.9972030630363637 1.862732065250123; 29.81517777140388 29.93605690750639 … 29.48232612752881 28.165385874059083], NLADefault(), [[0.013907174655313104 -0.05716939898613221 -0.006155524219557851; -0.08855201872557712 -0.006872478885491318 -0.024412371250722355; … ; -0.05567417608791839 0.020958240616840065 -0.05463514734412738; 0.08647650301013335 -0.004830822042974714 0.09832851499374105], [0.04213248726958181 -0.08713980785528797 … 0.09277984251585046 0.07165217731267358; 0.0594666227164502 -0.017200683557866192 … 0.001515976012656517 -0.016257560549243967; … ; -0.061682611611786814 -0.05278295296902507 … -0.0013732916261887996 -0.05357622749813545; -0.04561198555057036 -0.055659463829441695 … -0.037538777812366565 -0.06064975265589079]], RNN{typeof(NNlib.tanh_fast), Float64}(NNlib.tanh_fast, 1.0), [[0.0 0.0 … 0.0 0.0; 0.0 0.03679667105415618 … 0.0 -0.1504413499781463; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.1378192372786613 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]], Vector{Float32}[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], StandardStates(), 0, [0.191895004462442 -0.8727291477054525 … -0.9275994264160081 -0.9284136807209246; 0.207993968483316 0.966678677968091 … 0.12853425853116815 0.2223888777182524; … ; 0.0 -0.659747033987273 … -0.9023082828245332 -0.9005549127379939; 0.0 -0.12214704901669225 … 0.01667419825698542 -0.04947580062109448])

The input layer and bias can also be given as vectors, but of course, they have to be of the same size of the reservoirs vector. If they are not passed as a vector, the value passed will be used for all the layers in the deep ESN.

In addition to using the provided functions for the construction of the layers, the user can also choose to build their own matrix, or array of matrices, and feed that into the ESN in the same way.

The training and prediction follow the usual framework:

training_method = StandardRidge(0.0)
output_layer = train(esn, target_data, training_method)

output = esn(Generative(predict_len), output_layer)

3×1250 Matrix{Float64}:
  4.22356   3.83532   3.5583    3.38233  …   -7.58927   -8.00124   -8.44168
  1.98757   2.18088   2.43452   2.73873      -9.55775  -10.1418   -10.6934
 25.6556   24.4857   23.3794   22.3395      -22.8437   -23.1538   -23.6205

Plotting the results:

using Plots

ts = 0.0:0.02:200.0
lorenz_maxlyap = 0.9056
predict_ts = ts[(shift + train_len + 1):(shift + train_len + predict_len)]
lyap_time = (predict_ts .- predict_ts[1]) * (1 / lorenz_maxlyap)

p1 = plot(lyap_time, [test_data[1, :] output[1, :]]; label=["actual" "predicted"],
    ylabel="x(t)", linewidth=2.5, xticks=false, yticks=-15:15:15);
p2 = plot(lyap_time, [test_data[2, :] output[2, :]]; label=["actual" "predicted"],
    ylabel="y(t)", linewidth=2.5, xticks=false, yticks=-20:20:20);
p3 = plot(lyap_time, [test_data[3, :] output[3, :]]; label=["actual" "predicted"],
    ylabel="z(t)", linewidth=2.5, xlabel="max(λ)*t", yticks=10:15:40);

plot(p1, p2, p3; plot_title="Lorenz System Coordinates",
    layout=(3, 1), xtickfontsize=12, ytickfontsize=12, xguidefontsize=15,
    yguidefontsize=15,
    legendfontsize=12, titlefontsize=20)

References

Gallicchio, C. and Micheli, A. (2017). Deep echo state network (deepesn): A brief survey, arXiv preprint arXiv:1712.04323.