Getting Started with ReservoirComputing.jl

This is an introductory tutorial for ReservoirComputing.jl. We will showcase a typical use case by creating an Echo State Network (ESN) and training it to reproduce the dynamics of the chaotic Lorenz system.

Installing ReservoirComputing

ReservoirComputing.jl is registered in the General registry, so it can be installed through the Julia package manager:

using Pkg
Pkg.add("ReservoirComputing")

Copy-Pastable Simplified Example

If you wish to just get some code running to get started, the following code block provides an end-to-end simplified runnable example. The rest of this page will delve into more details, expanding on various aspects of the example.

using OrdinaryDiffEq
using Plots
using Random
using ReservoirComputing

Random.seed!(42)
rng = MersenneTwister(17)

function lorenz(du, u, p, t)
    du[1] = p[1] * (u[2] - u[1])
    du[2] = u[1] * (p[2] - u[3]) - u[2]
    du[3] = u[1] * u[2] - p[3] * u[3]
end

prob = ODEProblem(lorenz, [1.0f0, 0.0f0, 0.0f0], (0.0, 200.0), Float32[10.0, 28.0, 8/3])
data = Array(solve(prob, ABM54(); dt=0.02f0))
shift = 300
train_len = 5000
predict_len = 1250

input_data = data[:, shift:(shift + train_len - 1)]
target_data = data[:, (shift + 1):(shift + train_len)]
test = data[:, (shift + train_len):(shift + train_len + predict_len - 1)]

esn = ESN(3, 300, 3; init_reservoir=rand_sparse(; radius=1.2, sparsity=6/300),
    state_modifiers=NLAT2)

ps, st = setup(rng, esn)
ps, st = train!(esn, input_data, target_data, ps, st)
output, st = predict(esn, predict_len, ps, st; initialdata=test[:, 1])

plot(transpose(output)[:, 1], transpose(output)[:, 2], transpose(output)[:, 3];
    label="predicted")
plot!(transpose(test)[:, 1], transpose(test)[:, 2], transpose(test)[:, 3];
    label="actual")
Example block output

Congrats, you trained your first ESN! Now let's go into more detail.

Generating the data

Starting off the workflow, the first step is to obtain the data. We use OrdinaryDiffEq to derive the Lorenz system data. The data is passed to the model as a matrix, where the rows are the features and the columns represent the time steps.

using OrdinaryDiffEq

# define lorenz system
function lorenz!(du, u, p, t)
    du[1] = 10.0 * (u[2] - u[1])
    du[2] = u[1] * (28.0 - u[3]) - u[2]
    du[3] = u[1] * u[2] - (8 / 3) * u[3]
end

# solve and take data
prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0], (0.0, 200.0))
data = solve(prob, ABM54(); dt=0.02)
data = reduce(hcat, data.u)
3×10001 Matrix{Float64}:
 1.0  0.868014    0.846922   0.912746   …  10.0936  10.8484  11.5179  12.0466
 0.0  0.511611    0.972162   1.43659       13.9941  14.4536  14.5604  14.2441
 0.0  0.00464843  0.0167238  0.0363891     23.6141  25.297   27.1539  29.0613

Now we split the data into training and testing. To do an autoregressive forecast we want the model to be trained on the next step, so we are going to shift the target data by one. Additionally, we discard the transient period.

#determine shift length, training length and prediction length
shift = 300
train_len = 5000
predict_len = 1250

#split the data accordingly
input_data = data[:, shift:(shift + train_len - 1)]
target_data = data[:, (shift + 1):(shift + train_len)]
test_data = data[:, (shift + train_len + 1):(shift + train_len + predict_len)]
3×1250 Matrix{Float64}:
  5.97702   6.52739   7.12578   7.77037  …  -1.11904  -1.33419  -1.5963
  8.6124    9.39855  10.2371   11.0985      -2.09217  -2.51955  -3.04071
 20.2622   20.3067   20.5579   21.0389       8.8609    8.45571   8.09576

It is important to notice that the data needs to be formatted in a matrix with the features as rows and time steps as columns as in this example. This is needed even if the time series consists of single values (ie, the data can be represented by a vector).

Building the Echo State Network

Once the data is ready, it is possible to define the parameters for the ESN and the ESN struct itself. In this example, the values from (Pathak et al., 2017) are loosely followed as general guidelines.

using ReservoirComputing

#define ESN parameters
res_size = 300
in_size = 3
res_radius = 1.2
res_sparsity = 6 / 300
input_scaling = 0.1

#build ESN struct
esn = ESN(in_size, res_size, in_size; #autoregressive so in_size = out_size
    init_reservoir = rand_sparse(; radius = res_radius, sparsity = res_sparsity),
    init_input = weighted_init(; scaling = input_scaling),
    state_modifiers = NLAT2,
    leak_coefficient=1.0, # default value
    init_state = randn32, # default value
    use_bias=false, # default value
    init_bias = zeros32, # default value, not used since use_bias=false
    readout_activation=identity, # default value
)
ESN(
    reservoir = StatefulLayer(ESNCell(3 => 300, use_bias=false)),
    state_modifiers = (WrappedFunction(NLAT2)),
    readout = LinearReadout(300 => 3, use_bias=false, include_collect=true)
)

In this case, a size of 300 has been chosen, so the reservoir matrix will be 300 x 300. However, this is not always the case, since some input layer constructions can modify the dimensions of the reservoir. Please make sure to read the API documentation of the initializer you intend to use if you think that is cause of errors.

The res_radius determines the scaling of the spectral radius of the reservoir matrix; a proper scaling is necessary to assure the echo state property. The default value in the rand_sparse method is 1.0 in accordance with the most commonly followed guidelines found in the literature (see (Lukoševičius, 2012) and references therein).

The value of input_scaling determines the upper and lower bounds of the uniform distribution of the weights in the weighted_init. The value of 0.1 represents the default. The default input layer is the scaled_rand, a dense matrix. The details of the weighted version can be found in (Lu et al., 2017). For this example, this version returns the best results.

Training and Prediction

Training for ESNs usually means solving a linear regression. The library supports solvers from MLJLinearModels.jl, in addition to a custom implementation of ridge regression StandardRidge. In this example we will use the latter.

Since ReservoirComputing.jl builds on LuxCore.jl we first need to setup the state and the parameters

using Random
Random.seed!(42)
rng = MersenneTwister(17)

ps, st = setup(rng, esn)
((reservoir = (input_matrix = Float32[0.07099056 0.0 0.0; 0.044342138 0.0 0.0; … ; 0.0 0.0 0.04174912; 0.0 0.0 -0.017985344], reservoir_matrix = Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]), states_modifiers = (NamedTuple(),), readout = (weight = Float32[0.14745247 0.31642652 … 0.79039073 0.70270693; 0.50063944 0.76482 … 0.04451108 0.991169; 0.54582596 0.87618923 … 0.6872307 0.841046],)), (reservoir = (cell = (rng = Random.MersenneTwister(17, (0, 222892, 221442, 206)),), carry = nothing), states_modifiers = (NamedTuple(),), readout = NamedTuple()))

Now we can proceed with training the ESN model. Usually an initial transient is discarded, to account for the dynamics of the ESN to settle. This can be done by passing the washout keyword argument to train.

#define training method
training_method = StandardRidge(0.0)

ps, st = train!(esn, input_data, target_data, ps, st, training_method;
    washout = 0 # we use no washout
)
((reservoir = (input_matrix = Float32[0.07099056 0.0 0.0; 0.044342138 0.0 0.0; … ; 0.0 0.0 0.04174912; 0.0 0.0 -0.017985344], reservoir_matrix = Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]), states_modifiers = (NamedTuple(),), readout = (weight = [1.9344386565831162 1.6562549803918925 … -3.1849580651295346 -0.344731530029857; -3.3205596387187573 -0.5413116818330274 … -1.3444708046782436 -0.0739779239885029; -1.1705776872051683 -4.228598039588664 … -22.01006845321229 -6.156995779271625],)), (reservoir = (cell = (rng = Random.MersenneTwister(17, (0, 222892, 221442, 511)),), carry = ([0.783705692014893; 0.24665530843663533; … ; 0.9041354192988876; -0.6778029687722302;;],)), states_modifiers = (NamedTuple(),), readout = NamedTuple(), states = [-0.8906872435716704 -0.4978971528997346 … 0.7967611124768925 0.783705692014893; 0.0643670784276153 -0.2940836204726356 … 0.23875726643393388 0.24665530843663533; … ; 0.11395078172130134 -0.11604497671972241 … -0.13293119376425544 -0.13121365533741364; 0.03308747271752237 -0.5642452857900837 … -0.6845912857664761 -0.6778029687722302]))

ps now contains the trained parameters for the ESN.

Returning training states

The ESN states are internally used the training, however they are not returned by default. To inspect the states, it is necessary to set the boolean keyword argument return_states as true in the train! call.

(ps, st), states = train!(esn, input_data, target_data, ps, st, training_method;
    return_states = true
)

ReservoirComputing.jl provides additional utilities functions for autoregressive forecasting:

output, st = predict(esn, predict_len, ps, st; initialdata=test_data[:, 1])
([6.6058979731365035 7.102524237384551 … -15.39348471913014 -15.994717587944182; 9.109339693261244 9.464431614926621 … -19.736392121400165 -17.508016759429296; 20.110584868144606 20.290597309889797 … 31.73396849608948 35.8293362526921], (reservoir = (cell = (rng = Random.MersenneTwister(17, (0, 222892, 221442, 511)),), carry = ([-0.031201269660067994; -0.32493621528452243; … ; 0.9661545685365969; -0.6108011781280058;;],)), states_modifiers = (NamedTuple(),), readout = NamedTuple()))

To inspect the results, they can easily be plotted using an external library. In this case, we will use Plots.jl:

using Plots, Plots.PlotMeasures

ts = 0.0:0.02:200.0
lorenz_maxlyap = 0.9056
predict_ts = ts[(shift + train_len + 1):(shift + train_len + predict_len)]
lyap_time = (predict_ts .- predict_ts[1]) * (1 / lorenz_maxlyap)

p1 = plot(lyap_time, [test_data[1, :] output[1, :]]; label=["actual" "predicted"],
    ylabel="x(t)", linewidth=2.5, xticks=false, yticks=-15:15:15);
p2 = plot(lyap_time, [test_data[2, :] output[2, :]]; label=["actual" "predicted"],
    ylabel="y(t)", linewidth=2.5, xticks=false, yticks=-20:20:20);
p3 = plot(lyap_time, [test_data[3, :] output[3, :]]; label=["actual" "predicted"],
    ylabel="z(t)", linewidth=2.5, xlabel="max(λ)*t", yticks=10:15:40);

plot(p1, p2, p3; plot_title="Lorenz System Coordinates",
    layout=(3, 1), xtickfontsize=12, ytickfontsize=12, xguidefontsize=15,
    yguidefontsize=15,
    legendfontsize=12, titlefontsize=20)
Example block output

References