Lorenz System Forecasting
This example expands on the readme Lorenz system forecasting to better showcase how to use methods and functions provided in the library for Echo State Networks. Here the prediction method used is Generative
, for a more detailed explanation of the differences between Generative
and Predictive
please refer to the other examples given in the documentation.
Generating the data
Starting off the workflow, the first step is to obtain the data. Leveraging OrdinaryDiffEq
it is possible to derive the Lorenz system data in the following way:
using OrdinaryDiffEq
#define lorenz system
function lorenz!(du, u, p, t)
du[1] = 10.0 * (u[2] - u[1])
du[2] = u[1] * (28.0 - u[3]) - u[2]
du[3] = u[1] * u[2] - (8 / 3) * u[3]
end
#solve and take data
prob = ODEProblem(lorenz!, [1.0, 0.0, 0.0], (0.0, 200.0))
data = solve(prob, ABM54(); dt=0.02)
data = reduce(hcat, data.u)
3×10001 Matrix{Float64}:
1.0 0.868014 0.846922 0.912746 … -5.41248 -4.48696 -3.72246
0.0 0.511611 0.972162 1.43659 -0.380216 -0.265298 -0.291495
0.0 0.00464843 0.0167238 0.0363891 29.841 28.3209 26.8714
After obtaining the data, it is necessary to determine the kind of prediction for the model. Since this example will use the Generative
prediction type, this means that the target data will be the next step of the input data. In addition, it is important to notice that the Lorenz system just obtained presents a transient period that is not representative of the general behavior of the system. This can easily be discarded by setting a shift
parameter.
#determine shift length, training length and prediction length
shift = 300
train_len = 5000
predict_len = 1250
#split the data accordingly
input_data = data[:, shift:(shift + train_len - 1)]
target_data = data[:, (shift + 1):(shift + train_len)]
test_data = data[:, (shift + train_len + 1):(shift + train_len + predict_len)]
3×1250 Matrix{Float64}:
4.2235 3.83533 3.55846 3.38262 … 9.44212 10.0034 10.5085
1.98782 2.18137 2.43519 2.7396 12.3382 12.6956 12.8396
25.6556 24.4858 23.3794 22.3394 24.3255 25.4374 26.6729
It is important to notice that the data needs to be formatted in a matrix with the features as rows and time steps as columns as in this example. This is needed even if the time series consists of single values.
Building the Echo State Network
Once the data is ready, it is possible to define the parameters for the ESN and the ESN
struct itself. In this example, the values from (Pathak et al., 2017) are loosely followed as general guidelines.
using ReservoirComputing
#define ESN parameters
res_size = 300
in_size = 3
res_radius = 1.2
res_sparsity = 6 / 300
input_scaling = 0.1
#build ESN struct
esn = ESN(input_data, in_size, res_size;
reservoir=rand_sparse(; radius=res_radius, sparsity=res_sparsity),
input_layer=weighted_init(; scaling=input_scaling),
reservoir_driver=RNN(),
nla_type=NLADefault(),
states_type=StandardStates())
ESN(3 => 300)
Most of the parameters chosen here mirror the default ones, so a direct call is not necessary. The readme example is identical to this one, except for the explicit call. Going line by line to see what is happening, starting from res_size
: this value determines the dimensions of the reservoir matrix. In this case, a size of 300 has been chosen, so the reservoir matrix will be 300 x 300. This is not always the case, since some input layer constructions can modify the dimensions of the reservoir, but in that case, everything is taken care of internally.
The res_radius
determines the scaling of the spectral radius of the reservoir matrix; a proper scaling is necessary to assure the Echo State Property. The default value in the rand_sparse
method is 1.0 in accordance with the most commonly followed guidelines found in the literature (see (Lukoševičius, 2012) and references therein). The sparsity
of the reservoir matrix in this case is obtained by choosing a degree of connections and dividing that by the reservoir size. Of course, it is also possible to simply choose any value between 0.0 and 1.0 to test behaviors for different sparsity values.
The value of input_scaling
determines the upper and lower bounds of the uniform distribution of the weights in the weighted_init
. The value of 0.1 represents the default. The default input layer is the scaled_rand
, a dense matrix. The details of the weighted version can be found in (Lu et al., 2017), for this example, this version returns the best results.
The reservoir driver represents the dynamics of the reservoir. In the standard ESN definition, these dynamics are obtained through a Recurrent Neural Network (RNN), and this is reflected by calling the RNN
driver for the ESN
struct. This option is set as the default, and unless there is the need to change parameters, it is not needed. The full equation is the following:
\[\textbf{x}(t+1) = (1-\alpha)\textbf{x}(t) + \alpha \cdot \text{tanh}(\textbf{W}\textbf{x}(t)+\textbf{W}_{\text{in}}\textbf{u}(t))\]
where $α$ represents the leaky coefficient, and tanh can be any activation function. Also, $\textbf{x}$ represents the state vector, $\textbf{u}$ the input data, and $\textbf{W}, \textbf{W}_{\text{in}}$ are the reservoir matrix and input matrix, respectively. The default call to the RNN in the library is the following RNN(;activation_function=tanh, leaky_coefficient=1.0)
, where the meaning of the parameters is clear from the equation above. Instead of the hyperbolic tangent, any activation function can be used, either leveraging external libraries such as NNlib
or creating a custom one.
The final calls are modifications to the states in training or prediction. The default calls, depicted in the example, do not make any modifications to the states. This is the safest bet if one is not sure how these work. The nla_type
applies a non-linear algorithm to the states, while the states_type
can expand them by concatenating them with the input data, or padding them by concatenating a constant value to all the states. More in depth descriptions of these parameters are given in other examples in the documentation.
Training and Prediction
Now that the ESN has been created and all the parameters have been explained, it is time to proceed with the training. The full call of the readme example follows this general idea:
#define training method
training_method = StandardRidge(0.0)
#obtain output layer
output_layer = train(esn, target_data, training_method)
OutputLayer successfully trained with output size: 3
The training returns an OutputLayer
struct containing the trained output matrix and other needed for the prediction. The necessary elements in the train()
call are the ESN
struct created in the previous step and the target_data
, which in this case is the one step ahead evolution of the Lorenz system. The training method chosen in this example is the standard one, so an equivalent way of calling the train
function here is output_layer = train(esn, target_data)
like the readme basic version. Likewise, the default value for the ridge regression parameter is set to zero, so the actual default training is Ordinary Least Squares regression. Other training methods are available and will be explained in the following examples.
Once the OutputLayer
has been obtained, the prediction can be done following this procedure:
output = esn(Generative(predict_len), output_layer)
3×1250 Matrix{Float64}:
4.22376 3.83589 3.55846 3.38306 … -1.03036 -1.41823 -1.75763
1.98878 2.18157 2.43625 2.73974 -3.12407 -3.21149 -3.38362
25.6609 24.4894 23.394 22.359 22.2517 21.1782 20.1691
both the training method and the output layer are needed in this call. The number of steps for the prediction must be specified in the Generative
method. The output results are given in a matrix.
While the states are saved in the ESN
struct for the training, for the prediction they are not saved by default. To inspect the states, it is necessary to pass the boolean keyword argument save_states
to the prediction call, in this example using esn(... ; save_states=true)
. This returns a tuple (output, states)
where size(states) = res_size, prediction_len
To inspect the results, they can easily be plotted using an external library. In this case, Plots
is adopted:
using Plots, Plots.PlotMeasures
ts = 0.0:0.02:200.0
lorenz_maxlyap = 0.9056
predict_ts = ts[(shift + train_len + 1):(shift + train_len + predict_len)]
lyap_time = (predict_ts .- predict_ts[1]) * (1 / lorenz_maxlyap)
p1 = plot(lyap_time, [test_data[1, :] output[1, :]]; label=["actual" "predicted"],
ylabel="x(t)", linewidth=2.5, xticks=false, yticks=-15:15:15);
p2 = plot(lyap_time, [test_data[2, :] output[2, :]]; label=["actual" "predicted"],
ylabel="y(t)", linewidth=2.5, xticks=false, yticks=-20:20:20);
p3 = plot(lyap_time, [test_data[3, :] output[3, :]]; label=["actual" "predicted"],
ylabel="z(t)", linewidth=2.5, xlabel="max(λ)*t", yticks=10:15:40);
plot(p1, p2, p3; plot_title="Lorenz System Coordinates",
layout=(3, 1), xtickfontsize=12, ytickfontsize=12, xguidefontsize=15,
yguidefontsize=15,
legendfontsize=12, titlefontsize=20)
References
- Lu, Z.; Pathak, J.; Hunt, B.; Girvan, M.; Brockett, R. and Ott, E. (2017). Reservoir observers: Model-free inference of unmeasured variables in chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 27.
- Lukoševičius, M. (2012). A Practical Guide to Applying Echo State Networks. In: Neural Networks: Tricks of the Trade (Springer Berlin Heidelberg); pp. 659–686.
- Pathak, J.; Lu, Z.; Hunt, B. R.; Girvan, M. and Ott, E. (2017). Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data. Chaos: An Interdisciplinary Journal of Nonlinear Science 27.