## Random forests surrogate tutorial

Note

This surrogate requires the 'SurrogatesRandomForest' module which can be added by inputting "]add SurrogatesRandomForest" from the Julia command line.

Random forests is a supervised learning algorithm that randomly creates and merges multiple decision trees into one forest.

We are going to use a Random forests surrogate to optimize $f(x)=sin(x)+sin(10/3 * x)$.

First of all import Surrogates and Plots.

using Surrogates
using SurrogatesRandomForest
using Plots
default()

### Sampling

We choose to sample f in 4 points between 0 and 1 using the sample function. The sampling points are chosen using a Sobol sequence, this can be done by passing SobolSample() to the sample function.

f(x) = sin(x) + sin(10/3 * x)
n_samples = 5
lower_bound = 2.7
upper_bound = 7.5
x = sample(n_samples, lower_bound, upper_bound, SobolSample())
y = f.(x)
scatter(x, y, label="Sampled points", xlims=(lower_bound, upper_bound))
plot!(f, label="True function", xlims=(lower_bound, upper_bound), legend=:top)

### Building a surrogate

With our sampled points we can build the Random forests surrogate using the RandomForestSurrogate function.

randomforest_surrogate behaves like an ordinary function which we can simply plot. Addtionally you can specify the number of trees created using the parameter num_round

num_round = 2
randomforest_surrogate = RandomForestSurrogate(x ,y ,lower_bound, upper_bound, num_round = 2)
plot(x, y, seriestype=:scatter, label="Sampled points", xlims=(lower_bound, upper_bound), legend=:top)
plot!(f, label="True function",  xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function",  xlims=(lower_bound, upper_bound), legend=:top)

### Optimizing

Having built a surrogate, we can now use it to search for minimas in our original function f.

To optimize using our surrogate we call surrogate_optimize method. We choose to use Stochastic RBF as optimization technique and again Sobol sampling as sampling technique.

@show surrogate_optimize(f, SRBF(), lower_bound, upper_bound, randomforest_surrogate, SobolSample())
scatter(x, y, label="Sampled points")
plot!(f, label="True function",  xlims=(lower_bound, upper_bound), legend=:top)
plot!(randomforest_surrogate, label="Surrogate function",  xlims=(lower_bound, upper_bound), legend=:top)

## Random Forest ND

First of all we will define the Bukin Function N. 6 function we are going to build surrogate for.

function bukin6(x)
x1=x
x2=x
term1 = 100 * sqrt(abs(x2 - 0.01*x1^2));
term2 = 0.01 * abs(x1+10);
y = term1 + term2;
end
bukin6 (generic function with 1 method)

### Sampling

Let's define our bounds, this time we are working in two dimensions. In particular we want our first dimension x to have bounds -5, 10, and 0, 15 for the second dimension. We are taking 50 samples of the space using Sobol Sequences. We then evaluate our function on all of the sampling points.

n_samples = 50
lower_bound = [-5.0, 0.0]
upper_bound = [10.0, 15.0]

xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = bukin6.(xys);
50-element Vector{Float64}:
194.9863428566373
337.5008932148196
50.077894542763865
278.1482001224633
144.741627014022
297.34172797144976
236.82569468282983
364.77730599103734
107.31580991739821
288.43082001226617
⋮
321.7658205306883
245.71026580746002
379.2949065629214
224.57861320373783
346.1622092497987
107.16557646894123
294.1946966016903
128.62475225964087
301.415259049581

### Building a surrogate

Using the sampled points we build the surrogate, the steps are analogous to the 1-dimensional case.

using SurrogatesRandomForest
RandomForest = RandomForestSurrogate(xys, zs,  lower_bound, upper_bound)
p1 = surface(x, y, (x, y) -> RandomForest([x y])) # hide
scatter!(xs, ys, zs, marker_z=zs) # hide
p2 = contour(x, y, (x, y) -> RandomForest([x y])) # hide
scatter!(xs, ys, marker_z=zs) # hide
plot(p1, p2, title="Surrogate") # hide

### Optimizing

With our surrogate we can now search for the minimas of the function.

Notice how the new sampled points, which were created during the optimization process, are appended to the xys array. This is why its size changes.

size(xys)
(50,)
surrogate_optimize(bukin6, SRBF(), lower_bound, upper_bound, RandomForest, SobolSample(), maxiters=20)
size(xys)
(50,)
p1 = surface(x, y, (x, y) -> RandomForest([x y])) # hide
xs = [xy for xy in xys] # hide
ys = [xy for xy in xys] # hide
zs = bukin6.(xys) # hide
scatter!(xs, ys, zs, marker_z=zs) # hide
p2 = contour(x, y, (x, y) -> RandomForest([x y])) # hide
scatter!(xs, ys, marker_z=zs) # hide
plot(p1, p2) # hide