Hyperparameter Optimization¶
Elliot provides hyperparameter tuning optimization integrating the functionalities of the HyperOpt library and extending it with exhaustive grid search.
Before continuing, let us recall how to include a recommendation system into an experiment:
experiment:
models:
PMF:
meta:
hyper_max_evals: 20
hyper_opt_alg: tpe
validation_rate: 1
verbose: True
save_weights: True
save_recs: True
validation_metric: nDCG@10
lr: 0.0025
epochs: 2
factors: 50
batch_size: 512
reg: 0.0025
reg_b: 0
gaussian_variance: 0.1
As we can observe, the meta section contains two fields that are related to hyperparameter optimization: hyper_max_evals
, and hyper_opt_alg
.
hyper_opt_alg
is a string field that defines the hyperparameter tuning strategy
hyper_opt_alg
can assume one of the following values: grid, tpe, *atpe, rand, mix, and anneal.
grid corresponds to exhaustive grid search
tpe stands for Tree of Parzen Estimators, a type of Bayesian Optimization, see the paper
atpe stands for Adaptive Tree of Parzen Estimators
rand stands for random sampling in the search space
mix stands for mixture of search algorithms
anneal stands for simulated annealing
hyper_max_evals
is an int field that, where applicable (all strategies but grid), defines the number of samples to consider for hyperparameter evaluation
Once we choose the search strategy, we need to define the search space. To this end, Elliot provides two alternatives: a value list, and a function-parameters pair.
In the former case, we just need to provide a list of values to the parameter we want to optimize:
experiment:
models:
PMF:
meta:
hyper_max_evals: 20
hyper_opt_alg: tpe
lr: 0.0025
epochs: 2
factors: 50
batch_size: 512
reg: [0.0025, 0.005, 0.01]
reg_b: 0
gaussian_variance: 0.1
In the latter case, we can choose among the search space functions provided by HyperOpt: choice, randint, uniform, quniform, loguniform, qloguniform, normal, qnormal, lognormal, qlognormal. Each function and its parameters are documented at the page in the section Parameters Expression.
Note that the label argument is internal and DO NOT have to provide it.
To teach Elliot to sample from any of these search spaces is straightforward: we pass to the parameter a list in which the first element is the function name, and the others are the parameter values.
An example of the syntax to define a search with loguniform for the learning rate parameter (lr) is:
experiment:
models:
PMF:
meta:
hyper_max_evals: 20
hyper_opt_alg: tpe
lr: [loguniform, -10, -1]
epochs: 2
factors: 50
batch_size: 512
reg: [0.0025, 0.005, 0.01]
reg_b: 0
gaussian_variance: 0.1
Finally, Elliot provides a shortcut to perform an exhaustive grid search.
We can avoid inserting hyper_opt_alg
and hyper_max_evals
fields and we directly insert the lists of possible values for the parameters to optimize:
experiment:
models:
PMF:
meta:
validation_rate: 1
verbose: True
save_weights: True
save_recs: True
validation_metric: nDCG@10
lr: [0.0025, 0.005, 0.01]
epochs: 50
factors: [10, 50, 100]
batch_size: 512
reg: [0.0025, 0.005, 0.01]
reg_b: 0
gaussian_variance: 0.1
In this case, Elliot recognizes that hyperparameter optimization is needed and automatically performs the grid search.