Hyperopt - A bayesian Parameter Tuning Framework
Recently I was working on a in-class competition from the How to win a data science competition; Coursera course. You can start for free with the 7-day Free Trial. Learned a lot of new things from that about using XGBoost for time series prediction tasks.
The one thing that I tried out in this competition was the Hyperopt package - A bayesian Parameter Tuning Framework. And I was literally amazed. Left the machine with hyperopt in the night. And in the morning I had my results. It was really awesome and I did avoid a lot of hit and trial.
What really is Hyperopt?
From the site:
Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
What the above means is that it is a optimizer that could minimize/maximize the loss function/accuracy(or whatever metric) for you.
All of us are fairly known to cross-grid search or random-grid search. Hyperopt takes as an input a space of hyperparams in which it will search, and moves according to the result of past trials.
To know more about how it does this, take a look at this paper by J Bergstra. Here is the documentation from github.
How?
Let me just put the code first. This is how I define the objective function. The objective function takes space(the hyperparam space) as the input and returns the loss(The thing you want to minimize.Or negative of the thing you want to maximize)
(X,y) and (Xcv,ycv) are the train and cross validation dataframes respectively.
We have defined a hyperparam space by using the variable space
which is actually just a dictionary. We could choose different distributions for different parameter values.
We use the fmin
function from the hyperopt package to minimize our fn
through the space
.
from sklearn.metrics import mean_squared_error
import xgboost as xgb
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
import numpy as np
def objective(space):
print(space)
clf = xgb.XGBRegressor(n_estimators =1000,colsample_bytree=space['colsample_bytree'],
learning_rate = .3,
max_depth = int(space['max_depth']),
min_child_weight = space['min_child_weight'],
subsample = space['subsample'],
gamma = space['gamma'],
reg_lambda = space['reg_lambda'],)
eval_set = [( X, y), ( Xcv, ycv)]
clf.fit(X, y,
eval_set=eval_set, eval_metric="rmse",
early_stopping_rounds=10,verbose=False)
pred = clf.predict(Xcv)
mse_scr = mean_squared_error(ycv, pred)
print "SCORE:", np.sqrt(mse_scr)
#change the metric if you like
return {'loss':mse_scr, 'status': STATUS_OK }
space ={'max_depth': hp.quniform("x_max_depth", 4, 16, 1),
'min_child_weight': hp.quniform ('x_min_child', 1, 10, 1),
'subsample': hp.uniform ('x_subsample', 0.7, 1),
'gamma' : hp.uniform ('x_gamma', 0.1,0.5),
'colsample_bytree' : hp.uniform ('x_colsample_bytree', 0.7,1),
'reg_lambda' : hp.uniform ('x_reg_lambda', 0,1)
}
trials = Trials()
best = fmin(fn=objective,
space=space,
algo=tpe.suggest,
max_evals=100,
trials=trials)
print best
Finally:
Running the above gives us pretty good hyperparams for our learning algorithm. In fact I bagged up the results from multiple hyperparam settings and it gave me the best score on the LB. If you like this and would like to get more information about such things, subscribe to the mailing list on the right hand side. Also I would definitely recommend this course about winning Kaggle competitions by Kazanova, Kaggle rank 3 . Do take a look.