2

I've always been taught that decreasing the learning rate parameter in gbdt models such as XGBoost, LightGBM and Catboost will improve the out-of-sample performance, assuming the number of iterations is increased accordingly and all else equal.

However, I see a lot of people (including some Kaggle competition winners) include a search space for the learning rate parameter when they are doing hyperparameter tuning through a grid search or by using an algorithm like Optuna. For example, they could specify a search space of 0.01 to 0.2 for the learning rate parameter. This does not make sense to me as I always subscribed to the idea that a lower learning rate leads to better performance, so creating a search space for the learning rate parameter would not add any value as the learning rate should always gravitate towards the lower bound.

This leads me to the following questions:

  • Does a lower learning rate always lead to better out of sample performance, assuming the number of iterations is increased accordingly and all else equal, or can the out of sample performance also deteriorate with a lower learning rate?
  • Does it make sense to tune the learning rate through a grid search or Optuna? If a lower learning rate always leads to better performance, it would make more sense to me to fix the learning rate at the lowest value you can afford (taking into account the size of your dataset and the computational resources you have available) instead of specifying a search space for it.
Casper
  • 21
  • 1

0 Answers0