3

I'm working on a prediction project where we have a lot cyclical features such as hour of the day, weekday, month, day of year, etc etc. After some searching I decided to follow the advice here.

Now I have the sin and cos component for every cyclical feature as a separate feature, so month becomes month_sin and month_cos. However, I don't know for sure whether the model can deal with this correlation, as both components need to be equally weighted in order for the feature to make sense. The model assigns different weights to the sin and cos components after training though. My intuition tells me that this is bad, but I'm not sure what to do about it.

Currently gbm (R) gives the best results. For a gradient boosting model, is it better to force equal weights on the two correlated features, or is it better to let the model figure it out even if it results in different weights on the two components? Or would you suggest an entirely different approach?

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
Armen
  • 33
  • 6

1 Answers1

4

as both components need to be equally weighted in order for the feature to make sense

That is not the case.

For instance if $\text{sin}(\theta)$ of the cyclical feature is weighted strongly, it means that the original feature has the strongest positive effect on output at $\theta = \frac{\pi}{2}$.

If the two features are weighted equally, then the focus is around $\theta = \frac{\pi}{4}$

In general, you should expect different weights to apply to the different components, depending on which hours of the day maximise the target variable and which minimise it.

For a gradient boosting model, is it better to force equal weights on the two correlated features, or is it better to let the model figure it out even if it results in different weights on the two components?

Definitely better to let the model figure it out in this case.

Your main concern would be whether a one-hot-encoded representation might be better than a cyclical representation for your problem. One-hot allows for arbitrary relationships to each hour, but adds more dimensions - and so may require more examples of each time. Cyclical has less dimensions, but is more likely to include non-linear effects, if for instance min and max effect on output target variable are not exactly 12 hours apart - thus it may require a more complex model.

Neil Slater
  • 29,388
  • 5
  • 82
  • 101