Is there a way to add more importance to points which are more recent when analyzing data with xgboost?
Asked
Active
Viewed 4.8k times
3 Answers
43
Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.
data <- data.frame(feature = rep(5, 5),
year = seq(2011, 2015),
target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01
#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature),
label = data$target,
weight = weightsData)
wacax
- 3,500
- 4
- 26
- 48
24
On Python you have a nice scikit-learn wrapper, so you can write just like this:
import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)
More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit
lucidyan
- 341
- 2
- 5
15
You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.
TBSRounder
- 883
- 6
- 12