XGboost Classifier predicits different results for same samples depending on test dataset size

Question

I train a simple xgboost classifier model with the following lines.

xgb_model = xgb.XGBClassifier(objective="binary:logistic", random_state=42)
xgb_model.fit(X_train, y_train)
ypred_1 = xgb_model.predict(X_test_1)
ypred_2 = xgb_model.predict(X_test_2)

Then I use two test data sets, of which X_test_2 is a subset of X_test_1. When predicting the two test data sets, the model gives different predictions for some samples (which are identical in both data sets). Even if I run the predictions in batches, the different predicted samples differ depending on the batch size. Only when I predict both test data sets in samples, the predictions are the same for all samples. The same behavior was also observed when using XGboost.DMatrix.

Does anyone have an explanation for this behavior?

score 4 · Answer 1 · edited Jun 14 '22 at 16:06

So the mistake was to use the DMatrix datatype while using categorical data in the dataset. I split the data and then stored it in a DMatrix with enable_categorical=True. This encoded the categorical data differently in the sub-dataset than in the entire test dataset, and due to that the predictions were different.

Better way to go: Encode before splitting and then store in DMatrix with enable_categorical=False.

score 1 · Answer 2 · answered Apr 11 '25 at 14:44

I had this issue, too. I decided to use pandas and construct the CategoricalDtype, but this has to be done at training AND prediction. The problem was that when you pandas can only encode the categories it sees with astype('category'). This is usually fine at training when you use all possible values for features, but at prediction, you often don't have examples for ever possible feature category.

Instead of this:

df['catcol'].astype('category')

Do this:

df['catcol'].astype(pandas.CategoricalDtype(list(range(1, 10)), True)

Then you can create your DMatrix with enable_categorical=True

XGboost Classifier predicits different results for same samples depending on test dataset size

2 Answers2