1

Consider a setting in which I have an unbalanced dataset where the targeted class takes values = 1 in 0,01% of observations and value = 0 in 99,9% of the observations.

I train a classification model, say XGBClassifier and obtain the predict_proba, from the documentation:

probability of each X example being of a given class.

Now, suppose I want to rebalance a bit the class by undersampling, and train a second model where my target has value = 1 in 10% of cases and value = 0 in the remaining 90% of observations.

Is the interpretation of the predicted probabilities affectd by this rebalancing?

Can I still say that if observation x_i has value 0.4 then it's 40% likely to have class = 1?

Ale
  • 151
  • 4

0 Answers0