1

i have an imbalance data set and I used smote to oversample the minority class and undersample the majority class. now, I want to check the test AUC using predict_proba of the model.

I have two questions: 1. Do I have to correct the probability if I am comparing AUCs? 2. How can I correct it (a combination of undersampling and oversampling!)

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63
anat
  • 155
  • 4

1 Answers1

2
  1. No, any adjustment to the probabilities will presumably be monotonic, so the rank-ordering of the predictions will be the same, so the AUC will be the same.

  2. See, e.g., https://datascience.stackexchange.com/a/58899/55122

See also the more complex "probability calibration" techniques.

Also, if you see better results after smote+undersampling, and can share your data and work, I'd be very interested. I haven't yet seen an example where training on the original dataset doesn't do just as well (with appropriate thresholding).

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63