9

I have classification data with far more negative instances than positive instances. I have used class weights in my models and have achieved the discrimination I want but the predicted probabilities from the models do not match the actual probabilities in the modeling data.

Is there a way to adjust the predicted probabilities from the class weighted models to match the actual probabilities in the data? I have seen equations for under-sampling (https://www3.nd.edu/~rjohns15/content/papers/ssci2015_calibrating.pdf) but they don't seem to work for class weights.

desertnaut
  • 2,154
  • 2
  • 16
  • 25

3 Answers3

10

A more general adjustment for resampling (not just the simple undersampling in your linked paper) exists:

Add $\ln\left(\frac{p_1(1-r_1)}{(1-p_1)r_1}\right)$ to the log-odds of each prediction, where $p_1$ is the proportion of the positive class in the original dataset, and $r_1$ is the proportion of the positive class in the resampled dataset.

Equivalently, multiply the odds by the quantity inside the logarithm. (Unfortunately, this doesn't lead to a clean adjustment directly to the probabilities.)


Let's do a little rewriting to see the connection to your linked paper. $1-r_1$ is the proportion of negative classes call it $r_0$, and similarly with $p_1$. Use capitals $R_1, \dotsc$ to denote the number (or total weight) of samples rather than the proportions, and without subscripts $P,R$ to denote total numbers (or weights) of samples before and after resampling. So the multiplier becomes $$\frac{p_1(1-r_1)}{(1-p_1)r_1} = \frac{p_1 r_0}{p_0 r_1} = \frac{(P_1/P) (R_0/R)}{(P_0/P) (R_1/R)} = \frac{P_1 R_0}{P_0 R_1}.$$ In the context of the linked paper, positive class samples are not resampled, so $P_1=R_1$ and the adjustment simplifies to $R_0/P_0$, which is the parameter $\beta$ used in the paper.

Finally, using their equation (4), we check the change in odds: $$\text{new odds} = \frac{p}{1-p} = \frac{1}{\frac1p - 1} = \frac{1}{\frac{\beta p_s−p_s+ 1}{\beta p_s} - 1} = \frac{\beta p_s}{1-p_s} = \beta\cdot\text{old odds}. $$


So, what about weightings instead of resampling? Well, class_weights might have different effects in different algorithms, but generally the idea is that (positive) integer values of class_weights should correspond to duplicating the samples that many times, and fractional values interpolate that. So, it should be about the same to use the multiplicative factor above. Using the size version rather than the proportion version, we should interpret $R_0$ and $R_1$ as the total weights of the relevant classes.

I've not been able to find a reference for this version, so I put together a short experiment; it seems to verify that this shift works.
GitHub/Colab notebook


Finally, this shift in log-odds will fail to produce properly calibrated probabilities if the classifier is poorly calibrated on the weighted data. You could look into calibration techniques, from Platt to Beta to Isotonic. In this case, the shift above is probably superfluous.

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63
1

I had the same problem as you. However, when you apply the technique in the accepted answer, you cancel out the predictive power gained by class weights. I also stumbled on CalibratedClassifierCV method but it had a similar impact as the accepted answer so I did not use it at the end. As an alternative, I tried to collect more data (which I know not always possible) and try other models such as Decision Tree based models.

-2

The difference between predicted probabilities and actual probabilities is called training error.

There are many ways to reduce training error. Engineering better features and choosing a different machine learning algorithm are the most common.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113