2

Say I have a multiclass classification problem with N classes. I have trained a classifier on a training set, I use a validation set and a One-vs-rest ROC-curve to give me N ROC curves.

Since the ROC curve is created based on different thresholds of when we classify a sample as $Ci$ or not $Ci$. We can then chose (our) optimal FPR/TRP ratio and get the threshold (t) e.g say t=0.6 we classify a sample as $Ci$ if model_score>=0.6 else "the rest" i.e not $Ci$. (the blue marker at this picture from sklearn) enter image description here

The question is, in the multi-class problem we can use e.g one-vs-rest and create N ROC-curves (see below, also from sklearn)

enter image description here

but now we have N different thresholds (in the plot, N=3 since we have three classes). Say we in the one-vs-rest have defined the optimal threshold for the classes as

t1 = 0.8 (Class 1 vs rest)
t2 = 0.6 (Class 2 vs rest)
t3 = 0.4 (class 3 vs rest)

we get a new sample and the model-score is S= [0.3,0.4,0.3] thus according to the thresholds we won't label it as any class since no score is above the threshold.

CutePoison
  • 520
  • 3
  • 10

1 Answers1

1

Without modifying the assumptions or the model, there are two choices:

  1. The system returns the best single guess. For example, Class 2 since that is the best model's largest probability.
  2. The system returns "Not confident enough to make a prediction".
Brian Spiering
  • 23,131
  • 2
  • 29
  • 113