3

I'm using machine-learning algorithms to solve binary classification problem (i.e. classification can be 'good' or 'bad'). I'm using SVM based algorithms, LibLinear in particular. When I get a classification result, sometimes, the probability estimations over the result are pretty low. For example, I get a classification 'good' with probability of 52% - those kind of results I rather throw away or maybe classify them as 'unknown'.

EDITED - by D.W.'s suggestion

Just to be more clear about it, my output is not only the classification 'good' or 'bad', I also get the confidence level (in %). For example, If I'm the weather guy, I'm reporting that tomorrow it will be raining, and I'm 52% positive at my forecast. In this case, I'm sure you won't take your umbrella when you leave home tomorrow, right? So in those cases where my model does not have a high confidence level I throw away this prediction and don't count it in my estimations.

Unfortunately, I can't find articles regarding thresholding the probability estimations...

Does anyone have an idea what is a normal threshold that I can set over the probability estimations? or at least can refer me to a few articles about it?

Ziv Levy
  • 133
  • 1
  • 5

3 Answers3

3

There is no universal answer. Instead, it depends on your application. What counts as useful for your application? That determines what should count as a useful or good-enough machine learning algorithm. What counts as useful will vary widely from application to application; some applications require 99.99% accuracy, others might be happy with 52% accuracy.

In practice, there are also multiple ways to define accuracy. If you expect that (in the ground truth) half of the objects should be classified 'good' and half as 'bad', then the obvious accuracy measure suffices, i.e., counting what fraction of instances your classifier outputs the correct answer. However, if 'good' objects are a lot more common than 'bad' objects, or if the penalty for mis-classifying a 'good' object as 'bad' is very different from the penalty for mis-classifying a 'bad' object as 'good', then you might want to measure accuracy differently. The best measure of accuracy is again application-dependent; there is no single universal answer.

D.W.
  • 167,959
  • 22
  • 232
  • 500
2

If your method is reasonable, then there is a correlation between your confidence and accuracy rates---when your confidence is higher so is your accuracy. Then, get some sample from the true distribution to use as a gold standard. Now study the mentioned correlation. What is the threshold of confidence at which you start to lose accuracy?

D.W.
  • 167,959
  • 22
  • 232
  • 500
0

For each class you will have a gain $G$ for getting a prediction correct and a loss $L$ for getting a prediction wrong. You solve for probability p using the equation $Gp - L(1-p) = 0$ to get $p$ for which any confidence below should not be scored i.e. $G = 70, L=10, p = 12.5\%$. Each class will have a different threshold.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514