Measuring the uncertainty of predictions

Question

Given a multiclass classification model, with n features, how can I measure the uncertainty of the model for that particular classification?

Let's say that for some class the model accuracy is amazing, but for another it's not. I would like to find a metric that will let me decide if, for a particular sample, I would like to "listen" to the model or not.

I thought about using prediction intervals, but am not sure how to calculate them.

score 10 · Accepted Answer · edited Jun 18 '21 at 15:33

Alternatively to the accepted answer, another way to estimate the uncertainty of a specific prediction is to combine the probabilities returned by the model for each class using a certain function. This is a common practice in "Active learning", where given a trained model you select a subset of unlabelled instances to label (to augment the initial training dataset) based on some sort of uncertainty estimation. The three most common functions used (called sampling strategies [1] in the literature) are:

Shannon entropy: you simply apply Shannon entropy to the probabilities returned by the model for each class. The highest the entropy the highest the uncertainty.
Least confident: you simply look at the highest probability returned by the model among all classes. Intuitively the certainty level is lower for a test instance with a 'low' highest probability (e.g. [.6, .35, .05] --> .6) compared to a 'high' highest probability (eg. [.9, .05, .05] --> .9).
Margin Sampling: you subtract form the highest probability the second-highest probability (e.g. [.6, .35, .05] --> .6-.35=.25). It is conceptually similar to the least confident strategy, but a bit more reliable since you're looking at the distance between two probabilities rather than a single raw value. Also, in this case, a small difference means a high uncertainty level.

Another more interesting way to estimate the uncertainty level for a test instance that is applicable to deep models with dropout layers is instead deep active learning [2]. Basically, by leaving dropout active while doing predictions you can bootstrap a set of different outcomes (in terms of probabilities for each class) from which you can estimate mean and variance. The variance, in this case, tells you how much the model is uncertain about that instance.

Anyway, consider that these are just crude approximations, using a model that specifically estimates the uncertainty of a particular prediction as suggested in the accepted answer is surely the best option. Nevertheless, these estimations can be useful because they are potentially applicable to every model that returns probabilities (and there are also adaptations for models like SVM).

[1] http://www.robotics.stanford.edu/~stong/papers/tong_thesis.pdf

[2] https://arxiv.org/abs/1808.05697

score 6 · Answer 2 · edited Jun 20 '21 at 19:20

In the model, you will decide how best to get uncertainty. If you used Bayesian optimization (that's a great package for it in Python), for example, you get a covariance matrix along with your expected values, and so inherently get an uncertainty measure. In this case, you can make predictions as to the underlying function of your data, and the (co-)variance will provide levels of uncertainty, as shown by the width of the green bands around the line below:

So the red points show where we have some sample data... notice that we have none e.g. at X = 4 and X = -1, which is why we have high uncertainty; the 95% confidence interval is very large.

If you use a standard deep neural network TPO perform classification, for example, there is no inherent measure of uncertainty. All you really have is your test accuracy, to let you know how well the model performs on hold-out data. I cannot remember where it is explained, but I believe it is not actually feasible to interpret the class prediction values in terms of uncertainty.

For example, if you are predicting cat or dog for an image, and the two classes receive (normalized) logit values [0.51, 0.49] respectively, you cannot assume this means very low certainty.

score 2 · Answer 3 · answered Jan 01 '25 at 09:14

Recent work (2024) has developed a reasonably simple method to extract the model uncertainty from Bayesian neural networks (NNs). The method is designed for either Monte-Carlo dropout or deep ensemble NNs, but the critical part is to be able to take samples from the posterior predictive distribution. So if you can approximate the posterior predictive distribution as described below you can use the method:

Extracting the model uncertainty is then as simple as calculating the entropy over the average of the samples (to get total uncertainty), then subtracting the average entropy for each sample (the data uncertainty):

Ref: Thuy, Arthur, and Dries F. Benoit. "Explainability through uncertainty: Trustworthy decision-making with neural networks." European Journal of Operational Research 317.2 (2024): 330-340.

score 1 · Answer 4 · answered Jun 16 '21 at 21:30

1

Though it does not exactly measure uncertainty for a classification model, you can give a look at trust scores.

answered Jun 16 '21 at 21:30

Tanguy

290
2
10

parvij · Answer 5 · 2021-06-17T21:08:40.303

-2

My answer is wrong, but I keep it because other people may make my mistake and the comments below my answer are valuable

I think you looking for

model.predict_prob()

in python lots of models have it. and with this function, you can calculate how strong the model sure about its answer.

edited Jun 17 '21 at 21:08

answered Jul 22 '18 at 16:11

parvij

791
5
18

Measuring the uncertainty of predictions

5 Answers5

My answer is wrong, but I keep it because other people may make my mistake and the comments below my answer are valuable

Linked