45

I have created three different models using deep learning for multi-class classification and each model gave me a different accuracy and loss value. The results of the testing model as the following:

  • First Model: Accuracy: 98.1% Loss: 0.1882

  • Second Model: Accuracy: 98.5% Loss: 0.0997

  • Third Model: Accuracy: 99.1% Loss: 0.2544

My questions are:

  • What is the relationship between the loss and accuracy values?

  • Why the loss of the third model is the higher even though the accuracy is higher?

N.IT
  • 2,015
  • 4
  • 21
  • 36

7 Answers7

59

There is no relationship between these two metrics.

  • Loss can be seen as a distance between the true values of the problem and the values predicted by the model. The larger the loss, the larger the errors you made on the data.
  • Accuracy can be seen as the count of mistakes/misclassifications you made on the data. The larger the accuracy, the fewer misclassifications you made on the data.

That means:

  • large loss and small accuracy means you made huge errors on a lot of data (worst case)
  • small loss and small accuracy means you made small errors on a lot of data
  • small loss with a large accuracy means you made small errors on a few data (best case)
  • large loss but a large accuracy means you made huge errors on a few data (your case; the third model)

For your case, the third model can correctly predict more examples (large accuracy), but on those where it was wrong, it made larger errors (large loss - the distance between true value and predicted values is greater).

NOTE:

Don't forget that loss is a subjective metric, which depends on the problem and the data. It's a distance between the true value of the prediction, and the prediction made by the model.

  • The significance of the loss value is relative to the data itself; if your data are between 0 and 1, a loss of 0.5 is huge, but if your data are between 0 and 255, an error of 0.5 is low.
  • The acceptability of a loss value depends on the problem; consider cancer detection, where an error of 0.1 is unacceptably huge for this problem, whereas an error f 0.1 for image classification is fine.
Nate Anderson
  • 443
  • 5
  • 10
Jérémy Blain
  • 1,102
  • 7
  • 10
12

Actually, accuracy is a metric that can be applied to classification tasks only. It describes just what percentage of your test data are classified correctly. For example, you have binary classification cat or non-cats. If out of 100 test samples 95 is classified correctly (i.e. correctly determined if there's cat on the picture or not), then your accuracy is 95%. By the way, Confusion matrix describes your model much better then accuracy.

Loss depends on how you predict classes for your classification problem. For example, your model use probabilities to predict binary class cat or non-cats between 1 and 0. So if probability of cat is 0.6, then the probability of non-cat is 0.4. In this case, picture is classified as cat. Loss will be sum of the difference between predicted probability of the real class of the test picture and 1. In reality log loss is used for binary classification, I just gave the idea of what loss is.

DmytroSytro
  • 517
  • 2
  • 14
9

The other answers give good definitions of accuracy and loss. To answer your second question, consider this example:

We have a problem of classifying images from a balanced dataset as containing either cats or dogs. Classifier 1 gives the right answer in 80/100 of cases, whereas classifier 2 gets it right in 95/100. Here, classifier 2 obviously has the higher accuracy.

However, in the 80 of images classifier 1 gets right, it is extremely confident (for instance when it thinks an image is of a cat it is 100% sure that's the case), and in the 20 it gets wrong it was not at all confident (e.g. when it said a cat image contained a dog it was only 51% sure about that). In comparison, classifier 2 is extremely confident in its 5 wrong answers (it's 100% convinced that an image which actually shows a dog is a cat), and was not very confident about the 95 it got right. In this case, classifier 2 would have worse loss.

rlms
  • 191
  • 3
8

Someone says that accuracy has no relationship to the loss, but from a theoretical perspective, there IS a relationship.

Accuracy is $1 - (error\ rate)$ and the error rate can be seen as the expectation of the 0-1 loss: \begin{equation} l_{01}(f(x), y) := \begin{cases} 0 & (f(x) = y) \\ 1 & (f(x) \neq y) \end{cases} \end{equation}

\begin{equation} error\ rate = \mathbb{P}_{x, y} \left[ f(x) \neq y \right] = \mathbb{E}_{x, y} \left[ l_{01}(f(x), y) \right] \end{equation} where $f$ is the model, $x$ is its input and $y$ is the ground truth label for $x$.

In order to maximize the accuracy, we want to minimize the error rate. However, due to the incontinuity of the 0-1 loss, it is practically impossible. Instead, a variety of "surrogate loss" is used. The surrogate loss function $l$ is required to have some properties:

  • $l$ is continuous.
  • $l$ is convex.
  • $l$ bounds $l_{01}$ from above.

Surrogate losses with these properties allow us to minimize them via the well-known gradient descent algorithm.

Popular classes of those surrogate losses include the hinge loss that is used in support vector machine (SVM) and the logistic loss that is used in logistic regression and standard neural networks.

So, from a theoretical viewpoint, the accuracy and the loss displayed in every epoch of your training have some relationship. That is,

  • Accuracy has a direct connection with the error rate, which we want to minimize in the training.
  • Loss (usually the cross entropy loss, which is equivalent to the logistic loss in a sense) is a surrogate loss that bounds the error rate.
aest
  • 81
  • 1
  • 3
0

In deep learning, during the training process, you typically monitor both the training and validation accuracy and loss to assess the performance of your model. Here's a brief explanation of each:

Training Accuracy and Loss:

Training Accuracy: This metric measures the percentage of correctly classified samples in the training dataset. It indicates how well the model is performing on the data it is being trained on. Training Loss: This metric represents a measure of how well the model is performing on the training data. It quantifies the difference between the predicted values and the actual values in the training dataset. The goal during training is to minimize this loss function.

Validation Accuracy and Loss:

Validation Accuracy: This metric measures the percentage of correctly classified samples in a separate validation dataset that the model hasn't seen during training. It provides an estimate of how well the model is generalizing to new, unseen data. Validation Loss: Similar to training loss, validation loss measures how well the model is performing on the validation dataset. It helps to identify overfitting or underfitting. Like training loss, the goal is to minimize this loss function, but its value might increase if the model starts overfitting to the training data. During the training process, these metrics are often plotted over epochs (iterations over the entire dataset) to visualize the performance of the model. Ideally, you want to see both training and validation accuracy increasing and both training and validation loss decreasing. If the training accuracy continues to increase while the validation accuracy stagnates or decreases, it might indicate overfitting. Similarly, if the training loss decreases while the validation loss increases, it might also indicate overfitting.

Monitoring these metrics helps in tuning hyperparameters, selecting models, and understanding how well the model is learning from the data.

0

"There is no relationship between these two metrics." isn't really accurate. Of course, there is a relationship between those two. Indeed, not a linear one. As @JérémyBlain noted, one can't really decide how well your model is based on the loss. That's why loss is mostly used to debug your training. Accuracy, better represents the real world application and is much more interpretable. But, you lose the information about the distances. A model with 2 classes that always predicts 0.51 for the true class would have the same accuracy as one that predicts 0.99. –

Adding to the comment of the currently top answer. The question itslef might imply the outmost interpreation of the question about the application to the real world that still offers such words handled metrics, but if interpret about the mechnism of the method these ouput values comes from, beside what I just quoted, could we not mention that the loss function (and its full training or validation dataset response "surface") is what the method is trying to minimize over the dataset sample (not sample data point, sample data set). If gradient descent for example, it would be about greedily looking at "slope" of the discretize sampling version of such function, etc... and the loss function or the problem might have been design so that in spite of looking locally for the optimal direction of udating it would find the overall method more probably global optimum. The loss function at any datapoint of the sample data set is determining in the global optimization search that yield the final object or tools that spits out the variables of the question. It is a metric of how well that under the hood segment or view pointof the core of the method has accomplished so optmization (whith appropriate generalization of whatever task the problem was well posed as, skipping details). My point is that it is totally related to the method used to produce the final inference user input output tool. It does give some sense that that lower level of how good a job it did. But the previous answers assuming only the last level tool usage, are right that it would not be needed for your end user. In all cases though, all those metrics have the target phenomenon reality and its data sampling method knowledge, and then the partitions of that data into training and validation (etc..) and then the method of optmization and then the various internal metrics as determinant dependent variable. in that sense. The "quoted-quoted" statement is more rethorical or explanatory assuming an end user premise to the questoin, than the technical truth. I hope I am not slapshing or rambling out of context here. I thought this would be a nice supobptimal answers to expand for those coming in having that different context assumptions in mind.

dbdb
  • 1
  • 2
-1

Ayúdenme a aclarar esta duda por favor

Después de ver este comentario: ------si sus datos están entre 0 y 1, una pérdida de 0,5 es enorme, pero si sus datos están entre 0 y 255, un error de 0,5 es bajo. Tengo las siguientes dudas:

Yo hice un modelo con la "Y" categórica, son 4 categorías y las clasifique: "0","1","2" y "3", ¿esta bien como lo clasifique (empieza en 0 y no en 1? y ¿mis datos estarían entre 0 - 4 para evaluar el test loss y accuracy?

este fue mi resultado: Test Loss: [0.9032219648361206, 3.8990705013275146]