Questions tagged [metric]

A metric is a way to evaluate the performance of a machine learning model. Depending on the task, different metrics may be used.

A metric is a way to evaluate the performance of a machine learning model. Depending on the task, different metrics may be used. Some popular metrics used in often in classification tasks include accuracy, precision, recall, F1 Score, AUC. Metrics for regression tasks may be RMSE or MSE.

Some popular metrics are defined below:

$Accuracy = {Correct\ Predictions \over Total\ Predictions}$

$Precision = {True\ Positive \over {True\ Positive\ +\ False\ Positive}}$

$Recall = {True\ Positive \over {True\ Positive\ +\ False\ Negative}}$

239 questions
45
votes
7 answers

What is the relationship between the accuracy and the loss in deep learning?

I have created three different models using deep learning for multi-class classification and each model gave me a different accuracy and loss value. The results of the testing model as the following: First Model: Accuracy: 98.1% Loss: 0.1882 Second…
N.IT
  • 2,015
  • 4
  • 21
  • 36
30
votes
2 answers

How to interpret classification report of scikit-learn?

As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this? And two other questions: what does…
user77241
23
votes
1 answer

What's the difference between Sklearn F1 score 'micro' and 'weighted' for a multi class classification problem?

I have a multi-class classification problem with class imbalance. I searched for the best metric to evaluate my model. Scikit-learn has multiple ways of calculating the F1 score. I would like to understand the differences. What do you recommending…
Fractale
  • 355
  • 1
  • 2
  • 5
18
votes
3 answers

MAD vs RMSE vs MAE vs MSLE vs R²: When to use which?

In regression problems, you can use various different metrics to check how well your model is doing: Mean Absolute Deviation (MAD): In $[0, \infty)$, the smaller the better Root Mean Squared Error (RMSE): In $[0, \infty)$, the smaller the…
Martin Thoma
  • 19,540
  • 36
  • 98
  • 170
15
votes
1 answer

Balanced Accuracy vs. F1 Score

I've read plenty of online posts with clear explanations about the difference between accuracy and F1 score in a binary classification context. However, when I came across the concept of balanced accuracy, explained e.g. in the following image…
Ric S
  • 287
  • 1
  • 2
  • 11
13
votes
3 answers

Why is the F-measure preferred for classification tasks?

Why is the F-measure usually used for (supervised) classification tasks, whereas the G-measure (or Fowlkes–Mallows index) is generally used for (unsupervised) clustering tasks? The F-measure is the harmonic mean of the precision and recall. The…
Bruno Lubascher
  • 3,618
  • 1
  • 14
  • 36
12
votes
4 answers

Can the F1 score be equal to zero?

As it is mentioned in the F1 score Wikipedia, 'F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0'. What is the worst condition that was mentioned? Even if we consider the case of: either precision or recall is…
12
votes
1 answer

Finding linear transformation under which distance matrices are similar

I have $n$ sets of vectors, where each set $S_i$ contains $k$ vectors in $\mathbb{R}^d$. I know there is some unknown linear transformation $W$ under which the distance matrix $D_i$ (a $k\times k$ matrix) is approximately "the same" (i.e. has a low…
11
votes
5 answers

Cosine similarity vs The Levenshtein distance

I wanted to know what is the difference between them and in what situations they work best? As per my understanding: Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the…
Pluviophile
  • 4,203
  • 14
  • 32
  • 56
10
votes
1 answer

XGBoost custom objective for regression in R

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…
Peter
  • 7,896
  • 5
  • 23
  • 50
10
votes
4 answers

Log loss vs accuracy for deciding between different learning rates?

While model tuning using cross validation and grid search I was plotting the graph of different learning rate against log loss and accuracy separately. Log loss When I used log loss as score in grid search to identify the best learning rate out of…
10
votes
3 answers

Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks…
Alex
  • 201
  • 1
  • 3
9
votes
3 answers

Why do we use a Gaussian kernel as a similarity metric?

In graph-based clustering, why is it preferred to use the Gaussian kernel rather than the distance between two points as the similarity metric?
zfb
  • 91
  • 1
  • 1
  • 4
8
votes
2 answers

AUC-ROC for Multi-Label Classification

Hey guys I'm currently reading about AUC-ROC and I have understood the binary case and I think that I understand the multi-classification case. Now I'm a bit confused on how to generalize it to the multi-label case, and I can't find any intuitive…
8
votes
1 answer

What is Continuous Ranked Probability Score (CRPS)?

I came across some evolution metric at Kaggle: Continuous Ranked Probability Score (CRPS): Mathematically, $C = \frac{1}{199N} \sum_{m=1}^{N} \sum_{n=-99}^{99} (P(y \le n) -H(n - Y_m))^2,$ where P is the predicted distribution, N is the number of…
user86099
1
2 3
15 16