F1 score vs accuracy, which metric is more important?

Question

I have two multiclass classification models for making predictions (number of classes is three to be precise). One is Keras neural network, other is Gradient Boosted Classifier from Scikit Learn library.

I have noticed that after training on same data GBC has higher accuracy score, while Keras model has higher F1 score. Which model should I use for making predictions on future data? Which metric is more important?

Yohanes Alfredo · Accepted Answer · 2019-12-24T03:32:13.330

Well it is highly dependent on your use cases and how your data is distributed.

Let me breakdown the pros and cons in practice :

F1-Score

Pros :

Takes into account how the data is distributed. Useful when you have data with imbalance classes.

Cons :

Less interpretable. Precision and recall is more interpretable than f1-score, since it measures the type-1 error and type-2 error. However, f1-score measures the trade-off between this two.
When positive class is minority class, the score is quite sensitive when there is switching where the ground truth is positive.

Accuracy

Pros :

Easy to understand.

Cons :

It does not take into account how the data is distributed (example case below). This error could be crucial and might lead to incorrect conclusion.

Here is an example depicting cons of accuracy. Without loss of generality consider a binary classification task. Imagine having a data which of 100 samples with 90 negative sample and 10 positive sample. Suppose you have a classifier that predicts all negative. You will have an accuracy of 90%, but let's consider the f1 score, you will actually get 0 because your recall (which is a component of f1 score) is 0.

In practice, for multi-class classification model (which is your use-cases) accuracy is mostly favored. f1 is usually used for multi-label or binary label where the classes are highly unbalanced.

F1 score vs accuracy, which metric is more important?

1 Answers1