4

I know the general expression of the F1-score:

$$F1 = \frac{precision * recall}{precision + recall}$$

And its $F_{beta}$ variants (see: https://en.wikipedia.org/wiki/F-score):

$$F_{beta} = (1+\beta^2) * \frac{precision * recall}{\beta^2 * precision + recall}$$

I was wondering what optimisation problem it solve. That is F1 is defined as the harmonic mean, and $F_{beta}$ is the harmonic mean with precision weighted by $\beta$. But I have a hard time linking this with a cost matrix.

That is with the cost matrix:

True Predicted: 0 Predicted: 1
0 a b
1 c d

In terms of a,b,c and d what is maximising $F_{beta}$ achieving ? Can the $\beta$ imbalance can be linked to imbalance between a,b,c and d?

Lucas Morin
  • 2,775
  • 5
  • 25
  • 47

1 Answers1

3

There isn't a direct and perfect correspondence between the optimal cost-based objective and the optimal F1 score.

Here's an example to show that, though it's perhaps somewhat unsatisfactory because it essentially constructs a sufficiently imbalanced dataset to break things.

$\DeclareMathOperator{tp}{TP}\DeclareMathOperator{fp}{FP}\DeclareMathOperator{tn}{TN}\DeclareMathOperator{fn}{FN}$ I'll take as convention that the cost-based objective is $a\tn-b\fp-c\fn+d\tp$, with $a,b,c,d\geq0$, and we're seeking the maximum.

If $a,b>0$, then let $x=\max\{2c/a, 2d/b\}$, and consider the two confusion matrices

$$\begin{bmatrix}x&0\\1&0\end{bmatrix}, \begin{bmatrix}0&x\\0&1\end{bmatrix}$$

The first one has $F_1=0$ and cost-based objective $xa-c\geq c\geq0$; the second one has $F_1=2/(2+x)>0$ and objective $d-bx\leq -d \leq 0$. So the two metrics disagree about which confusion matrix is better.

(If one of $a,b$ is zero, I think this can still be salvaged, you just need to take $x$ large enough. If they're both zero then it doesn't matter what $x$ is, the example works.)

I'd be happy to hear an example that didn't rely so much on imbalance.


From a higher level, rewrite $$F_1 = \frac{2\tp}{2\tp+\fp+\fn} = \frac{1}{1+\frac{\fp+\fn}{2\tp}},$$ to see that maximizing $F_1$ is equivalent to minimizing $\frac{\fp+\fn}{\tp}$. Thinking about the case $a=0$, $b=c=d=1$, we're still looking at optimizing the ratio of errors to true positives vs optimizing the difference between errors and true positives.

(That leads to a nice 2d optimization, with feasible region a triangle, F1 level curves being lines through the origin of differing slopes, and objective level curves being parallel lines.)

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63