11

Common model validation statistics like the Kolmogorov–Smirnov test (KS), AUROC, and Gini coefficient are all functionally related. However, my question has to do with proving how these are all related. I am curious if anyone can help me prove these relationships. I haven't been able to find anything online, but I am just genuinely interested how the proofs work. For example, I know Gini=2AUROC-1, but my best proof involves pointing at a graph. I am interested in formal proofs. Any help would be greatly appreciated!

nealmcb
  • 408
  • 5
  • 7
Steven
  • 111
  • 1
  • 1
  • 3

3 Answers3

1

The result Gini=2*AUROC-1 is hard to prove because it is not necessarily true. The Wikipedia article on the Receiver Operating Characteristic curve gives the result as a definition of Gini, and the article by Hand and Till (cited by nealmcb) merely says that the graphic definition of Gini using the ROC curve leads to this formula.

The catch is that this definition of Gini is used in the machine-learning and engineering communities, but a different definition is used by economists and demographers (going back to Gini's original paper). The Wikipedia article on the Gini coefficient sets out this definition, based on the Lorenz curve.

A paper by Schechtman & Schechtman (2016) sets out the relationship between AUC and the original Gini definition. But to see that they cannot be exactly the same, suppose that the proportion of events is p and that we have a perfect classifier. The ROC curve then passes through the top-left corner and AUCROC is 1. However, the (flipped) Lorenz curve runs from (0,0) to (p,1) to (1,1) and the economists' Gini is 1-p/2, which is nearly but not exactly 1.

If events are rare, then the relationship Gini=2*AUROC-1 is nearly but not exactly true using Gini's original definition. The relationship is only exactly true if Gini is redefined to make it true.

PaulVD
  • 51
  • 1
1

The Wikipedia entry for Receiver operating characteristic references this paper for the Gini=2AUROC-1 result: Hand, David J.; and Till, Robert J. (2001); A simple generalization of the area under the ROC curve for multiple class classification problems, Machine Learning, 45, 171–186. But I'm afraid I don't have easy access to it to see how close it comes to what you want.

nealmcb
  • 408
  • 5
  • 7
0

According to the paper (Adeodato, P. J. L and Melo, S. B. 2016), there is a linear relationship between the Area under the KS curve (AUKS) and Area under the ROC curve (AUROC), namely:

$$ AUROC = 0.5 + AUKS $$

Proof of equivalnce is included in the paper.