29

I am interested in knowing what really happens in Hellinger Distance (in simple terms). Furthermore, I am also interested in knowing what are types of problems that we can use Hellinger Distance? What are the benefits of using Hellinger Distance?

Smith Volka
  • 685
  • 2
  • 6
  • 13

1 Answers1

13

Hellinger distance is a metric to measure the difference between two probability distributions. It is the probabilistic analog of Euclidean distance.

Given two probability distributions, $P$ and $Q$, Hellinger distance is defined as:

$$h(P,Q) = \frac1{\sqrt2}\cdot \|\sqrt{P}-\sqrt{Q}\|_2$$

It is useful when quantifying the difference between two probability distributions. For example, if you estimate a distribution for users and non-users of a service. If the Hellinger distance is small between those groups for some features, then those features are not statistically useful for segmentation.

timleathart
  • 3,960
  • 22
  • 35
Brian Spiering
  • 23,131
  • 2
  • 29
  • 113