Why is the Kullback-Leibler divergence non-negative?

Question

The Kullback-Leibler divergence for the continuous case for two probability densities $p$ and $q$ is

$$D_\text{kl}\left(p,q\right) = \int_{x\in\chi}\left(p(x)\log(p(x)) - p(x)\log(q(x))\right)\text{d}x$$

Then, how come that for any choice of $q$, $D_\text{kl}\left(p,q\right)\geq0$?

I will attempt to prove by absurd. Suppose that there exists a probability distribution $q$ in which $D_\text{kl}\left(p,q\right)<0$. An initial guess would be to make

$\forall x\in\chi: q(x)>p(x)$

But that doesn't hold since both $p$ and $q$ must have integrals adding up to one:

$$\int_{x\in\chi}q(x)\text{d}x>\int_{x\in\chi}p(x)\text{d}x = 1 \unicode{x21af}$$.

So, $p$ and $q$ must intersect at least once, meaning that we can divide the integral into two parts, one in which $p\geq q$ and another with $p<q$:

$$D_\text{kl}\left(p,q\right) = \int_{x\in\chi_{q\geq p}}\left(p(x)\log(p(x)) - p(x)\log(q(x))\right)\text{d}x + \int_{x\in\chi_{q< p}}\left(p(x)\log(p(x)) - p(x)\log(q(x))\right)\text{d}x$$.

On the second term, choose $q$ so that it gets arbitrarily close to $p$ as possible

$$\forall x\in\chi_{q<p} : q(x) = p(x)-\epsilon(x)$$.

Whereas in the first term, we want $q\geq p$. However, if we choose $q>p$:

$$\int_{x\in\chi}q(x)\text{d}x = \int_{x\in\chi_{q\geq p}}q(x)\text{d}x + \int_{x\in\chi_{q<p}}q(x)\text{d}x>\int_{x\in\chi_{q\geq p}}p(x)\text{d}x + \int_{x\in\chi_{q<p}}p(x)\text{d}x - \int_{x\in\chi_{q<p}}\epsilon(x)\text{d}x$$ $$\int_{x\in\chi}q(x)\text{d}x>1- \int_{x\in\chi_{q<p}}\epsilon(x)\text{d}x$$.

We have an absurd on the limit $\epsilon(x)\to0$. So, the only suitable choice for $q$ is $q=p$.

See Lemma 4.2 in https://kconrad.math.uconn.edu/blurbs/analysis/entropypost.pdf. — KCd, Aug 21 '22 at 14:04

score 7 · Accepted Answer · answered Aug 21 '22 at 00:42

The most elementary proof uses the inequality $\log t\leq t -1$ for $t >0$, which can be verified by differentiation. Note that restricting the integration in the definition of $D_\text{kl}\left(p,q\right)$ to the set $\{x: p(x)>0\}$ does not affect the value of the integral. Therefore,

$$-D_\text{kl}\left(p,q\right) = \int_{p(x)>0} p(x)\log \frac{q(x)}{p(x)} \,\text{d}x $$ $$\leq \int_{p(x)>0} p(x) \Bigl(\frac{q(x)}{p(x)}-1 \Bigr) \,\text{d}x = \int_{p(x)>0} q(x) \,\text{d}x - \int_{p(x)>0} p(x) \,\text{d}x \leq 0\,.$$

There are other proofs using convexity and Jensen's inequality.

score 2 · Answer 2 · edited Aug 21 '22 at 12:17

The non-negativity of the KL divergence goes by the name Gibbs' inequality, so you can find many proofs by searching for this name.

As for your proof, I believe it is flawed. At a high level you are doing the following

Assume you have a counterexample to Gibbs' inequality, i.e. $p,q$ such that $D(p||q) < 0$
Analyze $p, q$ of (two) particular forms
Show that in these two particular cases, $D(p||q)<0$ implies a contradiction.

Any proof of the above needs to handle arbitrary $(p,q)$ though, not $p,q$ of a (or two) specific forms. So, as currently written, the proof has gaps.

Why is the Kullback-Leibler divergence non-negative?

2 Answers2

Linked