10

If $X$ is a discrete random variable, its entropy $H(X)$ is usually defined as something along the lines of $-\sum \def\P{\mathbb{P}}\P(x) \log_2( \P(x))$, where the sum ranges over all the possible values $x$ of $X$.

I have seen a few expositions of the problem of extending this definition to continuous random variables. I take it that the standard counterpart to $H$ in the continuous case is the so-called differential entropy, denoted by $h(X)$, and defined as

$$ \int f(x)\log_2(f(x)) dx\,, $$

where $f$ is the probability density of $X$, and the integral runs over the (continuous) set of values of $X$.

This definition is always contingent on the existence of the integral, but even putting aside this existence question, I'm a bit confused about the proper way to interpret this integral.

None of the derivations (of the passage to the continuous case) has been terribly concerned over mathematical subtleties, such as well-definedness, convergence, etc. A possible clue to this attitude is that all these derivations are based on the Riemann integral formalism, which strikes me as particularly ill-suited for thinking about this particular integral's convergence.

I'm looking for a (hopefully mathematically rigorous) measure-theoretic approach to this generalization of the entropy to the continuous case. I'd appreciate any pointers.

kjo
  • 14,904
  • 4
    For a rigorous discussion of Information Theory you can see http://ee.stanford.edu/~gray/it.html Unfortunately I don't think he uses the words "differential entropy". But maybe you can fit your question into his discussion of relative entropy? – Martin Leslie Apr 11 '14 at 00:07
  • 2
    Another reference. http://200.9.100.182/~josilva/review_pending/silva_convergence_I_measure.pdf This has a reasonable definition of differential entropy and talks about convergence. But I'm not sure it answers your question. – Martin Leslie Apr 11 '14 at 18:42
  • 2
    Related question: http://mathoverflow.net/questions/162301/intrinsic-significance-of-differential-entropy – echinodermata Apr 22 '14 at 03:17

1 Answers1

5

The most general definition of information quantities is in terms of a KL divergence. This definition is discussed starting on page 107 of Entropy and Information Theory by Gray.

On a probability space $(\Omega,\mathcal{A})$ with probability measures $P,Q$ and a finite-alphabet RV $Z$, define the relative entropy of a measurement $Z$ with measure $P$ with respect to measure $Q$ as:

$$H_{P\|Q}(Z)=\sum_{S\in\text{Part}(Z)} \left(P(S)\log_2\left[\frac{P(S)}{Q(S)}\right]\right)^\ast,$$

where $\text{Part}(Z)$ is the partition on $\Omega$ induced by the inverse map of $Z$, and $(\dots)^\ast:[0,1]\times[0,1] \to \mathbb{R}$ is what it looks like plus a few extensions: $$ (\dots)^\ast|_{a,b}:= \left\lbrace \begin{array}{ll} a\cdot \log_2(a/b) & a,b>0 \\ \infty & a\neq 0,\ b= 0 \\ 0 & \text{otherwise} \end{array}\right.$$

Then define the mutual information between two RVs $X,Y$ on $(\Omega,\mathcal{A})$ as:

$$I(X;Y)=\sup_{Z\ \text{finite alphabet}}H_{P_{XY}\| P_X \times P_Y}(Z).$$

Finally define the entropy of an RV $X$ as $H(X)=I(X;X).$ Since it is just a supremum, it always exists. The $Z$s take the place of simple functions in the way you would construct a Lebesgue integral.

  • 1
    This seems like a more reasonable formulation of "continuous entropy" than differential entropy. But it should be clearly stated that this is not a derivation of differential entropy since differential entropy leads to the conclusion $I(X;X)=\infty$. This version gets it right. – cantorhead Feb 04 '18 at 01:18