Why Do the Entropies of Discrete and Continuous Distributions Diverge, Even When Their Distributions Converge?

Question

I'm grappling with the concept of entropy as it relates to discrete and continuous distributions, and I’d appreciate some help understanding a thought experiment that’s left me puzzled.

Thought Experiment:

Imagine you have a discrete probability distribution where the possibilities are determined by the vertices of an $n$-sided polygon centered at the origin on a 2D plane. Now imagine a uniform continuous distribution over the unit circle, also centered at the origin, with the $n$-sided polygon inscribed in the circle. As $n$ increases, the polygon’s discrete distribution converges to the continuous distribution on the circle in the limit.

My Understanding:

In this limit, the discrete distribution becomes indistinguishable from the continuous uniform distribution on the circle. So, I would expect their entropies to converge as well. However, I understand that:

The discrete entropy $H(X) = \log(n)$ increases without bound as $n \to \infty$.
The differential entropy of the continuous uniform distribution on the circle is finite, since the probability density is spread over a finite interval.

I’m struggling with the intuition behind why the distributions converge in the limit, but the entropies behave so differently. Specifically:

Why does the discrete entropy grow unbounded as $n$ increases, while the continuous entropy remains finite, even though the distributions become identical in the limit?
I understand that in the continuous case, we are measuring the spread of probability density, but what is the deeper reason for this divergence in entropy when approximating a continuous distribution with an increasingly fine discrete one? What conceptual or mathematical property leads to this difference?

I’d appreciate any intuition pumps or technical insights to help reconcile this gap between the discrete and continuous views of entropy, especially in relation to their limiting behaviors.

Thank you for your help!

For the millionth time: the differential entropy is not the entropy of a continous variable https://math.stackexchange.com/questions/1398438/differential-entropy/1398471#1398471
https://math.stackexchange.com/questions/2552895/uniformly-random-number-on-0-1-has-zero-entropy/2553036#2553036 — leonbloy, Sep 19 '24 at 14:22

matthmeuh · Answer 1 · 2024-09-18T10:08:02.757

Your confusion mostly comes from the fact that the differential entropy and the discrete entropy are not equivalent, i.e. the former is not a natural limit of the latter. One way to see this mathematically is to try and apply a change of variables. Assume we deal with random variables taking values in $\mathbb R$. Let $\varphi: \mathbb R \rightarrow \mathbb R$ be a continuously differentiable increasing function (morally, this corresponds to a transformation that preserves information), then

If $X$ is discrete and finitely supported in $\mathcal X$, then its entropy is $$ \begin{aligned} H(\varphi(X)) &= - \sum_{x \in \varphi (\mathcal X)} P(\varphi(X) = x) \log P(\varphi(X) = x) = -\sum_{x \in \mathcal X} P(\varphi(X) = \varphi(x)) \log P (\varphi(X) = \varphi(x)) \\ &= -\sum_{x \in \mathcal X} P ((X) = x) \log P((X) = x) = H(X). \end{aligned} $$
Now, assume $X$ is continuous and supported on $\mathcal X \subseteq \mathbb R$, with cdf $p_X$, then we know the density of $ Y = \varphi(X)$ is $p_Y(y) = p_X \circ \varphi^{-1}(y) \cdot \left|(\varphi ^{-1})^\prime(y)\right|$. Therefore, its differential entropy is $$ \begin{aligned} H_{\mathrm{diff}}(Y) &= - \int_{\varphi(\mathcal X)} p_Y(y) \log p_Y (y) dy \\ &= - \int_{\mathcal X} p_Y(\varphi (x)) \log p_Y(\varphi (x)) \varphi^\prime (x) dx \\ &= -\int_{\mathcal X} p_X(x) \log \left(p_X (x) \cdot (\varphi^{-1})^\prime(\varphi(x))\right) dx \\ &= - \int_{\mathcal X} p_X(x) \log (p_X(x)) dx - \int_{\mathcal X} p_X(x) \log \left( (\varphi^{-1})^\prime(\varphi(x))\right) dx. \end{aligned}. $$ As you can see, we recover the differential entropy $H_{\mathrm{diff}}(X)$ with an extra term that is typically nonzero, so that in general $H_{\mathrm{diff}}(X) \neq H_{\mathrm{diff}}(\varphi (X))$.

For some extra intuition, there is a Wikipedia page that might help https://en.wikipedia.org/wiki/Limiting_density_of_discrete_points.

Why Do the Entropies of Discrete and Continuous Distributions Diverge, Even When Their Distributions Converge?

Thought Experiment:

My Understanding:

1 Answers1