I am reading Elements of Information Theory by Cover and Thomas (2006) and struggle with the definition of mutual information for continuous random variables (Chapter 9: Differential Entropy). For two random varibles with a joint pfd $f(x, y)$, they define the mutual information as \begin{equation} I(X;Y) = \int f(x, y) \log \frac{f(x, y)}{f(x)f(y)} \text{d}(x, y). \end{equation} Later, they give a more general definition using mutual information of discrete random variables as \begin{equation} I(X;Y) = \text{sup}_{P, Q} I([X]_P; [Y]_Q), \end{equation} where $[X]_P$, resp. $[Y]_Q$ is a quantization of $X$, resp. $Y$ w.r.t. to finite partition $P$ of $X$, resp. $Q$ of $Y$. Now they say, that the definitions are equivalet for random variables with density and that it can be shown similarly to the way they show that the mutual information of continuous random variables is the limit of mutual information of their quantized version. For that they use a theorem stating that $H(X^{\Delta}) + \log\Delta \rightarrow h(X)$ for $\Delta \rightarrow 0$, where $\Delta$ is the length of a bin used for uniform quantization of $X$ and $X^{\Delta}$ is the corresponding quantized version of $X$. But since the general definition uses any partition, not necessarily the uniform one and without the uniformity, we do not have the limit of the quantization, I really cannot figure out the proof of equivalency. Can anyone help?
Asked
Active
Viewed 61 times
2
-
1Hint: First try to show that for any quantisation, $I(X;Y) \ge I([X]P;[Y]_Q).$ Second, try to show that with the uniform partitions, it holds that $I([X]{\mathrm{Unif},\Delta}; I([Y]_{\mathrm{Unif},\Delta}) \to I(X;Y).$ This is enough to establish the claim: due to the first part, $I(X;Y)$ dominates the supremum, and due to the second, there is a sequence of partitions with the quantised MI converging to $I$, and so $I$ cannot exceed the supremum. – stochasticboy321 Jan 28 '24 at 16:19