2

Suppose that $(\Omega,\mathcal{F})$, $(\mathcal{X},\mathcal{F}_X)$ and $ (\mathcal{Y},\mathcal{F}_{\mathcal{Y}})$ are measurable spaces. Suppose that $X,\tilde{X}$ are measurable maps from $(\Omega,\mathcal{F})$ to $(\mathcal{X},\mathcal{F}_X)$. Suppose that $\varphi$ is a measurable map from $(\mathcal{X},\mathcal{F}_X)$ to $ (\mathcal{Y},\mathcal{F}_Y)$. Assume that $\mathbb P$ is a probability measure on $(\Omega,\mathcal{F})$. Consider the Kullback-Leibler divergences $$\mathcal{D}_{\text{KL}}(\mathbb{P}_X,\mathbb{P}_{\tilde{X}}) := \int_{\mathcal{X}} \ln \bigg(\frac{\operatorname{d}\mathbb{P}_X}{\operatorname{d}\mathbb{P}_{\tilde{X}}}(x)\bigg) \operatorname{d}\mathbb{P}_X(x)\;,$$ and $$\mathcal{D}_{\text{KL}}(\mathbb{P}_{\varphi \circ X},\mathbb{P}_{\varphi \circ \tilde{X}}) := \int_{\mathcal{Y}} \ln \bigg(\frac{\operatorname{d}\mathbb{P}_{\varphi \circ X}}{\operatorname{d}\mathbb{P}_{\varphi \circ \tilde{X}}}(y)\bigg) \operatorname{d}\mathbb{P}_{\varphi \circ X}(y)\;,$$ where $\mathbb{P}_X$ and $\mathbb{P}_{\tilde{X}}$ are the push-forward measures defined of $(\mathcal{X},\mathcal{F}_{\mathcal{X}})$ by $\mathbb{P}_X[A] := \mathbb{P}[X\in A]$ and $\mathbb{P}_{\tilde X}[A] := \mathbb{P}[\tilde{X}\in A]$, for each $A \in \mathcal{F}_{\mathcal{X}}$, respectively, where $\mathbb{P}_{\varphi \circ X}$ and $\mathbb{P}_{\varphi \circ \tilde{X}}$ are the push-forward measures defined of $(\mathcal{Y},\mathcal{F}_{\mathcal{Y}})$ by $\mathbb{P}_{\varphi \circ X}[B] := \mathbb{P}[\varphi \circ X\in B]$ and $\mathbb{P}_{\varphi \circ \tilde X}[B] := \mathbb{P}[\varphi \circ \tilde{X}\in B]$, for each $B \in \mathcal{F}_{\mathcal{Y}}$, respectively, while $\frac{\operatorname{d}\mathbb{P}_X}{\operatorname{d}\mathbb{P}_{\tilde{X}}} : \mathcal{X} \to [0,+\infty]$ is the Radon-Nikodyn derivative of $\mathbb{P}_X$ with respect to $\mathbb{P}_{\tilde{X}}$ and $\frac{\operatorname{d}\mathbb{P}_{\varphi \circ X}}{\operatorname{d}\mathbb{P}_{\varphi \circ \tilde{X}}} : \mathcal{Y} \to [0,+\infty]$ is the Radon-Nikodyn derivative of $\mathbb{P}_{\varphi \circ X}$ with respect to $\mathbb{P}_{\varphi \circ \tilde{X}}$.

Is it true that $$\mathcal{D}_{\text{KL}}(\mathbb{P}_{\varphi \circ X},\mathbb{P}_{\varphi \circ \tilde{X}}) \le \mathcal{D}_{\text{KL}}(\mathbb{P}_X,\mathbb{P}_{\tilde{X}})\;?$$

Intuitively, at least from an information theoretic point of view, I expect this to be true, since the Kullback-Leibler divergence measure the "discrepancies" between two measures, and it seems plausible that a deterministic transformation can only decrease this discrepancy.

In fact, this is true at least when $\varphi(X)$ and $\varphi(\tilde{X})$ are discrete random variables and $\mathbb{P}_X$ and $\mathbb{P}_{\tilde{X}}$ are absolutely continuous with respect to each other, given that \begin{align} \mathcal{D}_{\text{KL}}(\mathbb{P}_{\varphi \circ X},\mathbb{P}_{\varphi \circ \tilde{X}}) &= \int_{\mathcal{Y}} \ln \bigg(\frac{\operatorname{d}\mathbb{P}_{\varphi \circ X}}{\operatorname{d}\mathbb{P}_{\varphi \circ \tilde{X}}}(y)\bigg) \operatorname{d}\mathbb{P}_{\varphi \circ X}(y) \\ &= \sum_{y \in \mathcal{Y}} \ln \bigg(\frac{\mathbb{P}[X \in \varphi^{-1}(y)]}{\mathbb{P}[\tilde{X} \in \varphi^{-1}(y)]}\bigg) \mathbb{P}[X \in \varphi^{-1}(y)] \\ &= \sum_{y \in \mathcal{Y}} -\ln \bigg(\frac{\mathbb{P}[\tilde{X} \in \varphi^{-1}(y)]}{\mathbb{P}[X \in \varphi^{-1}(y)]}\bigg) \mathbb{P}[X \in \varphi^{-1}(y)] \\ &= \sum_{y \in \mathcal{Y}} -\ln \bigg(\frac{1}{\mathbb{P}[X \in \varphi^{-1}(y)]} \int_{\varphi^{-1}(y)} \frac{\operatorname{d} \mathbb{P}_{\tilde{X}}}{\operatorname{d} \mathbb{P}_{X}}(x) \operatorname{d}\mathbb{P}_X (x)\bigg) \mathbb{P}[X \in \varphi^{-1}(y)] \\ &\le \sum_{y \in \mathcal{Y}} \bigg(\frac{1}{\mathbb{P}[X \in \varphi^{-1}(y)]} \int_{\varphi^{-1}(y)} -\ln \bigg( \frac{\operatorname{d} \mathbb{P}_{\tilde{X}}}{\operatorname{d} \mathbb{P}_{X}}(x) \bigg) \operatorname{d}\mathbb{P}_X (x)\bigg) \mathbb{P}[X \in \varphi^{-1}(y)] \\ &= \int_{\mathcal{X}} \ln \bigg(\frac{\operatorname{d}\mathbb{P}_X}{\operatorname{d}\mathbb{P}_{\tilde{X}}}(x)\bigg) \operatorname{d}\mathbb{P}_X(x) \\ &= \mathcal{D}_{\text{KL}}(\mathbb{P}_X,\mathbb{P}_{\tilde{X}}) \end{align} where the inequality follows from the fact that $-\ln$ is a convex function and Jensen's inequality.

However, how should we proceed in the general case when $\varphi(X)$ and $\varphi(\tilde{X})$ are not discrete and we can't leverage the sum trick?

Bob
  • 5,995

0 Answers0