Randomized functions cannot increase statistical distance

Question

In these lecture notes instructor Chris Peikert states the following lemma without a proof

Let $f$ be a (randomized) function on the domain of $X$, $Y$. We have

$\triangle (f(X), f(Y)) \leq \triangle(X, Y )$

where triangle denotes statistical distance between two random variables.

First question: How can I prove this?

Second question:

Assume $X$ and $Y$ are independent uniformly distributed random variables. Define randomized algorithm $f$ like this:

$f(x) := \text{return 1 if $X = x$ and 0 otherwise}$

However, $\triangle (X, Y)=0$, but $\triangle (f(X), f(Y))$ is not zero? What I am missing here?

score 1 · Answer 1 · answered Dec 27 '23 at 16:09

I think the problem is due to that, the randomized process you define is dependent of $X$. I think the following slightly modified version of your statement is correct:

Consider any two random variables $X$ and $Y$ distributed over some finite universal set $U$. Consider a randomized process $f(t, z)$, which is a (deterministic) function $$ f: U\times R\to U, $$ where $R$ is the set of all random seeds.

Let $Z$ be a random variable distributed over $R$ that is independent of $X$ and $Y$.

Then $$\Delta(f(X, Z); f(Y, Z))\leq \Delta(X; Y).$$

Note that letting the range of $f$ to be $U$ is without loss of generality. In fact, we can also let $R=U$ w.l.o.g..

Here is my proof:

For all $u\in U$ , let set $S_u\subseteq U\times R$ be $$ S_u\overset{\text{def}}{=}\{(t, z)\in U\times R: f(t, z) = u\}. $$ Let set $S_{u|z}$ be $$ S_{u|z} \overset{\text{def}}{=}\{t\in U:f(t, z)=u\}. $$ Note that for different $u_1\neq u_2$ , $S_{u_1|z}$ and $S_{u_2|z}$ are disjoint. Hence the disjoint union of all $S_{u|z}$ is a subset of $U$ , i.e. $$ \bigcup_{u\in U} S_{u|z}\subseteq U. $$ What we want to prove is $$ \begin{align}2\Delta(f(X, Z); f(Y, Z))&=\sum_{u\in U}\left|\Pr[f(X, Z)=u]-\Pr[f(Y, Z)=u]\right|\\ &=\sum_{u\in U} \left|\Pr[(X, Z)\in S_u]-\Pr[(Y, Z)\in S_u]\right|\\ &=\sum_{u\in U}\left|\sum_{z\in R}\Pr[Z=z]\cdot \left(\Pr[X\in S_{u|z}]-\Pr[Y\in S_{u|z}]\right)\right|\\ &\leq\sum_{u\in U}\sum_{z\in R}\Pr[Z=z]\cdot \left|\Pr[X\in S_{u|z}]-\Pr[Y\in S_{u|z}]\right|\\ &=\sum_{z\in R}\Pr[Z=z]\cdot\left(\sum_{u\in U} \left|\Pr[X\in S_{u|z}]-\Pr[Y\in S_{u|z}]\right|\right)\\ &\leq\sum_{z\in R}\Pr[Z=z]\cdot\left(\sum_{t\in U} \left|\Pr[X=t]-\Pr[Y=t]\right| \right)&(*)\\ &=2\Delta(X; Y). \end{align} $$

In step $(*)$, we take the union of all (disjoint) $S_{u|z}$, which is contained in $U$.

This finishes our proof.

Hope this helps!

score 1 · Accepted Answer · answered Jul 20 '16 at 23:11

Second question:

$\Delta$ is defined on distributions so the distributions of $f(X)$ and $f(Y)$ are the same, since $X,Y$ are both uniform. The supremum definition makes this clearer.

First question:

Unless the map is one to one, hence a bijective, when equality holds, the probability that $f(X)=f(Y)$ can only increase compared to the probability that $X=Y$. Prove this for binary distributions and then subpartition and use induction to conclude the general case by using conditional probabilities.

Randomized functions cannot increase statistical distance

2 Answers2