Is this probability negligible?

Question

Let $X \in \{0,1\}^{2n}$ be a uniformly distributed random string and $Y \in \{0,1\}^{2n}$ such that $H(Y) = n$. Does this imply that $Pr(X=Y) = 2^{-n}$? If it's not, is this probability necessarily neglible in $n$?

I've only been able to show that $H(X|Y) \geq n$, but I'm not very sure what it implies about this probability...

I am not really used to using the concept of Shannon entropy, so forgive me if this is a bad question.

Mark Schultz-Wu · Answer 1 · 2024-02-09T21:07:37.693

With much effort, one can get a bound such that I am unclear how tight it is. The bound is non-standard. There is also evidence that replacing your assumption on $Y$ would yield easier bounds. I'll separate this answer into three parts.

A non-standard bound of questionable utility,
an attempt at a standard argument (that does not work, and leads to questioning your assumption on $Y$), and
showing that modifying your assumption on $Y$ yields a simple standard bound that seems relatively good.

Nonstandard Bound

Using Theorem 3 of this, we get that

$$ |H(X)-H(Y)| \leq 2n\Delta(X,Y) + h(\Delta(X,Y)) $$ Here, $h(x)$ is the binary entropy function, and satisfies the bound $h(x) \leq 2\sqrt{x(1-x)} \leq 2\sqrt{x}$. Using that $H(X)-H(Y) = 2n - H(Y)\geq 0$, we can (with much effort) solve the resulting quadratic (in $\sqrt{\epsilon}$) inequality. I get the bound

$$\left(\sqrt{1-\frac{H(Y)}{2n}}-\frac{1}{2n}\right)^2 \leq \Delta(X,Y) := 1 - \Pr[X = Y].$$

Plugging in $H(Y)\geq n$, we get that

$$ \Pr[X = Y] \leq \frac{1}{2} + \frac{1}{\sqrt{2}n}-\frac{1}{4n^2} $$

It is unclear to me how useful this bound is (due to the constant term $1/2$), but it is a bound in the direction you want (upper) on your quantity of interest ($\Pr[X = Y]$) in terms of your assumptions ($X$ uniform, $H(Y) = n$). Note here that we actually need the assumption that $H(Y) \leq n$ for things to go through.

A standard Bound that ends up in the "wrong direction"

The KL Divergence is defined to be

$$ \mathsf{KL}(P||Q) = \sum_x P(x)\log\frac{P(x)}{Q(x)} $$

Note that for $X$ uniform, we have that

$$\mathsf{KL}(Y||X) = \sum_x Y(x)\log Y(x) - \sum_x Y(x) \log \frac{1}{2^{2n}} = \sum_x Y(x)\log Y(x) +2n = 2n-H(Y)$$

The Bretagnolle–Huber inequality (which I will refer to as a "Pinsker Inequality" later. They both do conceptually similar things, Pinsker's inequality is known better, but Bretagnolle-Huber is tighter) states that for $\Delta(X, Y) := \frac{1}{2}\lVert X-Y\rVert_1$ the total variation distance (in cryptography, this is equivalently known as the distinguishing advantage)

$$ \Delta(X, Y)\leq \sqrt{1-\exp(-\mathsf{KL}(X||Y))} \leq 1- \frac{1}{2}\exp(-\mathsf{KL}(X||Y)). $$

The total variation distance has what is known as its coupling charcterization. A coupling $\Gamma$ of random variables $X$ on $\Omega_x$, $Y$ on $\Omega_y$ is a joint distribution $\Gamma = (X',Y')$ on $\Omega_x\times \Omega_y$ such that the marginals are what you expect. One coupling is the independent coupling

$$\Pr_{(X', Y')}(x,y) = \Pr_X(x)\Pr_Y(y)$$

but you can also have couplings where $X', Y'$ are dependent.

Anyway, the coupling characterization is that

$$\Delta(X,Y) = \inf_\Gamma \Pr_{\Gamma = (X', Y'}[X'\neq Y'].$$

By Pinsker's inequality and our previous computation of the KL divergence, we get that

$$\inf_\Gamma \Pr_{\Gamma = (X', Y')}[X'\neq Y'] \leq 1-\frac{1}{2}\exp(H(Y)-2n)$$

This is a fairly natural argument, but the inequalities are "pointing in the wrong direction". By this, I mean that we only get a lower bound on $\Pr[X = Y]$. One can generally reverse the inequalities in the above, but it typically comes at a high cost. The inequality to try to reverse is the "Pinsker-type" inequality (in the above, the Bretagnolle-Huber inequality). You can see an example Reverse Pinsker Inequality here, though it typically requires strong assumptions on the particular distributions of $X, Y$, e.g. I won't assume these assumptions are reasonable to you and show you what the resulting bound you get is.

Replacing your assumption on $Y$:

Alternatively, you can change your assumption on $Y$. A purely entropy bound in cryptography is quite uncommon. More typical is a min entropy bound, or average min-entropy. See here.

For example, if we define

$$ \tilde{H}_\infty(Y\mid X) = -\log \mathbb{E}_X[\max_y\Pr[Y = y\mid X = x]]$$

to be the average min-entropy, and note that

$$2^{-\tilde{H}_\infty(Y\mid X)} = \mathbb{E}_X[\max_y\Pr[Y = y\mid X = x]]$$

then it is simple to get the bound

\begin{align*} \Pr[Y = X] &= \sum_x \Pr[Y = x\mid X = x]\Pr[ X = x] \\ &= \mathbb{E}_X[\Pr[Y= x\mid X = x]]\\ &\leq \mathbb{E}_X[\max_y\Pr[Y = y\mid X = x]] \\ &= 2^{-\tilde{H}_\infty(Y\mid X)}. \end{align*}

This bound is clearly symmetric in $X$ and $Y$, e.g. you could replace your assumption that $H(Y) \geq n$ with one that either $\tilde{H}_\infty(Y\mid X) \geq n$, $\tilde{H}_\infty(X\mid Y)\geq n$, or something that implies one of these two.

kodlu · Answer 2 · 2024-02-10T04:42:05.427

You can get a very loose lower bound on the complementary probability via Fano's inequality: $$ H(X|Y) \leq H_2(e)+P(X\neq Y) \log\left(|{\cal X}|-1\right), $$ where $e$ is the binary variable $X\oplus Y$ and $H_2$ is the binary entropy; thus (using your lower bound) $$ n\leq H(X|Y) \leq H_2(e)+P(X\neq Y) \log(2n-1)\leq 1 + 2n P(X\neq Y), $$ yielding $$P(X\neq Y)\geq \frac{1}{2}-\frac{1}{2n}.$$

Edit: As far as remember the relationship between the conditional Renyi entropy and conditional Shannon entropy is complicated, precluding a simple bound without further information on the joint distribution between $X$ and $Y.$

D.W. · Answer 3 · 2024-02-10T02:55:35.510

No. It's not necessarily $1/2^n$, and it's not necessarily negligible in $n$. Consider the joint distribution on $X,Y$ induced by the following random process:

Pick $X=(X_1,\dots,X_{2n})$ uniformly at random from $\{0,1\}^{2n}$.
If $X_1=0$, let $Y=0$.
If $X_1=1$, let $Y_1:=1$, $Y_2:=1$ and $Y_i: =X_i$ for $i=3,4,\dots,2n$.

Notice that $\Pr[X=Y] > 1/4$, as if $X_1=1$ and $X_2=1$, then $X=Y$.

Also, you can verify that $H(Y)=n$ by a little calculation. Details below.

$$\begin{align*} H(Y) &= -\Pr[Y=0] \lg \Pr[Y=0]\\ &\phantom{= {}}\, -\sum_{u \in \{0,1\}^{2n-2}} \Pr[Y=11u] \lg \Pr[Y=11u]\\ &= -\frac{1}{2} \cdot (-1) - 2^{2n-2} {1 \over 2^{2n-1}} \cdot (-(2n-1))\\ &= \frac{1}{2} + \frac{2n-1}{2}\\ &= n. \end{align*}$$

Please check my calculation to see if it is correct or not. I am not confident in my answer.

Is this probability negligible?

3 Answers3