2

Suppose I sample a matrix $h$ from $\mathbb{Z}_2^{l \times n}$ where each entry in $h$ is $1$ with probability 1/2. Suppose I also have a set $S\subset \{0,1\}^n$, and I define a random variable $X$ with $$\mathrm{Pr}\left[X = x \in S\right] = 1 / |S|.$$ The leftover hash lemma (from Lemma 2.1 of Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data by Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, Adam Smith) states that if $$l \leq m - 2 \log_2(1/\epsilon) + 2$$ where $m$ is the min-entropy of $X$ (in this case, $m = \log_2|S|$) then the statistical distance between the distribution of $h(x)$ for $x \in X$ and the uniform distribution over $\{0, 1 \}^l$ is at most $\epsilon$.

Given that I sample $h$ as above, am I always guaranteed to have at most this statistical distance from the uniform? I ask because I've been reading a textbook that says something that seems in tension. Specifically, Theorem D.5 within Computational Complexity A Conceptual Perspective, Oded Goldreich says that there will be a fraction of the hash functions which won't satisfy the guarantee. Am I missing something here? Does the leftover hash lemma not apply for specific hash functions that I choose in the manner above, but only on average?

This answer seems relevant, and it makes me suspect that I'm misunderstanding what we're using to measure the statistical distance. In my mind, I thought the leftover hash lemma guarantees that the new variable $h(X)$ will be uniformly distributed over $l$ bits within that statistical distance. Is that not the case?

Germ
  • 133
  • 4

2 Answers2

3

Suppose I sample a matrix $h$ from $\mathbb{Z}_2^{l \times n}$ where each entry in $h$ is $1$ with probability 1/2.

One such matrix in this distribution is the all-zero matrix.

Given that I sample $h$ as above, am I always guaranteed to have at most this statistical distance from the uniform?

What happens if what you sample for $h$ is the all-zero matrix?

  • What's the conditional distribution on $h(X)$ given that $h$ is the all-zero matrix?

  • What's the statistical distance of $h(X)$ from uniform given that $h$ is the all-zero matrix?

I ask because I've been reading a textbook that says something that seems in tension. Specifically, Theorem D.5 within Computational Complexity A Conceptual Perspective, Oded Goldreich says that there will be a fraction of the hash functions which won't satisfy the guarantee. Am I missing something here? Does the leftover hash lemma not apply for specific hash functions that I choose in the manner above, but only on average?

The point is that the leftover hash lemma is a statement about the joint distribution over hash functions and inputs $(h,X)$. The probability of sampling the all-zero matrix (or other cases that seem similarly pathological) is small enough in this joint distribution that the statistical distance of $h(X)$ from uniform remains within the bound.

1

It's a tricky question in that there may be two (not necessarily equal) answers. There's the theoretical approach and the practical one.

Practically and somewhat circuitously, we can go with NIST's recommendations. They posit that "full entropy" is that where a sample's $\epsilon \le 2^{-64}$.

They also state that the following table of functions is acceptable in generating full entropy from an input:-

table

There's a managerie of hashes and ciphers. We can assume that your $\mathbb{Z}_2^{l \times n}$ and even Toeplitz matrices would be included as those are currently used in commercial TRNGs calibrated with the same Left Over Hash lemma.

So my answer in short is that yes, it appears empirically, commercially and standards wise that the lemma holds for all types of decent (but not necessarily even cryptographic) hash.

I'll look forward to a theoretical answer.


You'll find that if you use AES as a conditioner with it's 128 bit block size and a 256 bit input, you get exactly $\epsilon = 2^{-64}$. That's the origin of the "losing one bit of entropy after hashing" meme.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83