3

Assuming that $x$ is a sequence of $l$ bits and $0 \le n < l$, let $R(x, n)$ denote the result of the left bitwise rotation of $x$ by $n$ bits. For example, if $x = 0100110001110000$, then $$\begin{array}{l} R(x,0) = {\rm{0100110001110000}},\\ R(x,1) = {\rm{1001100011100000}},\\ R(x,2) = {\rm{0011000111000001}},\\ \ldots \\ R(x,15) = {\rm{0010011000111000}}. \end{array}$$

Let $A \oplus B$ denote the result of the XOR operation for two sequences of $l$ bits. For example, $$0100110001110000 \oplus 1010010001000010 = 1110100000110010.$$

Let $H(x)$ denote the number of non-zero bits in $x$ (i.e. the Hamming weight of $x$).

Assuming that $x$ and $y$ are two bitstrings of the same length $l$, let $f(x, y)$ denote the minimal element (the smallest number) in the tuple $$\begin{array}{l} (H(x \oplus y),\\ H(x \oplus R(y,1)),\\ H(x \oplus R(y,2)),\\ \ldots \\ H(x \oplus R(y,l - 1))). \end{array}$$

Suppose that we have a TRNG which generates sequences of random bits. Generate a sequence of $L = k \times l$ bits. Split this sequence into $k$ words (so the length of each word is $l$): $w_0, w_1, \ldots, w_{k-1}$. Then compute the following tuple $T$ of numbers:

$$\begin{array}{l} (f({w_0},{w_1}),\\ f({w_0},{w_2}),\\ \ldots \\ f({w_0},{w_{k - 1}}),\\ f({w_1},{w_2}),\\ f({w_1},{w_3}),\\ \ldots \\ f({w_1},{w_{k - 1}}),\\ f({w_2},{w_3}),\\ \ldots \\ f({w_{k - 2}},{w_{k - 1}})). \end{array}$$

In other words, for any pair of words $(w_i, w_j)$ such that $i \neq j$, compute the corresponding $f({w_i},{w_j})$.

Question 1: given $k$ and $l$, how to compute the expected value of the minimal number $M_T$ in $T$?

Question 2: given $k$ and $l$, how to compute the expected value of the average number $A_T$ in $T$? Here the number $A_T$ is computed as follows: sum all elements of $T$, then divide the sum by the total number of elements in $T$.

The expected number here implies the number with the maximum probability. For example, the expected number of zero bits in a sequence of $l$ random bits is $l/2$.

lyrically wicked
  • 1,379
  • 7
  • 11

1 Answers1

1

Edit: TL;DR

Editing this since a closed form answer is not tractable but a computation is.

All those pairwise hamming distances are samples from a binomial distribution, specifically $\textrm{Bin}(\ell,1/2).$ Then we need to consider the expectation, as well as the minimum of a set of binomial random variables, under the assumption that a TRNG is used.

Question 1: See page 68-69 in reference A, where the mean of the minimum $M_T$ is computed. Their $N$ is your $\ell$ and their $n$ is your total number of samples $k\binom{k}{2}.$ You can use the formula for $M_T=\mu_{1:n}=\sum_{x=0}^{N-1}(1-F(x))^n,$ where $F(x)$ is the cumulative binomial distribution given in (4.4.1).

A. Barry C. Arnold, N. Balakrishnan, H. N. Nagaraja, A First Course in Order Statistics, Classics in Applied Mathematics, SIAM, 2008.

Question 2:

By the law of large numbers, and assumption that the sequences are generated by a TRNG, the average $A_T$ will approach $\ell/2$ with very high probability very rapidly.

Original Answer:

Since you say the input strings are outputs of a TRNG, I shall take each of the terms as uniformly distributed random vectors.

You define
$$f(x, y)=\min\{H(x \oplus y),H(x \oplus R(y,1)),\ldots,H(x \oplus R(y,\ell-1)\} $$ which I shall model as the minimum Hamming weight of a set of $k$ randomly uniformly independently chosen binary $\ell-$tuples.

Each Hamming weight in this set is distributed as $\textsf{Binomial}(\ell,1/2).$ So we have the minimum of $k$ unbiased binomial samples on $\{0,1,\ldots,\ell\}$. The minimum is also called the first Order Statistic and letting $F(u)=\mathbb{Pr}[X\leq u]$ where $X$ is binomially distributed as above (you are already using $x$ as a variable hence the $u$), we have $$ \mathbb{Pr}[f(x,y)\leq x]=1-(1-F(u))^{k}. $$

If the randomness hypothesis holds, then your next step looks at all $\binom{k}{2}$ pairs of these minima, so in effect you are looking at the minimum of a collection of $$\binom{k}{2}k$$ binomial samples. This is $O(k^3)$ samples and the minimum will go to zero quite fast, if the randomness hypothesis above holds, so you'd want to consider $$ 1-\left[1-(1-F(u))^k\right]^{\binom{k}{2}} $$

Based on this my tentative answers are:

Question 1: given $k$ and $l$, how to compute the expected value of the minimal number $M_T$ in $T$?

This will be close to zero for any sizable $k$ compared to $\ell.$ You might want to use the Gaussian approximation to the binomial and use the Gaussian cumulative distribution function to evaluate an estimate.

Question 2: given $k$ and $l$, how to compute the expected value of the average number $A_T$ in $T$? Here the number $A_T$ is computed as follows: sum all elements of $T$, then divide the sum by the total number of elements in $T$.

This is now the average of the individual order statistics instead of the minimum of the minima. It will be very likely comparable to $$ 1-(1-F(u))^k. $$

Remark: If you want to directly use approximations to the binomial you can see my answer to the following mathoverflow question. Note that for the binomial $\textsf{Binomial}(\ell,1/2),$ and for any $u \in \{0,1,\ldots,\ell\}$ we have $$ 2^{-\ell}\sum_{j\leq u-1} \binom{\ell}{j}\leq F(u)\leq 2^{-\ell}\sum_{j\leq u} \binom{\ell}{j} $$

kodlu
  • 25,146
  • 2
  • 30
  • 63