Permuted Hamming distance

Question

Suppose Alice wants to send a message to Bob, they agree on a $n$ letters alphabet $\Omega = \{a_1, \cdots, a_n\}$ and they both agree on a shared secret $\omega=\omega_1 \cdots \omega_m$ $\omega_i \in \Omega \,\forall i$ that must be sent before the message. Unfortunately, Bob forgets the encoding of $\Omega$ and is only able to decode a message over a permuted alphabet $\sigma(\Omega) = \{\sigma(a_1), \cdots, \sigma(a_n)\}$ where $\sigma$ is a permutation of $\Omega$. If e.g. $\Omega$ are bits, it means that 0 and 1 could be flipped or not.

Suppose Alice wants to prove her identity to Bob, so she sends their shared secret $\omega$ to Bob, so Bob will decode the message $m := \sigma(\omega) = \sigma(\omega_1) \cdots \sigma (\omega_n)$. Bob will accept the message iff there exists a permutation $\tau$ s.t. $\tau (m) = \omega$.

Now suppose that the channel over which Bob receives messages is noisy, Bob wants to know how close a message is close to a secret, so he computes: $$ d_H^S (m,\omega) := \min_{\tau} \left(d_H (\tau(m), \omega)= \sum_i d_H(\tau(m_i), \omega_i)\right) $$ and accept the message iff $d_H^S (m,\omega)$ is smaller than some threshold $C$. As an example if $\Omega=\{0,1\}$, $d_H^S(00,11) = 0$ (bits are flipped) while $d_H^S(00,10) = 1$ (whether or not bits are flipped, distance is one).

Now suppose Charlie wants to fake the identity of Alice, he then sends random messages to Bob. To compute the threshold $C$, the natural question emerges:

Question: What could be the average value of $d_H^S (a,b)$ between two random words $a$ and $b$ of $n$ letters?

Interesting problem. For what it's worth, the minimization (for a given pair) can be regarded as a https://en.wikipedia.org/wiki/Assignment_problem and can be performed by the https://en.wikipedia.org/wiki/Hungarian_algorithm — leonbloy, May 05 '22 at 17:52
There is an error in the "answer" reasoning. The result for Gaussians that is applied is valid for LARGE n. It is totally incorrect to apply it for small n=2 or n=3 as here. — Mikhail Lifshits, Jun 21 '22 at 08:10

leonbloy · Answer 1 · 2022-05-09T13:28:59.797

Here's a very rough -perhaps useless- approximation.

We have two words $a$, $b$, each of length $m$, over the alphabet $\{0,1,2 \cdots n-1\}$

Let's define the $n \times n$ matrix $C$ such that $c_{i,j} = \sum_{k=1}^m [a_k=i][ b_k=j]$, i.e., the entry $c_{i,j}$ counts the pairs $(a_k,b_k) = (i,j)$, or the matchings produced by mapping $ j \to i$ in the word $b$. We have $\sum C_{i,j}=m$

Our task is to choose a permutation $P$, which correspond to $n$ elements from the matrix $C$ such that no elements share a row or column, and maximing the sum of those elements $s=\max_P \sum_{(i,j) \in P} c_{i,j}$. The corresponding minimal distance will be $ d = m - s$

This is known as an assignment problem.

In our setting, $a,b$ are assumed to be random; the elements $c_{i,j}$ follow a multinomial distribution of dimension $n^2$, with $E[c_{i,j}]=m/n^2$ Of course, they are not independent.

First approximation: for large $n$, the multinomial should be well aproximated by $n^2$ iid Poisson variables with $\lambda = m/n^2$ (Poissonization).

I didn't find any result about this setting, but I found this one for gaussian iid variables. There it's stated that the expectation of the maximum is asympotically equivalent to

$$s_0 = \sqrt{2 n \log(n!)}$$

for standard gaussians. In general, for a $N(\mu,\sigma^2)$, this would translate to $s_0 = \sqrt{2 n \log(n!)}\, \sigma+ n\mu$.

Let's make our second (quite bold) approximation: our Poisson rvs might approximate a gaussian with $\mu=\sigma^2=m/n^2$. With that one gets:

$$\frac{E[d]}{m} \approx 1 - \frac{1}{n} - \sqrt{\frac{ 2\log(n!)}{m \, n^2}} \tag 1$$

It's doubtful that this can make a valid approximation under some range of the parameters.

Added: Perhaps this reference is useful.

Added: empirically, it might seem correct that, for large $m$, $\frac{E[d]}{m} \approx 1 - \frac{1}{n} + \alpha_n m^{-1/2}$

The graph shows the results from simulations for $n=2$ and $n=3$, as function of $m$, together with the approximation $(1)$ in thin green line. The approximation seems quite good (astonishingly good, I'd say) for $n=3$ - so I'd be (now) more optimistic that it works for bigger $n$.

Edit 3: For $n=2$, using this we get the approximation

$$\frac{E[d]}{m} \approx \frac12 - \sqrt{\frac{1}{2 m \pi}}\tag2$$

Added in the graph (red dotted line).

Edit 4: Also, for $m=2$ it's easy to get the exact solution: $$\frac{E[d]}{m} =\frac{n-1}{n^2}$$

In summary:

$$ \frac{E[d]}{m} =\begin{cases} \displaystyle{\frac{n-1}{n^2}} & \text{if }m=2\\ \approx \displaystyle{1 - \frac{1}{n} - \sqrt{\frac{ a_n}{m }}} & \text{if }m>2\\ \end{cases}$$

where

$$ a_n=\begin{cases} \displaystyle{\frac{ 1}{2 \pi}} & \text{if }n=2\\ \displaystyle{\frac{ 2\log(n!)}{n^2}} \approx & \displaystyle{\frac{ 2\log(n)}{n}} \text{if }n>2\\ \end{cases}$$

I asked in MO about this model: https://mathoverflow.net/questions/422483/random-assignment-problem-under-multinomial-or-poisson-distribution — leonbloy, May 18 '22 at 13:32

Permuted Hamming distance

1 Answers1