2

Probabilistic pattern matching with matrices works like this:

For two given strings, $x,y$ where $x\in\{0,1\}^n$ and $y\in\{0,1\}^m$ and $n\ge m$, we want to find the occurrences of $y$ in $x$.

If we describe the $0$ digit as the matrix $M_0=\begin{pmatrix} 1 & 0 \\ 1 & 1 \\ \end{pmatrix}$ and the $1$ digit with the $M_1=\begin{pmatrix} 1 & 1 \\ 0 & 1 \\ \end{pmatrix}$ matrix , it isn't hard to prove that if we define the string $x$ with matrix multipilcation (e.g $M_0\cdot M_1 \cdot M_1$ for the $011$ string), we get that $x=y \iff M_x=M_y$.

My question is the following:

If we uniformly sample a prime number $p\in \{0,...,n^4\}$, and do all the computations above the $\mathbb{Z}_p$ field (i.e modulo $p$), what is the probability that for strings $a\ne b$ we will have $M_a \equiv M_b \text{ mod } p$? We can assume that the strings $a,b$ are distributed uniformly over $\{0,1\}^n$.

My first intuition was that the expected value of the number $p$ should be $\frac{n^4}{2}$, and thus the probability should be at most $\frac{2}{n^4}$, but the question states getting the bound of $\frac{1}{n^2}$, which leads me to think that my intution wasn't correct (even though $\frac{2}{n^4}$ << $\frac{1}{n^2}$ for $n$ large enough).

Thanks.

Mickey
  • 957
  • The expected value of $p$ is a bit less than $\frac{n^4}2$ because the primes are denser in the lower half of the interval, but that's just a small effect. The main problem is that you haven't specified a distribution for the strings, and without one it doesn't make sense to talk about probabilities. (E.g. for $a=b$ the probability that $a\ne b$ is zero.) – joriki Jul 03 '18 at 08:29
  • @joriki You are correct about the distribution for the strings. Lets assume that for each index $i$ we have $Pr(a_i = 0) = 0.5$ (i.e uniform distribution). I've edited it in. – Mickey Jul 03 '18 at 09:18
  • Thanks. (What you've edited into the question is stronger than that; it implies that the $a_i$ are independent.) – joriki Jul 03 '18 at 09:29
  • @joriki correct again, what I have written in the question is the relevant assumption (independent random variables) :) – Mickey Jul 03 '18 at 10:31

0 Answers0