3

I am reading an article related to streaming algorithms named "Turnstile streaming algorithms might as well be linear sketched" by Yi Li, Huy Nguyen and David Woodruff,

At some point they have a random algorithm (uses a tape of random bits) that solved a problem over $\mathbb Z_{|m|}^{n} = \{-m,..,m\}^n$ that succeeds with probability $1-\delta$

They want to reduce the number of bits needed for randomness used by that algorithm using via the following statement:

Theorem 9: Let A be a randomized automaton solving problem P on $\mathbb Z_{|m|}^{n}$ with failure probability at most $\delta$. There is a ranomized automaton B that only needs to pick uniformly at random one of $O(\delta ^{-2}nlogm)$ deterministic instances of A and solves P on $\mathbb Z_{|m|}^{n}$ with failure probability at most $2\delta$

Proof: Let $A_1 ,A_2 ,.., A_{O(n\delta ^{-2}logm)}$ be independent draws of deterministic automata picked by B.

Fix an input $x\in \mathbb Z_{|m|}^{n}$.

Let $p_A(x)$ be the fraction of automata among $A_1 ,A_2 ,.., > A_{O(n\delta ^{-2}logm)}$ that solve problem P correctly on x and $p_B(x)$ be the probability that B solves P on x correctly.

By a Chernoff bound, we have that $Pr\{|p_A(x)-p_B(x)|\geq \delta\} \leq exp(-O(nlogm)) < (2m+1)^{-2n}$.

Taking a union bound over all choices of $x\in \mathbb Z_{|m|}^{n}$, we have $Pr\{|p_A(x)-p_B(x)|\geq \delta \ for\ all\ x\} > 0$.

Therefore, there exists a set of $A_1 ,A_2 ,.., A_{O(n\delta > ^{-2}logm)}$ such that $|p_A(x)-p_B(x)|\leq \delta$ for all $x\in > \mathbb Z_{|m|}^{n}$. The automaton B simply samples uniformly at random from this set of deterministic automata

I am having trouble understanding some parts of the proof:

the random variable $p_A(x)$ should be the amount of $A_i$'s that succeeds divided by the amount of them that B has sampled?

if so than by the low of total probability its easy to see that:

$p_B(x) = \sum_{i}P[B\ choose\ A_i] P[B\ succeeds\ on\ x\ |\ B\ choose' A_i] = \sum_{i}\frac{1}{O(\delta^{-2}nlogm)} P[A_i\ succeeds\ on\ x] = p_A(x)$

where the last move comes from the fact the each instance $A_i$ is deterministic so its either 1 for success or 0 for failure.

So the whole $|p_A(x)-p_B(x)|$ they use is actually 0

I understand from the end of the proof that $|p_A(x)-p_B(x)|$ should some how represent the distance between A's success to B's success but could not see how this happen.

Also I did not understand the use of Chernoff the way they did.

Matan L
  • 143
  • 4

1 Answers1

3

Here is another way of looking at $p_A$ and $p_B$. We can think of $B$ as a deterministic algorithm which accepts an additional input $r$ representing the randomness. This input is drawn from some distribution $R$. Also, denote by $P(x)$ the correct answer. Then $$ p_B(x) = \Pr_{r \sim R} [B(x,r) = P(x)]. $$ Let $M = O(n\delta^{-2}\log m)$, and sample $r_1,\ldots,r_M \sim R$. The algorithm $A_i$ simply runs $B$ with randomness $r_i$, that is $A_i(x) = B(x,r_i)$. We can define another algorithm $A$ which first choose $i \in \{1,\ldots,M\}$ uniformly at random, and then runs $A_i$. The success probability of $A$ is $$ \begin{align*} p_A(x) &= \Pr_{i \in \{1,\ldots,M\}} [B(x,r_i) = P(x)] \\ &= \Pr_{r \in \{r_1,\ldots,r_M\}} [B(x,r) = P(x)]. \end{align*} $$ Thus in $B$, the randomness is chosen according to $R$, whereas in $A$, the randomness is chosen uniformly from a set $\{r_1,\ldots,r_M\}$. If we choose $r_1,\ldots,r_M \sim R$, then we expect the performance of $A$ to be very similar to that of $B$, and this can be quantified using Chernoff's inequality.

The probability $p_A(x)$ itself depends on the choice of $r_1,\ldots,r_M$. What may confuse you is how we conceive of this two-step random process. We first choose $r_1,\ldots,r_M$, and this gives us the function $p_A(x)$, which is a random function depending on the choice of $r_1,\ldots,r_M$. Note that we don't take probability over the choice of $r_1,\ldots,r_M$ – if we did, we would just get $p_B(x)$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514