0

I am trying to understand the assumption proof of Theorem 2 in the paper "A Universal Law of Robustness via isoperimetry" by Bubeck and Sellke.


$$ \mathbb{P}\left(\sqrt{\frac{d}{c n L^{2}}} \sum_{i=1}^{n}\left(f\left(x_{i}\right)-\mathbb{E}[f]\right) z_{i} \geq t\right) \leq 2 \exp \left(-(t / 9)^{2}\right) ............(1) $$ We rewrite the above as: $$ \mathbb{P}\left(\frac{1}{n} \sum_{i=1}^{n}\left(f\left(x_{i}\right)-\mathbb{E}[f]\right) z_{i} \geq \frac{\epsilon}{8}\right) \leq 2 \exp \left(-\frac{\epsilon^{2} n d}{9^{4} c L^{2}}\right).....................(2) $$ Since we assumed that the range of the functions is in $[-1,1]$ we have $\mathbb{E}[f] \in[-1,1]$ and hence: $$ \mathbb{P}\left(\exists f \in \mathcal{F}: \frac{1}{n} \sum_{i=1}^{n} \mathbb{E}[f] z_{i} \geq \frac{\epsilon}{8}\right) \leq \mathbb{P}\left(\left|\frac{1}{n} \sum_{i=1}^{n} z_{i}\right| \geq \frac{\epsilon}{8}\right).............(3) $$


1.How Equation 2 is coming from Equation 1? How can it be rewritten like this?

  1. How Equation 3 is forming?

Update : Can anybody help me in this follow up question?

2 Answers2

0
  1. Set $$\epsilon = 9t \sqrt{\frac{cL^2}{dn}}$$

  2. If $\frac{1}{n} \sum_{i=1}^n E[f] z_i \ge \epsilon / 8$, then $$E[f] \left(\frac{1}{n} \sum_{i=1}^n z_i\right) > 0$$ which implies $$\frac{1}{n} \sum_{i=1}^n E[f] z_i = \left|E[f]\right| \left|\frac{1}{n} \sum_{i=1}^n z_i\right| \le \left|\frac{1}{n} \sum_{i=1}^n z_i\right|.$$

angryavian
  • 93,534
0

For the jump from equation 1 to 2 you can follow this procedure \begin{align} 2\exp(-(t/9)^2) &\geq \mathbb{P}\left (\sqrt{\frac{d}{cnL^2}}\sum_{i=1}^n (f(x_i) - \mathbb{E}[f])z_i \geq t \right ) & (\text{Eq. 1})\\ =& \mathbb{P}\left (\sqrt{\frac{cL^2}{dn}} \sqrt{\frac{d}{cnL^2}}\sum_{i=1}^n (f(x_i) - \mathbb{E}[f])z_i \geq \sqrt{\frac{cL^2}{dn}}t \right ) \\ =& \mathbb{P}\left (\frac{1}{n} \sum_{i=1}^n (f(x_i) - \mathbb{E}[f])z_i \geq \sqrt{\frac{cL^2}{dn}}t \right ) \end{align} where the second line is just multiplying both sides of the inequality by the same positive constant, and the third is just simplifying.

Now choose $t = \sqrt{\frac{dn}{cL^2}}\frac{\epsilon}{8}$. Plugging this into the first and last gives \begin{align} \mathbb{P}\left (\frac{1}{n} \sum_{i=1}^n (f(x_i) - \mathbb{E}[f])z_i \geq \sqrt{\frac{cL^2}{dn}} \sqrt{\frac{dn}{cL^2}}\frac{\epsilon}{8} \right ) &\leq 2\exp\left (-\left (\frac{1}{9} \sqrt{\frac{dn}{cL^2}}\frac{\epsilon}{8}\right )^2 \right ) \\ \mathbb{P}\left (\frac{1}{n} \sum_{i=1}^n (f(x_i) - \mathbb{E}[f])z_i \geq \frac{\epsilon}{8} \right ) & \leq 2\exp\left (-\left (\frac{1}{9} \sqrt{\frac{dn}{cL^2}}\frac{\epsilon}{8}\right )^2 \right ) \\ &=2\exp\left(-\frac{1}{9^2}\frac{\epsilon^2}{8^2}\frac{dn}{cL^2}\right) \\ &\leq 2\exp\left (-\frac{\epsilon^2nd}{9^4cL^2} \right) \end{align} which proves the first part.

For the second part of the question, we need the basic fact that if "not $B$" implies "not $A$," then $\mathbb{P}(A) \leq \mathbb{P}(B)$. To see this, note that \begin{align}\mathbb{P}(A) &= \mathbb{P}(A \text{ and } B) + \mathbb{P}(A \text{ and not } B) \\ &=\mathbb{P}(A|B)\mathbb{P}(B) + \mathbb{P}(A|\text{not } B)\mathbb{P}(\text{not } B) \\ &\leq 1\mathbb{P}(B) + 0\mathbb{P}(\text{not } B) = \mathbb{P(B)} \end{align} With this in mind, let $B$ be the event $\left| \frac{1}{n}\sum_{i=1}^n z_i \right | \geq \frac{\epsilon}{8}$ and let $A$ be the event $$\exists f \in \mathcal{F} \text{ s.t. } \frac{1}{n}\sum_{i=1}^n \mathbb{E}[f]z_i \geq \frac{\epsilon}{8}.$$ Now note that since $|\mathbb{E}[f]| \in [0,1]$, we have if not $B$ then \begin{equation} \frac{\epsilon}{8} > \left| \frac{1}{n}\sum_{i=1}^n z_i \right | \geq |\mathbb{E}[f]|\left| \frac{1}{n}\sum_{i=1}^n z_i \right | \geq \frac{1}{n}\sum_{i=1}^n \mathbb{E}[f]z_i \end{equation} which mean not $B$ implies not $A$, and we can apply the bound above.

  • Thanks! for such elaborated explanation. Can you just tell me , Why you have written P(A given B) as 1 in the proof of second part! –  Jun 07 '22 at 06:52
  • It should be ---- "if "not A" implies "not B," then P(A)≤P(B)". –  Jun 07 '22 at 08:58
  • Yes there was backwards inequality in the sentence, I corrected it to "if not B implies not A, then P(A) <= P(B)". The reason I exchanged P(A|B) for 1 because 1 is always an upper bound on a probability, and therefore P(A|B)P(B) <= P(B). This can also be seen by spotting P(A and B) <= P(B). – Matt Werenski Jun 07 '22 at 12:44
  • okay. Yes , P(A and B) <= P(B) by same logic P(A and not B) <= P(not B), why 0 * P(not B).Am I missing something? :( Not B implies Not A,; in the second term there is P(A given not B) , hence is it 0?? –  Jun 07 '22 at 18:27
  • 1
    The reason it's 0 is because we are assuming that "not B" implies "not A" and therefore P(not A | not B) = 1. By the sum rule P(A | not B) + P(not A | not B) = 1, and therefore P(A | not B) + 1 = 1 which implies P(A | not B) = 0. – Matt Werenski Jun 07 '22 at 18:38
  • @M.Werenki Can you help me in this, I am really struggling https://math.stackexchange.com/questions/4471556/how-can-i-perform-union-operation-into-two-events –  Jun 13 '22 at 18:22