5

Suppose $X,X_1,X_2,X_3\dots$ is a $\mathbb{P}$-i.i.d. family of $[-1,1]$-valued random variables with $\mathbb{E}[X] = 0$. By Hoeffding's inequality, we know that \begin{equation*} \forall T \in \mathbb{N}, \forall \delta \in(0,1), \qquad\mathbb{P}\bigg[ \frac{1}{T} \sum_{t=1}^T X_t \ge \sqrt{\frac{2}{T} \log\Big(\frac{1}{\delta}\Big)} \bigg] \le \delta\;. \end{equation*} I'm wondering if a better upper bound than the $(2^T-1) \cdot \delta$ (that follows from a union bound) holds on the quantity \begin{equation*} \mathbb{P}\Bigg[ \bigcup_{\emptyset\neq A \subset \{1,\dots,T\}} \bigg\{\frac{1}{|A|} \sum_{t\in A} X_t \ge \sqrt{\frac{2}{|A|} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg]\;, \end{equation*} where $|A|$ is the number of elements in $A$.

Specifically, I'm looking for an upper bound of (nearly) the form $O(T^\alpha)\cdot \delta$, for some $\alpha > 0$.

I suspect this could be true, due to the highly entangled structure of this union. To get a more specific idea of why this could be the case, this answer to a question similar in spirit proves that the bound coming from a union bound is far from tight.

So far, I tried to apply the aforementioned answer in the following way. We say that a set $\{A_1, \dots, A_T\}$ is a string if $A_1 \subset \dots \subset A_T$, $|A_k| = k$ and $A_k \subset \{1,\dots,T\}$, for each $k \in \{1,\dots,T\}$. We say that a family of strings $\mathcal{A}_1,\dots,\mathcal{A}_m$ is a string-cover of $\{1,\dots,T\}$ if for each $A \subset \{1,\dots,T\} \backslash \{\emptyset\}$ there exists $k \in \{1,\dots,m\}$ such that $A \in \mathcal{A}_k$. Then if $\mathcal{A}_1,\dots,\mathcal{A}_m$ is a string-cover of $\{1,\dots,T\}$, by the previous answer and a union bound we have that the probability we are trying to upper bound is upper bounded by $ m \cdot \log(T) \cdot e^2 \cdot \log(e/\delta) \cdot \delta $. However, note that if $\mathcal{A}_1,\dots,\mathcal{A}_m$ is a string-cover of $\{1,\dots,T\}$ then $m \ge \frac{2^T-1}{T}$, so this idea leads to something that at best is still exponential in $T$.

Any other ideas?

Bob
  • 5,995
  • 1
    If your bound was true, I think it would give you very good control of $\sum |X_t|$. Maybe one can use this to show that your desired bound cannot hold? I might be wrong. – PhoemueX Jul 25 '22 at 18:28

1 Answers1

5

Contrary to what I was hoping, rearrangements have a major impact: if $f$ is a polynomial, an upper bound of the form $f(T)\cdot \delta$ on the probability of the event cannot hold for each $T \in \mathbb{N}$ and $\delta \in (0,1)$.

What follows is a proof of this claim.

Let $(\Omega,\mathcal{F},\mathbb{P})$ be the probability space where $X,X_1,X_2,\dots$ are defined. Assume that $$\mathbb{P}[X=1] = 1/2 =\mathbb{P}[X=-1].$$ For each $T \in \mathbb{N}$, define $E_T:=\{\omega \in \Omega \mid \sum_{t=1}^TX_t \ge \sqrt{T}\}$. Since $\mathbb{E}[X]=0$ and $\mathrm{Var}(X) = 1$, by the CLT we have that $$\mathbb{P}[ E_T ] \to 1-\Phi(1) \;, \qquad T \to \infty \;,$$ where $\Phi$ is the cdf of the standard normal. It follows that there exists a $T_0 \in \mathbb{N}$ such that for each $T \ge T_0$ it holds that $$\mathbb{P}[ E_T ] \ge \frac{1-\Phi(1)}{2} \;.$$ For each $T \in \mathbb{N}$ and each $\omega \in \Omega$, define $A_T(\omega) := \big\{t \in \{1,\dots,T\} \mid X_t(\omega)=1\big\}$.

Notice that $$\forall T \in \mathbb{N}, \forall \omega \in E_T, \qquad |A_T(\omega)| \ge \sqrt{T}.$$

Suppose that $f : \mathbb{N} \to \mathbb{R}$ is such that $$\forall T \in \mathbb{N}, \forall \delta \in (0,1), \qquad \mathbb{P}\Bigg[ \bigcup_{\emptyset\neq A \subset \{1,\dots,T\}} \bigg\{\frac{1}{|A|} \sum_{t\in A} X_t \ge \sqrt{\frac{2}{|A|} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg] \le f(T) \cdot \delta \;.$$ It follows that, for each $T \ge T_0$, if we define $\delta := e^{-\sqrt{T}/2}$, we have \begin{align*} f(T) e^{-\sqrt{T}/2} &= f(T) \cdot \delta \ge \mathbb{P}\Bigg[ \bigcup_{\emptyset\neq A \subset \{1,\dots,T\}} \bigg\{\frac{1}{|A|} \sum_{t\in A} X_t \ge \sqrt{\frac{2}{|A|} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg] \\ &\ge \mathbb{P}\Bigg[ \bigg\{\omega \in E_T \; \bigg|\; \frac{1}{|A_T(\omega)|} \sum_{t\in A_T(\omega)} X_t(\omega) \ge \sqrt{\frac{2}{|A_T(\omega)|} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg] \\ &= \mathbb{P}\Bigg[ \bigg\{\omega \in E_T \; \bigg|\; 1 \ge \sqrt{\frac{2}{|A_T(\omega)|} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg] \\ &\ge \mathbb{P}\Bigg[ \bigg\{ \omega \in E_T \; \bigg|\; 1 \ge \sqrt{\frac{2}{\sqrt{T}} \log\Big(\frac{1}{\delta}\Big)} \bigg\} \Bigg] = \mathbb{P}[E_T] \ge \frac{1-\Phi(1)}{2} \;. \end{align*} It follows that for each $T \ge T_0$ we have $$f(T) \ge \frac{1-\Phi(1)}{2} e^{\sqrt{T}/2}\;,$$ meaning that $f$ can't be polynomial in $T$.

Bob
  • 5,995