1

Suppose that $X_1 ,\cdots,X_n$ have cdf $F$. Let $F_n (t) =\sum\limits_{i=1}^{n} \dfrac{1}{n}I(X_i ≤ t)$. From From Hoeffding's inequality we have:

For each $t$: $\mathbb{P}(|F_n(t)-F(t)|>\epsilon) \le 2\exp(-2n\epsilon^2)$

Question:

This inequality is true for each $t \in \mathbb{R}$, therefore it is true for $t$ satisfying a particular property in particular if it exists. Let that property be $\sup\limits_t |F_n(t)-F(t)|$

$\implies \mathbb{P}(\sup\limits_t|F_n(t)-F(t)|>\epsilon) \le 2\exp(-2n\epsilon^2)$

Please let me know where did my reasoning go wrong.

Shadow
  • 43
  • There is no reason why $\sup_t |F_n(t) - F(t)| = |F_n(t_0) - F(t_0)|$ for some constant $t_0$. – Project Book Dec 29 '17 at 07:57
  • Is this the only reason? For example if such a constant did exist then would my reasoning be valid? – Shadow Dec 29 '17 at 07:59
  • Yeah, if it did. – Project Book Dec 29 '17 at 08:00
  • Thanks, one last question: Can such a constant exist for $|F_n(t)-F(t)|$ where $F(t)$ is a cdf and $F_n(t)$ is empirical cdf as defined above in general?

    I meant to ask if it doesn't always exist for cdfs in general or it depends on distribution of random variables.

    – Shadow Dec 29 '17 at 08:03
  • No I have not learned about VC dimension yet. This is explained before VC dimension in the notes. – Shadow Dec 29 '17 at 09:37
  • $(F_n(t))_{t\in \mathbb{R}}$ are random variables so unless $F$ is trivial, $t_0$ where $\sup_t |F_n(t) - F(t)|$ is attained, is going to be a random variable and not a constant. – Project Book Dec 29 '17 at 10:42

0 Answers0