6

I am self-studying empirical process theory. I have encountered the covering number $N(\delta,\mathcal{G},P)$, as well as the empirical version $N(\delta,\mathcal{G},P_n)$. It seems intuitive to expect some kind of convergence: $$ N(\delta,\mathcal{G},P_n)\rightarrow N(\delta,\mathcal{G},P) $$ Yet, I have no idea how to prove this. Can such a result be shown? Or are there counterexamples?

Definitions

Covering number: Let $P$ be a probability measure on the Borel-$\sigma$-algebra over $\mathbb{R}$. For $p\in[1,\infty)$ let $L^p(P)$ be the set of Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$, for which $\int_\mathbb{R} |f|^p dP<\infty$. Let $\mathcal{G}$ be a totally bounded subset of $L^p(P)$. For some $\delta>0$, we can define the covering number of $\mathcal{G}$ as the smallest $N\in\mathbb{N}$, such that there exists a finite subset $G\subset \mathcal{G}$ with the following property: For any $g\in\mathcal{G}$, there exists a $h\in G$, such that $||g-h||_p<\delta$. This number is denoted by $N(\delta,\mathcal{G},P)$.

Empirical measure: Let $P$ be as above. Let $\{X_n\}_{n\in\mathbb{N}}$ be a sequence of independent $P$-distributed random variables. If $\delta_{X_i}$ denotes the dirac-measure, the empirical measure $P_n$ is defined as: $$ P_n:\mathcal{B}(\mathbb{R})\rightarrow[0,1],\quad E\mapsto \frac{1}{n}\sum_{i=1}^n\delta_{X_i}(E) $$

RobPratt
  • 50,938
Idontgetit
  • 1,556
  • I would start by trying to show first that the expectation of the empirical covering number is equal to the true covering number. Then maybe it's possible to show a uniform law of large numbers for $N$ under some special cases of $\mathcal G$ – dmh Mar 05 '21 at 18:56
  • This is the canonical approach to solve problems in empirical process theory. Calculating the expectation of the empirical covering number seems nontrivial to me. – Idontgetit Mar 07 '21 at 09:24
  • 1
    There's a difficulty here that the elements of the covering are not necessarily part of the class of functions being covered. If we took $\mathcal G$ to be a GC class, then the difference of functions in this class $g-h$ is also GC and we can consider the convergence of $|g-h|{P_n}$ which converges to a value that is less than $\delta$ iff $|g-h|{P}$ is less than $\delta$. However, this doesn't answer your question since the elements of the covering need not be in $\mathcal G$. Any ideas if this argument can be made to work? – dmh Mar 08 '21 at 16:09
  • I have a vague idea of how to deal with the elements of the cover not being from a GC class: approximate these elements by smooth versions $g_i^\epsilon$ which are by construction from a GC class. The claim holds for these elements. Now show that $|g - g_i^\epsilon|{P_n}$ is close to $|g - g_i|{P_n}$ for all $g \in \mathcal G$ to complete the argument. – dmh Mar 08 '21 at 22:06
  • I guess the latter statement is easy to show: $|g-g_i^\epsilon| = |g-g_i^\epsilon + g_i - g_i| \leq |g-g_i| + |g_i - g_i^\epsilon|$. The second term can be made arbitrarily small. – dmh Mar 08 '21 at 22:12
  • 1
    I'm sceptical about this approach. Intuitively, approximating a non-GC class with a GC class should not be possible without problems. I guess the approximation would be in the $P_n$-norm, so the approximating class would be random. – Idontgetit Mar 09 '21 at 08:04
  • I would already be satisfied (for now) and accept an answer which can show such the result for GC classes (or an even smaller class). – Idontgetit Mar 09 '21 at 08:07

1 Answers1

1

The answer is no, due to problems with equivalence classes. We will look at two cases: First, the elements of $L^p(P)$ are understood as equivalence classes ($f=g$ if $P(f(x)=g(x))=1$), then we take them as individual functions.

Case 1: Interpreting $L^p(P)$ as a set of equivalence classes.

In this case, calculating the empirical covering number makes no sense, unless $P$ is discrete.

To demonstrate that this is nonsense, let $\mathcal{B}$ be the set of all Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$. take $P$ as the Lebesgue measure on $[0,1]$ and $\mathcal{G}=\{[0]\}$. Here, $[0]$ is the set of all Borel-measurable mappings $\mathbb{R}\rightarrow\mathbb{R}$, which are $P$-almost surely equal to zero. Take a finite subset $\{x_1,\dots,x_n\}\subset \mathbb{R}$. For any $g\in\mathcal{B}$, $$ x\mapsto \sum_{i=1}^n g(x)1\{x=x_i\}\in[0] $$ Therefore, for any $\delta>0$ and any $n\in\mathbb{N}$, the element $[0]$ can cover all of $\mathcal{B}$ in the empirical $L^p$-norm: $$ N(\delta,\mathcal{B},P_n)=1 $$ Trivially, $$ N(\delta,\mathcal{B},P)=+\infty $$ In fact $\mathcal{B}$ is so large that we cannot even define a countable covering for it.


Case 2: Interpreting $L^p(P)$ as a set of functions.

In this case, there is also a counterexample based on equivalence classes. But now it goes the other way.

Let $\mathcal{G}$ be the set functions parametrized by $\alpha\in [0,1]$, $\beta>0$, which map: $$ g_{\alpha,\beta}:[0,1]\rightarrow\mathbb{R},\quad x\mapsto \beta 1\{x=\alpha\} $$ Let $P$ be the Lebesgue measure on $[0,1]$. For any $g,h\in\mathcal{G}$, it holds that $g=h$, $P$-almost surely. So, for any $\delta>0$, $$ N(\delta,\mathcal{G},P)=1 $$ At the same time, suppose we have $n=1$ and we observe $x_1$. The empirical distance between elements of $\mathcal{G}$ with $\alpha=x_1$ is unbounded: $$ ||g_{x_1,\beta_1}-g_{x_1,\beta_2}||_{P_1}=|\beta_11\{x=x_1\}(x_1)-\beta_21\{x=x_1\}(x_1)|=|\beta_1-\beta_2| $$ So, $$ N(\delta,\mathcal{G},P_1)=+\infty,\quad\text{$P$-almost surely} $$ It is easy to see that the same holds for any $n$: $$ N(\delta,\mathcal{G},P_n)=+\infty,\quad\text{$P$-almost surely} $$

Idontgetit
  • 1,556