3

I am having some trouble understanding the notion of the VC dimension. The definition I have is the following:

The VC dimension of a set of hypothesis functions $H$ is the cardinality of the largest set which $H$ can shatter. We say that $H$ shatters some set $S \subseteq \mathcal{X}$ if we can realise any labelings on $S$ using functions from $H$.

To me, this means that for any (countable) set of labels $L \subseteq \mathbb{Z}$, there is some $h \in H$ that provides a surjection $h: S \mapsto T$, where $T \subseteq \mathcal{X}$ with $|T| = |L| \leq |S|$. (There is then a trivial bijection from $T$ to $L$.)

Is this the correct interpretation?

The reason I ask is because the above interpretation leads me to think that the singleton set $H = \{h\}$ has $VC(H) = 1$, since it always sends a single point to a single point, so we can get any labelling we like on this single point by a trivial bijection.

That is, suppose we wanted to label the point $x \in \mathbb{R}$ as $l$, then $h: y \in \mathbb{R} \mapsto 0$ will do since we can just change the name of 0 to $l$. Note that this chosen $h$ does not shatter a two-set from $\mathbb{R}$, since we can only label the points in one distinct way.

However, I have read that the VC of a singleton is 0. I don't understand this.

(I see hypothesis functions $g: \mathcal{X} \to \{1, \dots, n\}$ as belonging to an equivalence class of functions that sends $\mathcal{X}$ to any set of size $n$. Please correct me if this is the wrong intuition.)

user27182
  • 133
  • 3

2 Answers2

4

A family of hypothesis functions on domain $\cal X$ is a subset of $\{0,1\}^{\cal X}$. A family $H$ shatters a set $S \subseteq \cal X$ if for every subset $T \subseteq S$ there exists a function $h \in H$ such that $h(s) = 1_{s \in T}$ for all $s \in S$, that is, $h(s) = 1$ if $s \in T$ and $h(s) = 0$ if $s \in S \setminus T$.

A singleton family $H$ cannot shatter any non-empty set $S$. Indeed, suppose that $H = \{h\}$ shatters $S \neq \emptyset$, and take any $s \in S$. Taking $T = \emptyset$, we see that $h(s) = 0$, whereas taking $T = S$, we see that $h(s) = 1$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
0

I'm not sure when you mean that the VC dimension of a singleton is 0? I have always thought it was 1. How I think about it is that given some $H$ of classifiers, I will pick any one point in the domain space $\mathcal{X} = \mathbb{R}$. Then an adversary will have to label it whichever way he wants. And if one of my classifiers in $H$ is able to classify correctly the points no matter how the adversary labels the points, then we say that $H$ shatters 1 point and should have at least VC dimension of 1.

So let the classifier set be $H = \{ h_i \}$ where $h_i(x) = 1$ if $x=i$ and 0 otherwise, for all $i \in \mathbb{R}$. So say for instance, I pick the point at $x=6$. The adversary comes in to label it with a 1, then I can provide the adversary with a classifier that labels this point with 1 (i.e. $h_6$). If he labels the point with 0 instead, then I instead give him another classifier that is not $h_6$ (that will label it 0 for sure).

So now, we check if $H$ can shatter two points. Apparently if we picked two points $x_i,x_j \in \mathbb{R}$, then the adversary will just label both with 1s and thus I am unable to find a classifier in $H$ that labels both with 1s. I try to give him $h_i \in H$ but that labels $x_i$ with 1 and $x_j$ with 0. I could try giving him $h_j \in H$ but $x_i$ labelled with 0 and $x_j$ with 1 (i.e. none of the classifiers in $H$ can give me a (1,1) labelling due to the nature of the singleton property).

M. Fire
  • 31
  • 2