0

Note: Also asked on Statistics Stack since I did not get an answer here.

I am trying to understand a paper about regularization in non-parametric regression and I am struggling to understand the RKHS involved there (For reference: The $v ||f||_{\mathcal{H}}$ term in Eq. 5 from Nonparametric Sparsity and Regularization). I know non-parametric regression is more of a statistics topic, but I think RKHS fits in both stacks.

My understanding of RKHS is that there are two perspectives about it (feel free to correct me!):

1.) When given a Hilbert space on $\Omega$, denoted by $\mathcal{H}(\Omega)$, I can investigate wether there are Kernels $K: \Omega \times \Omega \rightarrow \Omega$ satisfying the reproducing property, i.e. $\langle K_x,f \rangle = f(x)$. If I can find such kernels, I can say that my Hilbert space has reproducing kernels and call it RKHS. The example I found two times is the space of sequences with norm $\langle (a_n)_{n \in \mathbb{n}}, (b_n)_{n \in \mathbb{n}} \rangle = \sum_{i = 1}^{\infty} a_i b_i$. Then $K_p = 1_{y = p}$ leads to $\langle (a_n)_{n \in \mathbb{n}}, K_p\rangle = a_p$. So the space has reproducing kernels (compare here Stack: Calculating norm in RKHS and here Wikipedia: RKHS Examples).

2.) Again I am given a Hilbert Space which is not necessarily RKHS, e.g. $L^2(\Omega)$. Then I choose a kernel a priori, e.g. gaussian kernel, and filter out all the functions in my Hilbert space for which the reproducing property can not be fulfilled. So what I am left with are functions which are linear combinations of my kernels, i.e. $\forall f \in \mathcal{H}_K$ I can write $f(x) = \sum_{i = 1}^{\infty} \alpha_i K_{x_i}(x)$ (Stack: Understanding RKHS spaces). This is the point where my questions come in...

Here are my questions:

a) Coming back to the expression $v ||f||_{\mathcal{H}}$, which I first interpreted as a 'RKHS-norm'. Is my understanding correct, that the norm here is same I start with, when I just have a Hilbert space (and did not restrict it to only contain the linear combinations of the kernel)? The only difference is that the functions I can plug into the norm have a specific form now, i.e. lin. comb. of the kernel. So there is nothing like a 'RKHS-norm', it is just any norm on a given Hilbert space but the functions are restricted.

b) Now regarding Regularization and so called RKHS regression: Is my understanding correct, that a term like $v ||f||_{\mathcal{H}}$ can be choosen as a penalty in non-parametric regression because when you try to estimate a function in RKHS regression you write your function as lin. comb. of the kernel with parameter vector $\alpha$, and then $||f||_{\mathcal{H}} = \sqrt{\langle f, f \rangle}$ is proportional to a norm of the parameter vector $\alpha$. The vector norm is then 'induced' by what norm I choose for my Hilbert space, e.g. if I choose the $L^2$ norm on my Hilbert space, $||f||_{\mathcal{H}}$ will be proportional to $||\alpha||_2$.

Are these statements correct?

Red
  • 485

1 Answers1

1

Here is an attempt at an answer, feel free to ask for clarifications :

1.) When given a Hilbert space on $\Omega$, denoted by $\mathcal{H}(\Omega)$, I can investigate wether there are Kernels $K: \Omega \times \Omega \rightarrow \Omega$ satisfying the reproducing property, i.e. $\langle K_x,f \rangle = f(x)$. If I can find such kernels, I can say that my Hilbert space has reproducing kernels and call it RKHS. The example I found two times is the space of sequences with norm $\langle (a_n)_{n \in \mathbb{n}}, (b_n)_{n \in \mathbb{n}} \rangle = \sum_{i = 1}^{\infty} a_i b_i$. Then $K_p = 1_{y = p}$ leads to $\langle (a_n)_{n \in \mathbb{n}}, K_p\rangle = a_p$. So the space has reproducing kernels (compare here Stack: Calculating norm in RKHS and here Wikipedia: RKHS Examples).

This is mostly correct, but not quite right : first note that you don't investigate whether there are "kernels" but whether there exists one kernel, since by Moore-Aronszajn theorem, if there exists a kernel with the reproducing property on $\mathcal H(\Omega)$, it is the unique kernel with such property. Second, given an arbitrary Hilbert space, if you want to check whether it's a RKHS, you do not "look for kernels", but rather you check whether the pointwise evaluation $\delta_x : f\mapsto f(x) $ is continuous for all $x\in\Omega$. If it is, then the kernel you're looking for is given by Riesz representation theorem.

2.) Again I am given a Hilbert Space which is not necessarily RKHS, e.g. $L^2(\Omega)$. Then I choose a kernel a priori, e.g. gaussian kernel, and filter out all the functions in my Hilbert space for which the reproducing property can not be fulfilled. So what I am left with are functions which are linear combinations of my kernels, i.e. $\forall f \in \mathcal{H}_K$ I can write $f(x) = \sum_{i = 1}^{\infty} \alpha_i K_{x_i}(x)$ (Stack: Understanding RKHS spaces).

No, this "filtering" operation doesn't make much sense to me: remember that if a kernel $K$ has the reproducing property, it reproduces itself, i.e. $K(x,y) = \langle K(x,\cdot), K(y,\cdot)\rangle_{\mathcal H(\Omega)} $ for all $x,y\in\Omega$, but if $\mathcal H(\Omega)$ is arbitrary this will fail for many (almost all actually) kernels, so basically you will have filtered out $K$ itself! If you want to find the RKHS associated with a kernel $K$ defined on $\Omega\times\Omega$, the only option (literally the only option, as the RKHS is unique) is to take the completion of $\text{span}\{K(\cdot, x) \mid x \in\Omega\} $ with respect to the norm induced by the inner product $$\left\langle \sum_{i=1}^n\alpha_i K(\cdot, {x_i}), \sum_{j=1}^m \beta_j K(\cdot,{x_j}) \right\rangle_K := \sum_{i=1}^n\sum_{j=1}^m \alpha_i\beta_j K(x_i,x_j)$$

a) Coming back to the expression $v ||f||_{\mathcal{H}}$, which I first interpreted as a 'RKHS-norm'. Is my understanding correct, that the norm here is same I start with, when I just have a Hilbert space (and did not restrict it to only contain the linear combinations of the kernel)? The only difference is that the functions I can plug into the norm have a specific form now, i.e. lin. comb. of the kernel. So there is nothing like a 'RKHS-norm', it is just any norm on a given Hilbert space but the functions are restricted.

Yes this is true, if you have a Hilbert space $\mathcal H$ (so with inner product $\langle\cdot,\cdot\rangle_{\mathcal H} $ which happens to be a RKHS with kernel $K$, you have that all $f\in\mathcal H$ are of the form $f\equiv \sum_{i=1}^\infty f_i K(\cdot,x_i)$ (the limit here is understood with respect to the $\|\cdot\|_K$ norm defined above, and we have two equivalent representations of the inner product : $$\|f\|_{\mathcal H}^2 = \left\|\sum_{i=1}^\infty f_i K(\cdot,x_i)\right\|_{\mathcal H}^2 = \left\|\sum_{i=1}^\infty f_i K(\cdot,x_i)\right\|_K^2 = \sum_{i,j=1}^\infty f_if_j K(x_i,x_j). $$ Regarding your last sentence : yes it's true that the two norms are the same, but the power of this equivalence is that in practical applications of RKHS, we often deal with functions which are of the form $f =\sum_{i=1}^n f_i K(\cdot,x_i)$, in which case the other representation is much more convenient computationally speaking. (Also note that generally speaking, there are infinitely many possible representations of a RKHS, for instance using different feature maps, or orthogonal bases etc...)

b) Now regarding Regularization and so called RKHS regression: Is my understanding correct, that a term like $v ||f||_{\mathcal{H}}$ can be choosen as a penalty in non-parametric regression because when you try to estimate a function in RKHS regression you write your function as lin. comb. of the kernel with parameter vector $\alpha$, and then $||f||_{\mathcal{H}} = \sqrt{\langle f, f \rangle}$ is proportional to a norm of the parameter vector $\alpha$. The vector norm is then 'induced' by what norm I choose for my Hilbert space, e.g. if I choose the $L^2$ norm on my Hilbert space, $||f||_{\mathcal{H}}$ will be proportional to $||\alpha||_2$.

I don't understand very well what you've written here, but going back to what I said above, if we know that our function is of the form $f =\sum_{i=1}^n \alpha_i K(\cdot,x_i)$ for some finite $n$, then we have that $$\|f\|_{\mathcal H}^2 = \|f\|_K^2 = \sum_{i=1}^n \alpha_i\alpha_jK(x_i,x_j) = {\alpha}^T \tilde K \alpha $$ where $\alpha$ is the vector with entries $\alpha_1,\ldots,\alpha_n$ and $\tilde K$ is the $n\times n$ matrix with entries $\tilde K_{ij} = K(x_i,x_j) $. So in general, no, $\|f\|_{\mathcal H}$ will not be proportional to $\|\alpha\|_2$, but is closely related.

Hope this helps.

  • Thank you very much! I will work through it and maybe ask some question afterwards if that is ok. For now, I just wanted to accept your answer. – Red Dec 22 '24 at 15:35