0

Consider a real normed vector space, and a collection of linearly independent vectors $x_i \in X$ for $i=1,2,3,4..., n$. For a fixed $y \in X$ show that:

$\inf_{\alpha_i \in \mathbb{R}, i=1,2,3...,n} ||{y-\sum_{i=1}^{n}\alpha_ix_i}||$ assumes a minimum.

So this is an engineering math course and were going over function fitting. First off, I know that infimum and minimum are not the same thing, but the way the problem is posed seems to imply that there is an absolute minimum and it is equal the infimum of the set. I think it may that I'm not understanding his notation.

Conceptually, as I see it, if $X$ were the space of continuous functions, and we wanted to fit a polynomial to that continuous function, we certainly would want to minimize the norm difference between our function, $y$, and the fit polynomial $\sum_{i=1}^{n}\alpha_ix_i$. In this case the $x_i$ could be a basis of $n$th order polynomials. Hence, if we find a minimum norm value and the vector $\sum_{i=1}^{n}\alpha_ix_i$ that generates it, then we've found a "best" fit approximation for $y$.

My first thought to go about showing a minimum exists was to say that we could simply go through every possible $\sum_{i=1}^{n}\alpha_ix_i$, evaluate the norm with $y$, and then find one particular selection of $x_i$ and $\alpha_i$ that generates the minimum. However, since the $\alpha_i$ can be any numbers, I can't "check" every possible fit function. So this won't work.

Another small note I made is that $0$ is always a bound I can make below by properties of norms, but I'm unsure of how to refine the minimum value from that point.

  • Aren't norms continuous? So close to zero, your norm is arbitrarily close to zero. – Sean Nemetz Oct 18 '17 at 05:00
  • I know the absolute minimum value it can obtain is zero, but I don't think I'm guaranteed that there exists a selection of $x_i$ and $\alpha_i$ to make the norm difference zero in all cases. – Leif Ericson Oct 18 '17 at 05:07
  • Could calculate the difference from $y$ to the space spanned by the "$x_i$"s. I think it's called the "perpendicular distance" from $y$ to this space. – Sean Nemetz Oct 18 '17 at 05:11
  • I believe an infimum is gauranteed here – Sean Nemetz Oct 18 '17 at 05:12
  • Look here maybe: https://math.stackexchange.com/questions/112728/how-do-i-exactly-project-a-vector-onto-a-subspace – Sean Nemetz Oct 18 '17 at 05:22

5 Answers5

1

Here is a one answer:

If a matrix $X$ has linearly independent columns, then there is some $k>0$ such that $\|Xz \| \ge k \|z\|$ for all $z$.

Let $f(\alpha) = \|X \alpha -y\|$, where $X$ is the matrix composed of the vectors $x_k$. Note that $f(\alpha) \ge 0$ and let $m = \inf_\alpha f(\alpha)$. We have $f(\alpha) \ge \|X \alpha \| - \|y\| \ge k \|\alpha \| - \|y\|$.

In particular, if $\| \alpha \| > {1 \over k} (m+1+\|y\|)$ we have $f(\alpha) > m+1$, and so the set $\{ \alpha | f(\alpha) \le m+1 \}$ is non empty and compact and hence $f$ has a minimiser $\alpha_0$ such that $f(\alpha_0) = m$.

Here is a more satisfactory answer, but it relies on the result that says that a differentiable convex function $f$ is minimised at a point $\alpha^*$ iff ${\partial f( \alpha^*) \over \partial \alpha} = 0$.

Ley $\phi(\alpha) = {1 \over 2} \|X\alpha -y \|^2$ (the ${1 \over 2}$ is a conventional convenience, the square makes $\phi$ differentiable everywhere). It should be clear that minimising $\phi$ is the same as minimising $f$.

We can write $\phi(\alpha) = {1 \over 2}\langle \alpha , X^T X \alpha \rangle - \langle X^T y, \alpha \rangle + {1 \over 2} \|y\|^2$ and note that since $X^T X$ is positive semi definite (in fact definite, in this case) that $\phi$ is convex. Furthermore, ${\partial \phi(\alpha) \over \partial \alpha} = \alpha^T X^T X -y^T X = (X^T(X\alpha-y))^T$.

Hence $\alpha^*$ minimises $\phi$ (and hence $f$) iff $X^T(X\alpha^*-y) = 0$ (these are the normal equations).

In particular, there is always a solution to the normal equations (for any matrix $X$, not just one with linearly independent columns), and if $X^T X$ is invertible (iff the columns are linearly independent) then there is a unique minimiser given by $\alpha^* = (X^TX)^{-1} X^T y$.

If the columns of $X$ are not linearly independent, a solution still exists but is not unique.

copper.hat
  • 178,207
0

The infimum is taken over the set of $\alpha = (\alpha_1, \ldots, \alpha_n)$ which is basically the same as $\mathbb{R}^n$

You can approach the problem as follows (no solution, only outline):

Let $f(\alpha)$ denote the function for which you want to find the minimum, and fix some $d_0 =f(\alpha_0)$

Show that there is a bounded closed set $D$ of such $\alpha$ such that $\alpha\notin D\Rightarrow f(\alpha) > d_0$.

Now you can minimize $f$ over $D$ since $D$ is compact and $f$ is continuous. The minimum will be attainin the interior of $D$ since it will be smaller or equal to $d_0$ and since at the boundary you are already bigger than a fixed value.

Thomas
  • 23,023
  • Given the level of the exercise I'm not sure they know or can use Weierstrass theorem in this general form – Tancredi Oct 18 '17 at 05:28
0

View your problem this way: you are searching for $\inf_{w\in W}\|y-w\|=:\inf_{w\in W}f(w)$ where $W$ is the subspace spanned by $\{x_i\}_{i=1}^n$ ($f(w)$ is the distance of $w$ from $y$). Now, given that you have the inf $\delta:=\inf_{w\in W}\|y-w\|$ take a sequence $(z_i)_{i\in\mathbb{N}}$ of points of $W$ such that $f(z_i)\rightarrow\delta$ it is easy to note that this sequence is limitated, so you can take a subsequence $z_{i_k}\rightarrow \overline{z}$ but $W$ is closed, so $\overline{z}\in W$, we can conclude that $f(\overline{z})=\lim_{k\rightarrow\infty}f(z_{i_k})=\lim_{i\rightarrow\infty}f(z_i)=\delta$ by the continuity of the distance (and - hence - of $f$)

Tancredi
  • 1,562
0

Another way (much more simple if you like geometry) is to take the hortogonal to $W$, $W^\bot$ and its translate, $S:=y+W^\bot$. Now $S\cap W=\{\overline{z}\}$ is the minimum distance point of $W$ from $y$, the one you are searching for.

Tancredi
  • 1,562
-1

Let $\delta=\inf \left\{\left\lVert{y-\sum_{i=1}^{n}\alpha_ix_i}\right\rVert\,;\, (\forall i=1,\dots,n)\,\alpha_i\in\mathbb{R}\right\}$ and $S=\text{span}(x_1,\dots,x_n)$.
Notice that $\lVert x-y\rVert=d(x,y)$. Then $\delta=\inf_{v\in S} d(y,v)=d(y,S).$


Proposition: Let $(M,d)$ be a metric space with the property that closed and bounded sets are compact (for instance, any finite dimensional, normed vector space). Let $K\subset M$ be compact and $C\subset M$ closed.

Then there are $x\in C$ and $y\in K$ with $d(C,K)=d(x,y)$.

Proof: Let $(x_n,y_n)\subset C\times K$ be a minimizing sequence for $d(C,K)$, say with $$d(x_n,y_n)<d(C,K)+\frac1n.\tag{*}$$

The existence of such a sequence is guaranteed by the infimum property -- it shows up in the definition of $d(C,K)=\inf_{x\in C,\,y\in K}d(x,y)$.

Because $K$ is compact, there is a subsequence $(x_{n_k},y_{n_k})$ such that $\lim_{k\to\infty}y_{n_k}=y\in K$. In particular, this implies that $\big(d(y_{n_k},y)\big)_{k\in\mathbb{N}}$ is bounded, say by $M\geq 0$. Then:

$$d(x_{n_k},y)<d(x_{n_k},y_{n_k})+d(y_{n_k},y)<d(C,K)+(M+1)$$

With $R=d(C,K)+(M+1)$, it follows that $\big(x_{n_k}\big)_{k\in\mathbb{N}}$ lies in $\stackrel{\sim}{K}=\overline{B(R, y)}\cap C$.

As the intersection of two closed sets, $\stackrel{\sim}{K}$ is closed. Moreover, since $\overline{B(R, y)}$ is bounded, $\stackrel{\sim}{K}$ is also bounded. It follows that $\stackrel{\sim}{K}$ is compact.

Thus, there must be a subsequence $\left(x_{n_{k_j}},y_{n_{k_j}}\right)$ with $\lim_{j\to\infty} x_{n_{k_j}}=x\in\, \stackrel{\sim}{K}\subset C$.

Since $d:M\times M \longrightarrow \mathbb{R}$ is continuous, we have that

$$\lim_{j\to\infty} d\left(x_{n_{k_j}},y_{n_{k_j}}\right)=d(x,y)$$

Moreover, by the defining property $(*)$ of $(x_n,y_n)$, we must have $d(x,y)\leq d(C,K)$. Since $x\in C$ and $y\in K$, and $d(C,K)$ is the infimum over $x\in C$ and $y\in K$, it must be that $d(x,y)=d(C,K)$, which concludes our proof. $\square$.


To see that the proposition applies here, it suffices to consider $M=\text{span}(y,x_1,\dots,x_n)$, $d$ the metric induced by the norm of $X$, $K=\{y\}$ and $C=S$.

Fimpellizzeri
  • 23,321
  • First off, thanks for the reply. Seems Weierstrass is coming into play here. Contains a convergent subsequence, so $\alpha_n$ must be bounded. So $\alpha_n$ converges to an $\alpha$ that minimizes $f$? I have a little background in analysis so I can generally piece this together, but I'm curious now is there any other more intuitive way to approach this? I only ask since the class has no pre-requisites. Leads me to think that either the question does require these foundations, or there's some less foundational way to thinking about it. – Leif Ericson Oct 18 '17 at 05:46
  • Sorry, one other small question. We seem to only be minimizing over the $\alpha_i$ selection and not the $x_i$ selection. Is this because we can get rid of an $x_i$ by setting the corresponding $\alpha_i$ to 0? – Leif Ericson Oct 18 '17 at 05:54
  • Yes, the limit of the subsequence must minimize $f$ -- it must attain the infimum -- because of the inequality condition we used to define it.$${}$$ The 'intuitive' approach would be to visualize $S=\text{span}(x_1,\dots,x_n)$ as an $n$-hyperplane on $X$ and the minimum being the distance from $y$ to $S$ (which we generally picture in terms of orthogonal projections). Notice that ${y}$ is compact and $S$ is closed, so the distance is always realized as $d(y,s)$ for some $s\in S$. – Fimpellizzeri Oct 18 '17 at 06:33
  • In the answer, I'm minimizing over the $\alpha_i$ because that's what you specified the infimum was taken over, in the question, but you could minimize over $v \in S$ in much the same way -- this proof holds in great generality. All we're really need here is that $f$ be continuous and bounded from below. With these hypotheses, we're guaranteed the existence of a minimizing sequence and that this sequence lies in a compact subset of the domain, which in turn guarantees the existence of a convergent subsequence that by construction must converge to the infimum-now-minimum of $f$. – Fimpellizzeri Oct 18 '17 at 06:37
  • small proof of copper.hat's remark- the constant function $f=1$ which is continuous has the whole space as the preimage of the compact set ${1}$. – Calvin Khor Oct 18 '17 at 15:13
  • @copper.hat Yes, this was a gross mistake I will correct. What follows is the closed-ness of $C$ which, by itself, does not solve the problem.. The fact that $f$ looks like 'distance to a point' is important here. – Fimpellizzeri Oct 18 '17 at 16:01
  • @LeifEricson Please do consider the updated answer, and see the comments as to why the previous answer was not correct. – Fimpellizzeri Oct 18 '17 at 16:52