1

I am trying to find a linear regression for the problem:

$$\displaystyle\arg\min_w\|y-Xw\|^2 $$

By finding the optimum of the above equation, I get

$$\displaystyle X^TXw=X^Ty $$

In the case where $X^TX$ is invertible (i.e. the variables are independent), I can get the unique solution

$$\displaystyle w=(X^TX)^{-1}X^T $$

However, when the variables are dependent, there's more than one unique solution.

Now, say I want to find a solution with minimal $l_2$ norm. I can define the new problem as:

$$\displaystyle\begin{align}\arg&\min_w\|w\| \\ &s.t. X^TXw=X^Ty \end{align}$$

How can I now use SVD decomposition ($X=U\Sigma V^T$) to solve the above optimization problem?

Solving with lagrange method:

I tried optimizing the equivalent $0.5\|w\|^2$, and got the following Lagrangian:

$$ \mathcal{L}(w,\alpha)=0.5\|w\|^2+\alpha(X^Ty-X^TXw) $$

When the gradient w.r.t $w$ is equal to 0, I get:

$$w = \alpha X^TX\\ X^Ty=X^TXw $$

But couldn't proceed from here

  • Maybe you can write the KKT conditions. – LinAlg Jan 16 '17 at 15:29
  • @LinAlg Thanks, I added some more of what I tried. – user407363 Jan 16 '17 at 17:36
  • $\alpha$ is a vector, so in the Lagrangian you get $\alpha^T$, and finally you arrive at $w = X^TX \alpha$. Plugging this into the second equation you get $X^T y = X^TX X^T X \alpha$. Not sure how to proceed. – LinAlg Jan 16 '17 at 17:44

1 Answers1

5

Let the SVD of $\mathrm X \in \mathbb R^{n \times p}$ be

$$\mathrm X = \mathrm U \Sigma \mathrm V^{\top} = \begin{bmatrix} \mathrm U_1 & \mathrm U_2\end{bmatrix} \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm V_1^{\top}\\ \mathrm V_2^{\top}\end{bmatrix}$$

The eigendecomposition of $\mathrm X^{\top} \mathrm X$ is, thus,

$$\mathrm X^{\top} \mathrm X = \mathrm V \Sigma^{\top} \mathrm U^{\top} \mathrm U \Sigma \mathrm V^{\top} = \mathrm V \Sigma^{\top} \Sigma \mathrm V^{\top} = \begin{bmatrix} \mathrm V_1 & \mathrm V_2\end{bmatrix} \begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm V_1^{\top}\\ \mathrm V_2^{\top}\end{bmatrix}$$

Hence, the normal equations

$$\boxed{\mathrm X^{\top} \mathrm X \, \mathrm w = \mathrm X^{\top} \mathrm y}$$

can be written as follows

$$\mathrm V \begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \mathrm V^{\top} \mathrm w = \mathrm V \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \mathrm U^{\top} \mathrm y$$

Let $\mathrm z := \mathrm V^{\top} \mathrm w$. Left-multiplying by $\mathrm V^{\top}$,

$$\begin{bmatrix} \Sigma_1^2 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm z_1\\ \mathrm z_2\end{bmatrix} = \begin{bmatrix} \Sigma_1 & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm U_1^{\top} \mathrm y\\ \mathrm U_2^{\top} \mathrm y\end{bmatrix}$$

Let $r := \mbox{rank} (\mathrm X)$. Thus,

$$\begin{bmatrix} \mathrm I_r & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm z_1\\ \mathrm z_2\end{bmatrix} = \begin{bmatrix} \Sigma_1^{-1} & \mathrm O\\ \mathrm O & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm U_1^{\top} \mathrm y\\ \mathrm U_2^{\top} \mathrm y\end{bmatrix} = \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \mathrm 0_{p-r}\end{bmatrix}$$

which always has a solution. Note that $\mathrm z_2$ is free. Since $\mathrm w = \mathrm V \mathrm z = \mathrm V_1 \mathrm z_1 + \mathrm V_2 \mathrm z_2$, the solution set of the normal equations is the $(p-r)$-dimensional affine space parameterized as follows

$$\left\{ \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y + \mathrm V_2 \eta \mid \eta \in \mathbb R^{p - r} \right\}$$

Note that

$$\| \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y + \mathrm V_2 \eta \|_2^2 = \Bigg\| \mathrm V \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \eta\end{bmatrix} \Bigg\|_2^2 = \Bigg\| \begin{bmatrix} \Sigma_1^{-1}\mathrm U_1^{\top} \mathrm y\\ \eta\end{bmatrix} \Bigg\|_2^2 = \| \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y \|_2^2 + \| \eta \|_2^2$$

which is minimized when $\eta = 0_{p-r}$. Thus, the least-norm solution is simply

$$\boxed{\mathrm w^* := \mathrm V_1 \Sigma_1^{-1} \mathrm U_1^{\top} \mathrm y}$$

  • Thanks so much! Can you please be more specific about the dimensions of $U_1$, $U_2$, $V_1$, $V_2$, $\hat{\Sigma}$ etc.? It got me a bit confused. – user407363 Jan 18 '17 at 09:12
  • OK, I think I've got it. $U_1$ is the $r$ first columns of $U$, $U_2$ being the rest. Same for $V_1$, $V_2$. $\hat{\Sigma}$ is the diagonal of size $r$ of $\Sigma$. Seems to work out now. Thanks again. – user407363 Jan 18 '17 at 09:25