why $x = \mathbf A^{\dagger}b$ is the one that minimizes $|x|$ among all mimizers of $|\mathbf Ax - b|$

Question

for arbitrary matrix $\mathbf A\in \mathbb R^{m \times n}$ and $rank(\mathbf A) = r$, solve the least squares: $$\min \|\mathbf Ax - b\|_2. $$ According to SVD, pseudo inverse of $\mathbf A$ is $$\mathbf A^{\dagger}=\mathbf{V} \mathbf \Sigma^{\dagger} {\mathbf U^T}$$ where $$\mathbf \Sigma^{\dagger}= \begin{bmatrix} \mathbf \Sigma_{r \times r}^{-1} & \mathbf 0 \\ \mathbf 0 & \mathbf 0 \end{bmatrix}_{n \times m}.$$ I get $x = \mathbf A^{\dagger}b$ is the canonical solution $(A^TA)^{-1}A^Tb$ for linear least squares when $m=n=r$. But don't know why it is the solution that minimizes $|x|$.

My favorite definition of the pseudoinverse is that it is the mapping that takes a vector $b$ as input and returns as output the least norm solution to $Ax= \hat b$, where $\hat b$ is the projection of $b$ onto the range of $A$. — littleO, Dec 04 '18 at 06:14

score 3 · Accepted Answer · answered Dec 04 '18 at 06:37

The column space of $A$ is the same as the span of the first $r$ columns of $U$; let $U_r$ be this $m \times r$ matrix. So the projection of $b$ onto the column space of $A$ is $\hat{b} := U_r (U_r^\top U_r)^{-1} U_r^\top b = U_r U_r^\top b$.

If $x$ is a solution to the optimization problem, then $Ax = \hat{b}$.

Thus, we can consider a second optimization problem: minimize $\|x\|_2$ subject to $Ax = \hat{b}$.

First, we check that $x := A^\dagger b$ is feasible, i.e. $A A^\dagger b = \hat{b}$. $$AA^\dagger b = U \Sigma V^\top V \Sigma^\dagger U^\top = U \begin{bmatrix} I_{r \times r} \\ & 0_{m-r \times m-r}\end{bmatrix} U^\top b = U_r U_r^\top b = \hat{b}.$$

Next we justify that it is minimum norm. Note that all other feasible $x$ can be written as $x = A^\dagger b + z$ for some $z$ in the nullspace of $A$. Note that the nullspace of $A$ is the same as the span of the last $n - r$ columns of $V$. On the other hand, $A^\dagger b = V\Sigma^\dagger U b$ lies in the span of the first $r$ columns of $V$. Thus $A^\dagger b$ and $z$ are orthogonal and we have $$\|x\|_2^2 = \|A^\dagger b\|_2^2 + \|z\|_2^2.$$ Thus choosing $z = 0$ minimizes $\|x\|_2$, so $A^\dagger b$ is the minimum norm solution.

why $x = \mathbf A^{\dagger}b$ is the one that minimizes $|x|$ among all mimizers of $|\mathbf Ax - b|$

1 Answers1