Here we show that $x=A^+b$, where $A^+$ is the Moore-Penrose pseudo inverse $A^+$ of a matrix $A$ solves the ordinary least square problem (OLS):
$$x=\operatorname{arg}.\min\|Ax-b\|_2$$
where $A\in M(\mathbb{C},m, n)$ and $b\in\mathbb{C}^n$, and that if $x'$ is another solution, then $\|A^+b\|_2\leq\|x'\|_2$ (This is one reason that makes $A^+$ special).
Notice that if $b\in \operatorname{range}(A)$, then $x_*=A^+b$ is a solution to $Ax=b$, and has the smallest $\|\;\|_2$-norm.
First some linear algebra facts:
Using simple linear algebra, we now that if $R$ is a linear subspace of $\mathbb{C}^n$, then
- (A) The orthogonal projection $P_R:\mathbb{C}^n\rightarrow R$ defined for each $\mathbf{y}\in \mathbb{C}^n$ as a vector $P_r\mathbf{y}\in R$ such that
$$\|P_R\mathbf{y}-\mathbf{y}\|_2\leq \|\mathbf{z}-\mathbf{y}\|_2,\qquad \mathbf{z}\in R$$
exists and is unique.
- (B) Furthermore, for each $\mathbf{y}\in\mathbb{C}^n$, $P_R\mathbf{y}$ is the unique vector that satisfies
$$\mathbf{y}- P_R\mathbf{y}\perp R$$
that is, $\langle \mathbf{y}- P_R\mathbf{y},\mathbf{z}\rangle=0$ for all $\mathbf{z}\in R$ where $\langle\cdot,\cdot\rangle$ is the standard inner product in $\mathbb{C}^n$ (hence the name orthogonal projection).
Existence of solutions to the OLS
Consider the linear space $R:=\{A\mathbf{w}:\mathbf{w}\in\mathbb{C}^m\}$, that is, the range of $A$. The geometrical digression made above shows that there is exist a unique vector $\hat{\mathbf{y}}\in R$ such that
$$\widehat{\mathbf{y}}=\operatorname{arg.min}_{\mathbf{z}\in R}\|\mathbf{y}-\mathbf{z}\|_2$$
namely the orthogonal projection of $\mathbf{y}$ onto $R$, $\widehat{\mathbf{y}}=P_R\mathbf{y}$. In particular, this shows that there is at least one $\boldsymbol{\omega}\in\mathbb{C}^m$ such that $\widehat{\mathbf{y}}=A\boldsymbol{\omega}$. In fact, for any solution $\mathbf{w}_H$ to the homogeneous equation
$A\mathbf{w}=\mathbf{0}$ we have that $\widehat{\mathbf{y}}=A(\boldsymbol{\omega}+\mathbf{w}_H)$.
The problem is now reduced to finding an $\boldsymbol{\omega}\in \mathbb{C}^m$ for which
$$\begin{align}
\widehat{\mathbf{y}}=A\boldsymbol{\omega}\tag{1}\label{one}
\end{align}$$
Now, as the range of the linear transformation $\mathbf{w}\mapsto A\mathbf{w}$ is generated by the columns of $A$, statement (B) shows that $\widehat{\mathbf{y}}=A\boldsymbol{\omega}$ is the unique vector in $\mathbb{C}^n$ satisfying
$$\begin{align}
A^*(\mathbf{y}-A\boldsymbol{\omega})=\mathbf{0}\tag{2}\label{two}
\end{align}$$
where $A^*$ is the conjugate transpose of $A$ (when $A\in M(\mathbb{R},m.n)$ then $A^*=A^\intercal$). Equivalently, any $\boldsymbol{\omega}\in\mathbb{C}^m$ satisfying \eqref{one} (which we have already established exists) satisfies the equation
$$\begin{align}
A^* A\boldsymbol{\omega}=A^*\mathbf{y}\tag{2'}\label{twop}
\end{align}$$
If the rank of $A$ is $m$ (i.e. $n\geq m$ and the columns of $A$ are linearly independent), then $A^*A$ is a $m\times m$ invertible matrix and so, there is a unique solution
$$\boldsymbol{\omega}=(A^* A)^{-1}A^*\mathbf{y}$$
Conversely, if $A^* A$ is invertible, then the rank of $A$ is $m$.
In any other case, there will be infinitely many solutions to \eqref{two} (still, $\widehat{\mathbf{y}}=P_R\mathbf{y}$ is unique) since then $\operatorname{rank}(A^* A)<m$ and so, the homogeneous equation $A^* A\mathbf{w}=\mathbf{0}$ and thus, the homogeneous equation $A\mathbf{w}=\mathbf{0}$ will have infinitely many solutions.
Solutions to \eqref{twop} with different properties (minimal norm for example)can be obtained by appealing to generalized inverse matrices.
Minimal solution to the OLS problem and the Moore-Penrose pseudoinverse:
Recall that for any $A\in M(\mathbb{C},m, n)$, matrix $X\in M(\mathbb{C}, n, m)$ is a generalized inverse matrix of $A$ if
$$AXA=X$$
Generalized inverse matrices exists and, when $A$ is invertible, there is only one such a matrix, namely $X=A^{-1}$.
Amongst all generalized inverse matrices os a matrix $A$, there is very special one, denoted by $A^+$, which satisfies
$$\begin{align}
AXA&=A\\
XAX&=X\\
(AX)^*&=AX\\
(XA)^*&=XA\end{align}$$
In terms of the singular value decomposition (SVD), if
$A= VDU$, where $V$, $U$ are othogonal matrices and $D$ has zeroes entries except on the main diagonal (which contains the nonnegative square roots of the matrix $A^*A$), then is is easy to see that $U^*D^+V^*$ is the Moore-Penrose pseudo inverse $A^+$ of $A$. It can also be shown that
$$A^+=(A^* A)^+A^*$$
There are different methods to show that $A^+$ exists and is unique! This is the Moore-Penrose inverse of $A$. The most efficient method to estimate $A^+$ is through eigenvalue methods, for example the singular value decomposition. Many computer packages (R, Octave, etc) estimate $A^+$ this way (usually coded as pinv(A))
The most important property of $A^+$ is that following:
Theorem: the minimal norm solution to the problem
$$\begin{align}
\operatorname{min}_{\mathbf{x}\in\mathbb{R}^n}\|A\mathbf{x}-\mathbf{b}\|_2\tag{3}\label{three}
\end{align}
$$
is $A^+\mathbf{b}$. That is,
$$A^+\mathbf{b}=\operatorname{arg.min}\{\|x\|_2: x=\operatorname{arg.min}\|Ax-b\|_2\}
$$
Here is a short proof:
Suppose $r=\operatorname{rank}(A)$, and let $A=VDU$ be a singular decomposition of $A$, so that $D\in M(\mathbb{C},m, n)$ has diagonal $\sigma_\geq\sigma_2\geq\ldots\geq\sigma_r>0$. $V$ and $U$ are orthogonal matrices that preserve the distances in $\mathbb{C}^n$ and $\mathbb{C}^m$ respectively (or $\mathbb{R}^n$ and $\mathbb{R}^m$ if you are dealing with real spaces only), that is
$$\| Vy\|_2=\|y\|_2,\qquad \|Ux\|_2=\|x\|_2$$
for all $y\in\mathbb{C}^n$ and $x\in\mathbb{C}^m$. Thus
$$\begin{align}
\rho=\inf_x\|Ax-b\|_2&=\inf_x\|VDU-b\|_2=\inf_x\|V^*(VDUx-b)\|_2\\
&=\inf_x\|DUx-V^*b\|_2=\inf_{y}\|Dy-c\|_2
\end{align}
$$
Notice that
$$\|Dy-c\|^2_2=\sum^r_{k=1}|\sigma_ky_k-c_k|^2+\sum^m_{k=r+1}|c_k|^2$$
This is minimize when $y_k=c_k/\sigma_k$ for $1\leq k\leq r$ and by letting the remaining $y_{r+1},\ldots,y_n$ to be arbitrary. Therefore
$$\rho^2=\sum^m_{k=r+1}|c_k|^2$$
Amongs all vectors $y$ that minimize \eqref{three}, the one with $y_{r+1}=\ldots y_m=0$ has minimum Euclidean norm. Clearly, this vector is given by
$$ y=D^+c$$
where $D^+\in M(\mathbb{R},n, m)$ is the Moore-Penrose pseudo inverse of $D$, and is given by the matrix that has the $n$-vector $[\sigma^{-1}_1,\ldots,\sigma^{-1}_r,0,\ldots,0]$ ins the main diagonal and zeroes everywhere else.
Putting this together, the minimal norm solution to the minimum square problem is
$$x=U^*y=U^*D^+c=U^*D^+V^*b=A^+b$$
Summary:
As we can see, if $(A^* A)^g$ is a generalized inverse of $A^*A$, then
$$\boldsymbol{\omega}=(A^* A)^gA^* \mathbf{y}$$
is a solution to \eqref{one}. Unless $A^* A$ is invertible, there are infinitely many generalized inverse of $(A^* A)$ and thus, many solution to \eqref{one}. (The kernel $A$ provides the infinitely many solutions)
The particular choice $(A^* A)^+$ however provides the minimal norm solution $\boldsymbol{\omega}$ to \eqref{one}, namely
$$\widehat{\boldsymbol{\omega}}=(A^* A)^+A^*\mathbf{y}=A^+\mathbf{y}$$
This is typically the solution that is implemented in OLS algorithms.
When $A$ is of full rank, $(A^* A)^+=(A^*A)^{-1}$ and in this case $\hat{\boldsymbol{\omega}}=A^+\mathbf{y}=(A^* A)^{-1}A^*\mathbf{y}$ is the unique solution to \eqref{one}
Refereces:
- Albert, A. Regression and the Moore-Penrose pseudo inverse, AP New York, 1972.
- Rao, C. R. and Mitra, S. K. Generalized Inverse Matrices and its Applications, John Wiley and Sons, New York, 1971
- Cheney, W. and Kincaid, D. Numerical Analysis, Brooks/Cole Publishing, Pacific Groove California, 1991.
\Lambda) instead: $\wedge$ is typeset as a binary operator, and more familiar to people that way. I know that when I read that question, I first assumed you were defining some kind of exterior algebra of matrices. – Misha Lavrov May 01 '22 at 14:40Edits. Just refine the question, incorporating what you learned from the comments, and then flag the orphaned comments as no longer needed. Comments are ephemeral. – Rodrigo de Azevedo May 02 '22 at 07:36