If we have a system of linear equations $Ax=b$, then in case $A$ is invertible it easy to say that the solution is $x=A^{-1}b$. In all other cases, even if $A$ is invertible, the solution if it exist will be in the form $x=A^{+}b+(I-A^{+}A)w$, where $A^{+}$is called pseudoinvers of $A$, and $w$ is a vector of free parameters. My question here is how to compute $A^{+}$ for any matrix $A$? what is the way for doing that? It is easy to compute $A^{+}$ if the column are linearly independent (so that $m>=n$), $A^{+}=(A^{T}A)^{-1}A$, and also if the rows are linearly independent (so that $m<=n)$, $A^{+}=A^{T}(AA^{T})^{-1}$, but I dont know how to compute $A^{+}$ if $A$ is sequare non invertible matrix or if the columns or rows are not linearly independents. The above does not works. If anyone have any idea about computing pseudoinverse or if there is easier method for finding the solution $x$ for system of linear equations $Ax=b$, please help me in all cases, and please keep in mind that I want to find the general form for the solution if it exist not just for a particular $b$.
-
1What do you mean "even if $A$ is invertible"? – Vedran Šego Aug 03 '13 at 00:43
-
2In my opinion, singular value decomposition is made for this. – Tunococ Aug 03 '13 at 00:45
3 Answers
Let $A = U \Sigma V^T$ be an SVD of $A$, i.e., $U$ and $V$ are orthogonal and $\Sigma = \mathop{\rm diag}(\sigma_1, \sigma_2, \dots, \sigma_r, 0, \dots, 0)$ is real diagonal with $\sigma_k > 0$ for all $k=1,\dots,r$. Then
$$A^+ = V \Sigma^+ U^T,$$
with
$$\Sigma^+ = \mathop{\rm diag}(\sigma_1^{-1}, \sigma_2^{-1}, \dots, \sigma_r^{-1}, 0, \dots, 0).$$
- 11,592
-
Thanks Vedran, but can you make it more clear by explaining how to find $U,\segma , and V$, please? I dont have a good idea about finding them. – LoveMath Aug 03 '13 at 01:26
-
SVD is computed via various iterative algorithms. It's not something you do by hand. Instead, you'll most likely have to rely on functions in your programming environment (probably Matlab, Mathematica, Python,...). As far as I know, there is no "do it with pen and paper" way to find $A^+$ for a general matrix $A$. – Vedran Šego Aug 03 '13 at 01:47
-
But I don't think that you can invert the diagonal matrix if it has 0's on its diagonal right? – profPlum Sep 30 '21 at 22:18
-
@profPlum, the question is not about inverse but pseudo-inverse. Those always exist. For Moore-Penrose, they are also unique (not sure about other pseudo-inverses): https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse#Existence_and_uniqueness – Vedran Šego Oct 01 '21 at 13:08
-
@VedranŠego I don't think this is correct, there is an algorithmic way of computing the pseudo-inverse by hand, see my answer below. – Felix Leditzky Apr 29 '25 at 12:55
The reason that the SVD is commonly used for the Pseudoinverse is because of the unitary operations. The SVD uses unitary operations only, which gives that the final results have the same error as any number in the environment, the machine error. This means that the small numbers may as well be treated as zero. This is generally useful. But it comes at the cost of being an iterative algorithm; there is no direct formula for the SVD.
However, if you have some reason to believe that the matrix is well conditioned (for its dimension whether it is square or not), or if you are using exact arithmetic, there is a direct (left inverse) formula: $$A^+ = A^\dagger =(A^\top A)^{-1}A^\top$$ I have written an algorithm here which avoids the extra matrix multiplications that the formula implies. I consider the algorithm as the Gaussian elimination equivalent for the non-square matrix. It operates on each column of the original matrix in turn, and a zero appears when the current column is already within the span of the previous columns. By comparison, the Gaussian elimination reaches this state when a column of all zeros appears. The difference in my algorithm though is that the column may just be ignored while noting that the dimensionality is reduced by one along with it.
For keeping track of what kind of errors could be in your problem, it is imperative to use something like the SVD. Thus if the results ever seem off when using my algorithm, it is likely related to such errors, as would happen when the matrix is ill conditioned.
-
What if $(A^TA)$ is singular? For example, if $A= \begin{pmatrix} 1&1&0\2&2&0\0&0&1 \end{pmatrix}$ then $A^TA = \begin{pmatrix} 5&5&0\5&5&0\0&0&1 \end{pmatrix}$, singular. – Jay Dec 02 '18 at 10:17
-
1@Jay the formula is then of course undefined. I believe my algorithm I linked here could be modified to get some useful information on the span that $A$ does encompass, but of course the inverse must be defined in order to use the inverse in any way. – adam W Jul 07 '19 at 23:57
The way we teach this in our (real) linear algebra course is as follows:
- Given an $(m\times n)$-matrix $A$ over the reals of rank $r$, we first find the compact singular value decomposition:
- $A^TA$ is an $(n\times n)$-positive semidefinite matrix and hence by the spectral theorem there exists an orthonormal basis for $\mathbb{R}^n$ consisting of eigenvectors of $A^TA$.
- The rank of $A^TA$ is the same as the rank of $A$, so there are $r$ strictly positive eigenvalues $\lambda_i$, which we order by magnitude in non-increasing order, $\lambda_1\geq \lambda_2\geq\dots\geq \lambda_r > 0$.
- We define the $(n\times r)$-matrix $V_c$ whose $i$-th column $v_i$ is the eigenvector corresponding to eigenvalue $\lambda_i$. This is an isometry (i.e., $V_c^TV_c = I_n$) since the eigenvectors are orthonormal.
- We define the singular values $\sigma_i = \sqrt{\lambda_i}$ for $i=1,\dots,r$, and set $\Sigma_c = \operatorname{diag}(\sigma_1,\dots,\sigma_r)$.
- For $i=1,\dots,r$, we define vectors $u_i = \dfrac{1}{\sigma_i} A v_i$. These are automatically orthonormal, which can easily be checked using properties of the inner product. We define the $(m\times r)$-matrix $U_c$ whose $i$-th column is the vector $u_i$. This is again an isometry, $U_c^TU_c=I_m$.
- The matrices constructed this way give the compact singular value decomposition, $$A = U_c \Sigma_c V_c^T.$$
- We now define the pseudo-inverse of $A$ as $$A^+ = V_c \Sigma_c^{-1} U_c^T.$$ Note that $\Sigma_c$ is an $(r\times r)$-diagonal matrix with strictly positive elements $\sigma_i$ on its diagonal, so $$\Sigma_c^{-1} = \operatorname{diag}(\sigma_1^{-1}, \dots,\sigma_r^{-1}).$$
You can check that the matrix $A^+$ satisfies $AA^+A = A$, or in other words, $AA^+$ acts like the identity on the column space of $A$.
Moreover, if $A$ has linearly independent columns, then the $A^+$ defined via the above procedure satisfies $A^+ = (A^TA)^{-1}A^T$. If $A$ has linearly independent rows, then $A^+ = A^T(AA^T)^{-1}$.