Why does solving $A^{\rm T}Ax = A^{\rm T}b$ yield a least squares approximation?

Question

I was following a linear algebra course, and I came upon an example where a linear regression was done by solving $A^{\rm T}Ax = A^{\rm T}b$, where $Ax = b$ could not be solved because $b$ is not in the column space of $A$. So the formula $A^{\rm T}Ax = A^{\rm T}b$ is derived as the projection of $b$ onto the column space of $A$, and solved for as an approximation.

What I am confused about is where during this derivation this became a least squared approximation. How would I formulate the problem in a similar way (as a linear algebra problem), but minimize linear distance instead? (edited, from Euclidean distance which was improper use of terminology)

What I mean to ask is when fitting a line to a set of points, if I wanted to minimize the absolute error, not the squared error, how would I formulate the problem?

It is minimizing Euclidean distance. The points to remember is that $Ax=0$ means $x$ is orthogonal to every row of $A$, hence to the whole row space. Changing to $A^T$ gives similar statement about column space. So the Nullspace of $A^T$ and the column space of $A$ are two subspaces that are orthogonal to each other. Now you can work out the connection. — P Vanchinathan, Nov 08 '18 at 03:09
$x=(A^{\rm T}A)^{-1} Ab$ is such that the distance $||b-Ax||_2$ is minimal. — Rócherz, Nov 08 '18 at 03:12
Possible duplicate of Why is minimizing least squares equivalent to finding the projection matrix $\hat{x}=A^Tb(A^TA)^{-1}$? — Théophile, Nov 08 '18 at 03:19
Question edited. I meant to ask, how could I minimize the absolute error, not the squared error when fitting a line to a set of points for example? — Jay Parthasarthy, Nov 08 '18 at 04:05

Michael Anderson · Answer 1 · 2018-11-08T03:31:16.437

2

Given an approximate solution $x$ we can define the squared error in the approximation as $E(x) = \| A x - b \|^2 = (Ax-b)^T(Ax-b)$.

Finding the $x$ that minimises this "error" us the least squares solution.

$E$ has gradient $\nabla E(x) = 2 A^T(Ax-b)$. Also $E$ has a minimum when $\nabla E = 0$. i.e. when $A^TAx-A^Tb = 0$ - which is exactly the condition you want.

edited Nov 08 '18 at 03:31

answered Nov 08 '18 at 03:12

Michael Anderson

526

@ccorn too right - fixed. I've been using $E = \frac{1}{2}\Delta^2$ too much lately... – Michael Anderson Nov 08 '18 at 03:32

score 1 · Answer 2 · answered Nov 08 '18 at 06:07

You can think it through a geometric way. b is certainly not in the column space, but since the left-null space is the orthogonal component of the column space. We can divide b into two parts: b= b(column part) +b(left-null part). since the multiplication of the transpose of A and b(left-null part) is 0 , we can get Ax=b(column part). And b(column part) is the projection from b to the column space, which is closest to b

Why does solving $A^{\rm T}Ax = A^{\rm T}b$ yield a least squares approximation?

2 Answers2