1

Per the title, I am a visual learner and would appreciate some conceptual explanation as for why that equation works and is equivalent to $Ax = b$ for the least squares approximation.

If the answer could be related to linear transformations, even better.

One reason I ask is also because to solve an ordinary equation $Ax=b$, we can multiply both sides by the inverse of A, whereas for least squares, we multiply by A transpose. I understand the conceptual intuition of the former, but am hoping for one of the latter. Are the two connected perhaps?

This equation is very relevant for those hoping to apply Linear Algebra to statistics so a satisfying answer would be really appreciated.

Thanks!

1 Answers1

1

Here is how I think about it:

If $y=Ax$ is the element in the column space of $A$ that is closest to $b$, then the displacement vector $b-y$ must be perpendicular to all of $\text{Col}(A).$ This isn't too difficult to visualize with a quick sketch.

This means $$(b-y)\cdot Ac=0$$ for all vectors $c$. This is equivalent to saying $$c\cdot (A^Tb-A^TAx)=0$$ Since this is true for all vectors $c$ we must have $A^Tb-A^TAx=0$.

  • How did you get from the first equation to the second?

    Also, follow up question, since we can analytically solve equation Ax = b by multiplying both sides by A inverse, what's the relationship between A transpose and A inverse (since when we solve for least squares, we multiply both sides by A transpose)?

    – AviPraMar Jul 29 '23 at 21:44
  • 1
    @AviPraMar The point is that $A$ is not invertible – Andrew Jul 29 '23 at 22:01