Per the title, I am a visual learner and would appreciate some conceptual explanation as for why that equation works and is equivalent to $Ax = b$ for the least squares approximation.
If the answer could be related to linear transformations, even better.
One reason I ask is also because to solve an ordinary equation $Ax=b$, we can multiply both sides by the inverse of A, whereas for least squares, we multiply by A transpose. I understand the conceptual intuition of the former, but am hoping for one of the latter. Are the two connected perhaps?
This equation is very relevant for those hoping to apply Linear Algebra to statistics so a satisfying answer would be really appreciated.
Thanks!