I was following a linear algebra course, and I came upon an example where a linear regression was done by solving $A^{\rm T}Ax = A^{\rm T}b$, where $Ax = b$ could not be solved because $b$ is not in the column space of $A$. So the formula $A^{\rm T}Ax = A^{\rm T}b$ is derived as the projection of $b$ onto the column space of $A$, and solved for as an approximation.
What I am confused about is where during this derivation this became a least squared approximation. How would I formulate the problem in a similar way (as a linear algebra problem), but minimize linear distance instead? (edited, from Euclidean distance which was improper use of terminology)
What I mean to ask is when fitting a line to a set of points, if I wanted to minimize the absolute error, not the squared error, how would I formulate the problem?