Why does Least Absolute Deviation regression exactly fit n measureemnts for a linear system with n independent variables?

Question

I am applying LAD regression to conduct some research. I have the following questions regarding LAD:

I know LAD exactly fits n measurements for a linear system with n variables. But I cannot easily explain this in a concise way. Would you please show me a simple and easy way to explain it? If a simple mathematical proof is impossible, a small example is also fine for me to illustrate this claim.
Why is LAD a median regression? This question relates to the first one.
Does the exact-fitting priority of LAD only hold for linear systems? Does it still hold for nonlinear systems?

Thanks a lot in advance.

Benson

score 0 · Answer 1 · answered Aug 17 '24 at 06:19

Assuming you refer to the model:

$$ \arg \min_{\boldsymbol{x}} {\left\| \boldsymbol{A} \boldsymbol{x} - \boldsymbol{b} \right\|}_{1} $$

Similar to the way $ \arg \min_{z} \sum_{i = 1}^{N} {\left\| z - {z}_{i} \right\|}_{2}^{2} $ is minimized by the mean of $\left\{ {z}_{i} \right\}$ then $ \arg \min_{z} \sum_{i = 1}^{N} {\left\| z - {z}_{i} \right\|}_{1} $ is minimized by the median of $\left\{ {z}_{i} \right\}$ (See my answer to The Median Minimizes the Sum of Absolute Deviations).
So this concept is borrowed to the linear model when inspecting the residuals of the model.

The ${L}_{1}$ based regression is known to be more robust to outliers.
In case the system is underdetermined, just like Least Squares, the solution can haze zero objective value.

Remark: I am not sure what you mean by exact fitting. In most cases regression is used with a noisy measurements, hence the regression model does not fit the data exactly.

score 0 · Answer 2 · answered Aug 17 '24 at 07:08

Suppose we have two distinct points in the plane $(x_1, y_1)$ and $(x_2, y_2)$ with $x_1\neq x_2$. If we include an intercept term in the regression, then we have $n=2$ sample points and $2$ predictor variables ($1$ and $x$). Clearly, we can fit a straight line through these two points with $0$ residuals. For larger $n$ values, the idea is still the same, i.e., it is possible draw a hyperplane such that all $y$ values lie in it. In other words, if the model spans $\mathbb{R}^n$, then the response as an $n$-vector has to lie within it.
I don't see why this question is related to the 1st.
It depends on whether your non-linear model includes $\mathbb{R}^n$ as a subset. If so, an exact fit is possible.

Why does Least Absolute Deviation regression exactly fit n measureemnts for a linear system with n independent variables?

2 Answers2