Let $\mu$ be a probability distribution over $\mathbb{R}^n$. All functions discussed henceforth are from $\mathbb{R}^n$ to $\mathbb{R}$. Let $l^\ast$ be a linear function and $f$ be a function such $f=l^\ast$ on a set that contains strictly larger than 50% fraction of the probability mass. Show that $l^\ast$ is a minimizer of the $L_1(\mu)$ error $\|f-l\|_{L_1(\mu)}$ over all linear functions $l$.
I tried to prove this for specific $f$'s to get intuition. But, even for specific $f$'s, there is a lot of cases to handle. It didn't look nice at all, and I decided to give up.
I looked at things surrounding the area "robust linear regression" on the internet, but couldn't find an explanation for this problem.
I tried to find inspiration from the various proofs that median minimizes $\ell_1$ error of some finite set of reals. In this direction, I tried to find the point at which gradient of the objective function is equal to zero. Here too, I got to a point where there was a large number of cases to handle, and decided to give up.