3

The problem of finding $$ \substack{{\rm min}\\x}\left( \|Ax-b\|^2_2+\lambda \|x\|_1\right),$$ where $\|\cdot\|_2$ and $\|\cdot\|_1$ are the $L_2$ and $L_1$ norms, respectively, is usually called the LASSO. $A$ is a matrix, $x$ and $b$ are vectors.

I believe this problem is robust against introducing some redundancy in the matrix $A$, by introducing for example in $A$ an extra column which is a linear combination of the original columns. Since the solution to LASSO is sparse, I expect this procedure does not change it much, i.e. the solution to $$\substack{{\rm min}\\y}\left( \|\tilde{A}y-b\|^2_2+\lambda \|y\|_1\right),$$ should be very close to the original one.

Can this be proved? Can I find an upper bound to how much the solution changes?

Royi
  • 10,050
thedude
  • 1,907

2 Answers2

3

It's enlightening to think about what happens if $A$ contains two identical columns, $a_i$ and $a_j$. In that case, the objective function cannot distinguish between a solution with $x_i = 1, x_j = 0$ and a solution with $x_i = \theta, x_j = 1 - \theta$ (where $0 < \theta < 1$). It seems most likely that if either $x_i$ or $x_j$ is active in the solution found by your LASSO solver, then both will be active. In other words, the solution you find will not be as sparse as it could have been. For that reason, it can sometimes be helpful to use a non-convex sparsity-inducing regularizer, such as $\sum_i |x_i|^{1/2}$.

littleO
  • 54,048
  • But two identical columns is a very special case. In a more generic case, where the new column is a linear combination of at least two original columns, and they are normalized, I think maybe the solution is stable – thedude Dec 20 '19 at 16:25
  • Because $\theta_1+\theta_2-\sqrt{\theta_1^2+\theta_2^2}>0$ for positive parameters – thedude Dec 20 '19 at 16:28
2

Take $A=I$, $b = \mathbf{1}$ and $\lambda = 1$. For sake of simplicity, let $A$ be a 3-by-3 matrix and $\bar{A} = [A | \mathbf{1}]$. The solution to

$$\min_{x} (||Ax-b||_2^2 + ||x||_1)$$

can shown to be $x = [0.5; 0.5; 0.5]$. Of course, if you increase $\lambda$ it will tend toward the zero vector. Now, check

$$\min_{y} (||\bar{A}y-b||_2^2 + ||y||_1).$$

Especially compare the cases $y = [0.5; 0.5; 0.5; 0]$ to $y = [0; 0; 0; 1]$. The latter returns a much smaller value than the former ($1$ ve $2.25$). And I wouldn't say, $y(1:3) = [0; 0; 0]$ is close to $x_{\min} = [0.5; 0.5; 0.5]$. You can always crank up $\lambda$ to make them closer, but at that point, you are solving the wrong problem.

  • So the difference between the solutions depends on $\lambda$. No problem (sparsity is to be expected for large $\lambda$ anyway). Can this be quantified? – thedude Dec 20 '19 at 16:24
  • As $\lambda$ grows, the contribution of $||x||_1$ to the cost function grows. This can be interpreted as "encouraging sparsity" but REALLY remind yourself that you are looking for a solution to $Ax=b$! That aside, I remember that there are some results in Tibshirani's 1996 paper, but they may be experimental. It has been a while since I read that. – Abdullah Ali Sivas Dec 20 '19 at 16:37