Consider $$ \min_{\mathbf{w} \in \mathbb{R}^d} \|\mathbf{X}^T\mathbf{w}\|_1 \qquad\text{subject to } \quad \|\mathbf{w}\|_2^2=1, $$ where $\mathbf{X}\in\mathbb{R}^{d\times m}$ is a set of $d$-dimensional points and $m > d$. How can I solve this problem?
- 5,024
-
Since $$| X^T w |1 = \sum{k = 1}^{m} \left| \sum_{j = 1}^{d} x_{j k} w_j \right|,$$ can we maybe just look at each summand separately, that is, solve the problem for $m = 1$: $$\min_{w \in \mathbb R^d} | x^T w | \quad \text{subject to} \quad | w |_2^2 = 1$$ or am I missing something? – ViktorStein Mar 07 '22 at 13:08
-
1I think that the problem posed in the above comment can be solved with the following strategy: suppose $x^T w > 0$, then $\partial_w | x^T w | = x$, so we would need $x = 0$ for optimality, which contradicts $x^T w > 0$. The same goes for $x^T w < 0$, so we have to look at the case where $ | x^T w | = 0$, that is, $x^T w = 0$, and at this point ($0$), the absolute value is not differentiable. So we have to do something else. – ViktorStein Mar 07 '22 at 21:36
2 Answers
This isn't a completely rigorous proof, but starting from Ramanujan's comment that this is equivalent to finding
$$\min \sum_{k = 1}^{m} \left| \sum_{j = 1}^{d} x_{j k} w_j \right|\qquad \text{subject to }\sum_{j=1}^dw_j^2=1$$
it helps to think about it somewhat geometrically. The level curves of the objective function are polytopes in $d$-dimensional space. If the objective curve was equal to the minimum, it would only intersect the $(d-1)$-sphere where as many of $\sum_{j = 1}^{d} x_{j k} w_j$ are zero as possible (at corners). Specifically, there would be $d-1$ of these equal to $0$. Then from all these possible options, one arrangement would yield the minimum.
So in order to find the minimum, choose $d-1$ columns of $\mathbf{X}$ to have the dot product with $\mathbf w$ be $0$ (i.e. $\sum_{j = 1}^{d} x_{j k} w_j=0$ for $d-1$ values of $k$). Then you have $d$ equations with $\sum_{j=1}^dw_j^2=1$ and $d$ variables, so you can solve** for $\mathbf{w}$. In fact, it would be a set of linear equations along with $\sum_{j=1}^dw_j^2=1$, so you could use whatever linear algebra technique you like to solve for $\mathbf{w}$.
Since there are $\binom{m}{d-1}$ columns you can choose, this gives you just as many options for $\mathbf{w}$, so the answer would be the value of $\mathbf{w}$ that mimimizes the objective function.
**: It might not be that it can be solved for if the columns of $\mathbf{X}$ are dependent, but if they are, you could delete the column that can depend on others and adjust the values in those other columns. So no matter what, the problem can be made into the case for when columns of $\mathbf{X}$ are linearly independent.
- 9,780
-
-
@Ramanujan Indeed, so I think it would be reduced to when $m=d$. – Varun Vejalla Mar 16 '22 at 03:49
We can use a standard trick to get rid of the nondifferentiability. The objective is $\|X^Tw\|_1=\sum_j|w^Tx_j|$. Each term can be written as $\min t_j$ s.t. $-t_j\le w^Tx_j\le t_j$. So the problem is equivalent to $$\min_{t,w} 1^Tt\\\text{s.t. }\|w\|_2=1,\\-t\preceq X^Tw\preceq t.$$ The problem can now be passed to a standard nonlinear programming routine.
- 61
-
Cool, I didn't know that trick! Does this make it easier to find a closed form solution? – ViktorStein Mar 14 '22 at 12:15
-
I define the Lagrangian $$L \colon \mathbb{R}^m\times \mathbb{R}^d\times \mathbb{R}\times\mathbb{R}^m \times \mathbb{R}^m \to \mathbb{R}, \qquad (t, w, \lambda, u, v) \mapsto 1^T + \lambda( | w |_2^2 - 1) + u^T (t - X^T w) + v^T(X^T w + t),$$ whose derivative is $$\partial L(t, w, \lambda, u, v) = \begin{pmatrix} 1 - u + v \ 2 \lambda w + X(v - u) \ | w |_2^2 - 1 \ t - X^T w \ X^T w + t \end{pmatrix}.$$ Setting this zero yields $$- X^T w = t = X^T w$$ and $$- 2 \lambda w = X (v - u) = X 1$$ and thus $$w = -\frac{1}{2 \lambda} X 1_m$$ and thus $$X^T w = -\frac{1}{2 \lambda} X^T X 1_m.$$ – ViktorStein Mar 15 '22 at 09:51