Group lasso with weighted parameters and L0 norm penalty

Question

I have explored the following hard problem for a long time. I need some help for the (possibly) final steps. Specifically, \begin{equation}\tag{1} \min_{\mathbf{x}\in\mathbf{R}^n}\left\{ f(\mathbf{x}):= \frac{1}{2}\|\mathbf{x}-\mathbf{v}\|_2^2 + \lambda\|\mathbf{Ux}\|_2+\lambda_0\|\mathbf{x}\|_0\right\}. \end{equation} where $\mathbf{v}\in\mathbf{R}^n$ is a fixed vector and $\mathbf{U}=\mathbf{diag}(\mathbf{u})$ with $u_i>0, \forall i=1,\ldots,n$, and $\lambda, \lambda_0>0$.

Since we have no idea about how many nonzero elements the optimal $\mathbf{x}^*$ has, we define $\Omega^k=\{\mathbf{x}\mid \|\mathbf{x}\|_0=k, \mathbf{x}\in \mathbf{R}^n\}$ to represent the set of all $n$-dimensional vectors with exact $k$ non-zero elements. We can show that $\mathbf{x}=\mathbf{0}$ if $\mathbf{U}^{-1}\mathbf{v}\le \lambda$, which is a sufficient but not necessary condition due to the existence of $\ell_0$ penalty. Otherwise, we assume the optimal $\mathbf{x}^*\in\Omega^k$, then the original problem can be reduced to \begin{equation}\tag{2} \min_{\mathbf{x}_k\in\Omega^k}\left\{ f(\mathbf{x}_k):= \frac{1}{2}\|\mathbf{x}_k-\mathbf{v}\|_2^2 + \lambda\|\mathbf{U}_k\mathbf{x}_k\|_2+\lambda_0k\right\} \end{equation} where the subscripts $k$ denote notations with $k$ nonzero entries. The solution to (2) can be computed by the following fixed point operator (I've shown it has a unique fixed point. You may be very smart to propose a closed-form solution for this group lasso problem with weighted parameters within the group.) $$ \mathbf{x}_k=T(\mathbf{x}_k) = (\mathbf{I}+\frac{\lambda \mathbf{U}_k^2}{\|\mathbf{U}_k\mathbf{x}_k\|_2})^{-1}\mathbf{v}_k. $$ For convenience, let $\mathbf{A}_k=(\mathbf{I}+\frac{\lambda \mathbf{U}_k^2}{\|\mathbf{U}_k\mathbf{x}_k\|_2})^{-1}$, then $\mathbf{A}_k$ is a positive definite dianonal matrix with diagonal entries $A_{ki}\in(0,1),\forall i=1,\ldots,n$. Plugging this into (2) yields \begin{align}\tag{3} f(\mathbf{x}_k)=& \frac{1}{2}\|\mathbf{A}_k\mathbf{v}_k-\mathbf{v}\|_2^2 + \lambda\|\mathbf{U}_k\mathbf{A}_k\mathbf{v}_k\|_2+\lambda_0k\\ =&\frac{1}{2}\|\mathbf{A}_k\mathbf{v}_k\|_2^2-\mathbf{v}_k^T\mathbf{A}_k\mathbf{v}_k + \lambda\|\mathbf{U}_k\mathbf{A}_k\mathbf{v}_k\|_2 +\frac{1}{2}\|\mathbf{v}\|_2^2+\lambda_0k\\ =&-\mathbf{v}_k^T(\mathbf{A}_k-\frac{1}{2}\mathbf{A}^2_k)\mathbf{v}_k + \lambda\|\mathbf{U}_k\mathbf{A}_k\mathbf{v}_k\|_2 +\frac{1}{2}\|\mathbf{v}\|_2^2+\lambda_0k \end{align} where $\mathbf{A}_k-\frac{1}{2}\mathbf{A}^2_k\succ\mathbf{0}$ since $A_{ki}\in(0,1),\forall i=1,\ldots,n$. The intuition for doing this is from this paper ''Neural Network Compression via $\ell_0$ Sparse Group Lasso on the Mobile System'', in which the solution to the following similar but simpler problem is proposed. \begin{equation}\tag{4} \min_{\mathbf{x}\in\mathbf{R}^n}\left\{ f(\mathbf{x}):= \frac{1}{2}\|\mathbf{x}-\mathbf{v}\|_2^2 + \lambda\|\mathbf{x}\|_2+\lambda_0\|\mathbf{x}\|_0\right\}. \end{equation} The only difference is that there are no weights for the group lasso term in (4). This problem looks very hard, because we need to try the $2^n$ possibilities to get the minimum objective value. But actually we only need to check $n+1$ possibilities, which is great. The solution is motivated by the following simple derivations. Since we have known the closed-form solution for group lasso, i.e. $\mathbf{x}=(\|\mathbf{v}\|_2-\lambda,0)_+\frac{\mathbf{v}}{\|\mathbf{v}\|_2}=(1-\frac{\lambda}{\|\mathbf{v}\|_2},0)_+\mathbf{v}$. Suppose $\mathbf{x}^*\in\Omega^k$, \begin{align} f(\mathbf{x}_k)=&\frac{1}{2}\|\mathbf{x}_k-\mathbf{v}\|_2^2 + \lambda\|\mathbf{x}_k\|_2+\lambda_0k\\ =&\frac{1}{2}\left\|(\|1-\frac{\lambda}{\|\mathbf{v}_k\|_2})\mathbf{v}_k-\mathbf{v}_k-\mathbf{v}_k^-\right\|_2^2 + \lambda\left\|(\|\mathbf{v}_k\|_2-\lambda)\frac{\mathbf{v}_k}{\|\mathbf{v}_k\|_2}\right\|_2+\lambda_0k\\ =&\frac{1}{2}\left\|-\frac{\lambda}{\|\mathbf{v}_k\|_2}\mathbf{v}_k-\mathbf{v}_k^-\right\|_2^2 + \lambda(\|\mathbf{v}_k\|_2-\lambda)+\lambda_0k\\ =&\frac{1}{2}\lambda^2+\frac{1}{2}\|\mathbf{v}_k^-\|_2^2 + \lambda\|\mathbf{v}_k\|_2-\lambda^2+\lambda_0k\\ =&-\frac{1}{2}\lambda^2 + \lambda\|\mathbf{v}_k\|_2-\frac{1}{2}\|\mathbf{v}_k\|_2^2+\frac{1}{2}\|\mathbf{v}_k\|_2^2+\frac{1}{2}\|\mathbf{v}_k^-\|_2^2+\lambda_0k\\ =&-\frac{1}{2}(\|\mathbf{v}_k\|_2-\lambda)^2+\frac{1}{2}\|\mathbf{v}\|_2^2+\lambda_0k \end{align} where $\mathbf{v}_k^-=\mathbf{v}-\mathbf{v}_k$. The above implies that the greater $\|\mathbf{v}_k\|_2$ is, the smaller the objective value is, since the other terms are fixed. Thus, we only need to sort $\mathbf{v}$ by the absolute values of its elements in descending order and go through each (ordered) $\mathbf{v}_k$. Finally, we compare the minimum of them with $\frac{1}{2}\|\mathbf{v}\|_2^2$ (corresponding to the solution of $\mathbf{0}$). For the weighted version, I got stuck at (3). I appreciate any instruction or comments.

Group lasso with weighted parameters and L0 norm penalty

0 Answers0