Sparse Approximation in the Mahalanobis Distance

Question

Given a vector $z \in \mathbb{R}^n$ and $k < n$, finding the best $k$-sparse approximation to $z$ in terms of the Euclidean distance means solving $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} ||z - x||_2$$ This can easily be done by choosing $x$ such that it consists of the $k$ largest components of $z$ in terms of $| \cdot |$ and zero in every other component.

I was now thinking about what happens if we slightly modify the question to allow for other metrics on $\mathbb{R}^n$. For example, what would happen if we instead try to solve $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} (x-z)^TA(x-z)$$ for a symmetric positive definite matrix $A$? I guess this is much harder, but are there good algorithms that do this?

$|x|_0$ is the number of nonzero elements in a vector $x$. It is not a norm, but can be shown to be the limit of $|x|_p$ as $p \rightarrow 0$ — Winger 14, Feb 07 '22 at 22:06

score 4 · Accepted Answer · answered Feb 08 '22 at 15:29

Since $A$ is symmetric definite positive, we can write $A = S^2$ where $S$ is a square-root matrix of $A$. Introducing $y:= Sz$, your problem can now be rewritten as $$\min_{\{x \in \mathbb{R}^n : ||x||_0 \le k\}} \Vert Sx - y\Vert^2.$$

This is a quite well-known problem, which I think is known as sparse regression, related to the compressed sensing problem, a research area which was popular around the 2000's. There is a huge litterature on this. I will give a couple of pointers to your question about how to solve it:

One approach is to try to solve directly this nonconvex problem. You can try for instance a projected-gradient method : $$x^{k+1} = P_k(x^k - \lambda S^\top(S x^k - y))$$ where $\lambda < 2 / \Vert A \Vert$, and $P_k$ is the projection onto $\{x \in \mathbb{R}^n : ||x||_0 \le k\}$ which is quite easy to compute (as you wrote yourself, it sets to zero the $n-k$ smallest coefficients). This sequence $(x^k)_{k \in \mathbb{N}}$ is guaranteed to converge (because the problem is semi-algebraic, see the arguments in Example 5.4 of Convergence of descent methods for semi-algebraic and tame problems). But the limit is not guranteed to be a solution of the problem, because the problem is nonconvex.
An other approach, widely used in statistics and signal processing is to consider a convex approximation of the problem, known as the LASSO : $$\min_{\{x \in \mathbb{R}^n : ||x||_1 \le k\}} \Vert Sx - y\Vert^2.$$ Replacing $\Vert \cdot \Vert_0$ with $\Vert \cdot \Vert_1$ makes now the problem convex.
- Solutions of the LASSO can be efficiently approximated by sequences generated by a projection-gradient algorithm (same as above, but we project now onto a L1-ball, which is quite the same as projecting onto the simplex) ; but there are plenty of other algorithms at hand to do this as well.
- Of course the LASSO is not exactly our original problem, so what are we losing here? The fact is that there are guarantees which ensure that both problems have the same solution. For instance, if $S$ is injective, which is our case here since you assume $A$ to be definite positive (see Proposition 2.1 in The Convex Geometry of Linear Inverse Problems).
A last note, in case you start to investigate more on the topic with those keywords. The problem you consider is equivalent to $$\min_{\{x \in \mathbb{R}^n : \Vert Sx - y\Vert^2 \le \varepsilon\}} \Vert x \Vert_0$$ and $$\min_{\{x \in \mathbb{R}^n\}} \lambda \Vert x \Vert_0 + \Vert Sx - y\Vert^2$$ with a suitable choice of $\lambda,\varepsilon$ depending on $k$. Depending on what you'll read, one form or the other will be discussed.

Sparse Approximation in the Mahalanobis Distance

1 Answers1