0

I would like to least-squares-"solve" a set of linear equations ($\underset{\mathbf{x}}{\mathrm{argmin}}\; \|\mathbf{Ax-b}\|_2$).

In my case, $\mathbf{b=0}$, e.g. the system is homogeneous. I also happen to know that all parameters must be positive, $\mathbf{x} \geq 0$. (The math works out because half of the coefficients are negative.)

  • I am assuming that enforcing $\|\mathbf{x}\|_2=1$, e.g. constraining the solution to a hypersphere of constant radius, will take care of the homogeneneity (lest we simply get the trivial solution).

  • I have found iterative methods (based on quadratic programming) to deal with the non-negativity constraint.

I have, however, not been able to find a way to combine the two constraints. When adding the fixed-norm constraint to the QP, the solver complains about non-convexity of the problem. Intuitively, I had assumed this was a convex problem. Am I wrong, and if yes, why? If not, how can I solve this problem in software?

sk29910
  • 93
  • With the unit norm constraint, the problem is not convex because $(1,0,0,\ldots)$ and $(0,1,0,\ldots)$ are in the domain but the line segment joining them is not. –  May 12 '14 at 02:36
  • If you're willing to settle for enforcing $|\mathbf x|_1=1$ instead, then in conjunction with $\mathbf x\ge0$ this reduces to $\mathbf 1^T\mathbf x=1$ where $\mathbf 1$ is the vector of all ones, and you will obtain a quadratic programming problem that you can solve. –  May 12 '14 at 02:43
  • I see. Is there any downside to ditching the squares and going with a simple sum? – sk29910 May 12 '14 at 02:46
  • Yes: you will obtain a different answer with the one-norm than you would with the two-norm. But it is not clear to me that the answer you get from the two-norm is the only answer you want, anyway. –  May 12 '14 at 02:49
  • That isn't clear to me either. :) Is there any way to quantify the difference between the two? Are there other ways of "solving" homogeneous overconstrained systems? I appreciate your help! – sk29910 May 12 '14 at 03:03

2 Answers2

2

Following Rahuls comment, let us consider the convex optimization problem \begin{equation} \min_{\mathbf{x}} \|\mathbf{Ax}\|_2 ~~~\text{subject to } \mathbf{x} \geq 0, ~~\mathbf 1^T\mathbf x =\alpha. \end{equation} For some $\alpha^\star$, the solution $x^\star$ of the problem above will coincide with the solution of your original problem.

For any $\alpha_1 < \alpha^\star$, it follows $||x_1^\star|| < 1$. On the other hand, for $\alpha_2 > \alpha^\star$, it follows $||x_2^\star|| > 1$ (This is of course no rigorous proof). Therefore, one way to solve your optimization problem would be to search for the true $\alpha$ by solving the above stated problem combined with bi-section search:

Start with $\alpha_1=0$ and $\alpha_2$ great enough such that $||x_2||>1$.

Try $\alpha=(\alpha_1+\alpha_2)/2$. If the respective solution to the convex optimization problem has a norm greater than 1 set $\alpha_2:=\alpha$, otherwise $\alpha_1:=\alpha$. Then, repeat.

  • That's an interesting approach, and I can picture why this is true, so I'll accept this. Is there a name for doing it like this, or some reference work where such tricks are listed? – sk29910 May 12 '14 at 19:29
  • 1
    To my best knowledge, there is no special name for this technique but a quit similar strategy has been applied in "Quality of service and max-min fair transmit beamforming to multiple cochannel multicast groups" of Karipidis, Sidiropoulos. – The Pheromone Kid May 12 '14 at 19:41
0

Just for theoretical purposes, I'll write the optimality conditions for both the cases $\|x\|_1 = 1$ and $\|x\|_2 = 1$.

The positivity constraint can be handled through the change of variable $x = y^{\odot 2}$, denoting a component-wise power. The problem becomes

$$\min_{\|y^{\odot 2}\|=1} \|Ay^{\odot 2}\|_F^2 = \min_{\|y^{\odot 2}\|=1} (y^{\odot 2})^\top A^\top A y^{\odot 2}\label{optimization}\tag{1}$$

It is possible to derive the following formulas, slightly more general than those I proved here: for any square matrix $M$ and any column vector $c$ of suitable size, and for any positive integer $k$ it holds that \begin{align} &\frac{\mathrm{d}}{\mathrm{d}y} (y^{\odot 2})^\top M y^{\odot 2} = 2\left((M+M^\top)y^{\odot 2}\right)\odot y\\ &\frac{\mathrm{d}}{\mathrm{d}y} c^\top y^{\odot k} = k c \odot y^{\odot k-1} \end{align} where $\odot$ denotes the element-wise product, aka Hadamard product. We will use these formulas to write the optimality condition $\frac{\mathrm{d}}{\mathrm{d}y} \mathcal{F}(y) = 0$, where $\mathcal{F}(y)$ is the cost function in \eqref{optimization}.

CASE $\|x\|_1 = 1$

The Lagrangian is $$\mathcal{L}(y,\lambda) = \mathcal{F}(y) - \lambda (\mathbb{1}^\top y^{\odot 2} -1),$$ where $\mathbb{1}$ denotes a column vector of ones. We obtain the optimality conditions $$\boxed{ \left[2A^\top Ay^{\odot 2}-\lambda\mathbb{1}\right]\odot y = 0\\ \mathbb{1}^\top y^{\odot 2} = 1 }$$ In the $x$ variable, these conditions become $$\boxed{ \left[2A^\top Ax-\lambda\mathbb{1}\right]\odot x = 0\\ \mathbb{1}^\top x = 1\\ x \ge 0 }$$

CASE $\|x\|_2 = 1$

The Lagrangian is $$\mathcal{L}(y,\lambda) = \mathcal{F}(y) - \lambda (\mathbb{1}^\top y^{\odot 4} -1).$$ We obtain the (quite different) optimality conditions $$\boxed{ \left[(A^\top A-\lambda I)y^{\odot 2}\right]\odot y = 0\\ \mathbb{1}^\top y^{\odot 4} = 1 }$$ In the $x$ variable, these conditions become $$\boxed{ \left[(A^\top A-\lambda I)x\right]\odot x = 0\\ \mathbb{1}^\top x^{\odot 2} = 1\\ x \ge 0. }$$

MathMax
  • 195