5

Let $A\in\mathbb R^{m\times n}$ be a real matrix. For any $x,y\in\mathbb R^m$, we write $x\leq y$ if $x_i\leq y_i$ for $i=1,\dots,m$. For any matrix $B$, $\| B \|_F$ is the Frobenius norm and is defined by $\| B \|_F^2 := \operatorname{Tr}\left(B^T B\right)$, recall that the trace has the cyclic property $\operatorname{Tr}(AB) = \operatorname{Tr}(BA)$ and that the trace of a $1 \times 1$ matrix is the only value in that matrix.

I am trying to find a numerical method to solve the following optimization problem

\begin{align*} \min_{\substack{u\in\mathbb R^m,v\in\mathbb R^n\\ 0\leq Av}} \left\| A - u v^T \right\|_F^2 \end{align*}

For practical purpose, we have $m\ll n$, for instance $m$ would be of the order of hundreds and $n$ would typically be more than millions. I am expecting that, similarly to the PCA iterative computation procedure, we could get a runtime of order $O(m^2n)$. To a lesser extent I am also interested in anything related to the case $n<m$.

This is trying to find the best rank-$1$ approximation of $A$ with some constraint on the rank-$1$ approximation. This problem is not convex but is biconvex on a convex subset of $\mathbb R^m\times \mathbb R^n$, therefore my first idea was to alternate between optimization of $u$ and $v$. Observe that we can write

\begin{align*} \| A - uv^T \|_F^2 &= \|A\|_F^2 + \mathrm{Tr} (vu^Tuv^T) - 2 \mathrm{Tr}(v u^T A)\\ &= \|A\|_F^2 + v^Tvu^Tu - 2 u^T A v\\ &= \| A \|_F^2+\|u\|^2\cdot \|v\|^2 - 2 u^T A v \end{align*}

Differentiating w.r.t. $u$ gives $2\|v\|^2u-2Av$ and w.r.t. $v$ gives $2\|u\|^2 v - 2 A^T u$. For $v$ we need to compute the Lagragian and the KKT conditions (which are sufficient for a fixed $u$ but not in general), they are (with $\mu\in\mathbb R^m$ being the vector of Lagrange multipliers : \begin{cases} 2\|u\|^2 v - 2 A^T u-A^T\mu=0\\ 0\leq Av\\ 0\leq \mu\\ \mu^TAv = 0 \end{cases}

So it is necessary that we have $\| v \|^2u=Av$ and the previous conditions. I am not able to solve the system of KKT conditions for $v$, but solving it's Wolfe dual problem might be easier, it is given by \begin{align*} \max_{v,\mu} \| u \|^2\cdot \| v\|^2-2 u^TAv-\mu^TAv \end{align*} under the constraint $2\| u\|^2\cdot v-2A^Tu-A^T\mu=0$ and $\mu\geq 0$, solving for $v$ and then replacing in the optimization expression gives after some simplifications \begin{align*} \max_{\mu\geq 0} -(2u+\mu)^TM(2u+\mu) \end{align*}

Therefore solving for $\mu$ is equivalent to solving $\min_{\mu\geq 0} (2u+\mu)M(2u+\mu)$. I made a second post about this here because it is much simpler and this post is begining to be dirty, I will clean up everything when a solution is found.

Let $A^\dagger\in\mathbb R^{n\times m}$ be the Moore-Penrose pseudoinverse of $A$, in particular $A^\dagger A$ and $A A^\dagger$ are the respective projections onto the span of $A^T$ and $A$.

Here are some facts I could obtain :


Fact 1: In the solution $u,v$ to the original problem, $uv^T = 0$ if and only if $$\operatorname{span}(A) \cap [0,\infty[^m = \{ 0\}$$

Proof : If $uv^T\neq0$, then by the KKT conditions $0\leq u=\frac{Av}{\|\ v\|^2}\in\mathrm{span}(A)\cap [0,\infty[^m$, but $u\neq 0$.

If $0\neq a\in \mathrm{span}(A)\cap[0,\infty[^m$, then $b=A^\dagger a \neq 0$ and \begin{align*} \left\| A-\frac{ab^T}{\|b\|^2} \right\|_F^2 &= \| A \|_F^2+\frac{\|a\|^2\cdot \|b\|^2}{\| b\|^4} - 2 \frac{a^T A A^\dagger a}{\| b\|^2} \\ &=\| A \|_F^2+\frac{\|a\|^2}{\| b\|^2} - 2 \frac{\| a\|^2}{\| b\|^2} \\ &=\| A \|_F^2 - \frac{\| a\|^2}{\| b\|^2} \\ &<\| A \|_F^2 \end{align*} Therefore $uv^T\neq 0$.


Fact 2: If $uv^T\neq 0$ then $u\in \mathrm{span}(A)$ and $v\in\mathrm{span}(A^T)$, furthermore $u=A\frac{v}{\| v\|^2}$ and $v = A^T\frac{2u+\mu}{2\|u\|^2}$.

Proof : Trivial from KKT conditions.


Fact 3: If $a$, $\sigma$, $b$ is a singular triple of $A$ with $0\leq a$, then $u=a$ and $v=\sigma b$ satisfies all KKT conditions.

Proof : $2\| v\|^2u-2Av = 2\sigma^2 a- 2 \sigma^2a=0$, select $\mu=0$ to get $2\|u\|^2 v - 2A^Tu=2\sigma b-2\sigma b=0$, $0\leq \sigma^2 u = Av$.

P. Quinton
  • 6,249
  • 1
    What stops you from simply using any of-the-shelf optimization tool that supports linear constraints like scipy.optimize.minimize. If you want any specially optimized solver for this problem, it could be worth looking into biconvex optimization. – Hyperplane Feb 28 '23 at 10:45
  • @Hyperplane Thank you for the input, I tried scipy which seems to work, but I think it runs rather slowly (I have to run this many times) so I will take a look at biconvex optimization, if you have any source or input on that subject they would be most welcome. – P. Quinton Feb 28 '23 at 13:39
  • This is related : http://www2.math.uni-wuppertal.de/~klamroth/publications/gopfkl07.pdf – P. Quinton Feb 28 '23 at 13:40
  • If speed is an issue, you should give some information about the size of the problem. Also consider using JAX to get the Jacobian and Hessian in JIT-compiled form and consider offloading to GPU if it is a large scale problem. – Hyperplane Feb 28 '23 at 13:46
  • @Hyperplane done, essentially we have $m\ll n$. I am not very familiar with the tools you are mentioning, I will look into it, but if there is a way to solve this using bilconvex optimization by alternating optimization on two parameters, especially if they are in $\mathbb R^m$ then I think this would be very nice. – P. Quinton Mar 01 '23 at 07:54
  • @RodrigodeAzevedo Yes I am aware that the singular triple with maximal singular value would be the answer without the constraint $0\leq Av$, now the thing is that in general the maximal singular value may not be in the positive cone, and actually there may be no all positive left singular vector. – P. Quinton Mar 01 '23 at 08:06
  • There could be potentially a problem here with the problem given unbounded solutions. If you investigate the 2×2 case, it seems that for $A=₂$, the solution is obtained for $u=[a,a]$ and $v=[x,x]$ with $ax=½$ and $x≥0$. In particular, we have an unbounded solution manifold. $x=1/ε$ and $a=½ε$ is a solution for any $ε>0$. I suggest you consider regularization of the problem. – Hyperplane Mar 01 '23 at 14:02
  • This might be the reason that https://www.geno-project.org/ bugs out for this problem. – Hyperplane Mar 01 '23 at 14:12
  • @Hyperplane The problem is equivalent to $\min{ | A - \sigma u v^T |_F^2 : 0\leq \sigma, 0\leq u,| u|=1, |v |=1 }$. This makes the parameter space not convex but maybe your solver would be ok with that. – P. Quinton Mar 01 '23 at 14:45

2 Answers2

1

You could try using https://www.geno-project.org with the query

parameters 
    matrix A
variables
    vector u, v
min
    norm2(A - u * v')^2
st
    A*v >= vector(0)

Their library https://github.com/slaue/genosolver/ automatically generates a GPU capable solver based on L-BFGS-B.


Hyperplane
  • 12,204
  • 1
  • 22
  • 52
1

First of all, $n$ is almost totally irrelevant. You have $m$ vectors $a_i$ (rows of $A$) and you are searching for a vector $v$ that has non-negative scalar products with all $a_j$ and maximizes the sum of squares of the projections of $a_j$ to the direction of $v$. Clearly, such $v$ is in the linear span of $a_i$, so only the Gram matrix of $a_i$ matters for finding the coefficients of the appropriate linear combination. Finding it takes time $m^2n$, indeed, but from there on, you work in $\mathbb R^m$, so the life becomes much easier.

Second, consider the simplest non-trivial case when you have 2 unit vectors in the upper half-plane in $\mathbb R^2$ making the angles $\alpha$ and $\pi-\alpha$ with the positive $x$-semiaxis for small $\alpha$. Then the admissible directions $v$ will be those making the angle $\theta$, $|\theta|<\alpha$ with the vertical semiaxis and the largest sum of the squares of scalar products will be at the endpoints of the corresponding arc on the unit circle ($\sin^2 2\alpha+0>2\sin^2\alpha$), so, if you change the length of one of the vector slightly, you'll have two local maxima, one of which will be better than the other. If you use some Lagrange multiplier or descent related technique, you may easily end up in the wrong corner and get stuck there. In higher dimension, your problem is finding the furthest (with respect to the metric given by the quadratic form $\sum_i(a_i,v)^2$) from the origin point in the intersection of the unit ball with the cone given by the inequalities $(a_i,v)\ge 0$. So, you are searching for the maximum of a quadratic form over a fairly irregular convex set with the possibility of having a lot of local maxima that have little to do with the global one. This is an ugly problem in general and the only approach I can offer is to use something like the differential evolution and pray that it would end up at a decently high value.

fedja
  • 19,348
  • Thank you very much for you answer, it makes sense. I am currently experimenting a bit with the following method : We alternate between solving for $u$ and $\mu$ because they are low dimension. For a fixed $\mu_t$, we let $u_{t+1}\propto M(2u_t+\mu_t)$, and then $\mu_{t+1}$ is defined by a step in the projected gradient direction of the $(2u+b)^T M(2u+b)$ problem, where I also minimize its norm. If you want to describe this on the other question today I would give the reward to you (therefore a single step based on gradient, projection and minimization of the step norm). – P. Quinton Mar 20 '23 at 08:53
  • PS: I would be interested in any improvement of my previous method, or in any method that you think could be efficient, at least at finding a saddle point of the problem if not the best one. – P. Quinton Mar 20 '23 at 08:55