Let $A\in\mathbb R^{m\times n}$ be a real matrix. For any $x,y\in\mathbb R^m$, we write $x\leq y$ if $x_i\leq y_i$ for $i=1,\dots,m$. For any matrix $B$, $\| B \|_F$ is the Frobenius norm and is defined by $\| B \|_F^2 := \operatorname{Tr}\left(B^T B\right)$, recall that the trace has the cyclic property $\operatorname{Tr}(AB) = \operatorname{Tr}(BA)$ and that the trace of a $1 \times 1$ matrix is the only value in that matrix.
I am trying to find a numerical method to solve the following optimization problem
\begin{align*} \min_{\substack{u\in\mathbb R^m,v\in\mathbb R^n\\ 0\leq Av}} \left\| A - u v^T \right\|_F^2 \end{align*}
For practical purpose, we have $m\ll n$, for instance $m$ would be of the order of hundreds and $n$ would typically be more than millions. I am expecting that, similarly to the PCA iterative computation procedure, we could get a runtime of order $O(m^2n)$. To a lesser extent I am also interested in anything related to the case $n<m$.
This is trying to find the best rank-$1$ approximation of $A$ with some constraint on the rank-$1$ approximation. This problem is not convex but is biconvex on a convex subset of $\mathbb R^m\times \mathbb R^n$, therefore my first idea was to alternate between optimization of $u$ and $v$. Observe that we can write
\begin{align*} \| A - uv^T \|_F^2 &= \|A\|_F^2 + \mathrm{Tr} (vu^Tuv^T) - 2 \mathrm{Tr}(v u^T A)\\ &= \|A\|_F^2 + v^Tvu^Tu - 2 u^T A v\\ &= \| A \|_F^2+\|u\|^2\cdot \|v\|^2 - 2 u^T A v \end{align*}
Differentiating w.r.t. $u$ gives $2\|v\|^2u-2Av$ and w.r.t. $v$ gives $2\|u\|^2 v - 2 A^T u$. For $v$ we need to compute the Lagragian and the KKT conditions (which are sufficient for a fixed $u$ but not in general), they are (with $\mu\in\mathbb R^m$ being the vector of Lagrange multipliers : \begin{cases} 2\|u\|^2 v - 2 A^T u-A^T\mu=0\\ 0\leq Av\\ 0\leq \mu\\ \mu^TAv = 0 \end{cases}
So it is necessary that we have $\| v \|^2u=Av$ and the previous conditions. I am not able to solve the system of KKT conditions for $v$, but solving it's Wolfe dual problem might be easier, it is given by \begin{align*} \max_{v,\mu} \| u \|^2\cdot \| v\|^2-2 u^TAv-\mu^TAv \end{align*} under the constraint $2\| u\|^2\cdot v-2A^Tu-A^T\mu=0$ and $\mu\geq 0$, solving for $v$ and then replacing in the optimization expression gives after some simplifications \begin{align*} \max_{\mu\geq 0} -(2u+\mu)^TM(2u+\mu) \end{align*}
Therefore solving for $\mu$ is equivalent to solving $\min_{\mu\geq 0} (2u+\mu)M(2u+\mu)$. I made a second post about this here because it is much simpler and this post is begining to be dirty, I will clean up everything when a solution is found.
Let $A^\dagger\in\mathbb R^{n\times m}$ be the Moore-Penrose pseudoinverse of $A$, in particular $A^\dagger A$ and $A A^\dagger$ are the respective projections onto the span of $A^T$ and $A$.
Here are some facts I could obtain :
Fact 1: In the solution $u,v$ to the original problem, $uv^T = 0$ if and only if $$\operatorname{span}(A) \cap [0,\infty[^m = \{ 0\}$$
Proof : If $uv^T\neq0$, then by the KKT conditions $0\leq u=\frac{Av}{\|\ v\|^2}\in\mathrm{span}(A)\cap [0,\infty[^m$, but $u\neq 0$.
If $0\neq a\in \mathrm{span}(A)\cap[0,\infty[^m$, then $b=A^\dagger a \neq 0$ and \begin{align*} \left\| A-\frac{ab^T}{\|b\|^2} \right\|_F^2 &= \| A \|_F^2+\frac{\|a\|^2\cdot \|b\|^2}{\| b\|^4} - 2 \frac{a^T A A^\dagger a}{\| b\|^2} \\ &=\| A \|_F^2+\frac{\|a\|^2}{\| b\|^2} - 2 \frac{\| a\|^2}{\| b\|^2} \\ &=\| A \|_F^2 - \frac{\| a\|^2}{\| b\|^2} \\ &<\| A \|_F^2 \end{align*} Therefore $uv^T\neq 0$.
Fact 2: If $uv^T\neq 0$ then $u\in \mathrm{span}(A)$ and $v\in\mathrm{span}(A^T)$, furthermore $u=A\frac{v}{\| v\|^2}$ and $v = A^T\frac{2u+\mu}{2\|u\|^2}$.
Proof : Trivial from KKT conditions.
Fact 3: If $a$, $\sigma$, $b$ is a singular triple of $A$ with $0\leq a$, then $u=a$ and $v=\sigma b$ satisfies all KKT conditions.
Proof : $2\| v\|^2u-2Av = 2\sigma^2 a- 2 \sigma^2a=0$, select $\mu=0$ to get $2\|u\|^2 v - 2A^Tu=2\sigma b-2\sigma b=0$, $0\leq \sigma^2 u = Av$.
scipy.optimize.minimize. If you want any specially optimized solver for this problem, it could be worth looking into biconvex optimization. – Hyperplane Feb 28 '23 at 10:45