How can I solve "average" best rank-$1$ approximation?

Question

Assume I want to minimise this

$$ \min_{x,y} \left\| A - x y^T \right\|_{\text{F}}^2$$

then I am finding best rank-$1$ approximation of $A$ in the squared-error sense and this can be done via the SVD, selecting $x$ and $y$ as left and right singular vectors corresponding to the largest singular value of $A$.

Now instead, is possible to solve the following for $b$ also fixed?

$$ \min_{x} \left\| A - x b^T \right\|_{\text{F}}^2$$

If this is possible, is there also a way to solve

$$ \min_{x} \left\| A - x b^T \right\|_{\text{F}}^2 + \left\| C - x d^T \right\|_{\text{F}}^2$$

where I think of $x$ as the best "average" solution between the two parts of the cost function?

I am of course longing for a closed-form solution but a nice iterative optimisation approach to a solution could also be useful.

@ThomasArildsen, You should mark question as answered or say what's missing in the answers. — Royi, May 13 '18 at 13:18

Rodrigo de Azevedo · Answer 1 · 2021-01-25T08:03:03.683

$$\| \mathrm A - \mathrm x \mathrm b^{\top} \|_{\text{F}}^2 = \cdots = \| \mathrm b \|_2^2 \, \| \mathrm x \|_2^2 - \langle \mathrm A \mathrm b , \mathrm x \rangle - \langle \mathrm x , \mathrm A \mathrm b \rangle + \| \mathrm A \|_{\text{F}}^2$$

Taking the gradient of this cost function,

$$\nabla_{\mathrm x} \| \mathrm A - \mathrm x \mathrm b^{\top} \|_{\text{F}}^2 = 2 \, \| \mathrm b \|_2^2 \, \mathrm x - 2 \mathrm A \mathrm b$$

which vanishes at the minimizer

$$\mathrm x_{\min} := \color{blue}{\frac{1}{\| \mathrm b \|_2^2} \mathrm A \mathrm b}$$

Note that

$$\mathrm A - \mathrm x_{\min} \mathrm b^{\top} = \mathrm A - \mathrm A \left( \frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b } \right) = \mathrm A \left( \mathrm I_n - \frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b } \right)$$

where

$\frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b }$ is the projection matrix that projects onto the line spanned by $\mathrm b$, which we denote by $\mathcal L$.
$\left( \mathrm I_n - \frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b } \right)$ is the projection matrix that projects onto the orthogonal complement of line $\mathcal L$.

Hence, the minimum is

$$\| \mathrm A - \mathrm x_{\min} \mathrm b^{\top} \|_{\text{F}}^2 = \left\| \mathrm A \left( \mathrm I_n - \frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b } \right) \right\|_{\text{F}}^2 = \color{blue}{\left\| \left( \mathrm I_n - \frac{ \,\,\,\mathrm b \mathrm b^{\top} }{ \mathrm b^\top \mathrm b } \right) \mathrm A^\top \right\|_{\text{F}}^2}$$

which is the sum of the squared $2$-norms of the projections of the rows of $\rm A$ (i.e., the columns of $\rm A^\top$) onto the orthogonal complement of line $\mathcal L$.

matrices rank-1-matrices

For the case:
$$ \left| A - x {b}^{T} \right|{2}^{2} + \left| C - x {d}^{T} \right|{2}^{2} $$

The solution is given by $ {x}{min} = \frac{ Ab + Cd }{ \left| b \right|{2}^{2} + \left| d \right|_{2}^{2} } $. — Royi, May 13 '18 at 13:05
Though I think you should insert it into your answer. If it is a credit thing, feel fre to make it your own edit. — Royi, May 13 '18 at 13:07
@Royi The different LaTeX style bothers me. It clashes. In any case, I verified your work and it is indeed correct. — Rodrigo de Azevedo, May 13 '18 at 13:19

score 1 · Answer 2 · answered Aug 02 '17 at 13:50

1

This is a Convex Optimization Problem and you can easily solve it using CVX:

numRows = 5;

mA = randn([numRows, numRows]);
vB = randn([numRows, 1]);


cvx_begin('quiet')
    cvx_precision('best');
    variable vX(numRows)
    minimize( norm(mA - vB * vX.', 'fro') )
cvx_end

disp([' ']);
disp(['CVX Solution Summary']);
disp(['The CVX Solver Status - ', cvx_status]);
disp(['The Optimal Value Is Given By - ', num2str(cvx_optval)]);
disp(['The Optimal Argument Is Given By - [ ', num2str(vX.'), ' ]']);
disp([' ']);

If you formulate your expression using the Trace Operator you'll also be able to solve it easily in other simple methods.

answered Aug 02 '17 at 13:50

Royi

10,050

Why use CVX to solve an unconstrained least-squares problem? – Rodrigo de Azevedo Aug 03 '17 at 21:42
1

@RodrigodeAzevedo, I was short on time and I wrote this can be easily solved in other methods as you did :-). My point was to show this is a Convex Problem hence easily solveable as the Frobenius norm is a norm. – Royi Aug 04 '17 at 08:38

mathreadler · Answer 3 · 2017-08-03T01:13:37.523

You already have solutions with CVX (convex optimization), but in fact you can solve this using simple ordinary linear least squares and representing matrix multiplication with Kronecker products. Let $M_E$ represent multiplication by $E$ (from the right) and $v_A,v_C,v_x$ be the respective vectorization of $A,C,x$ we can rewrite it:

$$\min_{v_x}\left\{\|v_A- M_{b^t}v_x\|_2^2 + \|v_C- M_{d^t}v_x\|_2^2\right\}$$

Which you can expand using $\|Y\|^2 = Y^TY$, then expand, differentiate, sum up and set derivative equal $0$ and solve.

You can read more about the details of the vectorization on Wikipedia entry Kronecker Products

greg · Answer 4 · 2024-02-29T16:25:34.237

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\mt{\mapsto} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\Sj{\displaystyle\sum_{j=1}^n} \def\Sk{\displaystyle\sum_{k=1}^n} \def\P{{\large\Psi}} $Generalizing your second question a bit further, consider the following function $$\eqalign{ \P &= \Sk\frob{A_k-xb_k^T}^2 \qquad \qquad \qquad \qquad \qquad \quad }$$ which can be rewritten using Einstein Summation and Double-Dot Products $$\eqalign{ \P &= \LR{A_k-xb_k^T}:\LR{A_k-xb_k^T} \\ &= \LR{xb_k^T:xb_k^T} \;-\; 2\LR{A_k:xb_k^T} \;+\; \LR{A_k:A_k} \\ &= (b_k^T b_k)\LR{x:x} \;-\; 2\LR{A_k b_k}:x \;+\; \LR{A_k:A_k} \\ }$$ Calculate the gradient (NB: dummy index $k$ replaced by $j$ in the first term) $$\eqalign{ d\P &= (b_j^Tb_j)\LR{2x:dx} \;-\; 2\LR{A_k b_k}:dx \;+\; 0 \qquad \quad \\ &= 2\LR{b_j^Tb_j x - A_k b_k}:dx \\ \grad{\P}{x} &= 2\LR{b_j^Tb_j\,x-A_k b_k} \\ }$$ and solve the $\sf zero\ gradient$ condition for $x$ $$\eqalign{ b_j^T b_j\,x = A_k b_k \qiq \c{x = \fracLR{A_k b_k}{b_j^T b_j}\equiv\fracLR{\Sk A_k b_k}{\Sj b_j^T b_j}} \\ }$$

For reference, here are the rules for manipulating double-dot products $$\eqalign{ F:G &= \sum_{i=1}^m\sum_{j=1}^n F_{ij}G_{ij} \;=\; \trace{F^TG} \\ G:G &= \frob{G}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ F:G &= G:F \;=\; G^T:F^T \\ \LR{XY}:G &= X:\LR{GY^T} \;=\; Y:\LR{X^TG} \\ }$$ where $\LR{F,G,X,Y}$ are arbitrary matrices with dimensions $${ \def\R#1{{\mathbb R}^{#1}} F,G \in\R{m\times n} \qquad X \in\R{m\times p} \qquad Y \in\R{p\times n} \\ }$$

How can I solve "average" best rank-$1$ approximation?

4 Answers4