Linear Matrix Least Squares with Linear Equality Constraint - Minimize $ {\left\| A - B \right\|}_{F}^{2} $ Subject to $ B x = v $

Question

$$\begin{array}{ll} \text{minimize} & \| A - B \|_F^2\\ \text{subject to} & B x = v\end{array}$$

where $B$ is an $m \times n$ matrix and $x$ is an $n$-vector where each element is $1/n$ (an averaging vector). In layman terms, I want to find the 'closest' matrix to $A$ that has a new average along the rows.

Now I might be completely off as this is the first time trying to solve such a problem but I thought I could do something like this. Please take me through the final steps and if my idea is completely wrong let me know why and what I should be doing.

Here is my attempt so far:

$$\text{Trace}\left[(A-B) (A-B)^{\mathsf{T}}\right]$$

I was hoping I could use the Lagrange method.

These are the identities I have found useful:

$$\frac{\partial \text{Trace}[x]}{\partial x}=\text{Trace}\left[\frac{\partial X}{\partial x}\right]$$

$$\text{Trace}[A+B]=\text{Trace}[A]+\text{Trace}[B]$$

$$\frac{\partial X^{\mathsf{T}}}{\partial x}=\left(\frac{\partial X}{\partial x}\right)^{\mathsf{T}}$$

So this is the problem I thought could be the way to solve it:

$$\text{Trace}\left[(A-B) (A-B)^{\mathsf{T}}\right]-\lambda (B x-v)$$

Finding the gradient(please let me know if I am wrong) and setting it to zero and solving for transpose of B I get:

$B^{\mathsf{T}}=A^{\mathsf{T}}+\frac{\lambda x}{2}$

However, I belive that what I have done is wrong or there is something I am missing because now B is a matrix + a vector as far as I can see, so the dimensions does not work.

Unfortunately I am out of my depth and I have started to read about optimization and how to solve similar problems but I need a answer to this quickly.

I am happy for any comments, improvements corrections that would make this useful for other people.

Thank you very much for any feedback!

The problem separates into independent optimization problems, one for each row of $B$. The $i$th subproblem requires projecting the $i$th row of $A$ onto a particular hyperplane, which is a standard linear algebra problem. — littleO, Feb 03 '18 at 23:19

score 5 · Accepted Answer · answered Feb 03 '18 at 11:22

The beginning looks fine. However, note that you need one Lagrange multiplier per constraint. Thus you need a vector $\lambda$.

The function to minimize is then $$\operatorname{tr}(A^T-B^T)(A-B) - \lambda^T (B x -v). $$

Taking the gradient with respect to $B$, we arrive at $$2 (B-A)- \lambda x^T =0. $$

We can solve this for $B$ with the result $$ B = A + \frac12 \lambda x^T\;.$$

The Lagrange multiplier have to be determined such that the constraints are fulfilled, e.g., $$B x = Ax + \frac12 \lambda x^Tx = v\;.$$ This leads to $$ \lambda = \frac{2}{x^T x}(v -Ax)$$ and thus to the explicit solution $$ B = A + \frac{1}{\Vert x \Vert^2} (v -Ax) x^T $$

what if I have a additional constraint such as : low<=$b_{\text{ij}}$<=upp ? — ALEXANDER, Feb 07 '18 at 01:00

score 2 · Answer 2 · answered Feb 03 '18 at 11:36

Find orthogonal matrices $U,V$ such that $V^{-1}x\sim e_1$, $Uv\sim e_1$. As $ \|A-B\|_2=\|U(A-B)V\|_2$, we now want to find $B'=UBV$ with $B'e_1=\lambda e_1$ such that $\|A-B\|_2$ is minimized, where $A'=UAV$. As the first column of $B'$ is uniquely determined whereas all other entries of $B'$ are free to our likings, we make all entries of $B'$ in columns $2,\ldots, n$ equal to the corresponding entries of $A'$. Thus $B'y=A'y$ for all $y\perp e_1$.

Unfolding this result to the original matrices, we see that $By=Ay$ for all $y\perp x$. Thus $B$ must be of the form $$B=A+wx^T$$ where $w$ is adjusted to guarantee $Bx=v$. From $$v=Bx=Ax+wx^Tx=Ax+\|x\|^2w,$$ we find $w=\frac1{\|x\|^2}(v-Ax)$ and thus finally arrive at $$B = A+\frac{(v-Ax)x^T}{\|x\|^2}. $$

frank · Answer 3 · 2018-02-03T23:14:00.603

Find the general solution of the linear constraint. It will be the least-squares solution plus a contribution from the null space $$\eqalign{ Bx &= v \cr B &= vx^+ + C(I-xx^+) \cr }$$ where $x^+$ is the pseudoinverse of $x$ and $C$ is an arbitrary matrix.

Substituting this expression for $B$ yields an unconstrained problem in terms of $C$ $$\eqalign{ \phi &= \|B-A\|_F^2 = (B-A):(B-A) \cr d\phi &= 2(B-A):dB \cr &= 2(B-A):dC(I-xx^+) \cr &= 2(B-A)(I-xx^+):dC \cr \frac{\partial\phi}{\partial C} &= 2(B-A)(I-xx^+) \cr }$$ Set the gradient to zero and solve for C $$\eqalign{ B(I-xx^+) &= A(I-xx^+) \cr vx^+(I-xx^+) + C(I-xx^+)(I-xx^+) &= A(I-xx^+) \cr C(I-xx^+) &= A(I-xx^+) \cr }$$ Substitute this into the parameteric expression for $B$ $$\eqalign{ B &= vx^+ + C(I-xx^+) \cr &= vx^+ + A(I-xx^+) \cr &= A + (v-Ax)x^+ \cr }$$ Note that for a vector, we can write an explicit expression for the pseudoinverse $$x^+ = \frac{x^T}{x^Tx}$$ The nice thing about this approach, is that it holds when the vectors $(x,v)$ are replaced by matrices $(X,V)$.

In the above, the trace/Frobenius product is denoted by a colon, i.e. $$A:B = {\rm tr}(A^TB)$$

Akababa · Answer 4 · 2018-02-03T11:32:21.970

0

One thing you could use is the fact that each row is independent, i.e. for row vectors $a_i,b_i$ you can minimize $(a_i-b_i)^2:b_i\cdot \textbf 1=nv_i$ separately.

We can use the lagrange method directly on each scalar element to get $$b_{ij}=a_{ij}+v_i-\overline{a_i}$$

where $\overline{a_i}$ is the average of row $a_i$. Or as a matrix,

$$B=A+\textbf{1}^T(v-Ax)$$

edited Feb 03 '18 at 11:32

answered Feb 03 '18 at 11:23

Akababa

3,169

score 0 · Answer 5 · answered Feb 03 '18 at 11:28

You can rewrite $\bf Bx=v$ to $${\bf B}=\min_{\bf B}\{\|{\bf Bx-v}\|_F^2\}$$

with equality when (and only when) that norm equals 0. So you can add it to your cost function

$${\bf B}=\min_{\bf B}\{\|{\bf A-B}\|_F^2+\lambda\|{\bf Bx-v}\|_F^2\}$$

The larger $\lambda$ the more important to fulfill the subject constraint.

And then finally to express matrix multiplication with vectorization and Kronecker products.

Linear Matrix Least Squares with Linear Equality Constraint - Minimize $ {\left\| A - B \right\|}_{F}^{2} $ Subject to $ B x = v $

5 Answers5

Linked