6

$$\begin{array}{ll} \text{minimize} & \| A - B \|_F^2\\ \text{subject to} & B x = v\end{array}$$

where $B$ is an $m \times n$ matrix and $x$ is an $n$-vector where each element is $1/n$ (an averaging vector). In layman terms, I want to find the 'closest' matrix to $A$ that has a new average along the rows.

Now I might be completely off as this is the first time trying to solve such a problem but I thought I could do something like this. Please take me through the final steps and if my idea is completely wrong let me know why and what I should be doing.

Here is my attempt so far:

$$\text{Trace}\left[(A-B) (A-B)^{\mathsf{T}}\right]$$

I was hoping I could use the Lagrange method.

These are the identities I have found useful:

$$\frac{\partial \text{Trace}[x]}{\partial x}=\text{Trace}\left[\frac{\partial X}{\partial x}\right]$$

$$\text{Trace}[A+B]=\text{Trace}[A]+\text{Trace}[B]$$

$$\frac{\partial X^{\mathsf{T}}}{\partial x}=\left(\frac{\partial X}{\partial x}\right)^{\mathsf{T}}$$

So this is the problem I thought could be the way to solve it:

$$\text{Trace}\left[(A-B) (A-B)^{\mathsf{T}}\right]-\lambda (B x-v)$$

Finding the gradient(please let me know if I am wrong) and setting it to zero and solving for transpose of B I get:

$B^{\mathsf{T}}=A^{\mathsf{T}}+\frac{\lambda x}{2}$

However, I belive that what I have done is wrong or there is something I am missing because now B is a matrix + a vector as far as I can see, so the dimensions does not work.

Unfortunately I am out of my depth and I have started to read about optimization and how to solve similar problems but I need a answer to this quickly.

I am happy for any comments, improvements corrections that would make this useful for other people.

Thank you very much for any feedback!

Royi
  • 10,050
ALEXANDER
  • 2,160
  • The problem separates into independent optimization problems, one for each row of $B$. The $i$th subproblem requires projecting the $i$th row of $A$ onto a particular hyperplane, which is a standard linear algebra problem. – littleO Feb 03 '18 at 23:19

5 Answers5

5

The beginning looks fine. However, note that you need one Lagrange multiplier per constraint. Thus you need a vector $\lambda$.

The function to minimize is then $$\operatorname{tr}(A^T-B^T)(A-B) - \lambda^T (B x -v). $$

Taking the gradient with respect to $B$, we arrive at $$2 (B-A)- \lambda x^T =0. $$

We can solve this for $B$ with the result $$ B = A + \frac12 \lambda x^T\;.$$

The Lagrange multiplier have to be determined such that the constraints are fulfilled, e.g., $$B x = Ax + \frac12 \lambda x^Tx = v\;.$$ This leads to $$ \lambda = \frac{2}{x^T x}(v -Ax)$$ and thus to the explicit solution $$ B = A + \frac{1}{\Vert x \Vert^2} (v -Ax) x^T $$

Fabian
  • 24,230
2

Find orthogonal matrices $U,V$ such that $V^{-1}x\sim e_1$, $Uv\sim e_1$. As $ \|A-B\|_2=\|U(A-B)V\|_2$, we now want to find $B'=UBV$ with $B'e_1=\lambda e_1$ such that $\|A-B\|_2$ is minimized, where $A'=UAV$. As the first column of $B'$ is uniquely determined whereas all other entries of $B'$ are free to our likings, we make all entries of $B'$ in columns $2,\ldots, n$ equal to the corresponding entries of $A'$. Thus $B'y=A'y$ for all $y\perp e_1$.

Unfolding this result to the original matrices, we see that $By=Ay$ for all $y\perp x$. Thus $B$ must be of the form $$B=A+wx^T$$ where $w$ is adjusted to guarantee $Bx=v$. From $$v=Bx=Ax+wx^Tx=Ax+\|x\|^2w,$$ we find $w=\frac1{\|x\|^2}(v-Ax)$ and thus finally arrive at $$B = A+\frac{(v-Ax)x^T}{\|x\|^2}. $$

2

Find the general solution of the linear constraint. It will be the least-squares solution plus a contribution from the null space $$\eqalign{ Bx &= v \cr B &= vx^+ + C(I-xx^+) \cr }$$ where $x^+$ is the pseudoinverse of $x$ and $C$ is an arbitrary matrix.

Substituting this expression for $B$ yields an unconstrained problem in terms of $C$ $$\eqalign{ \phi &= \|B-A\|_F^2 = (B-A):(B-A) \cr d\phi &= 2(B-A):dB \cr &= 2(B-A):dC(I-xx^+) \cr &= 2(B-A)(I-xx^+):dC \cr \frac{\partial\phi}{\partial C} &= 2(B-A)(I-xx^+) \cr }$$ Set the gradient to zero and solve for C $$\eqalign{ B(I-xx^+) &= A(I-xx^+) \cr vx^+(I-xx^+) + C(I-xx^+)(I-xx^+) &= A(I-xx^+) \cr C(I-xx^+) &= A(I-xx^+) \cr }$$ Substitute this into the parameteric expression for $B$ $$\eqalign{ B &= vx^+ + C(I-xx^+) \cr &= vx^+ + A(I-xx^+) \cr &= A + (v-Ax)x^+ \cr }$$ Note that for a vector, we can write an explicit expression for the pseudoinverse $$x^+ = \frac{x^T}{x^Tx}$$ The nice thing about this approach, is that it holds when the vectors $(x,v)$ are replaced by matrices $(X,V)$.

In the above, the trace/Frobenius product is denoted by a colon, i.e. $$A:B = {\rm tr}(A^TB)$$

frank
  • 351
0

One thing you could use is the fact that each row is independent, i.e. for row vectors $a_i,b_i$ you can minimize $(a_i-b_i)^2:b_i\cdot \textbf 1=nv_i$ separately.

We can use the lagrange method directly on each scalar element to get $$b_{ij}=a_{ij}+v_i-\overline{a_i}$$

where $\overline{a_i}$ is the average of row $a_i$. Or as a matrix,

$$B=A+\textbf{1}^T(v-Ax)$$

Akababa
  • 3,169
0

You can rewrite $\bf Bx=v$ to $${\bf B}=\min_{\bf B}\{\|{\bf Bx-v}\|_F^2\}$$

with equality when (and only when) that norm equals 0. So you can add it to your cost function

$${\bf B}=\min_{\bf B}\{\|{\bf A-B}\|_F^2+\lambda\|{\bf Bx-v}\|_F^2\}$$

The larger $\lambda$ the more important to fulfill the subject constraint.

And then finally to express matrix multiplication with vectorization and Kronecker products.

mathreadler
  • 26,534