1

I have the following minimization problem, where I want to find W, \begin{align} &\min \mathrm{tr} (((W^TK)\circ(W^TK))^T((W^TK)\circ(W^TK))L)\\ &\text{s.t.} ~ W^TKHKW = I \end{align} where $\circ$ means the Hadamard product,and the L ,H and K are both semi-positive definite and symmetric. By taking the derivative of the objective function with respect to W and using the Lagrange multiplier, I get the following formula \begin{align} &2K[[L((KW)\circ(KW))]\circ(KW)]=KHKW\phi \end{align} where the $\phi$ is the Lagrange multiplier.Then, I don't know how to go on.Or is it impossible to find the W I want? I'd really appreciate it for any help.

gouchuan
  • 11
  • 3
  • Optimization problems like this, if I didn't know how to solve them, maybe I could just use the solver to find a solution, but in that case, is this derivative that I'm taking useless?Or what can I do with this derivative information if I don't know how to solve it, Thank you for your advice – gouchuan Jan 25 '23 at 14:29

1 Answers1

1

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\unvc#1{\op{unvec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\o{{\tt1}} \def\p{\partial} \def\l{\lambda} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\B{B^{-1}} $Define the following matrix variables in terms of $(H,K,L)$ $$\eqalign{ X &= KW &\qiq X^T = W^TK \\ Y &= X\odot X &\qiq Y^T = X^T\odot X^T \\ H &= C^TC &\qiq \big\{{\rm Cholesky}\big\} \\ Z &= CX &\qiq X = C^{-1}Z \\ }$$ Starting with an unconstrained matrix $U$, construct the semi-orthogonal matrix $Z$ $$\eqalign{ B &= B^T = \LR{U^TU}^{1/2} \\ Z &= U\B \qiq &Z^TZ &= \B U^TU B^{-1} = \B B^2 \B = I \\ && &\doteq X^TC^TCX = W^TKHKW \\ }$$

Now calculate differentials with respect to the unconstrained variable $$\eqalign{ B^2 &= U^TU \\ B\,dB+dB\,B &= U^TdU+dU^TU \\ (I\otimes B+B\otimes I)\,db &= \LR{I\otimes U^T+(U^T\otimes I)M}du \\ db &= P\,du \\ \\ Z &= UB^{-1} \\ dZ &= dU\,\B - UB^{-1}dB\,\B \\ &= dU\,\B - Z\,dB\,\B \\ dz &= \LR{\B\otimes I)\,du \;-\; (\B\otimes Z}\,\c{db} \\ &= \LR{(\B\otimes I) - (\B\otimes Z)\c{P}}\,\c{du} \\ &= Q\,du \\ \\ X &= C^{-1}Z \\ dX &= C^{-1}dZ \\ dx &= \LR{I\otimes C^{-1}}\c{dz} \\ &= \LR{I\otimes C^{-1}}\c{Q\,du} \\ }$$ where $M$ is the Commutation Matrix associated with the vectorization of matrix equations.

Finally, calculate the gradient of the objective function $$\eqalign{ \phi &= \trace{YY^TL} \\ &= L:YY^T \\ d\phi &= L : \LR{dY\,Y^T+Y\,dY^T} \\ &= 2L : dY\,Y^T \\ &= 2LY : dY \\ &= 2LY : \LR{2X\odot dX} \\ &= 4\LR{X\odot LY} : dX \\ &= 4\c{\vc{X\odot LY}} : dx \\ &= 4\:\c{v} : dx \\ &= 4\:v : \LR{I\otimes C^{-1}}Q\,du \\ &= 4\:Q^T\LR{I\otimes C^{-1}}^Tv : du \\ \grad{\phi}{u} &= 4\:Q^T\LR{I\otimes C^{-1}}^Tv \\ }$$ Using this gradient, you can attempt a closed-form solution for the zero gradient condition, or you can use the gradient expression in a gradient descent algorithm. The former is so complicated that it is likely impossible, while the latter is very simple.

Note that $Z$ depends on the direction of $U$ not its length. Thus every iteration should renormalize $U$ to prevent numerical overflows.

After calculating the optimal $u$ vector, the corresponding $W$ matrix can be reconstructed $$\eqalign{ U &= \unvc{u} \\ Z &= U\LR{U^TU}^{-1/2} \\ X &= C^{-1}Z \\ W &= K^{-1}X \\ \\ }$$


In the above, a colon denotes the Frobenius product, which is incredibly useful in Matrix Calculus $$\eqalign{ A:Z &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}Z_{ij} \;=\; \trace{A^TZ} \\ A:A &= \big\|A\big\|^2_{\c F} \qquad \big\{{\rm\c{Frobenius}\:norm}\big\} \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:Z &= Z:A \\ A:Z &= A^T:Z^T \\ B:\LR{A^TZ} &= \LR{BZ^T}:A^T &= \LR{AB}:Z \\ }$$

greg
  • 40,033
  • First of all, thank you very much for your patience to answer my doubts. But what I didn't understand was that the problem I was asking was constrained, and with gradient descent I didn't know how to solve the constrained problem. And, forgive me for being stupid, I want to find a matrix, and your answer seems to be getting a vector, doesn't that matter? – gouchuan Jan 28 '23 at 10:13
  • @gouchuan The columns of a matrix can be stacked to create a long vector (see the blue link above) and the long vector can be un-stacked to reconstitute the original matrix. So my answer works with long vectors for convenience. But the really important idea is to transform the problem in terms of the constrained $W$ variable into a problem in terms of the unconstrained $U$ variable. Lagrange Multipliers are another way to reformulate constrained problems. – greg Jan 28 '23 at 16:20
  • Sorry,there are some questions about the process of transforming constrained problems into unconstrained problems that I don't understand yet.I think the optimization process should be related to K, but the answer you gave is not related to K. Besides, I follow your steps and cannot push forward because K is not used in the subsequent process. – gouchuan Feb 04 '23 at 05:14
  • You first define Z in terms of unknown W, then construct Z in terms of unknown U, which makes me very confused.Because what I need to find after the optimal W is \begin{align} &\text W^TK \end{align}, which means different K, this optimization problem should be different, and I can see from your answer that it does not satisfy the above. – gouchuan Feb 04 '23 at 05:36
  • @gouchuan The intermediate variables $(B,C,X,Y,Z)$ were introduced to facilitate the gradient calculation. However, if you wish you can construct $W$ solely in terms of $(H,K,U)$ $$\large\eqalign{ W &= K^{-1}H^{-1/2};U(U^TU)^{-1/2} \ }$$ It is trivial to verify that this construction satisfies the constraint for any choice of $U;$ – greg Feb 04 '23 at 17:40