Matrix derivative over non-scalar function

Question

I'm attempting to obtain a derivative for the following function by matrix $U$:

$$R = \sum_{i=1}^n||ƒ(s ⋅ U) ⋅ V ⋅ M_i - f(t ⋅ U) ⋅ V|| ^ 2$$

where $s \in \mathbb{R}^{1\times3}$, $t \in \mathbb{R}^{1\times3}$, $U \in \mathbb{R}^{3\times3}$, $V \in \mathbb{R}^{3\times3}$, $M_i \in \mathbb{R}^{3\times3}$

I wish to use the derivative for the gradient descent method.

Now when $f$ is scalar function ie. $f:\mathbb{R}\to\mathbb{R} $ there is no issue with evaluating derivative:

suppose that: $$g = ƒ(s ⋅ U) ⋅ V ⋅ M_i - f(t ⋅ U) ⋅ V$$

then:

$$\frac{\partial{R}}{\partial{U}} = 2\sum_{i=1}^ns^T \cdot ((g \cdot (V \cdot M_i)^T) ∘ \frac{\partial f}{\partial{U}}(s \cdot U)) - t^T \cdot ((g \cdot V^T) ∘ \frac{\partial f}{\partial{U}}(t \cdot U))$$

But when $f$ becomes a function of vector argument $f:\mathbb{R}^{1\times3}\to\mathbb{R}^{1x3}$ i.e. $f$ accepts a vector-row and results with vector row. In this case the derivative of $f$ becomes a Jacobian of $\mathbb{R}^{3\times3}$ and it becomes unclear on how to evaluate the Hadamard product with vector-row of size $\mathbb{R}^{1\times3}$ on the left hand side and $\mathbb{R}^{3\times3}$ Jacobian.

I cant wrap my head around this problem for a week already, can anyone aid me with some thoughts on the issue?

greg · Accepted Answer · 2023-04-02T12:39:11.200

Change $\,(s,t)\,$ into column vectors and define these additional vectors $$\eqalign{ \def\LR#1{\left(#1\right)} \def\D#1{\operatorname{Diag}\LR{#1}} \def\f{f^\prime} \def\Sj{\sum_{j=1}^m} \def\Sk{\sum_{k=1}^n} \def\qiq{\quad\implies\quad} a &= f(U^Ts),\quad A=\D{\f(U^Ts)} &\qiq &da = A\;dU^Ts \\ b &= f(U^Tt),\quad B=\D{\f(U^Tt)} &\qiq &db = B\;dU^Tt \\ g_k &= M_k^T V^Ta - V^T b &\qiq &dg_k = M_k^T V^Tda - V^Tdb \\ q &= \Sk g_k,\quad r = \Sk M_kg_k \\ }$$ where $\f(z)$ denotes the ordinary derivative of the scalar function $f(z),$ and both functions are applied element-wise when given vector arguments.

The $\D{v}$ function creates a diagonal matrix from its vector argument, and this diagonal matrix can replace elementwise/Hadamard products involving the vector $$v\odot w =\D{v}\,w$$ Use these new variables to rewrite the objective function and calculate its gradient $$\eqalign{ \def\BR#1{\Big(#1\Big)} \def\o{{\tt1}} \def\p{\partial} \def\R{{\cal R}} \R &= \Sk g_k : g_k \\ d\R &= \Sk 2g_k : dg_k \\ &= 2\Sk\BR{g_k:M_k^T V^Tda \;-\; g_k:V^Tdb} \\ &= 2\Sk\BR{VM_kg_k:da \;-\; Vg_k:db} \\ &= 2\BR{Vr:da \;-\; Vq:db} \\ &= 2Vr:\LR{A\;dU^Ts} \;-\; 2Vq:\LR{B\;dU^Tt} \\ &= 2AVrs^T:dU^T \;-\; 2BVqt^T:dU^T \\ &= 2sr^TV^TA:dU \;-\; 2tq^TV^TB:dU \\ &= 2\BR{sr^TV^TA - tq^TV^TB} : dU \\ \frac{\p\R}{\p U} &= 2\BR{sr^TV^TA - tq^TV^TB} \\ \\ }$$

In the above derivation, the symbol $(:)$ denotes the Frobenius product $$\eqalign{ X:Y &= \Sj\Sk X_{jk}Y_{jk} \;=\; {\rm trace}\!\LR{X^TY} \\ X:X &= \|X\|_F^2 \qquad \{ {\rm Frobenius\;norm} \} \\ }$$ When applied to vectors $(m=\o{\;\rm or\;}n=\o)$ it corresponds to the dot product.

The terms in such products can be rearranged in many useful ways, e.g. $$\eqalign{ X:Y &= Y:X \;=\; Y^T:X^T \\ X:\LR{YZ} &= \LR{XZ^T}:Y \;=\; \LR{Y^TX}:Z \\ }$$

I have just one question, how did you came to this neat thought? I mean the fact that you can use $Diag(f'(U^T . s))$ ? @greg — Lu4, Apr 02 '23 at 12:01
@Lu4 I didn't invent it, I just read it somewhere. Once you've seen the $,a\odot b={\rm Diag}(a):b,$ substitution, you don't forget it since it's extremely useful. A similar substitution exists for the Hadamard product of a matrix, but it involves tensors so it's used less often. — greg, Apr 02 '23 at 12:52

Matrix derivative over non-scalar function

1 Answers1