0

According to Magnus and Neudecker, we can build a matrix calculus without running into higher order tensors by vectorizing the matrices before taking partial derivatives. This leads to the matrix Jacobian defined as $$ \mathrm{DZ}(X)=\frac{\partial \operatorname{vec} Z(X)}{\partial(\operatorname{vec} X)^{\mathsf{T}}} \,. $$

Now let's say we have some function that takes in 2 matrices and gives out another matrix $F:\mathbb{R}^{k \times l} \times \mathbb{R}^{m \times n} \to \mathbb{R}^{p \times q}$. Let's call our input matrices $A^{k \times l}$ and $B^{m \times n}$, and the output matrix $Y^{p \times q}$. So $Y = F(A,B)$.

My question: how can I interchange the order of partial derivatives? For example, given $ \frac{\partial \frac{\partial Y}{\partial A}}{\partial B} ,$ how do I get $ \frac{\partial \frac{\partial Y}{\partial B}}{\partial A} ?$ (where I ommited $\operatorname{vec}$ and transpose for brevity)

By trial and error with a symbolic toolbox I have arrived at this useless expression, since I cannot solve one in terms of the other: $$ \operatorname{vec} \frac{\partial \operatorname{vec} \left( \frac{\partial \operatorname{vec} Y}{\partial (\operatorname{vec} A)^\mathsf{T}}\right)^\mathsf{T}}{\partial(\operatorname{vec}B)^\mathsf{T} } = \operatorname{vec} \left( \frac{\partial\operatorname{vec}\frac{\partial \operatorname{vec} Y}{\partial (\operatorname{vec} B)^\mathsf{T}} }{\partial (\operatorname{vec} A)^\mathsf{T}}\right)^\mathsf{T} $$ I tried using the commutation matrix to eliminate the transpose on the RHS but with no success. Any ideas?

Reference: J. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd edition, 2019, Wiley.

Adrian
  • 43

2 Answers2

1

$ \def\c{\cdot} \def\q{\quad} \def\qq{\qquad} \def\qiq{\q\implies\q} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\phix{y_k} \def\p{{\partial}} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}} $You have re-discovered the following result $$\eqalign{ \hess{\phi}{a}{b} &= \left(\hess{\phi}{b}{a}\right)^T \ne \hess{\phi}{b}{a} \\ }$$ where $$\eqalign{ a &= \vc A,\q b=\vc B,\q y=\vc Y,\qq \phi=\phix \\ }$$ Since $\phi$ is a scalar and $\{a,b\}$ are vectors of different lengths, then from dimensional considerations alone one can see that

  • The LHS gradient is a rectangular matrix with the shape of the outer product $\,ab^T$
  • The RHS gradient is also rectangular, but it is shaped like the outer product $\:ba^T$


    Here is a simple concrete example $$\eqalign{ &\;\;\phi \;= \LR{Mb}\c a \;\,\,=\; \LR{M^Ta}\c b \\ g_a = &\grad{\phi}{a} = Mb \q \qiq \hess{\phi}{b}{a} = \grad{g_a}{b} = M \\ g_b = &\grad{\phi}{b} = M^Ta\; \qiq\hess{\phi}{a}{b} = \grad{g_b}{a} = M^T \\ }$$

  • greg
    • 40,033
    0

    Thank you greg for your answer. Your example motivated me to dig deeper and find the relation I needed for switching the order of partial derivatives.

    Is there any nice, explicit, function such that $ g\left(\frac{\partial \frac{\partial Y}{\partial B}}{\partial A}\right) = \frac{\partial \frac{\partial Y}{\partial A}}{\partial B} $? Yes! Moreover, it can be written in terms of $\operatorname{vec}$, Kroneker product $\otimes$, and good old matrix multiplication together with some identity and commutation matrices.

    In the following derivation I am going to skip writing $\operatorname{vec}$ and transpose before $\partial$, remembering to always vectorize the matrices before taking partials, and then arranging the partials like a Jacobian.

    Starting from the experimental result above, it turns out we can manipulate it until we get rid of all transposes. On the right hand side, $$ RHS = K^{(pqmn,kl)} \operatorname{vec} \frac{\partial\frac{\partial Y}{\partial B}}{\partial A} \,.$$ On the left hand side there is more work to do: \begin{align} LHS &= \operatorname{vec}\frac{\partial K^{(pq,kl)}\operatorname{vec}\frac{\partial Y}{\partial A}}{\partial B} \\ &= \operatorname{vec} \left[ \left(I_1 \otimes K^{(pq,kl)} \right) \frac{\partial \operatorname{vec} \left(\operatorname{vec}\frac{\partial Y}{\partial A}\right)}{\partial B} \right] \\ &= \operatorname{vec} \left(K^{(pq,kl)} \frac{\partial \frac{\partial Y}{\partial A}}{\partial B} \right) \\ &= \left( I_{mn} \otimes K^{(pq,kl)} \right) \operatorname{vec}\frac{\partial \frac{\partial Y}{\partial A}}{\partial B} \,. \end{align} So now we can say $LHS = RHS$ which is a closed-form relationship between the vectorized versions of our mixed partial derivatives, $$ \left( I_{mn} \otimes K^{(pq,kl)} \right) \operatorname{vec}\frac{\partial \frac{\partial Y}{\partial A}}{\partial B} = K^{(pqmn,kl)} \operatorname{vec} \frac{\partial\frac{\partial Y}{\partial B}}{\partial A} \,.$$ To obtain one in terms of the other, we need the inverse of vectorization, a sort of arcvec, that returns a matrix from a column vector. The subscript in $\operatorname{vec}_{m\times n}^{-1}$ indicates the size of the resulting matrix. So $$\frac{\partial \frac{\partial Y}{\partial A}}{\partial B} = \operatorname{vec}_{pqkl\times mn}^{-1} \left[ \left( I_{mn} \otimes K^{(pq,kl)} \right)^{-1} K^{(pqmn,kl)} \operatorname{vec} \frac{\partial\frac{\partial Y}{\partial B}}{\partial A} \right]$$ which expanded out reads $$\frac{\partial \frac{\partial Y}{\partial A}}{\partial B} = \left[ (\operatorname{vec}I_{mn})^{\mathsf{T}} \otimes I_{pqkl} \right] \left( I_{mn} \otimes \left[ \left( I_{mn} \otimes K^{(pq,kl)} \right)^{-1} K^{(pqmn,kl)} \operatorname{vec} \frac{\partial \frac{\partial Y}{\partial B}}{\partial A}\right] \right) \,.$$

    The other way around is analogous. I tested the formula in my symbolic toolbox and it seems correct. What is interesting to me is how this vectorization-based matrix calculus preserves short, nice, expressions for the product rule and chain rule, but a simple task like changing the order of partial derivatives is a headache!

    Adrian
    • 43