Derivative of vectorized function wrt to a Cholesky decompositiion

Question

Let $\Sigma$ be a symmetric, positive definite $p\times p$ covariance matrix, and let $f(\Sigma)$ be it's Cholesky factor. That is, $f(\Sigma)$ is a lower triangular $p\times p$ matrix such that $\Sigma = f(\Sigma) f(\Sigma)^{\top}$. Further let $\Lambda := \operatorname{diag}(f(\Sigma))$ be a diagonal matrix holding the diagonal elements of $f(\Sigma)$ on its diagonal, i.e. the standard deviations given by $\Sigma$, and finally, let $P = \Lambda^{-1} \Sigma \Lambda^{-1}$ denote the correlation matrix.

I am wondering if, with $\mathcal{P} := P - I_p + \Lambda$, the derivative $$ \frac{\mathrm{d}\operatorname{vec}\left( \mathcal{P} \right)}{\mathrm{d} \operatorname{vec} \left( f(\Sigma) \right)} $$ is known, where $\operatorname{vec}$ is the vectorization function and $I_p$ the $p$-dimensional identity matrix.

I found questions answering related questions, as for example here and here and here; however due to my limited knowledge of matrix calculus I don't know how to combine these sources nor if a closed form solution exists.

greg · Accepted Answer · 2024-04-08T12:10:43.627

0

Let's use a naming convention where matrices and vectors are denoted by upper and lower case Latin letters, respectively. Further, the symbol $\odot$ will denote the Hadamard product and $\otimes$ the Kronecker product.

For ease of typing, use $\{S,A,P\}$ instead of $\,\{\Sigma,{\large\Lambda},{\cal P}\}\,$ and $\,X=f(\Sigma)\,$.

Then rewrite the problem using these conventions. $$\eqalign{ S &= XX^T,\quad A = I\odot X,\quad V=A^{-1} \\ P &= VSV + A - I \\ }$$ Each of these matrices (except for $X$) is symmetric, and $(A,V,I)$ are diagonal.

Apply the vec operation ($K$ denotes the Commutation matrix) $$\eqalign{ y &= {\rm vec}(I) \\ x &= {\rm vec}(X) \quad\implies\quad{\rm vec}(X^T) &\doteq Kx \\ a &= y \odot x \;=\; {\rm Diag}(y)\,x &\doteq Yx \\ da &= Y\,dx \\ \\ s &= (I\otimes X)Kx \;=\; (X\otimes I)\,x \\ ds &= \Big((I\otimes X)K+(X\otimes I)\Big)\,dx &\doteq N\,dx\\ \\ p &= (V\otimes V)s + a-y &\doteq Bs + a-y \\ &= (VS\otimes I)v + a-y &\doteq Hv + a-y \\ &= (I\otimes VS)v + a-y &\doteq Jv + a-y \\ }$$ Finally, calculate the differentials of $v$ and $p$ $$\eqalign{ dv &= {\rm vec}(-V\,dA\,V) = -(V\otimes V)\,da \\ &= -B\,da \;=\; -BY\,dx \\ \\ dp &= da + B\,ds + H\,dv + J\,dv \\ &= Y\,dx + BN\,dx - (H+J)BY\,dx \\ &= \Big(Y + BN - (H+J)BY\Big)\,dx \\ }$$ and the gradient with respect to $x$ $$\eqalign{ \frac{\partial p}{\partial x} &= Y + BN - (H+J)BY \\ &= Y + (V\otimes V)\Big((I\otimes X)K+(X\otimes I)\Big) - (VS\otimes I + I\otimes VS)(V\otimes V)Y \\ &= Y + (V\otimes VX)K+(VX\otimes V) - \big(VSV\otimes V + V\otimes VSV\big)Y \\ }$$

edited Apr 08 '24 at 12:10

answered Jul 23 '20 at 01:39

greg

40,033

Thank you very much, this is wonderful!
1. Do you have any reference where I could read upon the techniques you used?
2. (With your notation) I am using your result to compute $\partial Tp / \partial Lx = T (\partial p / \partial x) D $ [Eq.1], where $T$ is similar to an elimination matrix but with a different ordering, $L$ is an elimination matrix and $D$ the corresponding duplication matrix. Using your main result I implemented Eq.1, but I get different results on test cases than the true jacobian, albeit similar. Is this since Eq. 1 is wrong or is there some mistake in your derivation?
– Jul 23 '20 at 17:54
$X$ is not a symmetric matrix, so you cannot operator on it with elimination/duplication matrices to change between ${\rm vec}(X) \iff {\rm vech}(X)$. This interchange only works for symmetric matrices like $S$ and $P$. – greg Jul 23 '20 at 18:03
In this case X is lower-triangular. Shouldn't it work in then? vec(X) and vech(X) contain the same information. Although one needs to redefine the duplication matrix to map to zero whenever the index of vec is in the upper part of X. This method is used for example here https://mathoverflow.net/questions/150427/the-derivative-of-the-cholesky-factor. – Jul 23 '20 at 18:25
Notice that that answer uses the non-standard notation ${\rm vech}_\Delta(X)$. If you want to use that operation you are free to do so, but don't confuse it with the standard notation -- which is what the elimination and duplication matrices are part of. – greg Jul 23 '20 at 18:30
Thank you for the clarification! – Jul 23 '20 at 18:54
If I understand it correctly, the problem statement was about a correlation matrix, in which case $V$ as calculated here would be an incorrect standardizer since it has to be instead $\frac{1}{\sqrt{s_{i,i}}}$. – anymous.asker Jul 03 '22 at 18:15

Derivative of vectorized function wrt to a Cholesky decompositiion

1 Answers1