attempting to differentiate the Cholesky decomposition... are previous derivations incorrect?

Question

Given a symmetric positive definite matrix $\mathbf{C}$ $\in \mathbb{R}^{m \times m}$, and its (lower triangular) Cholesky decomposition factor $\mathbf{L}$ $\in \mathbb{R}^{m \times m}$ such that $\mathbf{C}$ $=$ $\mathbf{L}$ $\mathbf{L}^\top$, I need to obtain the Jacobian $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$ $\in \mathbb{R}^{m^2 \times m^2}$.

As you could expect, this is not a new problem and many have already claimed to solve it. Notably, I found the following previous work done to try to do the partial derivative

This mathoverflow post has several discussions regarding how to solve it
in the comments of that post, page 231/252 of this textbook linked by user "pete" shows a derived proof (Theorem A.1) of the partial derivative of $\mathbf{L}$ w.r.t. $\mathbf{C}$, which can be easily used to construct $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$
and with that, this paper by Iain Murray uses the above proof to derive $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$ on pg 4/18 (equation 14).

These resources lead to the following derivation:

$\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})} \ = \ \big( \mathbf{I}_m \bigotimes \mathbf{L} \big) \ \mathbf{Z} \ \big( \mathbf{L}^{-1} \bigotimes \mathbf{L}^{-1} \big)$

where $\mathbf{Z}$ $=$ $\text{diag} \big( \text{vec} ( \mathbf{Q} ) \big)$ $\in \mathbb{R}^{m^2 \times m^2}$, and the lower triangular matrix $\mathbf{Q}$ $\in \mathbb{R}^{m \times m}$ is defined:

$\mathbf{Q}_{(i,j)} = \begin{cases} 1 \hspace{0.95cm} i > j \\ \frac{1}{2} \hspace{0.90cm} i = j \\ 0 \hspace{0.95cm} i < j \\ \end{cases} $

Note that these cited works define $\mathbf{Q} ( \mathbf{M} )$ as a function “Phi” acting on a square matrix $\mathbf{M}$, but this is equivalently given by the elementwise multiplication of $\mathbf{M}$ with the matrix $\mathbf{Q}$ I define above; I prefer writing using the elementwise multiplication of $\mathbf{Q}$ for the sake of simplicity (differentiated via $\text{diag} \big( \text{vec} ( \mathbf{Q} ) \big)$), since it appears to be more commonly used in derivations / matrix calculus.

Now I needed to confirm that this Jacobian was correct, so I coded it in matlab, and compared it to the true Jacobian that you would get if you measured how $\mathbf{L}$ would change as you change each value in $\mathbf{C}$ by some infinitisimal value $\epsilon$, e.g. $\epsilon$ $=$ $1e-10$. The following matlab code can be used to obtain the true Jacobian, i.e., JacobianTrue = $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$:

Unfortunately, I found that this JacobianTrue disagrees with the JacobianDerived = $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})} \ = \ \big( \mathbf{I}_m \bigotimes \mathbf{L} \big) \ \mathbf{Z} \ \big( \mathbf{L}^{-1} \bigotimes \mathbf{L}^{-1} \big)$.

When inspecting the two Jacobians in a very simple $m=2$ case, where $\mathbf{C}$ $\in \mathbb{R}^{2 \times 2}$ and thus Jacobian $\in \mathbb{R}^{4 \times 4}$, I observed that JacobianDerived and JacobianTrue have columns that agree for diagonal entries in $\mathbf{C}$ (columns 1 and 4 in the Jacobian), but disagree for off-diagonal entries in $\mathbf{C}$ (columns 2 and 3 in the Jacobian).

It is worth noting that since $\mathbf{C}$ is symmetric, any two columns of the Jacobian $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$ corresponding to $\mathbf{C}_{(i,j)}$ and $\mathbf{C}_{(j,i)}$ must be equivalent to each other. This can be seen in the true Jacobian, but not in the derived Jacobian (specifically with $m=2$, the 2nd column of the Jacobian should equal the 3rd, corresponding to $\mathbf{C}_{(1,2)}$ and $\mathbf{C}_{(2,1)}$, but we don't see this equivalency of these columns in the derived Jacobian).

So I not only think the derived Jacobian is wrong, but I think perhaps the last term, $\big( \mathbf{L}^{-1} \bigotimes \mathbf{L}^{-1} \big)$, is probably the term that is incorrect, and maybe another as well. In order for the Jacobian to have its 2nd and 3rd column equal when $m=2$, the last matrix in that chain (here being $\big( \mathbf{L}^{-1} \bigotimes \mathbf{L}^{-1} \big)$) must necessarily have its 2nd column equal to the 3rd, but it is easy to show that this is not the case. I wonder also if the second term $\mathbf{Z}$ may also be incorrect, since I got an answer that slightly better resembled the true Jacobian when I set $\mathbf{P}_{(i,j)}$ $=$ 1 when $i = j$.

I am wondering where I go from here. I really need to obtain the correct hand derived form for $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$, but my tests clearly show that the previously derived form appears to be incorrect, and I do not see any clear way for how I could correct their derivations.

Thanks greg. The first post you linked, you were the one who provided the answer (in the form $\frac{\partial \text{vec}(\mathbf{G})}{\partial \text{vec}(\mathbf{L})}$, where $\mathbf{G}$ is the chol factor and $\mathbf{L}$ is the symmetric matrix). Essentially, I am claiming that this derivation is incorrect (at least by the last term in the Jacobian), because it does not maintain symmetry of the symmetric matrix when viewed from columns of the Jacobian (each column per each entry of the symmetric matrix). I'd like to see what people think about my claim. — Cal, Mar 30 '25 at 02:01
Rodrigo, I'm actually trying to differentiate the QR decomposition of a matrix $\mathbf{X}$ with respect to the orthonormal basis $\mathbf{Q}$ and upper triangular $\mathbf{R}$ in Frechet derivative form like $\frac{\partial \text{vec}(\mathbf{L})}{\partial \text{vec}(\mathbf{C})}$. If I could get the derivative for one term in the factorization, I could do it for the other, and given the relationship between $\mathbf{R}$ and the Cholesky factor $\mathbf{L}$ of $\mathbf{C}$ $=$ $\mathbf{X}^\top \mathbf{X}$, I thought to try this route. — Cal, Mar 30 '25 at 18:39

Ted Black · Accepted Answer · 2025-03-30T18:09:36.070

Start with $$ \pmb C = \left[\begin{matrix} x & y \\ y & t\end{matrix} \right], \; (\operatorname{vec} \pmb C)^\top = \left[ \begin{matrix} x & y & y & t \end{matrix} \right] $$ For Cholesky decomposition to work $\pmb C=\pmb C^\top$ and so this matrix has only 3 independent variables. Since $\pmb L \pmb L^\top = \pmb C$, $$ \pmb L=\left[\begin{matrix}\sqrt{x} & 0\\\frac{y}{\sqrt{x}} & \sqrt{t - \frac{y^{2}}{x}}\end{matrix}\right], \; (\operatorname{vec} \pmb L)^\top = \left[ \begin{matrix} \sqrt{x} & \frac{y}{\sqrt{x}} & 0 & \sqrt{t-\frac{y^2}{x}} \end{matrix} \right]. $$ So we have $$ \frac{\partial \operatorname{vec} \pmb L}{\partial (\operatorname{vec} \pmb C)^\top} = \left[ \begin{matrix} \frac{1}{2\sqrt{x}} & 0 & 0 & 0 \\ -\frac{y}{2x^{\frac{3}{2}}} & \frac{1}{\sqrt{x}} & \frac{1}{\sqrt{x}} & 0 \\ 0 & 0 & 0 & 0 \\ \frac{y^2}{2x^2\sqrt{t-\frac{y^2}{x}}} & -\frac{y}{x\sqrt{t-\frac{y^2}{x}}} & -\frac{y}{x\sqrt{t-\frac{y^2}{x}}} & \frac{1}{2\sqrt{t-\frac{y^2}{x}}} \end{matrix} \right]. $$ To apply the solution by @greg using matrix calculus first calculate $$ \pmb L^{-1}=\left[\begin{matrix}\frac{1}{\sqrt{x}} & 0\\- \frac{y}{x \sqrt{t - \frac{y^{2}}{x}}} & \frac{1}{\sqrt{t - \frac{y^{2}}{x}}}\end{matrix}\right]. $$ Then $$ \pmb L^{-1} \otimes \pmb L^{-1}= \left[\begin{matrix}\frac{1}{x} & 0 & 0 & 0\\- \frac{y}{x^{\frac{3}{2}} \sqrt{t - \frac{y^{2}}{x}}} & \frac{1}{\sqrt{x} \sqrt{t - \frac{y^{2}}{x}}} & 0 & 0\\- \frac{y}{x^{\frac{3}{2}} \sqrt{t - \frac{y^{2}}{x}}} & 0 & \frac{1}{\sqrt{x} \sqrt{t - \frac{y^{2}}{x}}} & 0\\\frac{y^{2}}{x \left(t x - y^{2}\right)} & - \frac{y}{t x - y^{2}} & - \frac{y}{t x - y^{2}} & \frac{x}{t x - y^{2}}\end{matrix}\right]. $$ We must also calculate $$ \pmb I \otimes \pmb L= \left[\begin{matrix}\sqrt{x} & 0 & 0 & 0\\\frac{y}{\sqrt{x}} & \sqrt{t - \frac{y^{2}}{x}} & 0 & 0\\0 & 0 & \sqrt{x} & 0\\0 & 0 & \frac{y}{\sqrt{x}} & \sqrt{t - \frac{y^{2}}{x}}\end{matrix}\right]. $$ Since $$ \pmb Q = \left[ \begin{matrix} \frac{1}{2} & 0 \\ 1 & \frac{1}{2} \end{matrix} \right], \; (\operatorname{vec} \pmb Q)^\top = \left[ \begin{matrix} \frac{1}{2} & 1 & 0 & \frac{1}{2} \end{matrix} \right] $$ we have $$ \pmb Z=\operatorname{diag}\operatorname{vec} \pmb Q= \left[\begin{matrix}\frac{1}{2} & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 0 & 0\\0 & 0 & 0 & \frac{1}{2}\end{matrix}\right]. $$ If we use the formula $$ \frac{\partial \operatorname{vec} \pmb L}{\partial (\operatorname{vec} \pmb C)^\top} =(\pmb I \otimes \pmb L) \pmb Z (\pmb L^{-1} \otimes \pmb L^{-1}) $$ we get $$ \left( \pmb I \otimes \pmb L \right) \pmb Z \left( \pmb L^{-1} \otimes \pmb L^{-1} \right)= \left[\begin{matrix}\frac{1}{2 \sqrt{x}} & 0 & 0 & 0\\- \frac{y}{2 x^{\frac{3}{2}}} & \frac{1}{\sqrt{x}} & 0 & 0\\0 & 0 & 0 & 0\\\frac{y^{2}}{2 x^{2} \sqrt{t - \frac{y^{2}}{x}}} & - \frac{y}{2 x \sqrt{t - \frac{y^{2}}{x}}} & - \frac{y}{2 x \sqrt{t - \frac{y^{2}}{x}}} & \frac{1}{2 \sqrt{t - \frac{y^{2}}{x}}}\end{matrix}\right]. $$ There is perfect agreement between the terms that only involve $x$ and $t$. Because Cholesky requires $\pmb C$ to be symmetric the terms that involve differentiation by $y$ do not agree.

great analysis Ted. Both matrices you wrote here agree with my numerical tests (the true Jacobian vs the derived Jacobian). So what do you think, could the derived formula be incorrect? — Cal, Mar 30 '25 at 18:50
@Cal thanks for posting this question. The expression for the Jacobian is supposed to be (for the $2 \times 2$ case) a derivative w.r.t. 4 independent variables. We are plugging in the condition that there are only 3 independent variables. This is where the derived formula breaks down. This is not because the derivation is incorrect; the derivation correctly assumes that, in $dL=GdG^T+dGG^T$, $L$ is a matrix with no special properties. — Ted Black, Mar 31 '25 at 09:25
+1 Great post (as usual). I think the confusion is similar to what was described in the Srinivasan-Panda paper on symmetric gradients. Or perhaps it's due to the difference between the parametric/structured gradient versus the gradient wrt matrix elements. — greg, Mar 31 '25 at 14:01

score 2 · Answer 2 · answered Mar 31 '25 at 14:29

Here is a quick numerical test (in Julia) of the formula in question

# Julia lacks a nice eye() function
n = 2;  Ia = Matrix(I,n,n) * 1.0;
construct SPD test matrix
A = randn(n,n); A = A'A
#=
 1.22053   0.184283
 0.184283  0.583172
=#
Cholesky factor
L = cholesky(A).L
#=
 1.10478   0.0
 0.166805  0.745217
=#
construct a small symmetric perturtation
dA = randn(n,n)/2e6;  dA = dA + dA'
#=
 -9.13143e-7  7.88079e-7
  7.88079e-7  2.74055e-8
=#
calculate the perturbation of L
dL = cholesky(A+dA).L - L
#=
 -4.1327e-7    0.0 
  7.75736e-7  -1.55249e-7
=#
use the formula to ESTIMATE dL
V = inv(L);
P = LowerTriangular(ones(n,n) - Ia/2);
dLest = L(P.(VdAV'))
#=
 -4.1327e-7    0.0
  7.75735e-7  -1.55249e-7
=#
error of the estimate
norm(dLest-dL)
5.785647315602491e-13
vectorize the formulas
dl = vec(dL);
dlest = kron(Ia,L)Diagonal(vec(P))kron(V,V)*vec(dA);
[ dl dlest (dl-dlest) ]
#=
 -4.1327e-7   -4.1327e-7   -7.72474e-14
  7.75736e-7   7.75735e-7   3.01813e-13
  0.0          0.0          0.0
 -1.55249e-7  -1.55249e-7  -4.87523e-13
=#
the gradient matrix is lower triangular
(corresponding to Ted's 2nd derivation)
G = kron(Ia,L)Diagonal(vec(P))kron(V,V)
#=
  0.45258     0.0        0.0       0.0
 -0.0683329   0.90516    0.0       0.0
  0.0         0.0        0.0       0.0
  0.0152953  -0.101303  -0.101303  0.670946
=#

attempting to differentiate the Cholesky decomposition... are previous derivations incorrect?

2 Answers2

construct SPD test matrix

Cholesky factor

construct a small symmetric perturtation

calculate the perturbation of L

use the formula to ESTIMATE dL

error of the estimate

5.785647315602491e-13

vectorize the formulas

the gradient matrix is lower triangular

(corresponding to Ted's 2nd derivation)