For a positive definite matrix $\Sigma$, its Cholesky decomposition is defined as follows:
$$\Sigma = R^T R$$
where $R$ is an upper-triangular matrix where non-zero elements $\in \mathbb{R}$. I want to compute the gradient of $\Sigma$ with respect to $R$.
Here's what I tried (I'm not very familiar with matrix calculus):
\begin{align} \frac{d}{d R}\{R^T R\} &= \frac{d}{d R}\{R^T\} R + R^T \frac{d}{d R}\{ R\} \\ &= \left(\frac{d}{d R}\{R\}\right)^T R + R^T I \\ &= I^T R + R^T I \\ &= R + R^T \end{align}
However, since $R$ is upper-triangular, does that mean I should simply remove the lower-triangular part of the gradient, which is $R^T$?
In other words, the correct gradient looks like below, right?
$$\frac{d}{d R}\{R^T R\} = R$$
Intuitively, this seems to say that: "Yes, we do compute gradient wrt to the entire $R$, since all entries participate in the computation, but we then realize that the stuff below the diagonal are just constants that do not need to be changed by any optimization procedure."