The biggest challenge with this derivative is that matrices do not necessarily commute. This post should give you a taste for the difficulties one may encounter.
Nevertheless let's give it a try. But first, generalize the problem a bit by considering the map
$$ f\colon\mathbb S^n_+\longrightarrow\mathbb S^n_+,\, X\longmapsto X^{\alpha} \overset{\text{def}}{=} \exp(\alpha\log(X))$$
which maps a symmetric, positive definite matrix to its matrix power, defined via matrix exponential and matrix logarithm. One can easily check that $\sqrt{X} = X^{1/2}$ by diagonalizing.
1. The commutative case
If $\Delta X$ commutes with $X$, then everything is easy, because then $\log(X\cdot\Delta X)=\log(X) + \log(\Delta X)$ and $\exp(X+\Delta X) = \exp(X)\cdot\exp(\Delta X)$ which does not hold true in the general case
In the commuting case we have
$$\begin{aligned}
\log(X+\Delta X)-\log(X)
&= \log(X+\Delta X) + \log(X^{-1})
\\&= \log((X+\Delta X)X^{-1})
\\&= \log(I+X^{-1}\Delta X)
\\&= \sum_{k=1}^{\infty}(-1)^{k+1} \frac{(X^{-1}\Delta X)^{k}}{k}
\sim X^{-1}\Delta X \\
\exp(X+\Delta X) - \exp(X)
&= \exp(X)\exp(\Delta X) - \exp(X)
\\&= \exp(X)\big(\exp(\Delta X) -I\big)\\
&= \exp(X)\sum_{k=1}^{\infty} \frac{1}{k!}\Delta X^k \sim \exp(X)\Delta X
\end{aligned}$$
Hence $\partial_{\Delta X}\log(X) = X^{-1}$ and $\partial_{\Delta X} \exp(X) = \exp(X)$ in the commuting case $[X, \Delta X]=0$. In particular via chain rule it follows that we have $\partial_{\Delta X} X^\alpha = \alpha X^{\alpha-1}$ which satisfies our expectation.
2. The non-commutative case
This one is much harder, one can try to use the Baker-Campbell-Hausdorff formula and/or the Zassenhaus formula but the end result won't be pretty. In general, the nice formula from before does not hold anymore. For example, if $\alpha=2$ then once can easily check that
$$(X+\Delta X)^2-X^2 \sim X\cdot\Delta X + \Delta X \cdot X \neq 2X\cdot\Delta X$$
We can verify this numerically as well, for example with the python (3.7) program below. The residuals are stable when you increase the sample size $N$ in the commutative case, but they will get larger and larger in the general case. (randomly sampling matrices with extremely large commutator is rather rare...)
import numpy as np
from scipy.linalg import norm, fractional_matrix_power as powm
from scipy.stats import ortho_group, wishart # to sample orthogonal/ spd matrices
alphas = np.linspace(0.5, 5, 10)
N = 100 # sample size
eps = 10**-8
n=6 # matrix size
print("Commutative case")
# using simultaneous diagonalizable => commuting
for a in alphas:
r = 0
for _ in range(N):
D = np.diag(np.random.rand(n))
S = np.diag(np.random.rand(n))
U = ortho_group.rvs(n)
X = U.T @ D @ U
DX = eps* U.T@ S @ U
num = powm(X+DX, a) - powm(X, a) # numerical derivative
ana = a*powm(X, a-1) @ DX # formula
r =max(r, norm( num-ana))
print(F"alpha: {a:.2f}, max residual {r}")
# residuals should be numerically close to zero
print("General case")
for a in alphas:
r = 0
for _ in range(N):
X = wishart(scale=np.eye(n)).rvs()
DX= eps*wishart(scale=np.eye(n)).rvs()
num = powm(X+DX, a) - powm(X, a) # numerical derivative
ana = a*powm(X, a-1) @ DX # formula
r =max(r, norm( num-ana))
print(F"alpha: {a:.2f}, max residual {r}")
# residuals should be much larger