Finding the matrix derivative of $X^{-1}$ with respect to $X$

Question

Assume $X \in \mathbb{R^{n \times n}}$. I could not found particular formula to calculate the Derivative of $X^{-1}$ with respect to $X$, but I found a formula related to inverse of matrix as follows:

(1)$\frac{\partial}{\partial X} (a^TX^{-1}b) = -X^{-T}ab^TX^{-T} \quad a, b \in \mathbb{R}^n$

Can anyone give an insight on how derive a formula for derivative of $X^{-1}$ or formula (1) please?

Thank you in advance.

======

This post shared and discussed the same topic and I was asked if the current post is redundant. I believe the way the problem stated and discussed in these two posts is different. Specifically, I was trying to learn a simple approach for finding the derivative of a matrix expression that contains inverse of a matrix. I believe the detailed answer and discussion in the post is helpful to other learners like me(with an elementary calculus and matrix understanding).

A formula for $X^{-1}$? Do you mean a formula for derivative of $X^{-1}$? And do you know the definition of derivative as in https://en.wikipedia.org/wiki/Fréchet_derivative? — edm, Aug 25 '17 at 02:35
@spaceisdarkgreen Thanks for your comment. I edited the post. — Crimson, Aug 25 '17 at 02:48
@angryavianIt Thanks for sharing the post. It looks similar to my question. But the question itself stated in a way that it is not easy for me to understand. — Crimson, Aug 25 '17 at 03:45

Guillermo Angeris · Accepted Answer · 2017-09-13T06:44:13.447

In this case, I imagine you want the matrix derivative of the above expression. As such, let $X(t)$ be an invertible matrix on some neighbourhood of $0$, then

$$ X^{-1}(t)X(t) = I \implies \frac{\partial X^{-1}(t)}{\partial t}X(t) + X^{-1}(t)\frac{\partial X(t)}{\partial t} = 0 $$

rearranging and multiplying on the right by the inverse yields $$ \frac{\partial X^{-1}(t)}{\partial t} = -X^{-1}(t)\frac{\partial X(t)}{\partial t} X^{-1}(t). $$

This is probably the derivative you were looking for originally. Anyways, continuing to show (1) is straightforward now,

$$ \frac{\partial a^T X^{-1}b}{\partial t} = a^T\frac{\partial X^{-1}(t)}{\partial t}b = -a^T X^{-1}(t)\frac{\partial X(t)}{\partial t} X^{-1}(t) b $$

Assuming $X(t) = X + tY$, and evaluating at $t=0$ yields

$$ \frac{\partial a^T X^{-1}(t)b}{\partial t}\bigg|_{t=0} = a^T\frac{\partial X^{-1}(t)}{\partial t}\bigg|_{t=0}b =-a^T X^{-1} Y X^{-1} b $$ which, after some rearranging such that the above acts on general $Y$, gives your solution.

I guess I should probably just complete the solution. We usually define, for a differentiable function $F:\mathbb{R}^{m\times m} \to \mathbb{R}$, and $e_{ij} = e_ie_j^T$ where $e_i$ are the standard basis,

$$ \left(\frac{\partial F(A)}{\partial X}\right)_{ij} \equiv \frac{\partial F(A+te_{ij})}{\partial t}\bigg|_{t=0} $$

Note that this is equivalent to taking component-wise derivatives over $X$ when evaluated at a 'point' [i.e. matrix, as given] $M$.

Now, using this, then the above derivative becomes $$ \left(\frac{\partial a^T X^{-1}b}{\partial X}\right)_{ij} = -a^T X^{-1} e_{ij} X^{-1} b $$

or, writing out the multiplication explicitly using kronecker deltas---$\delta_{ij} =1$ when $i=j$ and 0 otherwise---and using Einstein summation convention (e.g. repeated indices are implicitly summed) we get

$$ \begin{align} \left(\frac{\partial a^T X^{-1}b}{\partial X}\right)_{ij} &= -\left(a^T X^{-1}\right)_{k} \delta_{ik}\delta_{j\ell} (X^{-1} b)_{\ell} \\ &= -\left(a^T X^{-1}\right)_{i}(X^{-1} b)_{j} \\ &= -\left(\left(a^T X^{-1}\right)^T(X^{-1} b)^T\right)_{ij}\\ &= -\left(X^{-T}ab^TX^{-T}\right)_{ij} \end{align} $$

as we wished.

Thank you very much for your detailed answer. If you don't mind, I have few questions. I didn't understand what is the role of $Y$ or why it was introduce in the formula? Is it just because to write X as function $X(t) $ ? — Crimson, Aug 25 '17 at 03:05
@Crimson Sorry, I just realized it was easier to finish up the derivation, overall! Perhaps the added details might help. — Guillermo Angeris, Aug 25 '17 at 03:13
I was not familiar with this way of solving derivative. Thanks again for explaining the details. — Crimson, Aug 25 '17 at 03:22
No worries! I think it's just a 'fancy' way of writing out what the normal definition of the component-wise derivative is, but the nice part is that writing it all out in terms of $X(t)$ reduces the matrix derivative problem over each of the entries to a one-dimensional (scalar) problem over $t$ (e.g. note that the first solution I give is a scalar one). Then after all of that, you can just go and plug in each element. — Guillermo Angeris, Aug 25 '17 at 03:24

Gribouillis · Answer 2 · 2017-08-25T07:27:18.543

A simpler way to present this uses the formal definition of the derivative of a function $f: E \to F$ where $E$ and $F$ are two normed spaces (such as a space of matrices). The function $f$ has a derivative (or differential) at point $X$ if there exists a linear map $f^\prime(X): E \to F$ such that when $\|H\| \to 0$ $$f(X+H) = f(X) + f^\prime(X)\cdot H + o(H)$$ where $o()$ is the little-o notation and $f^\prime(X)\cdot H$ means the image of $H$ by the linear map $f^\prime(X)$.

Let's apply this to $f(X) = X^{-1}$. First we have, when $H$ is small enough $$(I+H)(I-H) = I - H^2\quad\Rightarrow\quad (I+H)^{-1} = I - H + (I+H)^{-1}H^2 = I - H + o(H)$$ This proves that the derivative at $I$ is $f^\prime(I)\cdot H = -H$

Now let $X$ be invertible, we have $$(X+H)^{-1} = (X (I+X^{-1}H))^{-1} = (I + X^{-1}H)^{-1}X^{-1} = (I - X^{-1}H + o(X^{-1} H))X^{-1} $$ Hence $f(X+H)= f(X) - X^{-1} H X^{-1} + o(H)$, and it follows that $$f^\prime(X)\cdot H = - X^{-1}H X^{-1}$$ This proof is very general: it works not only for matrices but also for inversion in normed algebras.

Now if one takes $\phi(X) = a^T X^{-1} b$, we obtain $\phi^\prime(X)\cdot H = - a^T X^{-1} H X^{-1}b$. The notation $\big(\frac{\partial \phi(X)}{\partial X}\big)_{ij}$ that you are using is equal to $\phi^\prime(X)\cdot E_{ij}$ where $E_{ij}$ is the matrix which all terms are $0$ but the term in position $(i,j)$ which has value $1$. It is easy to see that $\phi^\prime(X)\cdot E_{ij} = - v_i w_j$ where $v_i$ are the components of $X^{-T}a$ and $w_i$ are the components of $X^{-1}b$.

@DanielV Yes, it's about the same definition in calculus of variations or in any place where there is a definition of the derivative such as differential geometry. The idea is always that the first order variation of a function is a linear one. — Gribouillis, Aug 25 '17 at 07:10
Isn't is more like the map $f \to f'$ is linear, rather than $f'$ being linear, or is $f'$ restricted to being linear because you are dealing with normed space or something? Also is it supposed to be $o(H^2)$ ? — DanielV, Aug 25 '17 at 08:15
@DanielV No, no, the map $H \to f^\prime(X)\cdot H$ is linear, that's the important point. In our case $H \to - X^{-1} H X^{-1}$ is linear. It is also true that $f\to f^\prime$ is linear but it's only a consequence of the definition. Often the $o(H)$ can be made a $O(|H|^2)$ because many functions have a second derivative but it is not always the case. — Gribouillis, Aug 25 '17 at 08:19

Finding the matrix derivative of $X^{-1}$ with respect to $X$

2 Answers2