Differentiation definition for spaces other than $\mathbb{R}^n$

Question

I have the following definition for the differentiability of a function:

A function $f:A\to \mathbb{R}^n$, where $A$ is a neighborhood of a point $a\in\mathbb{R}^m$, is differentiable at $a$ if there is an $n\times m$ matrix $B$ such that $$ \frac{f(a+h)-f(a)-B\cdot h}{|h|}\to 0 $$ as $h\to 0$. In this case $B$ is said to be the derivative of $f$ on $a$, and is written $B=Df(a)$.

But I got an question to show that the inverse matrix map $i:GL_n(\mathbb{R})\to M_n(\mathbb{R})$, mapping each invertible $n\times n$ matrix $A$ to its inverse $A^{-1}$, is differentiable (in all its points).

Now I don't know how to formally interpret such case because the definition is for functions between euclidian spaces. I know there must be some generalization for linear spaces but I was not able to find it here.

I can think of replacing $A$ and $\mathbb{R}^n$ with the linear normed spaces in question. So once $\det$ is continuous and $GL_n(\mathbb{R})=\det^{-1}(\mathbb{R}\setminus\{0\})$, the domain is open so it's a neighborhood of any of its points. But then what will be the derivative and how to find an expression for it? It's still a $n\times n$ matrix? In this case I tried to see it as a linear transformation $i'(A):\mathbb{R}^n\to\mathbb{R}^n$ but I can't come up with an expression for it to evaluate the limit.

Thanks in advance!

Just use the identitification $Mat_n({\mathbb R})={\mathbb R}^{n^2}$. — Moishe Kohan, May 06 '20 at 19:51
Also, have a look here (basically your question): https://math.stackexchange.com/q/190424/532409 and https://math.stackexchange.com/q/962579/532409 — Quillo, May 07 '20 at 09:09

peek-a-boo · Accepted Answer · 2020-05-07T04:24:58.360

A lot of standard differential calculus can be generalized to the setting of Banach spaces (finite-dimensional or infinite-dimensional), and in fact conceptually I think it is much clearer. All the standard things like chain rule, product rule, inverse function theorem, implicit function theorem, even the theory of ODEs carries over without too much effort to the Banach space setting.

Here's the relevant definitions.

Let $\Bbb{F} \in \{\Bbb{R}, \Bbb{C}\}$ be either the real or complex field. A Banach space over $\Bbb{F}$ is a normed vector space $(E, \lVert \cdot\rVert)$, such that the norm is complete (i.e every Cauchy sequence converges to some point of $E$ with respect to the given norm).

Let $(E_1, \lVert \cdot\rVert_1), (E_2, \lVert\cdot\rVert_2)$ be two Banach spaces over $\Bbb{F}$ (either real or complex field). Let $U \subset E_1$ be open, and let $f:U \to E_2$ be a given map. We say that $f$ is $\Bbb{F}$-differentiable at a point $a \in U$ if there is a continuous linear transformation $B: E_1 \to E_2$ such that \begin{align} \lim_{h \to 0} \dfrac{\lVert f(a+h) - f(a) - B(h) \rVert_2}{\lVert h\rVert_1} = 0 \end{align} In other words, we require that for every $\epsilon > 0$, there exist a $\delta > 0$ such that for all $h \in E_1$, if $0 < \lVert h \rVert_1 < \delta$, then $a+h \in U$ (this is possible since $U$ is open) and \begin{align} \dfrac{\lVert f(a+h) - f(a) - B(h) \rVert_2}{\lVert h\rVert_1} < \epsilon \end{align}

Of course, if such $B$ exists, one can prove it is unique; we can denote this as $Df_a, Df(a), df_a, df(a), f'(a)$ or anything else you like. The key thing now is that the derivative is a continuous (equivalently bounded) linear transformation from $E_1$ into $E_2$.

Note that if the vector space is finite-dimensional, then we have the following facts:

We can always equip it with a norm.
It is a standard theorem that all norms on a finite-dimensional space are equivalent (i.e give rise to the same topology).
It is easily checked that if we replace the norm on the Banach spaces $E_1, E_2$ with equivalent norms, then the notion of continuity is unchanged (this is clear, because the topologies are unchanged, and continuity is a purely topological property) and differentiability is unchanged. So, in the finite-dimensional case, one doesn't have to be too explicit about which norm is being used on the vector spaces in the definition of differentiability.
Every linear transformation $B: E_1 \to E_2$ between finite-dimensional Banach spaces is continuous (so, in the definition of differentiability, one doesn't have to explicitly verify this).
Every continuous linear transformation $B: E_1 \to E_2$ (not necessarily finite-dimensional) is differentiable everywhere, and for every $a \in E_1$, we have $DB_a(\cdot) = B(\cdot)$.

Your question seemed more focused on the general theory, which is why I addressed that first. For your actual question, the inversion map $i : GL_n(\Bbb{R}) \to M_n(\Bbb{R})$ is indeed defined on an open subset of a normed vector space (again, the space is finite-dimensional, so it doesn't matter which norm we actually use). If $A \in GL_n(\Bbb{R})$, then the derivative $Di_A$ will be a linear transformation $M_n(\Bbb{R}) \to M_n(\Bbb{R})$. If you really want to think in terms of matrices, then sure you can introduce a basis, $\beta$ for $M_n(\Bbb{R})$, and then since $Di_A$ is a linear transformation, you can consider the matrix representation $[Di_A]_{\beta}$. Note that this will be an $n^2 \times n^2$ matrix with real entries.

However, I think introducing a basis is completely unnecessary, and in fact confusing. Here's an outline for the derivative calculation. I leave it to you to fill in what assumptions are necessary to make the following reasoning work, and I leave it to you to justify carefully each equal sign which follows:

Fix $A \in GL_n(\Bbb{R})$, and $h \in M_n(\Bbb{R})$ sufficiently small in norm so that $A+h \in GL_n(\Bbb{R})$ and $I+ A^{-1}h \in GL_n(\Bbb{R})$, and $\lVert A^{-1}h\rVert < 1$ (why is possible to choose such small $h$?). Then, \begin{align} i(A+h) &= (A+h)^{-1} \\ &= \left[ A(I + A^{-1}h)\right]^{-1} \\ &= (I+A^{-1}h)^{-1} \cdot A^{-1} \\ &= \left( \sum_{n=0}^{\infty} (-A^{-1}h)^n \right) \cdot A^{-1} \\ &= \left( I - A^{-1}h + \mathcal{O}(\lVert h\rVert^2)\right) \cdot A^{-1} \\ &= A^{-1} - A^{-1}hA^{-1} + \mathcal{O}(\lVert h \rVert^2). \\ &= i(A) - - A^{-1}hA^{-1} + \mathcal{O}(\lVert h \rVert^2). \end{align} I claim that from this this it follows $Di_A(h) = -A^{-1}hA^{-1}$ (a triple product of $n \times n$ matrices). Why is this true? What are you supposed to check?

Here's an answer I wrote a while back which talks about a related question.

Great answer! Now that you mentioned about all norms being equivalent in finite dimensions, on most times I'm using the induced norm for spaces of matrices, which is submultiplicative, and I'm using this fact a lot to show the differentiability of some functions. So if I want to use another norm - say, the max norm, which is not submultiplicative, I could find another proof for those results, or there are some results which requires certain norms to be proved? — AnalyticHarmony, May 07 '20 at 05:18
@AnalyticHarmony Any theorem regarding continuity/differentiability doesn't depend on the particular norm chosen. Depending on the norm you choose, your intermediate steps may be different, with different inequality estimates, but the final conclusion shouldn't change. Basically, choose the norm which simplifies your life the most. Usually for matrices/linear operators, the operator norm is often a nice choice because of the property you mention. — peek-a-boo, May 07 '20 at 05:39

Differentiation definition for spaces other than $\mathbb{R}^n$

1 Answers1

Linked