Why is $D(f\circ g)=Df\circ Dg$

Question

I was reading on Wikipedia about total derivatives of functions and they stated the following about the chain rule for total derivatives:

Let $f:\mathbb R^m\to \mathbb R^k$ and $g:\mathbb R^n \to \mathbb R^m$ be two differentiable functions and let $a \in \mathbb R^n$. Let $D_{g(a)}f$ denote the total derivative of $f$ at $g(a)$ and $D_a g$ denote the total derivative of $g$ at a. Then: $$D_a(f\circ g)=D_{g(a)}f\circ D_a g$$ or, for short: $$D(f\circ g)=Df\circ Dg$$

The thing I'm not understanding is the following: What does $Df\circ Dg$ mean?

Those two total derivatives are defined as functions: $Df: \mathbb R^m\to \cal L(\mathbb R^m,\mathbb R^k)$, and $Dg: \mathbb R^n\to \cal L(\mathbb R^n,\mathbb R^m)$

So how is the composition $D(f\circ g)=Df\circ Dg$ defined? Am I missing something or is this a typo?

The composition of linear maps is well defined if their domains and codomains agree. The evaluation $Df_a,Dg_a$ is just a linear map, so they can be composed — FShrike, Jan 25 '22 at 18:48
The notation is slightly misleading. What they mean is that $\mathrm Df$ and $\mathrm Dg$ evaluated at a specific point are linear maps, and those should be composed. Not the maps mapping points to linear maps. — Vercassivelaunos, Jan 25 '22 at 18:49
Where does Wikipedia say that? I was unable to find it here. — José Carlos Santos, Jan 25 '22 at 18:50
The notation $D(f \circ g) = D f \circ D g$ can be made precise, not just a shorthand. See e.g. https://math.stackexchange.com/questions/2857459 or search for "derivative as a functor". — sdcvvc, Jan 25 '22 at 18:55

Arthur · Accepted Answer · 2022-02-07T20:56:24.250

I think you missed something here. Emphasis mine:

$$D_a(f\circ g)=D_{g(a)}f\circ D_a g$$ or, for short: $$D(f\circ g)=Df\circ Dg$$

If we just take a total derivative of a function, yes, you're right. But we're not doing that here. We are talking the total derivative of functions at specific points. And while it is true that $Df:\Bbb R^m\to \mathcal L(\Bbb R^m, \Bbb R^k)$, if we insert some $b\in\Bbb R^m$, we get $D_bf\in\mathcal L(\Bbb R^m, \Bbb R^k)$. And similar for $D_ag$.

And then they make the subscripted points implicit so that they don't have to type as much and we don't have to read as much. Which yes, is an abuse of notation, and it is ambiguous, as your question is evidence for. But they do warn you by saying "for short". Presumably they won't use $D$ to mean the full "everywhere" total derivative any more, and only use it as the total derivative at some implicit point, whether arbitrary or given.

Why is $D(f\circ g)=Df\circ Dg$

1 Answers1

Linked