For functions $f:\mathbb{R}^n\to\mathbb{R}^m$, the Chain Rule can be stated both in terms of the total derivative or in terms of partial derivatives. $\def\bU {\textbf{U}} \def\bV {\textbf{V}} \def\bW {\textbf{W}} \def\bu {\textbf{u}} \def\bv {\textbf{v}} \def\bw {\textbf{w}} \def\ba {\textbf{a}} \def\bb {\textbf{b}} \def\e{\varepsilon} \def\bzero{\textbf{0}}$ For some time now I have thought of the latter statement as a consequence of the former: if $f$ is total differentiable at $\ba$, then all its partials exist at $\ba$ and take the form ... (you know the deal).
Recently I've been meaning to turn this around, as Gateaux differentiation is (in some sense, as it can be defined on arbitrary topological vector spaces) more general than Fréchet differentiation. I have seen the proof of the Chain Rule for Fréchet derivatives, and I've been meaning to find a proof of the Chain Rule for Gateaux derivatives. Wikipedia does mention (part of) a statement, but no proof is given nor cited; this post on MO cites two results: the first is Theorem 3.18 of The Convenient Setting of Global Analysis, which deals with $c^\infty$-open sets and thus escapes my understanding. The second is Theorem 1.2.7 on Differential Calculus in Topological Linear Spaces, which I have not been able to gain access to.
Having failed to find a proof I decided to attempt one myself. Below I give some definitions and write (part of) what I think the statement for the Chain Rule for Gateaux differentiation is. Note that some necessary assumptions are probably missing!
I was hoping someone could help me finish the statement and proof (:
Throughout let $\bU$, $\bV$, and $\bW$ be topological vector spaces (assumed Hausdorff, so as to have unique limits), and let $U$, $V$, and $W$ be open sets of each respective TVS.
Definition: Given $\ba\in V$ and a vector $\bv\in\bV$, we say $f$ is differentiable at $\ba$ in the direction $\bv$ iff there exists $\bv_\ba\in \bV$ with any of the following properties:
- $\bv_\ba$ is given by the limit $$\lim_{t\rightarrow 0}\frac{f(\ba+t\bv)-f(\ba)}{t}.$$
- $\bv_\ba$ is the direction that best approximates $f(\ba+t\bv)$ at $\ba$, that is to say $$f(\ba+t\bv) = f(\ba) + t\bv_\ba + \e(t)$$ for the `error' function $\e:V\to\bW$ with the property $$\lim_{t\to 0} \frac{\e(t)}{t} = 0.$$ At times I will write $\e(t,\bv)$ to be more explicit. From now on I write $\partial_\bv f(\ba)$ for $\bv_\ba$.
Definition: we say $f$ is Gateaux differentiable at $\ba$ iff $\partial_\bv f(\ba)$ exists for every non-zero $\bv\in \bV$.
Definition: in the above setting, we say $f$ has continuous partials at $\ba$ iff the function $\bV\to\bV:\bv\mapsto \partial_\bv f(\ba)$ is continuous.
Result (?): Let $U$ and $V$ be open subsets of $\bU$ and $\bV$ respectively, and let $f:U\to\bV$ and $g:V\to \bW$ be Gateaux differentiable at $\ba\in U$ and $\bb=f(\ba)\in V$ respectively. If the following conditions are satisfied:
- $g$ has continuous partials at $\bb$.
- ?
then $g\circ f : U\to \bW$ is Gateaux differentiable at $\ba$ with $$\partial_\bu(g\circ f)(\ba) = \partial_\bv g(\bb) \quad \text{ where } \bv:= \partial_\bu f(\ba) \text{ and } \bb := f(\ba).$$
Proof attempt: choose an arbitrary $\bu\in\bU\setminus\{\bzero\}$. If we let $\bv(t) := \partial_\bu f(\ba) + t^{-1}\e_f(t)$, then for $t$ close to zero we can write \begin{alignat*}{2} (g\circ f)(\ba + t\bu) & = g\big(f(\ba) + t\bv(t)\big)\\ & = g(\bb) + t\partial_{\bv(t)} g(\bb) + \e_g(t,\bv(t))\\ & = g(\bb) + t\partial_{\bv} g(\bb) + \e(t) \end{alignat*} Where $\bv := \partial_\bu f(\ba)$ and $\e(t) := t\partial_{\bv(t)} g(\bb) - t\partial_\bv g(\bb) + \e_g(t,\bv(t))$. It is clear that $$\lim_{t\to 0}\frac{t\partial_\bv g(\bb) - t\partial_{\bv(t)} g(\bb)}{t} = \lim_{t\to 0}\Big[\partial_\bv g(\bb) - \partial_{\bv(t)} g(\bb)\Big] = \bzero$$ as first partials are continuous. It remains to be shown that $$\lim_{t\to 0}\frac{\e_g(t, \bv(t))}{t} = \lim_{t\to 0}\frac{g(\bb + t\bv(t)) - g(\bb) - \partial_{\bv(t)}g(\bb)}{t} = \bzero$$ which looks almost trivial, but which I have not been able to prove.
Note 1: I am using $\partial$ notation for Gateaux derivatives, in particular I am using $\partial_\bu f(\ba)$ for what Wikipedia and the MO post write as $\text{d}f(\ba;\bu)$. Hopefully the post remains readable.
Note 2: the answer in the MO post mentions that, in the statement of the Chain Rule for Gateaux differentiation, the necessary and sufficient condition is for $g$ to be "Hadamard" differentiable at $\bb$. I know not what that means, but I found this definition in Wikipedia (stated for Banach spaces, but seems naturally generalizable to TVSs). If this is what is meant by Hadamard, then I struggle to see how that is either sufficient or necessary for the statement in question. I may ask a different post about this in future.