1

I'm trying to prove the chain rule. Could you please verify if my proof looks fine or contains logical gaps/errors? Thank you so much for your help!

Let $X$ be a metric space and $Y,G$ normed vector spaces. Suppose $f: X \rightarrow Y$ is differentiable at $x_{0}$ and $g: Y \rightarrow G$ is differentiable at $y_{0}:=f\left(x_{0}\right)$. Then $g \circ f: X \rightarrow G$ is differentiable at $x_{0}$, and the derivative is given by $$\partial(g \circ f)\left(x_{0}\right) = \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right)$$


My attempt:

We have $f(x)=f\left(x_{0}\right) + \partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|$ for all $x \in X$ and $g(y) = g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(y-y_{0}\right)+s(y)\left\|y-y_{0}\right\|$ for $y \in Y$. Here $r: X \rightarrow Y$ and $s: Y \rightarrow G$ are continuous at $x_{0}$ and $y_{0}$ respectively. Moreover, $r\left(x_{0}\right)=0$ and $s\left(y_{0}\right)=0$.

Our goal is to find a function $t:X \to G$ such that $$(f \circ g)(x) = (f \circ g) \left(x_{0}\right) + \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right) \left(x-x_{0}\right)+t(x)\left\|x-x_{0}\right\|$$ for all $x \in X$ and that $t$ is continuous at $x_{0}$ and $t(x_0)=0$. We substitute $y=f(x)$ and get

$$\begin{aligned} (f \circ g)(x) &= g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(f\left(x_{0}\right)+\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|-y_{0}\right)\\ & \quad + s(y)\left\|y-y_{0}\right\|\\ &= g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|\right)\\ & \quad+ s(y)\left\|y-y_{0}\right\|\\ &= g\left(f(x_0)\right)+\partial g\left(f(x_0)\right) \circ \partial f\left(x_{0}\right)\left(x-x_{0}\right) \\ &\quad+\partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\| \end{aligned}$$

Equalizing $$g\left(f(x_0)\right)+\partial g\left(f(x_0)\right) \circ \partial f\left(x_{0}\right)\left(x-x_{0}\right) + \partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\|$$ and $$(g \circ f) \left(x_{0}\right) + \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right) \left(x-x_{0}\right)+t(x)\left\|x-x_{0}\right\|$$ we get

$$t(x) \|x-x_0\| = \partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\|$$ and consequently $$\begin{aligned} t(x) &= \partial g\left(f(x_0)\right) \circ r(x) + s(y) \frac{\left\|y-y_{0}\right\|}{\left\|x-x_{0}\right\|}\\ &= \partial g\left(f(x_0)\right) \circ r(x) + s(f(x)) \left\| \frac{\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|}{\|x-x_0\|} \right\| \\&= \partial g\left(f(x_0)\right) \circ r(x) + s(f(x)) \left\| \partial f\left(x_{0}\right) \frac{x-x_0}{\|x-x_0\|} +r(x) \right\|\end{aligned}$$ for all $x \neq x_0$. We further define $t(x_0)=0$. It is easy to check that $t$ satisfies our requirement. Hence $\partial(g \circ f)\left(x_{0}\right) = \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right)$.

Akira
  • 18,439

1 Answers1

1

Looks OK to me. I'll try to rewrite it neater a little. There are a lot of symbols so one thing you can do is to do some reductions: by considering $\tilde f(x)= f(x+x_0)$ instead of $f(x)$ we can assume that $x_0=0$. Then by considering $\tilde g(y) = g(y+f(0))$ instead of $g$, and $\hat{f}(x) = \tilde f(x)-\tilde f(0)$ instead of $\tilde f$, we can assume that $f(0) = 0$. See below [*]. Finally, adding a constant to $g$ doesn't change its derivatives so we can assume $g(0)=0$. So OK, now we can assume that \begin{align} f(h)&=\partial f(0)h + r(h)\|h\|_X, & h\xrightarrow{X} 0 \\ g(v) &= \partial g(0)v + s(v)\|v\|_Y, & v \xrightarrow{Y} 0 \end{align} Full proof follows. This implies that $f(h)\xrightarrow{Y}0$ when $h\xrightarrow{X} 0$, so \begin{align} (g\circ f)(h) &= g(f(h)) \\ &= \partial g(0) f(h) + s(f(h))\|f(h)\|_Y \\ &=\partial g(0)\big[\partial f(0)h + r(h)\|h\|_X\big ] + s(f(h))\|f(h)\|_Y \\ &= \partial g(0)\partial f(0)h +\big[ \partial g(0) r(h) + s(f(h))\frac{\|f(h)\|_Y}{\|h\|_X} \big]\|h\|_X \\ &= \partial g(0)\partial f(0)h +\left[ \partial g(0) r(h) + s(f(h))\left \|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{\ Y} \right]\|h\|_X \\ &= \partial g(0)\partial f(0)h + t(h)\|h\|_X\end{align} where $t:X\to G$ is defined by $$ t(h) = \begin{cases}0 & h=0 \\ \partial g(0) r(h) + s(f(h))\left \|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{\ Y} & h\neq 0\end{cases} $$ But its clear that $t$ is continuous away from $h=0$, and $$ \| \partial g(0) r(h) \|_G \le \| \partial g(0) \|_{Y\to X} \|r(h) \|_Y \to 0,$$

$$\left\|s(f(h)) \left\|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{ Y}\right \|_{G}\le \|s(f(h))\|_G\left (\|\partial f(0)\|_{X\to Y} + \|r(h) \right\|_{\ Y}) \to 0,$$ so $t(h)\xrightarrow{G} 0=t(0)$, and hence $t$ is continuous, which concludes the proof.

(Also compare with the Caratheodory definition of a derivative in 1D)


[*] Indeed, suppose we knew the result at $x=0$ for any functions $g,f$ with $f(0)=0$, we would have for general functions $f,g$, $$ \partial (g\circ f)(x_0) = \partial [g\circ (f(\bullet+x_0))](0) = \partial (g\circ \tilde f)(0) $$ and $$g(\tilde f(x)) = g(\tilde f(x)-\tilde f(0) + \tilde f(0)) = g(\hat f(x)+\tilde f(0)) = \tilde g \circ \hat f(x)$$ so$$ \partial (g\circ \tilde f)(0) = \partial (\tilde g\circ \hat f)(0) = \partial \tilde g(0)\circ \partial \hat f(0)= \partial g(\tilde f(0)) \partial \tilde f(0)=\partial g(f(x_0))\partial f(x_0).$$

Calvin Khor
  • 36,192
  • 6
  • 47
  • 102