I have just learned about the chain rule but my book doesn't mention the proof. I tried to write a proof myself but can't write it. So can someone please tell me about the proof for the chain rule in elementary terms because I have just started learning calculus.
4 Answers
Assuming everything behaves nicely ($f$ and $g$ can be differentiated, and $g(x)$ is different from $g(a)$ when $x$ and $a$ are close), the derivative of $f(g(x))$ at the point $x = a$ is given by
$$ \begin{align} & \lim_{x \to a}\frac{f(g(x)) - f(g(a))}{x-a} \\ &= \lim_{x\to a}\frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \cdot \frac{g(x) - g(a)}{x-a} \end{align} $$
where the second line becomes $f'(g(a))\cdot g'(a)$, by definition of derivative.
- 204,511
-
9There's a more complicated derivation that avoids having to treat separately the case where $g(x)=g(a)$ for $x$ in a neighborhood of $a$. Some authors such as the one quoted here point out an "incorrect argument" that looks like the proof above, but I think the objection applies only if you forget to specify that $g(x)\neq g(a)$. – David K Mar 08 '15 at 13:26
-
10You still need to deal with the case when $g(x) =g(a) $ when $x\to a$ and that is the part which requires some effort otherwise it's just plain algebra of limits. And most authors try to deal with this case in over complicated ways. One just needs to remark that in this case $g'(a) =0$ and use it to prove that $(f\circ g)'(a) =0$. – Paramanand Singh Jun 13 '17 at 10:09
-
@Arthur Is it correct to prove the rule by using two cases. One where the derivative of $g(x)$ is zero at $x$ (and as such the "total" derivative is zero), and the other case where this isn't the case, and as such the inverse of the derivative $1/g'(x)$ exists (the case you presented)? It seems to work, but I wonder, because I haven't seen a proof done that way. – Dole Feb 08 '19 at 20:27
-
@ParamanandSingh Just for completeness, the $g'(a) = 0$ case is written out in detail at https://math.stackexchange.com/a/2492797 – Joshua P. Swanson Oct 17 '23 at 22:27
First, let me give a careful statement of the theorem of the chain rule:
If $g$ is differentiable at $a$, and $f$ is differentiable at $g(a)$, then $f \circ g$ is differentiable at $a$, and $$ (f \circ g)'(a) = f'(g(a)) \cdot g'(a). $$
It is implicit in the hypotheses of the chain rule that $g$ and $f$ are real functions, $g$ is defined on an open interval containing $a$, and $f$ is defined on an open interval containing $g(a)$. For ease of notation, we shall give the proof in the case that $f$ and $g$ are functions $\mathbb R\to\mathbb R$, but nothing much changes in the general case.
Proof. We first define an auxiliary function $\phi:\mathbb R\to\mathbb R$ as follows: $$ \phi(t)=\begin{cases} \dfrac{f(t)-f(g(a))}{t-g(a)}&\text{if $t\neq g(a),$} \\[5pt] f'(g(a))&\text{if $t=g(a)$.} \end{cases} $$ By construction, $\phi$ is continuous at $g(a)$. Moreover, $g$ is continuous at $a$ because it is differentiable at $a$. Hence, $\phi \circ g$ is continuous at $a$. We use this below on line $\eqref{*}$.
Note that for all $x\neq a$, $$ \frac{f(g(x))-f(g(a))}{x-a}=\phi(g(x)) \cdot \frac{g(x)-g(a)}{x-a}. $$ (This is true even if $g(x)=g(a)$, as in that case both sides of the equation are equal to $0$.) Hence, \begin{align} (f \circ g)'(a)&=\lim_{x \to a}\frac{f(g(x))-f(g(a))}{x-a} \\[5pt] &= \lim_{x \to a}\phi(g(x)) \cdot \lim_{x \to a}\frac{g(x)-g(a)}{x-a} \\[5pt] &= \phi(g(a)) \cdot g'(a) \tag{*}\label{*} \\[5pt] &= f'(g(a)) \cdot g'(a), \end{align} as claimed.
For an explanation of how this proof was motivated, see chapter 10 of Michael Spivak's Calculus. (My definition of the function $\phi$ is different to Spivak's, though the difference is not major.) The basic idea is that $\phi$ provides an elegant way of dealing with the awkward case where $g(x)=g(a)$ for $x$ close to $a$. If, say, $g$ is a constant function, then we cannot write $$ \frac{f(g(x))-f(g(a))}{x-a}=\frac{f(g(x))-f(g(a))}{g(x)-g(a)}\cdot\frac{g(x)-g(a)}{x-a}, $$ and use the product law for limits, since the RHS is simply undefined.
- 22,603
-
How do you claim that the auxillary function will be continuous at g(a)? – L lawliet Oct 05 '23 at 19:39
-
2@Llawliet: Note that $\phi(g(a))=f'(g(a))=\lim_{t\to g(a)}\frac{f(t)-f(g(a))}{t-g(a)}=\lim_{t\to g(a)}{\phi(t)}$, with the last equality holding because $\frac{f(t)-f(g(a))}{t-g(a)}=\phi(t)$ for all $t\neq g(a)$. Does that help? – Joe Oct 05 '23 at 20:40
-
Hi Joe. With regards to you comment about showing $\phi$ is continuous, if I'm not mistaken, what you have actually shown is $\phi(g(a))=f'(g(a))=\lim_{t\to g(a); \ t \in \mathbb{R} \setminus {g(a)}}\frac{f(t)-f(g(a))}{t-g(a)}=\lim_{t\to g(a); \ t \in \mathbb{R} \setminus {g(a)}}{\phi(t)}$. But continuity would require showing it for all $t \in \mathbb{R}$. – Paul Ash Jan 01 '25 at 21:48
-
@PaulAsh: I'm going to assume that you are using the definition of a limit in Terence Tao's book. Suppose $x_0$ is a real number and $y:\mathbb R\to\mathbb R$ is a real function. If $\lim_{t\to x_0; , t\in\mathbb R\setminus{x_0}} y(t)=y(x_0)$, then $\lim_{t\to x_0; , t\in\mathbb R}y(t)=y(x_0)$. The reason is that, by definition, the first assertion means that for every $\varepsilon>0$, there is a $\delta>0$ such that, for all $t\in\mathbb R$, if $0<|t-x_0|<\delta$ then $|y(t)-y(x_0)|<\varepsilon$. – Joe Jan 01 '25 at 22:12
-
But then it immediately follows that for every $\varepsilon>0$, there is a $\delta>0$ such that, for all $t\in\mathbb R$, if $|t-x_0|<\delta$ then $|y(t)-y(t_0)|<\varepsilon$. Does that answer your question? – Joe Jan 01 '25 at 22:12
-
I think I get it -- because when $t = x_0$, then we have $|t - x_0| = 0 < \delta \implies |y(t) - y(x_0)| = 0 < \varepsilon$. – Paul Ash Jan 01 '25 at 22:23
-
1@PaulAsh: Yes, that's right. I will note at this point that when $E\subseteq\mathbb R$ and $f:E\to\mathbb R$ is a function, with $x_0\in\mathbb R$ a limit point of $E$, then when people write $\lim_{x\to x_0}f(x)$, what they usually mean is what, in Tao's notation, is written $\lim_{x\to x_0; x\in E\setminus{x_0}}f(x)$. See for instance Rudin's Principles of Mathematical Analysis (which is probably not a better book than Tao's, but I think uses more standard notation and language in this regard). – Joe Jan 01 '25 at 22:29
One approach is to use the fact the "differentiability" is equivalent to "approximate linearity", in the sense that if $f$ is defined in some neighborhood of $a$, then $$ f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}\quad\text{exists} $$ if and only if $$ f(a + h) = f(a) + f'(a) h + o(h)\quad\text{at $a$ (i.e., "for small $h$").} \tag{1} $$ (As usual, "$o(h)$" denotes a function satisfying $o(h)/h \to 0$ as $h \to 0$.)
If $f$ is differentiable at $a$ and $g$ is differentiable at $b = f(a)$, and if we write $b + k = y = f(x) = f(a + h)$, then $$ k = y - b = f(a + h) - f(a) = f'(a) h + o(h), $$ so $o(k) = o(h)$, i.e., any quantity negligible compared to $k$ is negligible compared to $h$. Now we simply compose the linear approximations of $g$ and $f$: \begin{align*} f(a + h) &= f(a) + f'(a) h + o(h), \\ g(b + k) &= g(b) + g'(b) k + o(k), \\ (g \circ f)(a + h) &= (g \circ f)(a) + g'\bigl(f(a)\bigr)\bigl[f'(a) h + o(h)\bigr] + o(k) \\ &= (g \circ f)(a) + \bigl[g'\bigl(f(a)\bigr) f'(a)\bigr] h + o(h). \end{align*} Since the right-hand side has the form of a linear approximation, (1) implies that $(g \circ f)'(a)$ exists, and is equal to the coefficient of $h$, i.e., $$ (g \circ f)'(a) = g'\bigl(f(a)\bigr) f'(a). $$ One nice feature of this argument is that it generalizes with almost no modifications to vector-valued functions of several variables.
- 82,053
-
The way $h, k$ are related we have to deal with cases when $k=0$ as $h\to 0$ and verify in this case that $o(k) =o(h) $. This is not difficult but is crucial to the overall proof. – Paramanand Singh Jun 13 '17 at 10:15
-
1What happens in the third linear approximation that allows one to go from line 1 to line 2? I don't understand where the $o(k)$ goes. – Michael Andrew Bentley Oct 10 '17 at 14:47
-
-
@meiji163 We want to prove that $\lim_{h \to 0} o(k)/h = 0$. We can write $o(k)/h$ as $f'(a)(o(h)/h) + o(o(h))/h$, which goes to 0 as $h \to 0$. – Iguana Mar 08 '25 at 07:22
As suggested by @Marty Cohen in [1] I went to [2] to find a proof. Under fair use, here I include Hardy's proof (more or less verbatim).
We write $f(x) = y$, $f(x+h) = y+k$, so that $k\rightarrow 0$ when $h\rightarrow 0$ and \begin{align} \label{eq:rsrrr} \dfrac{k}{h} \rightarrow f'(x). \quad \quad Eq. * \end{align} We must now distinguish two cases.
I. Suppose that $f'(x) \neq 0$, and that $h$ is small, but not zero. Then $k\neq 0$ because of Eq.~*, and \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h} &= \dfrac{F(y+k) - F(y)}{k}\dfrac{k}{h} \rightarrow F'(y)\,f'(x) \end{align*}
II. Suppose that $f'(x) = 0$, and that $h$ is small, but not zero. There are now two possibilities
II.A. If $k=0$, then \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&= \frac{F\left\{f(x+h)\right\}-F\left\{f(x )\right\}}{h} \\ &= \frac{F\left\{y\right\}-F\left\{y\right\}}{h} \\ &= \dfrac{0}{h} \\ &= 0 = F'(y)\,f'(x) \end{align*}
II.B. If $k\neq 0$, then \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&= \frac{F\left\{f(x+h)\right\}-F\left\{f(x )\right\}}{k}\,\dfrac{k}{h}. \end{align*} The first factor is nearly $F'(y)$, and the second is small because $k/h\rightarrow 0$. Hence $\dfrac{\phi(x+h) - \phi(x)}{h}$ is small in any case, and \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&\rightarrow 0 = F'(y)\,f'(x) \end{align*}
Bibliography
[2] G.H. Hardy, ``A course of Pure Mathematics,'' Cambridge University Press, 1960, 10th Edition, p. 217.
- 1,194