4

I would like to prove the chain rule: given $f$ and $g$ polynomial functions, $h = f \circ g$, and $a \in \mathbb{R}$, that $h'(a) = f'(g(a)) \cdot g'(a)$. However, I would like to do so without using the limit definition of the derivative or any sort of differentiation rules.

So far, the only lead I've got is that given $P(x)$ a polynomial function, by the division algorithm, $P(x) = (x-a)^2Q(x) + R(x)$, and $R(x)$ is the equation of the line tangent to $P(x)$ at $x = a$.

Matt Samuel
  • 59,287
tostito
  • 305

5 Answers5

5

Recall that the derivative is linear and satisfies $$(fg)'=f'g+fg'$$ Using this coupled with induction you can show that the derivative of $g(x)^n$ is $$ng(x)^{n-1}g'(x)$$ Thus if $$f(x)=\sum{a_nx^n}$$ then $$(f(g(x)))'=\sum{na_ng(x)^{n-1}g'(x)}=f'(g(x))g'(x)$$ This is precisely the chain rule.

Matt Samuel
  • 59,287
0

Let $D:k[X]\to k[X]$ be the usual derivative, which is the unique $k$-linear map such that $D(fg)=D(f)g+fD(g)$ for all $f$, $g\in k[X]$ and $D(X)=1$. We want to show that $$D(f\circ g)=D(f)\circ g \cdot D(g)\qquad\text{for all $f$, $g\in k[X]$.}$$ Since $D$ is $k$-linear and composition is $k$-linear and every $f\in k[X]$ is a linear combination of monomials, it is enough that we prove this when $f$ is $X^n$, that is (since we know that $D(X^n)=nX^{n-1}$) that $$D( g^n)=n g^{n-1} \cdot D(g)\qquad\text{for all $g\in k[X]$ and all $n\in\mathbb N_0$.}$$
This you can easily prove by induction on $n$. Indeed, i is obvious that it holds if $n=0$, and if it holds for some $n$ we have that \begin{align} D(g^{n+1}) &= D(g^n\cdot g)\\ &= D(g^n)\cdot g + g^n D(g) \\ &= ng^{n-1}D(g)\cdot g+g^n D(g) \\ &= (n+1)g^nD(g). \end{align}

0

With differential of a function $y=f(u)$, $$dy=f'(u)du$$ and also for $u=g(x)$ $$du=g'(x)dx$$ so with substituation $$dy=f'(g(x))g'(x)dx$$

Nosrati
  • 30,522
0

This is not a full solution, but it does describe another approach to defining the derivative of a polynomial that does not involve limits, and that is related the division algorithm.

Let $P(x)$ be any polynomial. Choose some real number $a$. Then if we divide $P(x)$ by $x-a$ we get a quotient, $Q(x)$, and a constant remainder, $R$. In fact by the Remainder Theorem $R=P(a)$, so we have $P(x) = (x-a)Q(x) + P(a)$. Rearranging, $$Q(x) = \frac{P(x)-P(a)}{x-a}$$ This equation has a natural interpretation: the quotient polynomial $Q(x)$ tells us the slope of the secant line passing through the graph of $P(x)$ at the points $(a,P(a))$ and $(x,P(x))$. With this interpretation, we can recognize that the slope of the tangent line to $P(x)$ at $x=a$ is just given by $Q(a)$. So we define $f'(a) = Q(a)$. (Since $Q(x)$ is a polynomial, there is no need to take a limit here.)

With this as background, let's set out to answer the question in the OP.

To compute the derivative of $f(g(x))$ at $x=a$ we divide $f(g(x))$ by $x-a$, obtaining a quotient $q_1(x)$, with $$f(g(x))=(x-a)q_1(x) + f(g(a))$$ and then the derivative is $q_1(a)$.

To compute $f'(g(a))$ we divide $f(x)$ by $x-g(a)$, obtaining a quotient $q_2(x)$, with $$f(x) = (x-g(a))q_2(x) + f(g(a))$$ and then $f'(g(a)) = q_2(g(a))$.

To compute $g'(a)$ we divide $g(x)$ by $x-a$, obtaining a quotient $q_3(x)$, with $$g(x) = (x-a)q_3(x) + g(a)$$ and then $g'(a) = q_3(a)$.

The chain rule is then expressed by the identity $$q_1(a) = q_2(g(a))\cdot q_3(a)$$. This is what we need to prove. Can you take it from here?

mweiss
  • 24,547
0

Slightly more generally as to what Matt Samuel did, one can even show the multivariate chain rule. Let $A$ be any commutative unital ring. Fix a positive integer $n$. For $1\leq r\leq n$, define $\partial_r:A[x_1,\dots,x_n]\to A[x_1,\dots,x_n]$ the $r$-th partial derivative and $d/dt:A[t]\to A[t]$ the derivative. As Samuel noted, for $g\in A[t]$ one has $\frac{d}{dt}g^n=(n-1)g'g^{n-1}$, by induction on $n$ plus Leibniz rule.

For $f\in A[x_1,\dots,x_n]$ and $g_1,\dots,g_n\in A[t]$, we claim $$ \frac{d}{dt}f(g_1,\dots,g_n)=\sum_{r=1}^n\frac{dg_r}{dt}\partial_rf(g_1,\dots,g_n). $$ Indeed, write $$ f=\sum_{i_1,\dots,i_n=0}^{+\infty}a_{i_1\cdots i_n}x_1^{i_1}\cdots x_n^{i_n}, $$ where almost all $a_{i_1\cdots i_n}\in A$ are zero. Then \begin{align*} \frac{d}{dt}f(g_1,\dots,g_n) &=\sum_{i_1,\dots,i_n=0}^{+\infty}a_{i_1\cdots i_n} \frac{d}{dt}(g_1^{i_1}\cdots g_n^{i_n})\\ &=\sum_{i_1,\dots,i_n=0}^{+\infty}a_{i_1\cdots i_n} \sum_{r=1}^ng_r'i_rg_1^{i_1}\cdots g_{r-1}^{i_{r-1}}g_r^{i_r-1}g_{r+1}^{i_{r+1}}\cdots g_n^{i_n}\\ &=\sum_{r=1}^n\sum_{i_1,\dots,i_n=0}^{+\infty} g_r'a_{i_1\cdots i_n}i_rg_1^{i_1}\cdots g_{r-1}^{i_{r-1}}g_r^{i_r-1}g_{r+1}^{i_{r+1}}\cdots g_n^{i_n}\\ &=\sum_{r=1}^ng_r'\sum_{i_1,\dots,i_n=0}^{+\infty} a_{i_1\cdots i_n}i_rg_1^{i_1}\cdots g_{r-1}^{i_{r-1}}g_r^{i_r-1}g_{r+1}^{i_{r+1}}\cdots g_n^{i_n}\\ &=\sum_{r=1}^ng_r'\partial_rf(g_1,\dots,g_n). \end{align*}