Clarifying the chain rule terminology in differential geometry calculuations

Question

Let $M$ be a manifold and $f:M\to\mathbb{R}$ a smooth function on it. Let $p\in M$ have the coordinates $\{x^i\}$ under the chart $(U,\phi)$. Finally, let $\gamma:I\to M$ be a curve ($I$ is an open interval in $\mathbb{R}$). Let $u$ be the generic argument of the $\gamma$ map, i.e. $u\in I$.

I'm trying to understand the chain rule: $$\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x^i}\frac{\partial x^i}{\partial u}$$

Now I'm aware that the change in the function value as we move along the curve is $\frac{\partial f}{\partial u}\big|_{p\in M}$, which is actually $\frac{\partial (f\circ\gamma)}{\partial u}\big|_{\gamma^{-1}(p)\in I}$ if we want to reconcile the domains.

Similarly, $\frac{\partial f}{\partial x^i}\big|_{p\in M}$ is actually $\frac{\partial (f\circ\phi^{-1})}{\partial x^i}\big|_{\phi(p)\in \mathbb{R}^n}$

For the last term, I have two ways of looking at it: either $\frac{\partial x^i}{\partial u}\big|_{x^i(p)\in \mathbb{R}^n}$, or $\frac{\partial x^i}{\partial u}\big|_{p\in M}$ . I'm not sure which is correct, so I'll leave that as is for now. The chain rule equation becomes $$\frac{\partial (f\circ\gamma)}{\partial u}\ \bigg|_{\gamma^{-1}(p)}=\frac{\partial (f\circ\phi^{-1})}{\partial x^i}\ \bigg|_{\phi(p)}\frac{\partial x^i}{\partial u}$$

My thought process was that I could express $f\circ\gamma$ as $(f\circ\phi^{-1})\circ(\phi\circ\gamma)$, but I haven't been able to understand just how the chain rule is working out.

I'd be grateful if someone could help me understand. I'm a beginner, so I would really appreciate a step by step answer meant for a beginner to the subject, without omitting any details.

You apply chain rule as calculus in $\mathbb{R}^n$ to your guess that $f \circ \gamma (t) = (f \circ \phi^{-1} \circ \phi \circ \gamma)(t) = \hat{f}(\gamma^1(t), \dots, \gamma^n(t) )$. — Kelvin Lois, Jun 23 '20 at 13:20
@Eumenes: Thanks, I appreciate the help, but it's not clear to me at all what you did on the right hand side. Don't get me wrong - intuitively it's pretty clear - changing $u$ will lead to change in each of the coordinates $x^i$. A change in any $x^i$ in turn corresponds to a change in $f$ so from that perspective, intuitively it makes sense. But I'd be grateful if you could detail out all the steps in a proper, systematic formulation of the chain rule, as and when you get time. You're much better at diff geom so what seems obvious to you doesn't seem obvious to me at all. — Shirish, Jun 23 '20 at 20:35
Some good books that treat chain rule are Spivak's Calculus on Manifolds, Munkres's Analysis on Manifolds or Zorich's Mathematical Analysis I. — Kelvin Lois, Jun 23 '20 at 22:10

peek-a-boo · Accepted Answer · 2020-06-24T20:09:35.943

If you're having trouble with the notation in differential geometry, my suggestion is to completely avoid Leibniz's notation (temporarily) and write out everything in completely precise notation, being careful with what the function is vs where it is being evaluated.

Let's first study what happens in $\Bbb{R}^n$ before moving on to the manifold case. Consider a differentiable map $F: \Bbb{R}^n \to \Bbb{R}$ and a differentiable curve $\lambda: \Bbb{R} \to \Bbb{R}^n$ (of course you can restrict everything to open subsets but I don't feel like introducing too many new letters for new domains). We can now form the composite function $F \circ \lambda:\Bbb{R} \to \Bbb{R}$, and ask what is the derivative at a point $u \in \Bbb{R}$. i.e what is $(F \circ \lambda)'(u)$? The answer is of course to use the chain rule; see Loomis and Sternberg, page $148$, Theorem $7.2$. The result is that \begin{align} (F \circ \lambda)'(u) &= DF_{\lambda(u)}\left[ \lambda'(u) \right] \end{align} This is the application of the linear transformation $DF_{\lambda(u)}:\Bbb{R}^n \to \Bbb{R}$ on the "velocity vector of the curve" $\lambda'(u) \in \Bbb{R}^n$. Or if you prefer matrices (which I do not), you can think of this as a matrix multiplication of the $1 \times n$ matrix $DF_{\lambda(u)}$ with the $n \times 1$ matrix (or column vector) $\lambda'(u)$. To write this in terms of partial derivatives, just recall what the entries of each matrix is: \begin{align} (F \circ \lambda)'(u) &= \sum_{i=1}^n (\partial_iF)_{\lambda(u)} \cdot (\lambda^i)'(u) \\ &= \sum_{i=1}^n (\partial_iF)_{\lambda(u)} \cdot (\text{pr}^i\circ\lambda)'(u) \tag{$*$} \end{align} where $\lambda(\cdot) = \left( \lambda^1(\cdot), \dots, \lambda^n(\cdot)\right)$, or said differently, $\lambda^i := \text{pr}^i \circ \lambda$ is the $i^{th}$ coordinate function of the curve $\lambda$ (here $\text{pr}^i(a^1, \dots a^n) := a^i$ is the function which assigns to each $n$-tuple, the $i^{th}$ entry). Also, the notation $(\partial_iF)_{\lambda(u)}$ means you first calculate the $i^{th}$ partial derivative $\partial_iF: \Bbb{R}^n \to \Bbb{R}$, and afterwards evaluate at $\lambda(u) \in \Bbb{R}^n$ .

By using a chart $(U,\phi)$, the manifold case reduces directly to the $\Bbb{R}^n$ case. We wish to compute $(f \circ \gamma)'(u)$. Well, just write this as $(f \circ \phi^{-1}) \circ (\phi \circ \gamma)$, so we're considering $F= f \circ \phi^{-1}$ and $\lambda = \phi \circ \gamma$. Now, using $(*)$, we get \begin{align} (f \circ \gamma)'(u) &= \sum_{i=1}^n \partial_i(f \circ \phi^{-1})_{(\phi \circ \gamma)(u)} \cdot \left( \text{pr}^i \circ \left(\phi \circ \gamma\right)\right)'(u) \end{align}

This is the full answer written out in gory detail. Now, if you want to obtain the more familiar looking formula, you have to make some new definitions for the notation.

The first step is rather than calling the chart map $\phi$, we shall call it $x$; so $x:U \to x[U] \subset \Bbb{R}^n$ is the chart map. Next, we define $x^i := \text{pr}^i \circ x: U \to \Bbb{R}$. With this, the above formula reads \begin{align} (f\circ \gamma)'(u) &= \sum_{i=1}^n\partial_i(f \circ x^{-1})_{x(\gamma(u))} \cdot (x^i \circ \gamma)'(u). \end{align}
The second step is to introduce the following short-hand notation: \begin{align} \dfrac{\partial f}{\partial x^i} := \left[ \partial_i(f \circ x^{-1})\right] \circ x \end{align} Or if I evaluate at $\gamma(u) \in M$, \begin{align} \dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \equiv \dfrac{\partial f}{\partial x^i}(\gamma(u)) := \partial_i(f \circ x^{-1})_{x(\gamma(u))} \end{align} The first $\equiv$ means "same thing different notation" (it's just a matter of where you want to indicate the point of evaluation, so it's more of an aesthetic thing rather than mathematical thing), but the $:=$ means it's a definition. The RHS is an ordinary partial derivative of a function $\Bbb{R}^n \to \Bbb{R}$, so it is something we already know, but the LHS is a new convenient symbol which we define in order to mimic the classical notation as much as possible. With this, we can write $(*)$ as \begin{align} (f \circ \gamma)'(u) &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \cdot (x^i \circ \gamma)'(u) \end{align} Or if you insist on using Leibniz's notation, you can write this as \begin{align} \dfrac{d(f \circ \gamma)}{du}\bigg|_{u} &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i}\bigg|_{\gamma(u)} \cdot \dfrac{d(x^i \circ \gamma)}{du}\bigg|_u \end{align}
The final step to making things look very classical is to completely avoid writing the compositions with $\gamma$ (so, don't write $\circ \gamma$ anywhere), and completely suppress where everything is being evaluated. Then, we get the nice familiar looking formula \begin{align} \dfrac{df}{du} &= \sum_{i=1}^n \dfrac{\partial f}{\partial x^i} \cdot \dfrac{dx^i}{du} \end{align} This is the form of the chain rule which you wrote in your very first equation (though for some reason you have $\frac{\partial}{\partial u}$ instead of $\frac{d}{du}$).

Remark.

I highly recommend recommend you watch the series of Lectures on General Relativity by Frederic Schuller, and in particular this one. The first $6$ lectures provide such an amazing introduction to the language of smooth manifolds, tangent spaces, tangent bundles, vector fields, covector fields (one-forms). (Of course, you should watch as many as you can, but for the very basics of differential geometry, you should watch atleast the first 6).

EDIT: (in response to comment)

You have quite a few typos, and you've incorrectly applied the chain rule. Yes, there should be a $\text{pr}^j$ in the "numerator". The chain rule says \begin{align} \partial_i(f \circ \psi \circ \phi^{-1}) \bigg|_{\phi(p)} &= \partial_j(f \circ \tilde{\phi}^{-1})\bigg|_{\tilde{\phi}(\psi(p))} \cdot \partial_i(\text{pr}^j \circ \tilde{\phi} \circ \psi \circ \phi^{-1})\bigg|_{\phi(p)} \end{align} As I explained in the first and second bullet points above, if we instead call the chart maps on $M$ as $(U,x)$, and the chart on $M'$ as $(V,y)$, then based on how I defined the notation above, this equality can be written as \begin{align} \frac{\partial(f \circ \psi)}{\partial x^i}\bigg|_{p} = \frac{\partial f}{\partial y^j}\bigg|_{\psi(p)} \cdot \frac{\partial(y^j \circ \psi)}{\partial x^i}\bigg|_{p} \tag{$\sharp$} \end{align} This says exactly the same thing as the previous equality, simply because of how I defined the notation. I think this is a completely good notation (though takes a little time to get used to in order to reconcile this new definition involving charts vs how we ordinarily think of things). However, what I find absolutely terrible (atleast when first learning the subject) is to completely avoid the composition symbol $\circ \psi$ and write \begin{align} \dfrac{\partial f}{\partial x^i} &= \dfrac{\partial f}{\partial y^j}\cdot \dfrac{\partial y^j}{\partial x^i} \end{align} This is an abuse of notation because the $f$ appearing on the two sides of the equation mean different things, and also the $y^j$ has two different meanings, while in $(\sharp)$, we're not reusing the same symbol for two different purposes.

Here's another scenario: if $\psi:M\to M'$ is a smooth map, and if $x^i$ denote coordinates of a point $p\in M$ and $y^i$ denote those of $\psi(p)\in M'$, then the book writes that $\partial(f\circ\psi)/\partial x^i=\frac{\partial f}{\partial y^j}\frac{\partial y^j}{\partial x^i}$. If $\phi,\tilde \phi$ are the chart maps for $M,M'$ respectively, then I know I can write $f\circ\psi$ as $f\circ\psi\circ\phi^{-1}$ (since we avoid writing compositions), which in turn is $(f\circ\tilde\phi^{-1})\circ(\tilde\phi\circ\psi\circ\phi^{-1})$. The derivative is then (cont'd)... — Shirish, Jun 24 '20 at 19:29
...$\partial_i((f\circ\tilde\phi^{-1})\circ(\tilde\phi\circ\psi\circ\phi^{-1}))\ |{x^i}$. This can be written as $\partial_j(f\circ\tilde\phi^{-1})\ |{\tilde\phi(\psi(\phi^{-1}(x^i)))}\ .\ \frac{\partial(\tilde\phi\ \circ\ \psi\ \circ\ \phi^{-1})}{\partial x^i}$. The first term is perfectly fine and denotes $\partial f/\partial y^j$ in Leibniz notation. The second term involves derivative of a $\mathbb{R}^n\to\mathbb{R}^{n'}$ function. Does the numerator in the second term need to be rephrased as $\text{pr}^j\circ\tilde\phi\circ\phi\circ\phi^{-1}$? — Shirish, Jun 24 '20 at 19:39
Thanks for the edit! I'm not clear about the typos though. You mean that I incorrectly wrote the "evaluated at part" as $x^i$ instead of $\phi(p)$? — Shirish, Jun 24 '20 at 20:14
or if you have access to Michael Spivak's Differential Geometry Volume 1, I would Highly suggest you carefully read through a few pages starting from 35 (that's where he talks about partial derivatives, chain rule, and how to relate the classical notation with the use of charts etc) — peek-a-boo, Jun 24 '20 at 20:23
one typo I see is that at the end, you have $\phi \circ \phi^{-1}$ which should have been $\psi \circ \phi^{-1}$ (so... I'm not sure why I said "a few typos", perhaps at that point in time I read/misread something else) — peek-a-boo, Jun 24 '20 at 20:24

score 1 · Answer 2 · answered Jun 23 '20 at 21:56

Let $f : M \to \mathbb{R}$ be a smooth map and $\gamma : I \to M$ is a smooth curve with $\gamma(t_0) = p$ for some $t_0\in I$ with $p$ contained in a smooth chart $(U,\phi,x^i)$. Lets establish the following notations

$\hat{\gamma}(t) \equiv\phi \circ \gamma (t) = (\gamma^1(t), \dots,\gamma^n(t))$,
$\hat{p} \equiv \phi(p) = \phi \circ \gamma(t_0) = (\gamma^1(t_0),\dots,\gamma^n(t_0))$,
$\hat{f}(x^1,\dots,x^n) \equiv (f \circ \phi^{-1})(x^1,\dots,x^n) $.

As you said the derivative $(f \circ \gamma)'(t_0)$ can be computed as

\begin{align} (f \circ \gamma)'(t_0) &= (f \circ \phi^{-1} \circ \phi \circ \gamma)'(t_0)= (\hat{f} \circ \hat{\gamma})'(t_0) = \frac{d}{dt}(\hat{f} \circ \hat{\gamma})(t)\Big|_{t=t_0}, \\ &= \frac{d}{dt} \hat{f}(\gamma^1(t),\dots,\gamma^n(t))\Big|_{t=t_0}, \\ &=\sum_{i=1}^n \bigg[\frac{\partial\hat{f}(\gamma^1(t),\dots,\gamma^n(t))}{\partial x^i} \, \frac{d\gamma^i(t)}{dt}\bigg]_{t=t_0}, \, \color{blue}{\text{by chain rule in } \mathbb{R}^n}\\ &= \sum_{i} \frac{\partial \hat{f}}{\partial x^i}(\hat{p}) \, \dot{\gamma}^i(t_0). \end{align} We know that (you can find the proof in most books) $$ \frac{\partial f}{\partial x^i}(p) := \frac{\partial}{\partial x^i}\Big|_p f = \frac{\partial \hat{f}}{\partial x^i}(\hat{p}). $$ So the result can be written as $$ (f \circ \gamma)'(t_0) = \sum_{i} \frac{\partial f}{\partial x^i}(p) \,\dot{\gamma}^i(t_0). $$

score 0 · Answer 3 · answered Jun 23 '20 at 12:57

0

Let $\omega^i: {\mathbb R}^n\to {\mathbb R}$ be the $i$-th coordinate mapping $\omega^i(x) = x^i$, then $\frac{\partial x^i}{\partial u}$ in the above chain rule formula is the derivative of the map $\omega^i\circ \phi\circ \gamma$ from $I$ to ${\mathbb R}$ at point $u = \gamma^{-1}(p)$.

answered Jun 23 '20 at 12:57

Gribouillis

16,826

So in this notation, $\omega^i\circ\phi$ would be called the coordinate function, correct? (since it takes $p$ to its coordinate $x^i\in\mathbb{R}$) – Shirish Jun 23 '20 at 13:14
Yes why not, it is the $i$-th coordinate function in this chart. – Gribouillis Jun 23 '20 at 13:15
Thanks a lot! I guess I understand the intent of your answer: we can express $f\circ\gamma$ as $(f\circ\phi^{-1}\circ(\omega^i)^{-1})\circ(\omega^i\circ\phi\circ\gamma)$ and then use chain rule as in calculus. Still, it'd be awesome if you could elaborate on your answer with steps, whenever you find time. The reason being that I'll try to work out the steps myself based on what you've said so far, but I'd still like to verify that my attempt is indeed correct by comparing it to your steps. I'll only look at your answer after I've tried to work it out myself. Thanks again! – Shirish Jun 23 '20 at 13:27
You cannot write $f\circ\gamma$ this way because $\omega^i$ is not at all invertible. I'll think about elaborating on this but it may take until tomorrow because I'm almost on my way to the beach! Start from the proof of the formula in your textbook. – Gribouillis Jun 23 '20 at 13:33
Ah yes of course - stupid of me. And yeah I'd appreciate that, please take your time. – Shirish Jun 23 '20 at 13:43

Clarifying the chain rule terminology in differential geometry calculuations

3 Answers3

Linked

Related