Why can derivatives be used as a basis for one-forms/covectors?

Question

I'm studying for general relativity and as such I'm learning tensor algebra. I'm following Schutz's book where denotes covectors with a tilde such as $\tilde{w}$ and vectors with the familiar arrow $\vec{e}$.

He defines the gradient of $\phi$ as the one-form $\tilde{d}\phi$ such that is has elements $\tilde{d}\phi = (\partial_i\phi) = \left( \dfrac{\partial\phi}{\partial x^{0}}, \dfrac{\partial\phi}{\partial x^{1}}, \dfrac{\partial\phi}{\partial x^{2}}, \dfrac{\partial\phi}{\partial x^{3}} \right)$. They of course transform as covectors such that $$(\tilde{d}\phi)_{\bar{a}} = {\Lambda^{b}}_\bar{a} (\tilde{d}\phi)_b$$ then it can be deduced that ${\Lambda^{b}}_\bar{a} = \dfrac{\partial x^b}{\partial x^\bar{a}}$.

He then claims that you can use these gradients to form a basis for one-forms such that $\tilde{d}x^a = w^a$. I don't understand this derivation because he uses the following relationships: $$\dfrac{\partial x^b}{\partial x^a} = {\delta^a}_b \ \ \ \text{and}\ \ \ \tilde{w}^a(\vec{e}_b) = {\delta^a}_b$$

Why is the first relationship true? I understand that for Cartesian coordinates this is true but what about in the general case? Couldn't I always define a new coordinate system $(x', y')$ where $x' = x'(y')$ and $y' = y'(x')$? Clearly then their various partial derivatives do not necessarily vanish. Does this mean the claim that $\tilde{d}x^a$ forming a covector basis only holds for Cartesian coordinates?

If we call the standard Euclidean coordinates $r^i$, and $\phi = (x^1,\dots, x^n)$ then you can check that $\partial/\partial x^i$ is the pushforward under $\phi^{-1}$ of $\partial/\partial r^i$. — , Dec 02 '23 at 02:16
Coordinates are just labels by which we specify points on a manifold and they allow us to partially differentiate functions defined on the manifold. The function $(x^0,\dots x^3)\mapsto x^b$ clearly has the partial derivative given by that "first" relationship, regardless what coordinates we have used. Why partial derivatives form a basis of the tangent space is handled in any differential geometry book (or briefly here for example). — Kurt G., Dec 02 '23 at 16:31
Differentials $dx^i$ are a basis of the cotangent space. Imho the strength of this modern view point is that a lot of stuff (coordinate transformations, pull backs, pushforwards) can be deduced simply from the chain rule. And can be memorized quite well in that way. — Kurt G., Dec 02 '23 at 16:32

lightxbulb · Answer 1 · 2024-11-27T10:05:16.603

First things first, note that what Schutz terms a gradient:

In fact, the gradient is a one-form, and understanding why helps us to develop an intuitive understanding of one-forms.

is typically termed a differential in most modern mathematical books I have seen. Schutz is by no means the only one that seems to do so, however, Minser, Thorne, and Wheeler do so in their book Graviation, Sean Carrol does so in his lecture notes on GR, and you can find something similar even on this page. So at this point I assume this is just accepted physics convention. Most modern mathematical books I have read, however, define the gradient of $f$ at $x$ as the vector $\nabla f(x)$ such that $\langle \nabla f(x), v\rangle = df(x)(v)$, where $df(x)$ is the differential of $f$ at $x$ and $\langle \cdot, \cdot\rangle$ is an inner product. Note that with this definition you would get different gradients for different inner products, and if you don't have an inner product you wouldn't even be able to define the gradient, while you can define the differential even without an inner product in Banach spaces. Moreover if the function is not differentiable the gradient would also not be defined as defined in the encyclopediaofmath page (they are really defining the Jacobian matrix there).

Basis and Canonical Dual Basis in Finite-Dimensional Spaces

A basis $A = \begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n\end{bmatrix} \in U^{1\times n}$, for a vector space $(U,\mathbb{F},+,\cdot)$, is a set of vectors in $U$ such that there exists a unique coordinate representation $[\vec{u}]^A$ for any vector $\vec{u}\in U$: \begin{equation*} \vec{u} = \sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j = A[\vec{u}]^A, \quad [u]^A\in\mathbb{F}^n. \end{equation*} Mathematically the existence part is typically stated as "$A$ spans $U$" ($\textrm{span}(A)=U$), while the uniqueness is in the form of "the vectors in $A$ are linearly independent".

A natural question is how do we find the coordinates $[\vec{u}]^A$ of $\vec{u}$ in the basis $A$. That is, we want functions $\alpha^i:U\to\mathbb{F}$ such that $\alpha^i(\vec{u}) = ([u]^A)^i$. If we choose $\alpha^i$ to be linear and such that $\alpha^i(\vec{a}_j) = \delta^i_j$ then this holds, since \begin{equation*} \alpha^i(\vec{u}) = \alpha^i\left(\sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j \right) \stackrel{linearity}{=} \sum_{j=1}^n \alpha^i(\vec{a}_j) ([\vec{u}]^A)^j \stackrel{\alpha^i(\vec{a}_j) = \delta^i_j}{=} \sum_{j=1}^n \delta^i_j ([\vec{u}]^A)^j = ([\vec{u}]^A)^i. \end{equation*} The above functionals are the canonical dual basis $A^* = \begin{bmatrix} \alpha^1 \\ \vdots \\ \alpha^n\end{bmatrix}\in (U^*)^{n\times 1}$ of $A$ for the dual space $(U^*,\mathbb{F},+,\cdot)$ where $U^*$ is the space of (continuous) linear maps from $U$ to $\mathbb{F}$. They span the space since any $\omega \in U^*$ can be written as $\omega = \sum_{j=1}^n \omega(a_j) \alpha^j$: \begin{equation*} \omega(\vec{u}) = \omega\left(\sum_{j=1}^n \vec{a}_j \alpha^j(\vec{u})\right) \stackrel{linearity}{=} \sum_{j=1}^n \omega(\vec{a}_j)\alpha^j(\vec{u}) \implies \omega = \sum_{j=1}^n \omega(\vec{a}_j)\alpha^j = \sum_{j=1}^n [\omega]_A\alpha^j = [\omega]_AA^*. \end{equation*} So you see that $\alpha^i$ give the coordinates $([\vec{u}]^A)^i = \alpha^i(\vec{u})$ of (contravariant) vectors in the basis $A$, while $\vec{a}_j$ give the coordinates $([\omega]_A)_j = \omega(\vec{a}_j)$ of linear functionals (covariant vectors) in the basis $A^*$. To reiterate: \begin{align*} \vec{u} &= A[\vec{u}]^A = \sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j = \sum_{j=1}^n \vec{a}_j\alpha^j(\vec{u}) = AA^* \vec{u} , \\ \omega &= [\omega]_AA^* = \sum_{j=1}^n ([\omega]^A)_j \alpha^j = \sum_{j=1}^n \omega(\vec{a}_j)\alpha^j = \omega AA^*. \\ \end{align*} The product between $A[\vec{u}]^{A}$, $[\omega]_AA^*$, and $AA^*$ is standard matrix matrix multiplication: \begin{equation*} AA^* = \begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n \end{bmatrix} \begin{bmatrix} \alpha^1 \\ \vdots \\ \alpha^n \end{bmatrix} = \sum_{j=1}^n \vec{a}_j \alpha^j. \end{equation*}

Canonical Dual Basis: Polynomial Space Example

Some examples are in order. Consider the space of real polynomials of at most degree $n-1$: $$U = \left\{p:\mathbb{R}\to\mathbb{R}\,|\, p(x) = \sum_{j=0}^{n-1} c^j x^j,\,\, c^j\in\mathbb{R}\right\},$$ with the usual function-function addition $(p+q)(x) = p(x)+q(x)$, scalar-function multiplication $(s\cdot p)(x) = s\cdot p(x)$, and the field being $\mathbb{F} = \mathbb{R}$. The basis is explicit in the above definition, namely the monomial basis $a_j(x) = x^{j-1}$. The dual basis is somewhat funny $\alpha^i = ((i-1)!)^{-1}\delta_0 \circ d_x^{i-1}$ (here $\delta_0$ is the evaluation functional at zero). Indeed you can check that: \begin{align*} \alpha^i(\vec{a}_j) &= ((i-1)!)^{-1}\delta_0 \circ d_x^{i-1} x^{j-1} \\ &= \delta_0 \circ \begin{cases} ((i-1)!)^{-1}(j-1)\ldots (j-i+1) x^{j-i}, &\textrm{for} \,\, j\geq i, \\ 0, &\textrm{for} \,\, j<i. \end{cases} \\ &= \begin{cases} 0, &\text{for} \,\, j<i, \\ 1, &\text{for} \,\, j=i, \\ 0, &\text{for} \,\, j>i. \end{cases} \\ &= \delta^i_j. \end{align*}

Canonical Dual Basis: Euclidean Subspace Example

Another example consists of an $n$-dimensional subspace $U$ of $\mathbb{R}^{m\times 1}$ (i.e. $n\leq m$). Let $A=\begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n\end{bmatrix} \in\mathbb{R}^{m\times n}$ be a basis for $U$. Let $A^{-}\in\mathbb{R}^{n\times m}$ be any left inverse of $A\in\mathbb{R}^{m\times n}$ (i.e. $A^{-}A = I_n$), and $(A^{-})^i \in \mathbb{R}^{1\times m}$ be the $i$-th row of $A^{-}$. Define $\alpha^i(\vec{u}) = (A^{-})^i*\vec{u}$, where $*$ is the standard matrix-matrix multiplication. It is then clear that $\alpha^i(\vec{a}_j) = \delta^i_j$ because $A^{-}A = I_n$. I intentionally emphasized $*$, since that's the main difference between the row-vector $(A^{-})^i$ and the functional $\alpha^i$. One may object that for $n<m$ the above does not provide a unique canonical dual basis since there are infinitely many left inverses $A^{-}$. However, if those are restricted to have domain $U$, all of them are equivalent, and thus the canoncial dual basis is indeed unique in $U^*$ (it is however not unique in $(\mathbb{R}^m)^*$).

Canonical Dual Basis: Partial Derivatives

Consider the space of directional derivative operators evaluated at $p\in\mathbb{R}^n$ that act on continuously differentiable functions with domain $\mathbb{R}^n$ and co-domain $\mathbb{R}$, i.e., $C^1(\mathbb{R}^n,\mathbb{R})$: \begin{equation*} U_p = \left\{\delta_p \circ \partial_{u} : C^1(\mathbb{R}^n,\mathbb{R})\to\mathbb{R}\,\biggr|\, (\delta_p \circ \partial_{u})(f) := \lim_{t\to 0} \frac{f(p+tu)-f(p)}{t}, \,\, u\in \mathbb{R}^n\right\}. \end{equation*} Since the domain is made up of functions in $C^1$ we can write any directional derivative by using the partial derivatives: \begin{equation*} (\delta_p \circ \partial_{u})(f) = (\delta_p \circ \partial_{\sum_{i=1}^n u^ie_i})(f) = \sum_{i=1}^n u^i (\delta_p \circ \partial_{e_i})(f). \end{equation*} In other words $A = \begin{bmatrix} \delta_p\circ \partial_{e_1} & \ldots & \delta_p \circ \partial_{e_n}\end{bmatrix} \in U_p^{1\times n}$ is a basis for $U_p$. Since $U_p$ is a subspace of $C^{1}(\mathbb{R}^n,\mathbb{R})^*$ the dual space $U_p^*$ is a subspace of the double dual $C^{1}(\mathbb{R}^n,\mathbb{R})^{**}\cong C^1(\mathbb{R}^n,\mathbb{R})$. We then have the natural isomorphism $f\in C^1(\mathbb{R}^n,\mathbb{R})\mapsto \delta_f\in C^1(\mathbb{R}^n,\mathbb{R})^{**}$, thus $$\delta_f(\delta_p\circ \partial_{u}) = (\delta_p\circ \partial_{u})(f) = \partial_u f(p).$$ It is clear that $\delta_f$ is linear in its argument $U_p$ since it's just an evaluation functional. The addition and scalar multiplication for $\delta_{a\cdot f + b\cdot g}$ are also defined as one would expect: \begin{equation*} \delta_{a\cdot f + b\cdot g}(\delta_p\circ \partial_{u}) = a\cdot (\delta_p\circ \delta_u)(f) + b\cdot (\delta_p\circ \delta_u)(g). \end{equation*} Now consider the coordinate functions $\epsilon^i(v) = v^i$ (this is the canonical dual basis for $e_j$, i.e., $\epsilon^i(e_j) = e^i_j = \delta^i_j$). Their evaluation functionals $\delta_{\epsilon^i}$ give you the canonical dual basis: \begin{equation*} A^* = \begin{bmatrix} \delta_{\epsilon^1} \\ \vdots \\ \delta_{\epsilon^n} \end{bmatrix} \in (U_p^*)^{n\times 1}, \quad \delta_{\epsilon^i}(\partial_p \circ \partial_{e_j}) = \lim_{t\to 0}\frac{\epsilon^i(p+te_j)-\epsilon^i(p)}{t} = \epsilon^i(e_j) = \delta^i_j. \end{equation*} Now define the notation $dx^i := \delta_{\epsilon^i}$, $\frac{\partial}{\partial x^j} := \delta_p \circ \partial_{e_j}$, $\frac{\partial x^i}{\partial x^j} := \frac{\partial \epsilon^i}{\partial x^j} = (\delta_p \circ \partial_{e_j})(\epsilon^i) = \delta^i_j$. This should now look familiar.

At this point it should probably be mentioned that it's not the derivatives that are the basis of the covectors (as the title of your question would imply). It is these $\delta_{\epsilon^i}$ evaluation functionals that are the canonical dual basis to the partial derivatives, and it is they that provide a basis for the covectors in $U_p^*$. First remember that the coordinates of an element $(\delta_p \circ \partial_{u})\in U_p$ are given through the canonical dual basis: \begin{equation} \delta_{\epsilon^i}(\delta_p \circ \partial_{u}) = \delta_{\epsilon^i}\left(\delta_p \circ \sum_{j=1}^n u^j \partial_{e_j}\right) = \delta_{\epsilon^i}\left(\sum_{j=1}^n u^j(\delta_p \circ \partial_{e_j})\right) = \sum_{j=1}^n u^j \delta^i_j = u^i. \end{equation}

Now suppose $\delta_f \in U_p^*$ is a $1$-form. Then you can apply it to an element of $\delta_p \circ \partial_{u} \in U_p$:

\begin{align*} \delta_f(\delta_p\circ \partial_u) = \delta_f\left(\delta_p \circ \sum_{j=1}^n u^j \partial_{e_j}\right) = \sum_{j=1}^nu^j \delta_f(\delta_p \circ \partial_{e_j}) = \sum_{j=1}^n \delta_f(\delta_p \circ \partial_{e_j})\delta_{\epsilon^j}(\delta_p\circ \partial_u). \end{align*} Now removing the argument leads to: \begin{equation*} \delta_f = \sum_{j=1}^n \delta_f(\delta_p \circ \partial_{e_j})\delta_{\epsilon^j} = \sum_{j=1}^n [\delta_f]_A^j \delta_{\epsilon^j} = [\delta_f]_AA^* = \delta_f AA^*. \end{equation*}

The partial derivative $\partial_{e_j}f(p)$ are the coordinates of $\delta_f$ w.r.t. the basis $A^*$ made up of $\delta_{\epsilon^j}$.

Partial Derivative Fields and Their Covector Fields

As another note - you can define the space of "vector fields" of partial derivative operators as follows:

\begin{equation*} U = \left\{\partial_{u} : C^1\bigl(\mathbb{R}^n,C^1(\mathbb{R}^n, \mathbb{R})^*\bigr)\,\biggr|\, (\partial_{u})(p) := \partial_{u(p)}, \,\, u\in C(\mathbb{R}^n,\mathbb{R}^n)\right\}. \end{equation*}

Note that $u$ is now a function of $p$. You could also make your basis for $\mathbb{R}^n$ a function of $p$ if it's not constant in space, i.e., $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix} \in (C^1(\mathbb{R}^n, \mathbb{R}^n))^{1\times n}$. Similarly for its canonical dual basis $$B^* = \begin{bmatrix} \beta^1 \\ \vdots \\ \beta^n\end{bmatrix} \in (C^1(\mathbb{R}^n,(\mathbb{R}^n)^*))^{n\times 1}.$$ Then you get location dependent bases for $U$:

$$A=\begin{bmatrix} \partial_{b_1} & \ldots & \partial_{b_n}\end{bmatrix} \in U^{1\times n}.$$

Similarly for the canonical dual:

$$A^* = \begin{bmatrix} \delta_{\beta^1} \\ \vdots \\ \delta_{\beta^n} \end{bmatrix} \in (U^*)^{n\times 1}.$$

Then you have for any element $\partial_u \in U$:

$$\partial_u\in U \implies (\partial_{u})(p) = \partial_{u(p)} = \partial_{\sum_{j=1}^n \beta^j(p)(u(p))b_j(p)} = \sum_{j=1}^n \delta_{\beta^j(p)}(\partial_{u(p)}) \partial_{b_j(p)} = \sum_{j=1}^n b_j(p) ([\partial_{u(p)}]^{A(p)})^j = A(p) [\partial_{u(p)}]^{A(p)} = A(p) A^*(p) \partial_{u(p)} = (AA^*\partial_u)(p).$$

Then fields of covectors $U^*$ are what are termed differential $1$-forms (I have no idea why they termed it "differential forms" instead of fields of linear functionals):

$$\omega\in U^* \implies \omega(p) = \sum_{i=1}^n \omega(p)(\partial_{b_i(p)})\delta_{\beta^i(p)} = \sum_{i=1}^n ([\omega(p)]_{A(p)})_i \delta_{\beta^i(p)} = [\omega(p)]_{A(p)}A^*(p) = \omega(p) A(p)A^*(p) = (\omega AA^*)(p).$$

What do mathematicians call $\left( \dfrac{\partial\phi}{\partial x^{0}}, \dfrac{\partial\phi}{\partial x^{1}}, \dfrac{\partial\phi}{\partial x^{2}}, \dfrac{\partial\phi}{\partial x^{3}} \right)$ if not a gradient? As you wish: I am ignoring the rest of that post. — Kurt G., Nov 26 '24 at 12:28
@KurtG. Those are the coordinates of the differential/total derivative, i.e., the Jacobian $J_{\phi}$. The gradient requires an inner product to even be defined, i.e., $\langle \nabla \phi(x), v\rangle = df(x)(v)$. So the coordinates of the gradient are $(G(x))^{-1}(J_{\varphi}(x))^T$, where $G(x)$ is the metric induced by this inner product in the same coordinate system as $J_{\varphi}$. The reason the two are usually mixed up is because in $\mathbb{R}^n$ w.r.t. the standard inner product and the canonical basis you have $\nabla \varphi(x) = J^T_{\varphi}(x)$. But this is not true in general. — lightxbulb, Nov 26 '24 at 16:56
Yes, @KurtG. That's what a multivariable calculus student calls a gradient. :) Turning the derivative (a linear map) into a vector requires a choice of isomorphism or inner product. — Ted Shifrin, Nov 26 '24 at 17:20
@TedShifrin It also caused me a great deal of confusion when trying to read differential geometry and general relativity texts coming from a CS background. It's not pleasant especially when one is studying on their own — lightxbulb, Nov 26 '24 at 17:36
@KurtG. Also $f, \varphi, \phi$ should have been the same thing in my comment. I can't edit it anymore though. I mentioned the gradient specifically because the OP stated that in his book the gradient transforms covariantly. While what he wrote are clearly the coordinates of the differential. That can get very confusing very quickly (I know from experience) if they read other materials on the topic, since conventionally the gradient is a vector and the differential is a linear functional with $\langle \nabla f, v\rangle = df(v)$. So the gradient will transform contravariantly. — lightxbulb, Nov 26 '24 at 17:49
What are you up to? Are you trying to write a book about the basics of differential geometry? Differentiating curves $t\mapsto\gamma(t)$ in $M$ w.r.t. $t$ after mapping them to $\mathbb R^n$ by charts shows easily that $\mathbb R^n$'s differential operators $\partial_i$ are a vector space that is isomorphic to the vector space we conceive as tangent space. It requires little text to write that up and can be found in pretty much every text book. — Kurt G., Nov 27 '24 at 10:23
@KurtG. "It requires little text to write that up and can be found in pretty much every text book." - There is little text in Schutz's book indeed. As you can see from the question little text and overloaded notation leads to confusion. So I decided to provide most details. It's still lacking the transition from $\mathbb{R}^n$ to arbitrary manifolds, but it is what it is. Also note that the fact that two objects are isomorphic doesn't provide the actual motivation for using the isomorphic representation. — lightxbulb, Nov 27 '24 at 13:13
The motivation for doing it with differential operators is that they provide transformation rules between different coordinate systems for free. No look ups in any texts. Just let the good old chain rule do its work. — Kurt G., Nov 27 '24 at 13:41
@KurtG. I have already read and upvoted your answer there yes. Most books/lectures I have read, however, skim over motivation for generalising different concepts as they do. And at least I am not aware of a differential geometry book that goes the full mile and elaborates on the concepts successively in $\mathbb{R}^n$, then with space variant bases, then generalises this to manifolds with the proper motivation for it. My answer above isn't even differential geometry, it's just basic linear algebra. But the linear algebra books that I have read didn't discuss the above in this manner. — lightxbulb, Nov 27 '24 at 13:53

score 1 · Answer 2 · answered Nov 26 '24 at 16:30

$ \newcommand\PD[2]{\frac{\partial#2}{\partial#1}} $Suppose that $f(x^i)$ is a real-valued function of some $n$ coordinates $x^i$. Then by definition $$ \PD{x^i}f = \lim_{h\to0}\frac{f(x^1,\dotsc,x^i+h,\dotsc,x^n)-f(x^1,\dotsc,x^i,\dotsc,x^n}h. $$ Note that the notion of partial derivative depends on the system of coordinates, not just one coordinate, and in particular that you cannot mix coordinates.

One way to state the above is that what $\PD{x^i}f$ means, by definition, is the derivative of $f$ with respect to $x^i$ when all the rest of $x^1,\dotsc,x^{i-1},x^{i+1},\dotsc,x^n$ are held constant.

(Does this mean that the standard Leibniz-like partial derivative notation is ambiguous? Yes. This is well-known in e.g. thermodynamics, where a common notation is $\left(\PD TU\right)_{P,V}$ to indicate that $P,V$ are held constant.)

Now by $\PD{x^i}{x^j}$ what we really mean is $\PD{x^i}f$ where $$f(x^1,\dotsc,x^n) = x^j$$ and of course in this case $\PD{x^i}f(x^1,\dotsc,x^n) = \delta^j_i$.

Formally, it helps to disentangle the notion of partial derivative from that of coordinates. So if $F : U\to\mathbb R$ for $\mathbb R^n$ then we can write $\partial_iF$ for the derivative with respect to the $i^{\mathrm{th}}$ argument of $F$ with all other arguments held constant. We can also introduce an explicit point function $p : U \to M$ taking coordinates $x^i$ to points in some manifold $M$. Then the function $f$ we were interested in before should really be a map of points, say $f : M \to \mathbb R$, and $$ ``\PD{x^i}f\text{''} = \partial_i(f\circ p). $$

If we have some other coordinates $y^i$, that is another point function $p' : V \to \mathbb R$ with $V\subseteq\mathbb R^n$, and $$ ``\PD{y^i}f\text{''} = \partial_i(f\circ p'). $$ If $F = f\circ p$ is our original $f(x^i)$ from above, then of course we could also write $$ ``\PD{y^i}f\text{''} = \partial_i(F\circ q) $$ where $q=p^{-1}\circ p'$ is the transition function between the coordinates.

Why can derivatives be used as a basis for one-forms/covectors?

2 Answers2