If $V,W$ are normed vector spaces (the dimension is irrelevant) and $A\subset V$ is open and $f:A\to W$ is differentiable, then the derivative at a point $a\in A$ is by definition a (bounded) linear map $Df_a:V\to W$, i.e $Df_a\in\text{Hom}(V,W)$. In particular, when $W=\Bbb{R}$, we have $Df_a\in\text{Hom}(V,\Bbb{R})=:V^*$. At this stage I’ll also refer you to my answer Is the identification of dx as a 1-form, and $\partial/\partial x$ as a vector, arbitrary? which should hopefully convince you that a derivative $Df_a$ (being certain limits of finite differences $\Delta f_a$) really needs to be this sort of complicated object: it needs to eat something in $V$ before it can spit out something in $W$. So, very literally in the definition of the derivative we’re forced to consider dual spaces, and more generally the space of all linear maps. Dealing with elements of $V$ or $W$ alone by themselves is not enough.
The next thing I should point out is that having an isomorphism alone isn’t sufficient for most purposes because you have to ensure that the thing you’re defining is independent of the isomorphism, or at the very least one should know to what extent things depend on the isomorphism. All real vector spaces of dimension $n$ are isomorphic to $\Bbb{R}^n$, and all separable real Hilbert spaces are isomorphic to $\ell^2(\Bbb{Z})$. So, why do we bother with more complicated vector spaces, and more complicated looking Hilbert spaces (e.g the $L^2$ spaces, the $H^k$ Sobolev spaces, etc)? Well, because unless the question you ask is something like ‘are the cardinalities equal’ or ‘are the spaces isomorphic’, simply having an isomorphism is all but useless to us. By transport of structure, we can do lots of these shenanigans.
I'm not concerned with generalizations and abstractions to other fields that were made decades after
A little off-topic, but… functional analysis was kicking around in full gear by 1920-1930’s with Banach and Riesz (with many of the motivating questions coming in the late 1800s), so this is quite literally the time (Elie) Cartan was in his prime. So, the issue of dealing with all these different spaces (and actually treating them as different) was in full effect back then as well. Furthermore, the difference and importance of the different types of tensors (i.e in modern language, the necessity of the full strength of the tensor bundles $T^{k}_l(TM)$) was known earlier too. And certainly after Einstein put forth GR, people have definitely knew the various levels of structure, even if not precisely formulated. And it’s a common feature that the ‘inventor’ of a concept knows less about it than we do with all our hindsight, which is why I don’t understand your last objection
theoretical frameworks like category theory were developed half a century after Cartan was using forms, I wouldn't take slick definitions involving categories and morphisms as some kind of valid argumentum ad abstractum.
yet in the comments you say
What is meant rigorously by metric and coordinate independence? I found no precise definition in Lee's book. Is there some hidden canonical property or equivalence class?. Can you give a very concrete and simple example/answer, with no abuse of notation or weasel words?
The language of category theory tells us in what precise sense things are ‘natural’ and ‘independent of choices’. Sure, Cartan and company may not have had the categorical language available to them, but they definitely recognized the importance of treating different things differently, and observing that some things behaved one way, while others behaved others. Try reading Cartan’s lectures on Riemannian Geometry in an Orthogonal Frame; parts of it are super insightful (as to how he thought about/was motivated by things), while others are barely comprehensible (because the notation/language hadn’t been ‘perfected’ yet: in fact he was simultaneously developing and using the language of vector-valued differential forms, which made his work hard to read back then).
Now, let me address your questions slightly more directly
It appears that rather than fussing around with a cotangent bundle, you can define the interaction of forms and with vectors as simply being (some) binary operation on the tangent bundle: $(\text TM)^2 \xrightarrow{\cdot} \mathbb F$.
Then you’re going to have to keep track of which concepts here on out depend on this bilinear bundle morphism, and to what extent (i.e if someone else chooses a different bilinear bundle morphism, how are your two answers going to differ?). The very concept of a gradient vector field, $\text{grad}(f)$, depends on a choice of non-degenerate bilinear form, whereas $df$ depends only on the function $f$. I’m not saying that the gradient vector field should never be introduced, but if one is trying to ‘artificially’ introduce it (e.g by artificially introducing a Riemannian metric, or a symplectic form into the problem) with the sole intention of avoiding mention of dual spaces, then that’s just silly. Very concretely, if I give you the function $f:\Bbb{R}^4\to\Bbb{R}$ defined as $f(\alpha,\beta,\gamma,\delta)=\alpha^2+\beta^2+\gamma^2+\delta^2$, then there are infinitely many ways to talk about a ‘gradient vector field’. This is particularly the case for a manifold like $\Bbb{R}^4$, because there are several ‘natural’ looking geometries here
- the ‘usual’ Euclidean-Riemannian metric
- the ‘usual’ Minkowski-Lorentzian metric
- the ‘usual’ symplectic form
Which do we use? Is there a preference for one over the other? What if you’re doing some classical mechanics and for some reason you have both a Riemannian metric and symplectic form, then which gradient should we consider? We could avoid all of this if we just deal with $df$ alone. Of course there’s a time and place to consider gradient vector fields, but it’s not so that we can avoid treating $df$ as what it actually is.
Some of the toys differential geometers use feel arbitrarily defined on forms, such as the pullback. Lee defines it on page 284 as:…
Pullback is the way it is quite literally because of the way we define maps and their compositions. Given a map $f:X\to Y$, it can be combined with other maps in two ways: composition on the domain or composition on the target, i.e
- given any set $E$, I can define a map $\text{Func}(Y,E)\to \text{Func}(X, E)$ as $\phi\mapsto \phi\circ f$.
- given any set $E$, I can define a map $\text{Func}(E,X)\to \text{Func}(E,Y)$ as $\phi\mapsto f\circ \phi$.
Now let’s stick to the land of vector spaces and linear maps, so $X,Y$ are vector spaces, and $f:X\to Y$ is a linear maps. To apply the above idea, I need a third vector spaces $E$. Well, the only ‘natural’ ones to this problem are $X,Y$ and of course the underlying field $\Bbb{F}$ (and of course anything built from these guys). So, as the simplest case of all let’s take $E=\Bbb{F}$. Then,
- the first bullet point above becomes the dual map $f^t:Y^*\to X^*$, $\phi\mapsto \phi\circ f$.
- the second bullet point becomes the map $\widetilde{f}:\text{Hom}(\Bbb{F},X)\to\text{Hom}(\Bbb{F},Y)$, $\phi\mapsto f\circ \phi$. However, here we have a canonical isomorphism $\Phi_X:\text{Hom}(\Bbb{F},X)\to X$ given by $\Phi_X(T):= T(1)$, i.e evaluation on $1\in\Bbb{F}$. Similarly, we have a canonical isomorphism $\Phi_Y$. We now see that the maps $\widetilde{f}$ and $f$ are canonically related as $f=\Phi_Y\circ\widetilde{f}\circ\Phi_X^{-1}$. Again, this is canonical, independent of any bases, inner products, bilinear forms, anything (and if you want a precise definition of this independence then I’m sorry but you must accept the categorical explanation, even if Elie Cartan didn’t utter those words… though I’m sure his son Henri would be fine with it). So, long story short, in this bullet point, we actually just get the map $f$ itself, so we’re not ‘creating’ anything new here.
The dual space $X^*$ on the other hand is not canonically isomorphic to $X$, so the first bullet point actually gives us something new here. Now, in the special case that the vector spaces $X,Y$ are both equipped with an inner products (or a symplectic form, or a Lorentzian metric etc) then we can use the induced isomorphism to define an adjoint, relative to the inner products, $f^*:Y\to X$. But you shouldn’t introduce an inner products for the sole purpose of avoiding dual spaces.
At the level of manifolds and vector bundles, this manifests itself as follows: given a smooth map $f:M\to N$, we have the tangent map $Tf:TM\to TN$, which then naturally induces vector bundles morphisms $(Tf)^{\otimes k}: (TM)^{\otimes k}\to (TN)^{\otimes}$ over $f$ and $(Tf)^{\wedge^k}:\bigwedge^k(TM)\to \bigwedge^k(TN)$, and hence
- for every vector bundle morphism $\xi:(TN)^{\otimes k}\to \Bbb{R}$, we get a new vector bundle morphism $f^*\xi:= \xi\circ (Tf)^{\otimes k}: (TM)^{\otimes k}\to \Bbb{R}$
- for every vector bundle morphism $\omega:\bigwedge^k(TN)\to\Bbb{R}$, we get $f^*\omega:=\omega\circ (Tf)^{\wedge^k}:\bigwedge^k(TM)\to\Bbb{R}$.
These are of course respectively the pullback of $(0,k)$ tensor fields and $k$-forms from $N$ to $M$ along $f$.
Other properties that came off as exclusive to forms also seemed to be perfectly definable using vector fields and an appropriate choice of binary operation.
Again, you should really get out of the habit of arbitrarily wanting to introduce bilinear forms, just to avoid duals. Given a vector space $V$, the dual space $V^*$ is a perfectly natural space to consider (it is made up only from $V$ and the field $\Bbb{F}$). Also the natural bilinear maps are scalar multiplication $\Bbb{F}\times V\to V$ and evaluation $V^*\times V\to\Bbb{F}$… I mean the second one is literally begging to be done: plug in a vector into an element of the dual.
There isn’t a natural bilinear map $V\times V\to\Bbb{R}$. It’s like trying to stick two north poles together; you need someone to put them together (technically impossible but you get what I mean).
Properties of forms that made them seem so-called metric independent seemed to me to come instead out of the derivative. It feels like the derivative is the actual thing making all this machinery work…
Yes! A lot of the nice properties of forms come out of how derivatives and the chain rule works. In particular, the fact that for $f:A\subset V\to W$, the derivative at a point $a\in A$ is a linear map $Df_a:V\to W$. And hence, if we have linear maps out of $W$, then we can compose with $Df_a$ to get linear maps out of $V$ (i.e the starting of pullback by $f$). Likewise, the chain rule $D(f\circ g)_a=Df_{g(a)}\circ Dg_a$ very directly implies the pullback property $(f\circ g)^*=g^*\circ f^*$, and also (for $1$-forms) that pullback commutes with the exterior derivative (for higher $k$, I agree this is harder to see). I can agree that it would be good to emphasize once again the important role played by the derivative, as a linear map, in the further treatment of forms, but really this is also pretty apparent in how often one refers back to calculus on $\Bbb{R}^n$/Banach spaces when developing the differential calculus on manifolds.
Let me emphasize that commuting pullbacks with exterior derivatives is one of the nicest things about exterior derivatives. We can pullback by any smooth map, not just isometries of some Riemannian metric, or not just symplectic-isomorphisms between certain symplectic manifolds. Again, if you want precise statements, you can’t avoid categorical language. And precisely because of how differential calculus in $\Bbb{R}^n$ works, the only natural objects to differentiate are differential forms, not vector fields.
For a vector field, $X$ on $M$, it is a smooth map $X:M\to TM$ (such that $\pi\circ X=\text{id}_M$), so the only form of differentiation we can a-priori do to it is take the tangent map $TX:TM\to T(TM)$. This is despite the fact that $X$ takes values in a vector bundle (i.e the next best thing to a vector space (atleast as far as differential geometry is concerned)).
In fact more generally, if we have a vector bundle $(E,\pi,M)$ and we specify a connection $\nabla$ on this vector bundle, then even without having a connection on $TM$ itself, we can still define a collection of first order differential operators, called the exterior covariant derivatives, $d_{\nabla}:\Omega^k(M;E)\to \Omega^{k+1}(M;E)$, which takes $E$-valued $k$-forms on $M$ to $E$-valued $(k+1)$-forms on $M$. Notice that the target space hasn’t enlarged to $T^kE$ or $T^{k+1}E$. But of course, this connection is extra piece of data to the vector bundle.
… (technically unnecessary for the theory of smooth manifolds by Whitney embedding theorem) …
Don’t bring in embeddings unless you really have to. I view Whitney’s theorem as a great comforting fact that the ‘abstract’ subject we’re studying is in principle the same as if we’d done everything within submanifolds of $\Bbb{R}^n$. Also, everytime you bring in extra information, you’re going to work that much harder to decipher which results are actually intrinsic.
I even felt that the major motivating results of the field (like integrals and the generalized stokes theorem) could come off just by viewing these things as a sort of binary operation on oriented vector fields.
That’s a completely arbitrary way of doing it. Anyway, honestly speaking differential forms aren’t the ‘right’ thing to be integrating. The ‘true’ objects to be integrating on a smooth manifold are scalar densities; this are objects which when transforming between two charts pick up a suitable absolute value of the Jacobian determinant. It is only in the presence of an orientation that one can convert a top-form into a scalar density and then integrate it. Furthermore, one has a corresponding divergence theorem; see Loomis and Sternberg Chapter 10 for such a presentation.
If differential forms aren’t technically the right things to be integrating, why do we do it? Well, because oriented manifolds occur quite often, and in this case we have Stokes’ theorem. Furthermore, there is a very nice calculus of differential forms: wedge products, pullback, interior product, Lie derivatives, and all the accompanying Cartan calculus (which relates $L_X,d,\iota_X$). So, this is a great harmony between differential calculus, integral calculus, and the basic algebra of the objects involved.
In Arnol'd's book the Legendre transform seems to have properties that can easily be written out with a binary operation instead of dualizing the vector space for momentum.
If $X,Y$ are vector bundles over $M$ and $\phi:X\to Y$ is a smooth fiber-preserving map (not necessarily fiberwise linear) then the fiber derivative is a vector bundles morphism $\mathbb{F}\phi:X\to \text{Hom}(X,Y)$. So, in the case $Y=M\times \Bbb{R}$ is the trivial bundle, this means we get a vector bundles morphism $\mathbb{F}\phi:X\to X^*$. The underlying principle here is that for a smooth map $f:A\subset V\to\Bbb{R}$, the derivative $Df$ is a map $A\to V^*$. No honest way to avoid duals here! See Why is the Legendre transform (of vector bundles) a smooth morphism $\mathbf FL:E\to E^*$? for a more detailed explanation of the notation..
So, long story short, the natural place things live (on the target) is the dual. In a special case like when the Lagrangian $L:TQ\to\Bbb{R}$ comes from a Riemannian metric $g$ on $Q$, i.e via the equation $L(v)=\frac{1}{2}g(v,v)$ (perhaps plus a potential term pulled back from the base) then the fiber derivative is actually equal to the musical isomorphism, i.e $\mathbf{F}(L_g)=g^{\flat}:TQ\to T^*Q$.
But really, you still can’t avoid the cotangent bundle, because it is $T^*Q$ which is naturally a symplectic manifold. In general, $TQ$ is not naturally a symplectic manifold. If however the fiber derivative $\mathbf{F}L:TQ\to T^*Q$ is a vector bundle isomorphism, then we can use this to pullback the natural symplectic form $\omega$ on $T^*Q$ to a symplectic form $\omega_L:=(\mathbf{F}L)^*\omega$ on $TQ$.
But at this stage when we’re already talking about symplectic forms, I legitimately do not see a point in avoiding dual spaces (and by extension, the cotangent bundle and its exterior powers).
I don't deny that highly useful tools like the pullback, wedge product, interior product, and exterior derivative are invaluable, but I don't know see why you can't just define these tools over vector fields, and simply treat all this notation and terminology for the dual space as being the notation and terminology for the tangent vector space?
Because the pullback and pushforward of vector fields is not defined, unless you’re considering diffeomorphisms. So that halts the naturality of everything else you try to do.
But I will say that wedge and interior products can be defined with one less dual. For every vector space $V$, one can define the exterior power $\bigwedge^k(V)$; in particular, we can replace $V$ with $V^*$ and we still get $\bigwedge^k(V^*)$ and we can show that for finite-dimensional $V$, this is canonically isomorphic to the dual $\left(\bigwedge^k(V)\right)^*$.
Next, regarding interior products, we can think of it as a bilinear map $V^*\times \bigwedge^k(V)\to \bigwedge^{k-1}(V)$, and perhaps this is the ‘true’ starting point. In particular, replacing $V$ with $V^*$ and then invoking the canonical isomorphism $V^{**}\cong V$, here we get the more commonly presented notion of $V\times \bigwedge^k(V^*)\to \bigwedge^{k-1}(V^*)$.
Final Remarks.
I think as a general principle, you should use the right tool for the job, and treat things the way they’re meant to be. If that thing turns out to be complicated then so be it. You can of course try to trick yourself and simplify it in an arbitrary manner e.g thinking of the curvature a connection as a bunch of numbers stored in a funny way with certain (anti)symmetry properties… and perhaps this might work for a while and even be simpler (I personally was hesitant to delve into differential geometry full). But, when you start treating things ‘properly’, you can start to access some of the deeper facts about things you’re studying.
Having said that, of course I completely sympathize with the depths of details especially in a subject like differential geometry. But that’s a part of math: you have to figure out which details are important, which are not, how the various ideas fit together, and how things change if done differently. And if you feel like the modern presentations are too abstract, then you should definitely try reading some of the classics (e.g Cartan’s lectures I mentioned above, or Spivak’s discussion of the classic works in Vol II of his differential geometry series); this can be enlightening in some respects, but I’ve also found myself utterly confused as to what they’re even saying (so I wouldn’t dwell on those details too much).