I am confused about jacobians, wedge products, tensors, and more. How do I use multilinear algebra to derive the jacobian from basics?

Question

Question

I am confused about differentials and deriving Jacobians. Can someone try to understand this rambling mess and help identify what parts I am missing?

I understand that this may take some patience to read. I have done my best to keep this simple and clear. I have spoken to my professors and I usually just get ignored or left on read regarding this confusion. So I leave it here.

I want to state upfront that I am aware that differentials are more formally treated as one-forms, and that we can take their wedge products and this should all be easy-peasy. Maybe it's just that I didn't take a math degree and took a physics degree instead, but the cross product of two scalars, or differentials, isn't defined in typical vector algebra. Since the wedge product is supposed to be an extension of the cross product to $\mathbb{R}^n$ it still doesn't jive with me to actually state something like $dx \wedge dy$ unless $dx$ is being used in lieu of $\hat{e}_1 dx$.

I am going to make this very simple. When I was introduced to Jacobians, a lot of hand waving occurred and we were told to just ignore some of the questions we had for now. Having taken vector algebra and several other courses I realize I still never have gotten a good answer.

So let's begin with polar coordinates in 2D.

The Jacobian is derived from the coordinate equations $x=x(r,\theta)=r\cos\theta$ and $y=y(r,\theta)=r\sin\theta$ by considering the differential of these functions $$ \left\{\begin{array} .dx =\frac{\partial x}{\partial r}dr+\frac{\partial x}{\partial \theta}d\theta\\ dy =\frac{\partial y}{\partial r}dr+\frac{\partial y}{\partial \theta}d\theta \end{array}\right. \Rightarrow \left\{\begin{array} .dx =\cos\theta dr-r\sin\theta d\theta\\ dy =\sin\theta dr+r\cos\theta d\theta \end{array}\right. $$ If we just simply multiply these together we get something close to the right answer. $$ dxdy = \cos\theta\sin\theta dr^2 + r\cos^2\theta drd\theta - r\sin^2\theta d\theta dr - r^2\sin\theta\cos\theta d\theta^2 $$ Applying the rule that 'something small squared is approximately zero' we simplify. $$ dxdy = r\cos^2\theta drd\theta - r\sin^2\theta d\theta dr $$ And then we must add a new arbitrary rule $d\theta dr = -dr d\theta$ so that this works out correctly. $$ dxdy = r( \cos^2\theta + \sin^2\theta )drd\theta = r drd\theta $$ But now we have a new rule which is not present in standard calculus courses where exchanging the order of integration introduces a negative sign. This is directly contradicted by the fact that when we integrate over a square domain the integral is independent of integration order. example: integrating over the area of a circle is a separable integral in polar coordinates $$ \iint_{\text{circle radius R}} dxdy = \int_{-R}^R \int_{-\sqrt{R^2-y^2}}^{\sqrt{R^2-y^2}} dx dy = \int_{-R}^R \int_{-\sqrt{R^2-x^2}}^{\sqrt{R^2-x^2}} dy dx = \int_{0}^{2\pi}\int_0^R rdrd\theta $$ Integrating the cartesian integral requires a lot of patience, but integrating the polar integral can be done either first on $r$ or $\theta$ or by separation. $$ \int_{0}^{2\pi}\int_0^R rdrd\theta = \int_0^R 2\pi rdr = \int_0^{2\pi} \frac{r^2}{2} d\theta = \int_0^{2\pi} d\theta \int_0^R rdr = \pi R^2 $$

At some point I saw someone factor the equations of differentials into matrix equations which neatly presented the Jacobian matrix which I had never observed as a 'derived' matrix before. $$ \begin{bmatrix}dx\\ dy\end{bmatrix} = \begin{bmatrix}\cos\theta&-r\sin\theta\\\sin\theta&r\cos\theta\end{bmatrix} \begin{bmatrix}dr\\ d\theta\end{bmatrix} $$ In this presentation the Jacobian is merely the absolute value of the determinant of the partial derivatives matrix. This, like many things in calculus, is given without explanation or derivation.

This is just the matrix factored version of the above equations. But now there is no natural way to simply multiply things together. On the left hand side, we have one object with two components and there is no standard matrix operation to multiply the first component to the second without first reducing the matrix equation to a system of separated equations. This is fine in principle, but I feel there should be such a method.

To this end, I noticed that if we right multiply the transpose of this equation we obtain the following. $$ \begin{align} \begin{bmatrix}dx\\dy\end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix}^T &= \begin{bmatrix}dx\\dy\end{bmatrix} \begin{bmatrix}dx&dy\end{bmatrix}\\ &= \begin{bmatrix}dx^2&dxdy\\dydx&dy^2\end{bmatrix}\tag{1} \end{align} $$ This is so close and yet so far away from the identity Jacobian for the Cartesian coordinates. Proceeding, we obtain for the righthand side. $$ \begin{bmatrix} \cos\theta&-r\sin\theta\\ \sin\theta& r\cos\theta \end{bmatrix} \begin{bmatrix}dr\\d\theta\end{bmatrix} \left( \begin{bmatrix} \cos\theta&-r\sin\theta\\ \sin\theta& r\cos\theta \end{bmatrix} \begin{bmatrix}dr\\d\theta\end{bmatrix} \right)^T = \begin{bmatrix} \cos\theta&-r\sin\theta\\ \sin\theta& r\cos\theta \end{bmatrix} \begin{bmatrix}dr\\ d\theta\end{bmatrix} \begin{bmatrix}dr & d\theta\end{bmatrix} \begin{bmatrix} \cos\theta& \sin\theta\\ -r\sin\theta& r\cos\theta \end{bmatrix} \\ = \begin{bmatrix} \cos\theta&-r\sin\theta\\ \sin\theta& r\cos\theta \end{bmatrix} \begin{bmatrix}dr^2& dr d\theta\\ d\theta dr & d\theta^2\end{bmatrix} \begin{bmatrix} \cos\theta& \sin\theta\\ -r\sin\theta& r\cos\theta \end{bmatrix} $$ But simplifying this last matrix multiplication is pretty obviously excessively long and convoluted out type out and in the process I have made several mistakes and while a CAS can do it in microseconds, I cannot easily copy and paste the result here. In addition this is a pretty obviously wrong computation.

My professors waved away questions about the value of $dx^2$ and $dy^2$ as being zero because $dx$ is already small so $dx^2$ is basically zero. Or rather they did this hand-wave with the real approximations ($\Delta x$) to the infinitesimals, $\Delta x \approx dx$. There is also the version of all of this that comes up in some other books in terms of the virtual displacements $\delta x$ and so forth. Moreover, Google has returned results on the value of square differentials in integrations, albeit perhaps with niche applications.

Additionally there is another problem, it should be the case that $dxdy=dydx$, but if we take the determinant of the (1) we get the zero as the determinant. If we take the square of the differentials to be zero, we get the wrong sign for the matrix. It must be the case that $dxdy=-dydx$ for the computation to work out. We also must throw in a square root so we aren't left with the squares going to zero again. This just reiterates the issues from above.

In other words I think it's best to express this all with a bit more formalism that I have read and seen elsewhere. Let $\vec{r}=\vec{r}(x,y)=\vec{r}(r,\theta)$ be out position/displacement vector. $$ \vec{r} =\hat{e}_1 x +\hat{e}_2 y =\hat{e}_1 r\cos\theta +\hat{e}_2 r\sin\theta $$ The differential of position is $$ d\vec{r} = \frac{\partial \vec{r}}{\partial x}dx+\frac{\partial \vec{r}}{\partial y}dy = \frac{\partial \vec{r}}{\partial r}dr+\frac{\partial \vec{r}}{\partial \theta}d\theta $$ Using the same matrix factoring method $$ \begin{align} d\vec{r} &= \hat{e}_1 dx + \hat{e}_2 dy\\ &= \begin{bmatrix}\hat{e}_1&\hat{e}_2\end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix} = \begin{bmatrix}dx&dy\end{bmatrix} \begin{bmatrix}\hat{e}_1\\\hat{e}_2\end{bmatrix}\\ &= (\cos\theta\hat{e}_1+\sin\theta\hat{e}_2) dr + (-r\sin\theta\hat{e}_1+r\cos\theta\hat{e}_2) d\theta\\ &= \begin{bmatrix}\hat{e}_1&\hat{e}_2\end{bmatrix} \begin{bmatrix} \cos\theta&-r\sin\theta\\ \sin\theta& r\cos\theta \end{bmatrix} \begin{bmatrix}dr\\d\theta\end{bmatrix}\\ \end{align} $$ So far nothing is truly new. The only objection I can think of to this presentation is that 'Matricies can only hold numbers' which is true I suppose but then the question becomes what exactly a number is? Since all fields are vector spaces with the ability to multiply the elements, this point is moot. Further I will also point to the determinant method for taking the vector cross product in $\mathbb{R}^3$.

Regardless, I know I am not unique in using this method to describe linear combinations since EigenChris uses this same representation in his series on Tensors for Beginners.

In fact, these thoughts come back nearly full circle to the following. Consider taking the previous process but instead of simple matrix multiplication, apply the wedge product in between. In fact, with the basis vectors written inside matricies, there is no need to use the transpose operation to make anything work out neatly. We can just use the commutativity of the basis matrix and the component matrix to make any operation more convenient to write out. $$ \begin{align} \begin{bmatrix}\hat{e}_1&\hat{e}_2\end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix} \wedge \begin{bmatrix}\hat{e}_1&\hat{e}_2\end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix} =& \begin{bmatrix}dx&dy\end{bmatrix} \begin{bmatrix}\hat{e}_1\\\hat{e}_2\end{bmatrix} \wedge \begin{bmatrix}\hat{e}_1&\hat{e}_2\end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix}\\ =& \begin{bmatrix}dx&dy\end{bmatrix} \begin{bmatrix} \hat{e}_1 \wedge \hat{e}_1 & \hat{e}_1 \wedge \hat{e}_2\\ \hat{e}_2 \wedge \hat{e}_1 & \hat{e}_2 \wedge \hat{e}_2 \end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix}\\ =& \begin{bmatrix}dx&dy\end{bmatrix} \begin{bmatrix} 0 & \hat{e}_1 \wedge \hat{e}_2\\ - \hat{e}_1 \wedge \hat{e}_2 & 0 \end{bmatrix} \begin{bmatrix}dx\\dy\end{bmatrix}\\ =& \begin{bmatrix}dx&dy\end{bmatrix} \begin{bmatrix} 0 + \hat{e}_1 \wedge \hat{e}_2 dy\\ - \hat{e}_1 \wedge \hat{e}_2 dx + 0 \end{bmatrix}\\ =& dx \hat{e}_1 \wedge \hat{e}_2 dy - dy \hat{e}_1 \wedge \hat{e}_2 dx \\ =& (dxdy - dydx) \hat{e}_1 \wedge \hat{e}_2 \end{align} $$ So now we have solved one issue, where we were stuck with a matrix and had to arbitrarily take the determinant and replaced it with another where either $dydx=dxdy$ and we have a zero differential area element, or $dydx=-dxdy$ and then we have to introduce an arbitrary half.

What am I missing? I feel as though there should be a neat resolution here where this all just simplifies quite nicely but I have never seen it.

I KNOW THIS IS A MESS OF A QUESTION. I will modify it for anyone who is willing to help understand me and where I am. THANK YOU! — Gerald, Oct 14 '24 at 16:57
Part of your issue is notation. The notation $dx,dy$ is used formally as part of how we denote an area integral, and doesn't have an independent existence. But there is also the common practice of letting $x, y$ denote the coordinate functions $\mathbf x(x,y) = x$ and $\mathbf y(x,y) = y$, whence $d\mathbf x, d\mathbf y$ are the total (Fréchet) differentials of the coordinate functions where e.g. $d\mathbf x(a\hat e_x+b\hat e_y) = a$ is a one-form. — Nicholas Todoroff, Oct 14 '24 at 19:48
We can combine these one-forms into a two-form $d\mathbf x\wedge d\mathbf y$, and then use the general notion of integrating differential forms to integrate it; and this is not exactly the same thing as the area integral I mentionted previously, though of course they are closely related. — Nicholas Todoroff, Oct 14 '24 at 19:48
This is a little related to, but different from this question about the area element in polar. — Mark S., Oct 14 '24 at 22:26
So from a naive perspective $d\vec{x}=\hat{x}dx=\hat{e}_1 dx$? I would really like to make a direct connection from the undergraduate presentation of vectors and integration to the higher level topics of 1-forms and 2-forms. For me it has never held meaning to simply take the wedge between dx and dy since these to me are infinitesimals which live in integrals and while I do understand them to have some notion of directionality to them, it's a one dimensional notion that doesn't live in a larger vector space. — Gerald, Oct 15 '24 at 16:18
@Gerald If by $d\vec x$ you are referring to the $d\mathbf x$ that I wrote, then no because this $d\mathbf x$ is a covector/one-form, not a vector, so $d\mathbf x \overset?= \hat e_1,dx$ makes no sense. If we're using the standard inner product however, then we could say that $d\mathbf x(v) = \hat e_1\cdot v$. Integrating $d\mathbf x$ along a line is exactly $\int\hat e_1\cdot d\vec r$: it means summing up all of the $x$-components of the infinitesimal tangent vectors $d\vec r$ along the line. In general, integrating a one-form $\omega$ means $\int\omega(d\vec r)$. — Nicholas Todoroff, Oct 16 '24 at 14:49
@NicholasTodoroff Yes, by $d\vec{x}$ I am referring to the $d\bf{x}$ that you wrote. I just didn't know the exact MathJax you used to write it. So earlier you wrote $d\vec{x}(a\hat{e}_x+b\hat{e}_y)=a$. To me I would translate that as $d\vec{x}=\hat{e}_x\cdot=\hat{e}_x^T$ so that when we followed through with the inner product or matrix product we are left with the scalar component $a$. In other words, the $d\vec{x}$ is little more than an notation change for writing inner products with the basis vectors. What is the magnitude of $d\vec{x}$? — Gerald, Oct 18 '24 at 03:38
@Gerald Yes, and I wrote the same thing. The "magnitude" of $d\mathbf x$ with respect to the standard inner product is $1$ since by definition $$d\mathbf x \cdot d\mathbf x = d\mathbf x^\sharp\cdot d\mathbf x^\sharp = \hat e_1\cdot\hat e_1 = 1$$ where $\omega \mapsto \omega^\sharp$ is the inverse of $v \mapsto v\cdot({-})$. — Nicholas Todoroff, Oct 18 '24 at 04:02
So $d\mathbf{x}$ has nothing to do anymore, at a graduate level, with 'an infinitesimal change in x' and is instead replaced with the notion that $dx$ represents an element of a covector field? I think I remember EigenChris saying something similar. — Gerald, Oct 19 '24 at 16:31

score 1 · Answer 1 · answered Oct 14 '24 at 18:05

The relation $$\mathrm{d}x \wedge \mathrm{d}y = -\mathrm{d}y \wedge \mathrm{d}x$$ isn't "a new arbitrary rule", it captures the (physicist) familiar (right-hand rule) fact that $$ \vec{a} \times \vec{b} = - \vec{b} \times \vec{a} \text{.} $$ To rephrase: cross products and wedge products anticommute.

Alternatively, if you are using the right-hand rule to orient infinitesimal areas (for a surface for applying Gauss's Law, for instance), the normals point inward if you order the infinitesimals one way and outward if you swap the order. So of course the integral is negated under exchange of variables. (In undergraduate land, we aren't careful about this. We manually point the normals outward because symmetry means we can intuit which way the field is passing through the surface patches. In graduate land, one may have surfaces with patches that are not so easy to manually orient, so define your infinitesimals carefully and let the wedge product track that for you.)

It may help to realize that cross products don't actually give you vectors, they give axial vectors.

When you say "the Jacobian is merely the absolute value of the determinant of the partial derivatives matrix". You're repeating the undergraduate land version of "you always know how to properly orient things". Generically,

The Jacobian matrix is the linear coefficient in the Taylor expansion of a multivariable function, also called the gradient. (The Hessian matrix is the second coefficient.)
The Jacobian determinant is the determinant of the Jacobian matrix.
When you swap two rows or two columns of a matrix (corresponding to reversing the handedness of one of the input or the output coordinate systems), the determinant picks up a factor of $-1$, again tracking orientations for you. (See second bullet at Determinant:Properties:Immediate Consequences. In jargon, the determinant is alternating. This might be a good time to slowly read alternating form.)

When you take $\mathrm{d}x\, \mathrm{d}y = \mathrm{d}y \, \mathrm{d}x$, you throw away all information about orientation. The only way this can be true for oriented quantities is if $\mathrm{d}x\, \mathrm{d}y = 0 = \mathrm{d}y \, \mathrm{d}x$. Then some of what you write about happens. But if we track orientations, the correct version of this is $$|\mathrm{d}x\, \mathrm{d}y| = |\mathrm{d}y \, \mathrm{d}x|$$ or, equivalently, $$|\vec{a} \times \vec{b}| = |\vec{b} \times \vec{a}| \text{,} $$ where we explicitly delete the orientation information.

From a naive perspective, would you say that $dx$ as you have written it should be written as $d\vec{x}=\hat{x}dx=\hat{e}_1dx$? — Gerald, Oct 16 '24 at 13:44
@Gerald : No. "$\mathrm{d}x$" is not an element of the vector space spanned by the $\hat{e}_i$. In undergraduate land, what you have written can be done but only because the orientation information is being added by hand elsewhere in the computation. — Eric Towers, Oct 16 '24 at 16:34
So I have a question for you. I have done some more reading, and the couple textbooks that I have barely touch on this topic, or assume a complete understanding of the material. But I found this answer https://math.stackexchange.com/a/664674/167701 where $d\mathbf{x}$ is identified with the unit covector $\tilde{e}_1$, $d\mathbf{x}=\tilde{e}_1$. To me this means that we are throwing away the undergraduate notion of $dx$ representing an infinitesimal change in $x$ and replacing it with the notion that $dx$ represents an inner product. Is there a better way to phrase this? — Gerald, Oct 19 '24 at 16:29
@Gerald : That the "$\mathrm{d}x$" acts like a dot product is very clear in "Real Analysis" with the study of Lebesgue integration and measures, where one writes $\int f(z) ,\mathrm{d}\mu(z)$. One can think of that integral as a dot product between the function $f$ and the co-function $\mu$. Another setting where dot-product-ness is clearly laid out is the theory of orthogonal polynomials. So yes, applying covectors to vectors is analogous to a dot product. — Eric Towers, Oct 19 '24 at 17:42

I am confused about jacobians, wedge products, tensors, and more. How do I use multilinear algebra to derive the jacobian from basics?

Question

1 Answers1