32

Why is it okay to consider that $(\mathrm d x)^n=0$ for any n greater than $1$? I can understand that $\mathrm d x$ is infinitesimally small ( but greater than $0$ ) and hence its square or cube should be approximately equal to $0$ not exactly $0$ .

But if this is so then how can we expect the results obtained from calculus to be exact and not just approximate ( like the slope or area under a curve )?

I have also noticed some anomalies, like $\sqrt{ (\mathrm d x)^2 + (\mathrm d y)^2 }$ is $0$ but $\mathrm d x\sqrt{1+ (\mathrm d y/\mathrm d x)^2 }$ is not $0$ when these two things are apparently the same . Moreover we can claim that

$$(\mathrm d x)^2=(\mathrm d x)^3=(\mathrm d x)^4 = \cdots = 0$$

which is quite hard to believe.

Can you help me figure out the logic behind these things ?

  • 7
    There are basically three answers to this. In one of them, called smooth infinitesimal analysis, you have nilpotent infinitesimals which are used to power derivatives. In particular, the ones used to power the first derivative are by definition "nilsquare" i.e. they have $dx \neq 0$ but $dx^2=0$. This requires some fairly sophisticated logical trickery to avoid contradictions. – Ian Sep 26 '16 at 10:29
  • 6
    In the second one, called hyperreal analysis, there are infinitesimals $dx$ but they are not nilpotent. Instead, we define our operations in terms of the "standard part" of certain quantities involving infinitesimals. For example $st \left ( \frac{(x+dx)^2-x^2}{dx} \right ) = st \left ( \frac{2xdx+dx^2}{dx} \right ) = st(2x+dx)=2x$, by the definition of the standard part. Thus in some sense we "took an approximation" by throwing out an infinitesimal term. – Ian Sep 26 '16 at 10:29
  • 9
    This is analogous to taking a finite difference quotient $\frac{f(x+h)-f(x)}{h}$ and then taking a limit as $h \to 0$, which is how we do it in standard analysis. Standard analysis is the last of these frameworks. In it, there are no infinitesimals at all, and you have to define everything in terms of bounds on real numbers. You are probably best off trying to understand everything in the framework of standard analysis. – Ian Sep 26 '16 at 10:29
  • 12
    The intuitive content of all of these frameworks is not really that $dx^2=0$ or even really $dx^2 \approx 0$ (since such an approximation throws out all information). Rather it is that $dx+dx^2 \approx dx$. Thus for instance $\sqrt{dx^2+dy^2}$ should not be treated as just being zero (and indeed it is the same as $dx \sqrt{1+(dy/dx)^2}$ under normal circumstances). By contrast, in our squaring example we throw out the $dx^2$ term because we had a $dx$ term being added to it. There is a crude parallel between this and machine epsilon in floating point arithmetic. – Ian Sep 26 '16 at 10:36
  • If this has answered your question, please say as much so that I may clean it up a bit to make an answer. – Ian Sep 26 '16 at 18:52
  • I understood the 3rd and 4th comment . –  Sep 27 '16 at 02:54

7 Answers7

20

Classical authors like Pierre de Fermat and Gottfried Wilhelm Leibniz discarded higher order terms in infinitesimal $E$ (in the case of Fermat) or $dx$ (in the case of Leibniz) while fully understanding that the terms are not being set to zero but rather discarded. In other words, they used a generalized notion of relation of equality up to a negligible term.

Fermat specifically introduced a term that is translated into English as adequality to refer to such a more general relation. Leibniz is quite specific in his writing (for example in his published response to Nieuwentijt in 1695) that he is working with such a generalized relation of equality.

In modern infinitesimal theories, this type of relation is formalized in terms of what is known as the standard part function (or shadow). Thus, the calculation of the ratio $\frac{\Delta y}{\Delta x}$ for $y=x^2$ will yield not the expected $2x$ but rather the infinitely close quantity $2x+\Delta x$ where $\Delta x$ is infinitesimal. To calculate the derivative at a real point $x=c$ one takes the standard part of $2c+\Delta x$ to obtain $2c$, the expected answer.

Thus when expanding the expression $(x+dx)^2=x^2+2dx+dx^2$ one does not set the term $dx^2$ equal to zero, even though superficially it may seem that one is doing just that. One has to see the broader picture when these expressions are set in relation to one another to understand what is going on.

A broader perspective on these developments can be found in this recent article. For additional articles in this area see this page.

Mikhail Katz
  • 47,573
  • I am very touched. I have never been awarded a SME bounty by the "community" robot which is apparently what happened in this case. @robjohn can you clarify? – Mikhail Katz Oct 07 '16 at 08:43
  • I just noticed your comment, so I apologize for the 4 year delay. The Community bot awarded the bounty because the OP's account was removed while the bounty was active. Although my answer had 9 votes and yours had 5 at the time the bounty was awarded, yours was selected because it was written after the bounty was started (my answer was written 2 days before the bounty was started). The Community bot would have awareded the full bounty if your answer had accepted by the OP, but since it was not, you received half of the bounty posted. – robjohn May 19 '21 at 22:23
  • Thanks. I am still touched :-) – Mikhail Katz Sep 19 '23 at 16:24
15

It is not the square of $\mathrm{d}x$ that is $0$. It is $\mathrm{d}x\land\mathrm{d}x$ that is zero.

This comes into play in differential form and in changes of variables. Suppose that $u=x+y$ and $v=x-y$. Then $$ \begin{align} \iint f\,\mathrm{d}u\,\mathrm{d}v &=\iint f\,\mathrm{d}(x+y)\,\mathrm{d}(x-y)\\ &=\iint f\,\mathrm{d}x\,\mathrm{d}x+\iint f\,\mathrm{d}x\,\mathrm{d}y-\iint f\,\mathrm{d}y\,\mathrm{d}x-\iint f\,\mathrm{d}y\,\mathrm{d}y\\ &=\iint f\,\mathrm{d}x\,\mathrm{d}y-\iint f\,\mathrm{d}y\,\mathrm{d}x\tag{1} \end{align} $$ Why do $\iint f\,\mathrm{d}x\,\mathrm{d}x=\iint f\,\mathrm{d}y\,\mathrm{d}y=0$? Well, inside the outer integral, $x$ is supposed to be held constant, so the inner $\mathrm{d}x$ will vanish. The same goes for the double $y$ integral.

Another consequence of this follows from $$ \begin{align} 0 &=\iint f\,\mathrm{d}u\,\mathrm{d}u\\ &=\iint f\,\mathrm{d}(x+y)\,\mathrm{d}(x+y)\\ &=\iint f\,\mathrm{d}x\,\mathrm{d}x+\iint f\,\mathrm{d}x\,\mathrm{d}y+\iint f\,\mathrm{d}y\,\mathrm{d}x+\iint f\,\mathrm{d}y\,\mathrm{d}y\\ &=\iint f\,\mathrm{d}x\,\mathrm{d}y+\iint f\,\mathrm{d}y\,\mathrm{d}x\tag{2} \end{align} $$ That is, $\mathrm{d}y\land\mathrm{d}x=-\mathrm{d}x\land\mathrm{d}y$. Thus, the integral in $(1)$ is equal to $$ 2\iint f\,\mathrm{d}x\,\mathrm{d}y\tag{3} $$


Note

It is not the case that $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}=0$. It is the same as $\mathrm{d}x\sqrt{1+\left(\frac{\mathrm{d}y}{\mathrm{d}x}\right)^2}$ in whatever context they both make sense.

robjohn
  • 353,833
  • 1
    I'd like to point out that we often use the same symbols for infinitesimals as for differential forms, but they should not really be thought of as being the same thing. In particular, $dx \wedge dx$ is the zero differential form which should be thought of as a completely different object from the number zero. – Ian Sep 27 '16 at 13:48
  • @Ian: well $(\mathrm{d}x)^2$ is the square of an area, so saying $(\mathrm{d}x)^2=0$ does not mean $0\in\mathbb{R}$ either. $0$ simply means the zero in whatever units, ring, etc. we are working. It doesn't prevent many from saying $\mathrm{d}x\land\mathrm{d}x=0$. – robjohn Sep 27 '16 at 14:30
  • To me it seems that the OP had a completely different thing in mind when he asked this question. In particular, it seems to me that his $\Bbb d x ^2$ is not an alternated, but rather a symmetric form. – Alex M. Sep 27 '16 at 14:40
  • 1
    @AlexM.: As I said at the beginning, the fact that the OP was confused between $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}$ and $\sqrt{1+\left(\frac{\mathrm{d}y}{\mathrm{d}x}\right)^2},\mathrm{d}x$ meant they were thinking about the normal (symmetric) product. – robjohn Sep 27 '16 at 17:50
  • This explanation is beyond me. What does dx/\dx mean? –  Sep 29 '16 at 05:39
  • 1
    @Avi: until you get a more precise notion studying Differential Forms, think of $\mathrm{d}x\land\mathrm{d}y$ as the successive $\mathrm{d}x$ and $\mathrm{d}y$ that appear in an integration. The variable of the outer differential is held constant while computing the integral involving the inner differential. It is like a cross product in several ways: it essentially computes an area and it is anti-symmetric ($\mathrm{d}x\land\mathrm{d}y=-\mathrm{d}y\land\mathrm{d}x$). – robjohn Sep 29 '16 at 15:24
  • 1
    In the context of one-variable calculus, the expression $dx^2$ should be thought of as a symmetric square rather than an antisymmetric square, so the discussion of differential forms here is irrelevant to the question posed by the OP, which concerns the apparently magic disappearance of higher-order terms in $dx$ in a typical calculation of a derivative. – Mikhail Katz Oct 02 '16 at 09:05
  • @MikhailKatz: it seemed to me that the confusion was between one variable and two variable integration. Never have I seen in a one-dimensional integral $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}=0$. This seemed like a confusion between the two dimensional $\iint f(x,y)\mathrm{d}x\mathrm{d}x=0$ and the one dimensional $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}=\sqrt{1+\left(\frac{\mathrm{d}y}{\mathrm{d}x}\right)^2},\mathrm{d}x$. The tag [infinitesimals] was added after I answered. – robjohn Oct 06 '16 at 14:52
  • Would the downvoter care to comment? – robjohn Oct 06 '16 at 14:53
  • The kind of $dx$ the OP is referring to is properly inderstood in the context of one-variable calculus. Even for Leibniz's characteristic triangle with $ds^2=dx^2+dy^2$ the squaring operation is a symmetric square rather than the antisymmetric square, so (anticommuting) differential forms are really quite irrelevant here, as is the fact that the square of a differential 1-form is zero. – Mikhail Katz Oct 06 '16 at 16:09
  • I don't think it is clear what kind of $\mathrm{d}x$ the OP is referring to since they asked why $\mathrm{d}x^2=0$. In my answer, I explained that $\mathrm{d}x^2\ne0$ in the expression $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}$ and that what they might have seen is $\iint\dots\mathrm{d}x^2=0$. – robjohn Oct 06 '16 at 18:23
  • @robjohn I am sorry but I have never seen $dx^2$ appear in a double integral. The expression $\sqrt{dx^2+dy^2}$ makes no sense if $dx$ and $dy$ are differential 1-forms. The expression clearly suggests the Leibnizian "characteristic triangle". – Mikhail Katz Oct 07 '16 at 08:42
  • @MikhailKatz: I think you are confusing the two things I am trying to separate. The $\mathrm{d}x$ in $\mathrm{d}x,\mathrm{d}x=0$ is the differential $1$-form: see $(1)$ and $(2)$ in my answer (two dimensional integral). The $\mathrm{d}x$ and $\mathrm{d}y$ in $\sqrt{\mathrm{d}x^2+\mathrm{d}y^2}$ represent the "legs" of the right triangle whose "hypotenuse" is $\mathrm{d}s$ (one dimensional integral). – robjohn Oct 07 '16 at 23:41
  • The OP asked a question about calculus, where $dx$ is interpreted as either a memory of a limiting process in an A-track approach, or as a genuine infinitesimal in the B-track approach. His question does not seem to be about differential forms which is a more advanced topic. – Mikhail Katz Oct 09 '16 at 07:15
  • Note that what the OP means when he writes $\sqrt{dx^2+dy^2}=0$ is that if one assumes that the square of a $dx$ is zero then necessarily this square root will be zero also (which it obviously isn't, as the OP seems to recognize). – Mikhail Katz Oct 09 '16 at 07:40
14

In standard analysis there are no infinitesimals. $dx$ is merely an element of syntax used in expressing $\frac{df}{dx}$ and $\int f(x) dx$ and nothing more. Instead everything gets defined in terms of bounds on real numbers. In particular limits are defined in terms of bounds on real numbers, which gets you derivatives and integrals. In this setting, a situation where you would see $dx^2$ if you were using infinitesimals might be differentiating $x^2$. In this case you find $\frac{(x+h)^2-x^2}{h}=2x+h$. This $h$ term is not zero...but if $x$ is not zero and $h$ is going to zero then it is much smaller than the $2x$ to which it is being added. That is, the leading order term of $(x+h)^2$ is $x^2$; the first order correction is $2xh$.

Much of calculus is purely concerned with leading order terms and first order corrections. Much of the rest of it confines attention to second order corrections. Despite this, if you had $h^k$ by itself for some large integer $k$, you would not think of it as actually being zero; you only neglect it when it is being added to something much larger than itself. Thus in the infinitesimal language you shouldn't really think of $dx^2$ as being zero, but rather so much smaller than $dx$ that $dx+dx^2$ can be treated like $dx$. (In particular, under normal circumstances $\sqrt{(dx)^2+(dy)^2}$ can be interpreted as $|dx| \sqrt{1+(dy/dx)^2}$.)

This infinitesimal language can be formalized, resulting in theories which are referred to as nonstandard analysis. There are basically two ways to do this: include nilpotent infinitesimals or don't. One way to do the former is smooth infinitesimal analysis: this system contains nonzero numbers with some power of them being zero. For instance for a nilsquare infinitesimal $dx$ you have $f(x+dx)=f(x)+f'(x)dx$ as an exact equality in SIA.

SIA is a somewhat foreign theory, for at least two reasons. First, some finesse with logic is required to make it work without contradictions. You can't define SIA in classical logic, it is an inconsistent theory there, because (as you hinted at) one can use excluded middle and the field axioms to prove that $(dx)^2=0$ implies $dx=0$. Intuitionistic logic dodges this issue. Second, SIA, as the name suggests, describes a "smooth universe": all the functions in it are infinitely differentiable. Standard analysis deals with less regular functions quite routinely.

The main system containing infinitesimals, all powers of which are nonzero, is hyperreal analysis. Hyperreal analysis is suited to describe exactly the same things as standard analysis, in a certain precise and very strong sense. Rather than using nilpotent infinitesimals to implement things like linear approximation, hyperreal analysis uses the "standard part" operation, which takes a number with an ordinary real part and an infinitesimal part and "discards" the infinitesimal part.

I only mention these so that you know that there is some power beyond just intuition in the use of infinitesimals. Nevertheless I would strongly encourage you to learn the meaning of everything in the standard framework.

Revising based on the bounty commentary: first of all, one should not view $\sqrt{dx^2+dy^2}$ (intuitively the length of an infinitesimal line segment) as being zero. It is exactly the same as $|dx| \sqrt{1+(dy/dx)^2}$. (We might need the absolute value because $x$ might rise or fall along the path.) A more general way to handle this would be to parametrize the curve in terms of an additional variable $t$, so that $\sqrt{dx^2+dy^2}=dt \sqrt{(dx/dt)^2 + (dy/dt)^2}$. Now $t$ only goes up (by our choice) so no absolute value is required.

As for writing $dx+dx^2 \approx dx$, it really depends on the context. With derivatives, the whole point is not to exactly write down the function, it's all about linear approximation. Thus for instance when I write $(x+h)^2 \approx x^2+2xh$, I am doing that because I don't want to pay attention to terms of higher order than $h$, because those first two terms (the largest ones, if $h$ is small enough) are enough for whatever purpose I have.

On the other hand, a basic philosophy in calculus and (standard) analysis is that one can prove that two things are equal by proving that they are arbitrarily close together. So to follow your example, when you expand out a proof that $\int_0^\pi \sin(x) dx = 2$, you might show that there is a lower sum for $\int_0^\pi \sin(x) dx$ which is at least $2-\epsilon$ and an upper sum which is at most $2+\epsilon$, for each $\epsilon>0$. The partition depends on $\epsilon$, and that dependence is exactly where the "limit" operation is hidden. (In practice we don't do this, we just use the FTC, but the FTC is proven in this fashion.)

Ian
  • 104,572
  • Ian, the OP specifically included a [tag:infinitesimals] tag, so it's a bit odd to start an answer with a comment like "in standard analysis there are no infinitesimals". You may provide some valuable mathematical details, but strictly speaking you are not answereing the question. – Mikhail Katz Sep 20 '23 at 12:01
10

This post is meant an extended comment rather than an answer.

Ian points out that there are three ways to interpret the symbol "$dx$":

  1. infinitesimal analysis;
  2. hyperreal analysis;
  3. differential forms.

The first two approaches are somewhat less standard, I think, and indeed, I know very little about either. As such, I'd like to comment on the question from the perspective of (3) differential forms.


In the theory of differential forms, the following five objects should be distinguished: $$dx, \ \ d(x^2), \ \ (dx)^2, \ \ \ dx \wedge dx, \ \ d(dx).$$

  • The object $dx$ is a "differential $1$-form." It is not zero.
  • The object $d(x^2)$ is equal to $2x\,dx$, which is also a "differential $1$-form." It is also not zero.
  • The object $(dx)^2$ is a "smooth quadratic form." It is not zero. Here, the squaring is an operation called "symmetric product."
  • The object $dx \wedge dx$ is a "differential $2$-form." It is equal to zero. The $\wedge$ symbol is an operation called "wedge product." The wedge product has the funny property that $dx \wedge dx = 0$, whereas $dx \wedge dy = -dy \wedge dx$ is not zero.
  • The object $d(dx)$ is a "differential $2$-form." It is equal to zero. In fact, the symbol $d$ is called the "exterior derivative," and has the funny property that $d(df) = 0$ for any function $f$.

While this does not answer the question per se, I hope this clarification will be useful to understanding.

Jesse Madnick
  • 32,819
2

Let us try to understand stuff at the intuitive level with the help of a toy problem. If you are looking for advanced mathematics, please skip this answer.

Suppose, a class of children is going from a place A to B. At the beginning of the journey, the teacher says, "Hi class! There is a bit of a problem. The speedometer of the bus is not working. But, we would need to calculate the speed of the bus for some time. Can we do it? I can tell you that for the next few seconds, the distance travelled by the bus $x = t^2$, where $x$ is in meters and $t$ is in seconds. Specifically, I want you to find out the speed at $t=2$ and $t=3$ seconds."

The class which has no concept of calculus, is puzzled at first. But slowly, they try to figure out some approximations.

Siddhartha: If we want to find the speed at $t=2$ seconds, we can have a look at the distance travelled b/w $t=1$ and $t=2$. That would be 3 meters, so we can say that the speed is greater than 3 m/s.

Akanksha: Good point. But instead of the previous one second, we can have a look at the next 1 second. In the next 1 second, the bus travels 5 m. So, the speed is lesser than 5 m/s. In fact, we can say that the speed is between 3m/s and 5m/s at $t=2$ seconds.

Harsh: But why are we taking the unit of time to be 1 second. If we reduce the time gap, we will get a better approximation, no?

Siddhartha: Lovely! Lets do it with a time gap of 1/2 seconds. Then. (Starts putting up numbers on paper and doing some addition subraction) Wow, so, with a time gap of 1/2 second, we can say that our speed is between 3.5 and 4.5 m/s

Akanksha: And we can repeat this process for smaller times as well. In fact, I have a feeling that if we take time gap to be 1/4 seconds, we will get speed between 3.75 and 4.25 seconds.

Teacher: Why don't you check that?

After a few seconds, Harsh verifies the claim. At this point, the teacher asks them to find a proof if this holds for general $t$ and $\Delta t$

So, the students do the calculation $v = ((t + \Delta t)^2 -t^2)/ \Delta t = (2t\Delta t + (\Delta t)2)/ \Delta t = 2t + \Delta t$

So, if we take the time gap to be $\Delta t$, we can say that our velocity lies between, $2t -\Delta t$ and $2t + \Delta t$. So, if we put our $\Delta t$ to be very small (approximately zero), we get our velocity as $2t$. We can call this our velocity just now.

Teacher: Excellent! The technical term for this is instantaneous velocity. Can you repeat the same procedure if I gave you $x = t^3$ instead?

Students (all excited): Yes sure!

$v = ((t + \Delta t)^3 -t^3)/ \Delta t = (3(\Delta t)t^2 + 3t(\Delta t)^2 + (\Delta t)^3))/ \Delta t = 3t^2 + 3t\Delta t + (\Delta t)^2$

Siddhartha: Teacher, I am getting this expression. What should I do now?

Teacher: Try for $t=2$. See, what happens?

Siddhartha: If I put $\Delta t$ to be very small, say 0.0001, I get values very close to 12.

Teacher: Lovely. What about $t=3$? General $t$?

Siddhartha: I can always put $\Delta t$ to be very very small. So, the only term which remains is $3t^2$.

Teacher (after waiting for others to catch up): Excellent! Now, do you notice that in effect, when we are expanding $(t + \Delta t)^n$, we can for our purposes ignore all powers greater than 2. So, we could have expanded $(t + \Delta t)^2$ as $(t^2 + 2t\Delta t)$ and $(t + \Delta t)^3$ as $(t^3 + 3t^2\Delta t)$ and still got the same answer.

Students fall silent for some time. After some time, a student breaks the silence.

Akanksha: It is because, in the division we have the power of $\Delta t$ as 1. So, any terms of higher power would become very small, when we make $\Delta t$ small. In fact, if we take $\Delta t$ to be almost zero, the higher powers would all be almost zero, since if $\Delta t = 0.0001$, its higher powers would be even smaller, in fact, much smaller.

Teacher: Excellent thinking Akanksha. In fact, all of you have done a great job. You have figured out the basics of calculus by yourself. Let me just fill in some nomenclature so that we can share with others our line of thought.

When we say that $\Delta t$ is almost zero, we write it as $\lim_{\Delta t \rightarrow 0}$. Since, this is used many many times, we actually save a lot of effort just by writing $dt$ instead of writing $\Delta t$ under $\lim_{\Delta t \rightarrow 0}$.

So, when, the denominator has the power of $dt$ at one, we can safely put $dt^2$, $dt^3 \ldots$ as 0. However, if the denominator has higher power of $dt$, then, we obviously cannot do this.

Can you understand this, my dear students?

Harsh: So, you are saying that we can ignore all powers of $dt$ higher than the lowest power in denominator.

Teacher: Yes.

Harsh: Would it also hold for non-integral powers?

Teacher: You say?

Harsh: It should, since $0.0001^{3/2}$ is still smaller than 0.0001, which we are taking to be almost zero.

Teacher: Lovely!


In our case, we can't say that $\sqrt{(dx)^2 + (dy)^2} = 0$, without more context. Specifically, the context required is whether or not we can ignore infinitesimal change in $x$. Neither can we say that $dx \sqrt{1 + (dy/dx)^2}$ is not zero for the same reason.

In fact, the two ($dx \sqrt{1 + (dy/dx)^2}$ and $\sqrt{(dx)^2 + (dy)^2}$) are identical. If one is zero, the other has to be.

What we can say is $\sqrt{1 + (dy/dx)^2}$ is non-zero. Because it is square root of (1 + square of something), hence, square root of (something always positive).

1

The question was: why is it considered that (dx)^2 = 0? In my view there are two answers that are related to one another. One answer is that dx can be regarded as a differential form and then we have the Grassmann algebra of differential forms that is designed to work correctly for integration, area and volume and so on. This has been discussed in other posts. Another answer is that dx could be intepreted as a nilpotent infinitesimal, and then dx^2 = 0 is motivated by thinking that dx is so small that dx^2 can be ignored. In that theory Bell and Lawvere use the notion that an element * with square 0 can be regarded as indistinguishable from 0. This requires the use of intuitionistic logic in the theory. But if you add a formal element * to the real numbers so that its square is 0, and extend the real numbers so that R^ = { a + b*| a and b are standard reals}, then you obtain a domain where a lot of calculus can be done. Again this is discussed here in other posts. So an element * with square 0 is also a way to think of the dx with dx^2 = 0. Now my point is that if you have dx and dy linearly independent over R and you assume that (adx + bdy)^2 = 0 for all real numbers a and b, then you deduce (assuming we are generating here an asoociative algebra) that dxdy = -dydx. Thus the Grassmann algebra can be viewed as a consequence of the principle that all "infinitesimals" should have square zero. This remark is meant to be a chapter in the discussion of nilpotent infinitesimals. The usual definitions of differential forms are done on a different basis. I hope this makes my previous post clear.

0

While there are several outstanding answers here, there are a couple of intuitive ideas I would point out that are made rigorous between differential forms, grassman algebras, and other such topics, showing that this can be context dependent. Intuitively, in pre-modern math/physics/classical geometry etc., we can look at a differential as only "mattering" when it is on the order of the system we are working with. For example, if the lowest order form is a 2-form, like $dxdy$, it is the "largest", so upon a double integration, 3 forms (like $dxdydz$) are still being multiplied by an infinitesimal, returning 0. This is nonrigorous and depends on the context (don't take it seriously), however (as in the drawing below) it can be quite elegant.

My favorite (context-dependent) proof that differentials square to zero is from geometry. Take a square, given by the points $(0,0),(0,t),(t,t),(t,0)$. Then the area is $A = t^2$. Taking the differential area, we get

$$ A + dA = A + tdt + tdt + dt^{2} $$ $$ \implies dA = 2tdt + dt^{2} $$

comparing this with the difference quotient, we find that $$ dA = 2tdt $$

But this means that

$$ 2tdt + dt^{2} = 2tdt $$

ergo $dt^{2} = 0$.

enter image description here

It serves to notice that for higher order geometry, the same type of behavior occurs with, for example, the volume of a box: we find that $dV = 3t^{2}dt$, which correspond to the infinitesimals of the faces, while the edges' and the vertices' differentials are of higher order and thus zero.