6

I've read the other posts on this site about whether you can treat $\frac{dy}{dt}$ as a fraction. There are a lot of conflicting opinions, but many seem to be saying that treating it as a fraction works fine for single-variable calculus as long as you don't write anything obviously counterproductive, like $$\left( \frac{dy}{dx} \right)^2 = \frac{(dy)^2}{(dx)^2} \ \ \text{ or } \ \ 2^{dy/dx} = \sqrt[dx]{2^{dy}}.$$

I've also read that $y''(t)$ can be written as $\frac{d^2 y}{dt^2}$ with the reasoning that $$y''(t) = \frac{d\frac{dy}{dt}}{dt} = \frac{d}{dt}\frac{dy}{dt} = \frac{d^2 y}{(dt)^2} = \frac{d^2 y}{dt^2}$$

This seems to make sense until I consider an expression like $\frac{dy'}{dy}$ (the derivative of $y'(t)$ with respect to $y$). If I work with Leibniz notation in the same way as before, I get $$\frac{dy'}{dy}=\frac{d\frac{dy}{dt}}{dy}=\frac{d}{dy}\frac{dy}{dt}=\frac{d}{dt}\frac{dy}{dy}=\frac{d}{dt}1=0$$

But from my understanding of derivatives and the chain rule, it makes sense to say $$\frac{d\frac{dy}{dt}}{dy} \equiv \frac{d^2y}{dt^2}\frac{dt}{dy}$$ which is not necessarily zero.

My question is, are there any consistent rules for how to use Leibniz notation? Maybe you have to treat $\frac{d}{dt}$ and $\frac{dy}{dt}$ as "atomic" even in single-variable calculus, which would mean the $\frac{d^2y}{dt^2}$ notation is extremely misleading. Or am I doing/understanding something wrong here?

Edit: Here's one last example of confusing Liebniz notation. A seemingly reasonable but incorrect justification for the 2nd derivative chain rule: $$\frac{d^2y}{(dx)^2} = \frac{d^2y}{(dt)^2}\frac{(dt)^2}{(dx)^2} = \frac{d^2y}{(dt)^2}(\frac{dt}{dx})^2$$ The correct but very notationally confusing 2nd derivative chain rule: $$\frac{d^2y}{dt^2}=\frac{d^2y}{dx^2}(\frac{dx}{dt})^2+\frac{dy}{dx}\frac{d^2x}{dt^2}$$

  • 1
    No , every notation has limitations, although at times it creates exotic cases that happen to be true , they are not true because of the notation. What is really behind the notation needs to be proved. The best you can do is to look at infinitesimal algebras – jimjim Jul 11 '24 at 10:15
  • 2
    "Consistent rules" are only the ones motivated by a proof. Every notation can be manipulated enough to create some sort of contradiction. Remember that what you are using are shortcuts to write derivatives in an easier way, what lies behind are rules that have been proven to be true – Zima Jul 11 '24 at 10:32
  • ah, the first issue is on your third line (mathmode). The dy in the numerator and the dy in the denominator are slightly different things. I've added some parentheses for cllarity $\frac{d(y')}{dy}=\frac{d(\frac{d(y)}{dt})}{dy}=\frac{d}{dy}(\frac{d(y)}{dt})$ – ness Jul 11 '24 at 12:05
  • 1
    you'll kindof run into some issues assuming $\frac{d}{dy}\frac{d}{dt}=\frac{d}{dt}\frac{d}{dy}$ too, but this may be a little pedantic as it is often the case – ness Jul 11 '24 at 12:10
  • I addressed your concerns about the formula for the second derivative in the updated version of my answer. – Mikhail Katz Jul 15 '24 at 09:12

3 Answers3

2

The justification you gave for denoting the expression $y''(dt)^2$ by $d^2y$ is not really a proof but rather a heuristic argument, and therefore unreliable in general (as you yourself have illustrated). Viewing the derivative and the second derivative as quotients is possible in the context of a framework that allows for infinitesimals, such as Robinson's infinitesimal analysis. Before we get to the second derivative, let us see how using $\frac{dy}{dt}$ for the first derivative is justified.

We start with an infinitesimal increment $\Delta t$ of the independent variable $t$, and form the corresponding change of the dependent variable $y$ by setting $\Delta y=f(t+\Delta t)-f(t)$. Then we can form the ratio of infinitesimals $\frac{\Delta y}{\Delta t}$ but in general it will not be the derivative but only infinitely close to it. Thus we define the derivative as the standard part (or shadow) of the ratio $\frac{\Delta y}{\Delta t}$ (when it exists). We then define a new dependent variable $dy$ by setting $dy=f'(t)dt$, where $dt=\Delta t$ (for the independent variable, there is no difference between $\Delta t$ and $dt$). Then $\Delta y$ and $dy$ differ by an infinitesimal term that's negligible compared to $\Delta y$.

For second derivative, the pertinent observation is that one can define $f''(t)$ (when it exists) as the standard part of the ratio $\frac{f(t+\Delta t) +f(t-\Delta t) -2f(t)}{\Delta t^2}$. In this sense it is a "second difference" (to use a Leibnizian term). This motivates the definition of $d^2y$ as the expression $f''(t)(\Delta t)^2$.

Incidentally, writing $\left( \frac{dy}{dx} \right)^2 = \frac{(dy)^2}{(dx)^2}$ or $2^{dy/dx} = \sqrt[dx]{2^{dy}}$ is perfectly legitimate in the context of Robinson's infinitesimal analysis, by the transfer principle.

Note that the formula for the chain rule, $\frac{dy}{dt}=\frac{dy}{dx} \frac{dx}{dt}$ is not proved by formal cancellation of $dx$ in the numerator and the denominator; rather, careful proof is required. The problem here is that the $dx$ in the denominator is an independent variable whereas in the numerator is a dependent variable, so they can't be canceled without further argument. Similarly, in the formula for the second derivative obtained by applying the Leibniz rule to the chain rule formula, in the expression $\frac{d^2y}{dx^2}\big(\frac{dx}{dt}\big)^2$ the $dx$ in the numerator and the $dx$ in the denominator cannot be canceled, because $dx$ in the denominator is an independent variable and the $dx$ in the numerator is a dependent variable (i.e., dependent on $t$). In any piece of mathematical notation, one has to strike a balance between being concise, on the one hand, and being explicit, on the other. Normally one would have to write $dy(x,dx)$ in place of $dy$ and such to be completely explicit, but then the formulas will be harder to read.

At any rate, paradoxes arise only if one insists of trying to apply formal manipulations of symbols without thinking of their meaning.

Mikhail Katz
  • 47,573
  • I don't see how infinitesimal analysis motivates the 2nd derivative notation of $\frac{d^2y}{dx^2}$. From my understanding of this paper, viewing derivatives as quotients necessitates we write the second derivative as $\frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}$. – Ishaan Jain Jul 12 '24 at 09:31
  • To clarify my question, is the 2nd derivative notation of $\frac{d^2y}{dx^2}$ legitimate in the context of infinitesimal analysis? I'm skeptical because the formula for the chain rule for the second derivative: $\frac{d^2y}{dt^2}=\frac{d^2y}{dx^2}(\frac{dx}{dt})^2+\frac{dy}{dx}\frac{d^2x}{dt^2}$ looks very counterintuitive with this notation and doesn't seem to lend itself to an interpretation of derivatives being quotients that can be algebraically manipulated. – Ishaan Jain Jul 13 '24 at 01:24
  • With our normal understanding of independent and dependent variables, $\frac{dx}{dx}=1$ and $\frac{d^2x}{dx^2}=0$. @IshaanJain – Mikhail Katz Jul 14 '24 at 09:43
  • @MikhailKatz - in the notation discussed in the paper, there is no need to identify dependent and independent variables. Just as we generally don't need to think about the nature of $x$ when multiplying/dividing both sides by $x$ in algebra (except for the exceptional case of $x = 0$), if we use a proper notation that supports it there is no reason that we need to think about the nature of $dx$ when multiplying/dividing both sides by $dx$ (except again in the exceptional case of $dx = 0$). – johnnyb Jul 15 '24 at 14:45
  • @johnnyb, I am not familiar with a rigorous approach to infinitesimal differentials that does not distinguish between independent and dependent variables. – Mikhail Katz Jul 16 '24 at 07:53
  • @MikhailKatz - I agree - and the reason is because they are using a notation with fundamental failures. If you fix the notation you no longer need the distinction. I essentially treat all variables as functions of an unknown independent variable, even if it winds up being an identity function. This makes everything 1000x more straightforward. – johnnyb Jul 16 '24 at 16:28
  • @MikhailKatz - let's call our hypothetical independent variable $q$. Therefore, $y$ is actually $y(q)$ and $x$ is actually $x(q)$. We can then define the differential $dy$ to be $y(q + \epsilon) - y(q)$. For more information see Section 4.8 of "Total and Partial Differentials as Algebraically Manipulable Entities". – johnnyb Jul 16 '24 at 16:30
  • @johnnyb, this is interesting work but you can't change established notation without causing great confusion (and also doubts about your work). Since time immemorial, the notation $\frac{d^2 y}{dx^2}$ has been used for the second derivative. You can't just change that at will. If you like to give a different meaning to this piece of notation, you could use another letter in place of $d$ (I don't know if $\delta$ would be appropriate here). But attaching a different meaning to an established piece of notation is generally not a good idea in mathematics, for better or worse. – Mikhail Katz Jul 17 '24 at 06:34
  • @MikhailKatz - I don't know why we must perpetuate a problem just because it is widespread. I'm am keeping the meaning of $dy$ and $dx$ from the first derivative. The standard notation changes the meaning of these in higher order derivatives. Note that my notation stems ONLY from applying the quotient rule to the quotient $\frac{dy}{dx}$. That's it! I'm merely maintaining the definition of terms from the first derivative. – johnnyb Jul 17 '24 at 12:32
  • "I'm am keeping the meaning of dy and dx from the first derivative" : Can you elaborate? How do you define the x-increment and the corresponding change in y without speaking of independent and dependent variables? @johnnyb – Mikhail Katz Jul 18 '24 at 08:08
  • Just using a dummy variable $q$ to be the independent variable, and then everything else is a dependent variable. So it is the $q$-increment that is fundamental, but is just left out because it isn't actually needed practically. – johnnyb Jul 18 '24 at 14:26
  • If you tell calculus students that they need a third variable $q$ to differentiate $y=x^2$ they would probably think again. @johnnyb – Mikhail Katz Jul 18 '24 at 14:31
  • Students don't need it, just the formalism. – johnnyb Jul 18 '24 at 14:33
  • It doesn't seem appropriate to teach a formalism without explaining how and why it works. @johnnyb – Mikhail Katz Jul 18 '24 at 15:09
1

For higher-order differentials to work algebraically, you need to adopt a notation that is a little non-standard. The typical notation, $\frac{d^2y}{dx^2}$ does not allow for algebraic manipulations.

However, if you take the first derivative, $\frac{dy}{dx}$ and take the derivative of it, you will notice that it is indeed a quotient, and therefore needs to use the quotient rule. The result will be that the notation for the second derivative is $\frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}$. This is fully algebraically manipulable. For instance, by straightforward algebraic notations, you can establish an inverse function theorem for the second derivative. Because the second derivative of $x$ with respect to $y$ is (using this notation) $\frac{d^2x}{dy^2} - \frac{dx}{dy}\frac{d^2y}{dy^2}$, we can see how to get there as follows:

$$ y'' = \frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2} \\ y''\, dx^3 = d^2y\,dx - dy\,d^2x \\ y''\, \frac{dx^3}{dy^3} = \frac{d^2y}{dy^2}\frac{dx}{dy} - \frac{d^2x}{dy^2} \\ -y''\,\frac{dx^3}{dy^3} = \frac{d^2x}{dy^2} - \frac{d^2y}{dy^2}\frac{dx}{dy} \\ -y'' \frac{1}{y'}^3 = x'' $$

This is more fully fleshed out in two papers I was involved in, "Extending the Algebraic Manipulability of Differentials" and "Total and Partial Differentials as Algebraically Manipulable Entities". But, in short, there is no problem in manipulating differentials algebraically as long as you use the proper notation, with notation being especially important for higher-order differentials and partial differentials. Most of the theoretical objections have come from general objections to infinitesimals, but the hyperreal number system has demonstrated that there is no logical problem with such numbers.

To address your specific example, though, the problem is that you are treating $d$ by itself as an entity. It is not. It is an operator. $dx$ is actually $d(x)$. You can think about it kind of like a function, such as $\sin(x)$, at least when considering parenthesis movement. The problem in your example is that you haven't properly parenthesized it. I will expand parentheses here so it is easy to spot the problem:

$$\frac{dy'}{dy} = \frac{d(y')}{d(y)} = \frac{d\left(\frac{d(y)}{d(x)}\right)}{d(y)}$$

As you can see, you can't get the $dy$s to cancel out because one is inside the other differential. Now, we can compute the differential, which yields:

$$\frac{\frac{d^2y}{dx} - dy \frac{d^2x}{dx^2}}{dy} = \frac{d^2y}{dx\,dy} - \frac{d^2x}{dx^2}$$

So, as you can see, you can do manipulation, it's just that you have to keep in mind the type of operations you are actually doing and what they are being applied to. Similar to functions, you can't just willy-nilly move the argument of an operator inside or outside the operator without justification.

Edit: OP asked for elaboration on the chain rule for the second derivate

For the chain rule for the second derivative, the chain rule still "works" but it is not necessary. In other words, the chain rule simply tells you which algebraic manipulations will bring you from one derivative to another.

So, let's look at what the chain rule is (we will use Arbogast/Euler D-notation to avoid confusion):

$$ D^2_t(y) = D^2_x(y)\,(D^1_t(x))^2 + D^1_x(y)\,D^2_t(x) $$

So, the left side is $\frac{d^2y}{dt^2} - \frac{dy}{dt}\frac{d^2t}{dt^2}$. So, let's just do the algebra of the right-hand side and see what happens. I'll over-parenthesize so you can more easily see the components.

$$ \left(\frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}\right)\, \left(\frac{dx}{dt}\right)^2 + \left(\frac{dy}{dx}\right)\,\left(\frac{d^2x}{dt^2} - \frac{dx}{dt}\frac{d^2t}{dt^2}\right) \\ = \left(\frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}\right)\, \left(\frac{dx^2}{dt^2}\right) + \left(\frac{dy}{dx}\right)\,\left(\frac{d^2x}{dt^2} - \frac{dx}{dt}\frac{d^2t}{dt^2}\right) \\ = \left(\frac{d^2y}{dt^2} - \frac{dy}{dx}\frac{d^2x}{dt^2}\right) + \left(\frac{dy}{dx}\frac{d^2x}{dt^2} - \frac{dy}{dt}\frac{d^2t}{dt^2}\right) \\ = \frac{d^2y}{dt^2} + \left(-\frac{dy}{dx}\frac{d^2x}{dt^2} + \frac{dy}{dx}\frac{d^2x}{dt^2}\right) - \frac{dy}{dt}\frac{d^2t}{dt^2} \\ = \frac{d^2y}{dt^2} - \frac{dy}{dt}\frac{d^2t}{dt^2} $$

So, you can see, the chain rule for the second derivative simply encodes the steps we would need to algebraically manipulate one derivative into the other.

johnnyb
  • 3,722
  • So if I understand correctly, $\frac{dy}{dx}, \frac{d^2y}{dx^2},..., \frac{d^ny}{dx^n}$ are the conventional Leibniz notations for derivatives, and are clearly "written" as quotients of differentials, but out of them only $\frac{dy}{dx}$ is the appropriate notation for a derivative when written as a quotient of differentials? – Ishaan Jain Jul 17 '24 at 23:40
  • @IshaanJain - correct. It is the only one where you can freely multiply/divide differentials and continue to get valid answers. – johnnyb Jul 18 '24 at 00:07
  • 1
    Ok thanks. I didn't think notation could be this inconsistent so it was hard to wrap my head around initially. For completeness, could you please include in your answer how the revised notation looks in the context of the 2nd derivative chain rule? – Ishaan Jain Jul 18 '24 at 03:24
  • 1
    @IshaanJain - updated it for you. – johnnyb Jul 18 '24 at 15:03
0

Leibniz notation is consistent, and even more I would say, it is useful by allowing shortcuts and permitting to carry computations with lightened notations. Nonetheless, it necessitates some practice and you will get used to it.

Unfortunately, there is no real set of rules ensuring consistency, or perhaps there exists a single one implicitly : each step has to be justified. In the concrete case you presented, the "wrong move" lies in the equality $D_yD_ty = D_tD_yy$, since the (total) derivatives with respect to $y$ and $t$ don't commute, because $y$ depends on $t$.

Also, I wouldn't pay too much attention to the paper you linked (see comment below). Indeed, its authors make non-standard manipulations. If I have understood them well, they try to redefine the second-order derivative by considering the first-order derivative $y'$ as a bivariate function of $x$ and $y$, i.e. $y' = y'(x,y)$, before applying the chain rule (because $y$ itself depends on $x$). However, they end up defining the second-order derivative $y''$ from itself and a useless vanishing term (because $x'' = 0$) in order to tally with a chain rule structure. In consequence, they haven't proven anything, apart from rewriting $y''$ in Leibniz style.

Finally, it is to be noted that higher-order derivatives in Leibniz notation are commonly treated as differentials, such as in the case of the second-order variation of a bivariate function $f(x,y)$ for example : $$ \mathrm{d}^2f(x,y) = \frac{\partial^2f}{\partial x^2}\mathrm{d}x^2 + \frac{\partial^2f}{\partial x\partial y}\mathrm{d}x\mathrm{d}y + \frac{\partial^2f}{\partial y^2}\mathrm{d}y^2 $$ Then you can divide this expression by a "second-order infinitesimal", i.e. $\mathrm{d}x^2$, $\mathrm{d}y^2$ or $\mathrm{d}x\mathrm{d}y$ in order to select the suitable term. You may even do it with respect to other parameters, such as $\mathrm{d}t^2$ or $\mathrm{d}s\mathrm{d}t$, which will induce the chain rule automatically if $x,y$ depend on them.

Abezhiko
  • 14,205
  • 1
    Is there any situation where Leibniz notation for higher order derivatives is useful by allowing shortcuts or permitting to carry computations with lightened notations? From my understanding of this paper, Leibniz-style notation for a second derivative that's consistent and useful would look like $\frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}$. – Ishaan Jain Jul 12 '24 at 09:10
  • @IshaanJain I've edited my answer to tackle your questions, have a look. – Abezhiko Jul 12 '24 at 14:36
  • The example you gave of the second-order variation of $f(x,y)$ is interesting, but I don't understand how in that example it is valid to algebraically manipulate the differentials, while for $\frac{d^2y}{dx^2}$, doing the same could result in justifying an incorrect chain rule formula: $\frac{d^2y}{(dx)^2} = \frac{d^2y}{(dt)^2}\frac{(dt)^2}{(dx)^2} = \frac{d^2y}{(dt)^2}(\frac{dt}{dx})^2$. On a related note, the true 2nd derivative chain rule formula $\frac{d^2y}{dt^2}=\frac{d^2y}{dx^2}(\frac{dx}{dt})^2+\frac{dy}{dx}\frac{d^2x}{dt^2}$ seems to fly in the face of algebraic manipulability. – Ishaan Jain Jul 13 '24 at 11:23
  • $\frac{d^2x}{dx^2} \neq 0$ (generally). This is just a problem of thinking in the typical notation. The second derivative of $x$ with respect to itself IS zero, but that is not $\frac{d^2x}{dx^2}$ in the revised notation. In fact, if you use the revised notation, the fact that the second derivative of $x$ with respect to itself is zero becomes immediately obvious. $\frac{d^2x}{dx^2} - \frac{dx}{dx}\frac{d^2x}{dx^2}$. $\frac{dx}{dx} = 1$, therefore this becomes $\frac{d^2x}{dx^2} - \frac{d^2x}{dx^2} = 0$. – johnnyb Jul 15 '24 at 18:29