Understanding an Inconsistency for the Multivariable Chain Rule

Question

To motivate my question, let's start with an example:

Example: Find $\tfrac{df}{dt}$ and $\tfrac{\partial f}{\partial t}$ if $f(x,t)=xt$, where $x=x(t)$.

From my current understanding, the total derivative $\tfrac{df}{dt}$ cares about us substituting $t$ into $x$ to give us

$$\dfrac{df}{dt}=\dfrac{d}{dt}(xt)=\dfrac{d}{dt}(x(t)\cdot t)=\dfrac{dx}{dt}t+x$$

and the partial derivative $\tfrac{\partial f}{\partial t}$ doesn't care about the substitution as it sees $x$ and $t$ as two separate variables and keeps $x$ constant for the differentiation

$$\dfrac{\partial f}{\partial t} = \dfrac{\partial}{\partial t}(xt)=x$$

What throws me off is when we have a function such as $f(x(u,v),y(u,v))$. Finding $\tfrac{\partial f}{\partial u}$ for example would be found by doing

$$\dfrac{\partial f}{\partial u}=\dfrac{\partial f}{\partial x}\dfrac{\partial x}{\partial u}+\dfrac{\partial f}{\partial y}\dfrac{\partial y}{\partial u}$$

My question is this: Why doesn't $\tfrac{\partial f}{\partial u}=0$? The function $f$ is only a function of $x$ and $y$ originally and there are no $u$'s to be found in the original function. You may say we perform the substitution to get $x=x(u,v)$ and $y=y(u,v)$ making $f$ a function of $u$ and $v$ but from the example above, for $\tfrac{\partial f}{\partial t}$ we do not substitute $x=x(t)$ and kept $x$ as is. So why are functions of two variables different and require the use of the multivariable chain rule?

The issue is bad notation, especially with an overload of the symbol $f$, and I beat the matter to death in this answer of mine. — peek-a-boo, Jan 07 '25 at 20:41

score 3 · Answer 1 · edited Jan 07 '25 at 19:39

As it was mentioned before, this is just a matter of notation. In your second example the use of $f$ is ambiguous (but convenient). The proper way would be to say that $g(u,v) = f(x(u,v), y(u,v))$, and that $$ \frac{\partial g}{\partial u} (u,v) = \frac{\partial f}{\partial x}(x(u,v),y(u,v))\cdot \frac{\partial x}{\partial u}(u,v) + \frac{\partial f}{\partial y}(x(u,v),y(u,v))\cdot \frac{\partial y}{\partial u}(u,v). $$

score 1 · Answer 2 · answered Jan 05 '25 at 07:49

I'll write a short answer. Hopefully this is enough. I think the confusion is mostly due to you using the variable $t$ twice with slightly different meanings. This is a slight abuse of notation for convenience. Let's be a bit more pedantic. You have $f(x,t)$. Then let's use e.g. $\tau$ instead of $t$ for the compound functions, i.e. $f(x(\tau),t(\tau))$ where $x(\tau)$ is still the same function as before and $t(\tau)=\tau$. Now, instead of $\tfrac{df}{dt}$ calculate $\tfrac{df}{d\tau}$ and the logic should work just like in your second example.

user326210 · Accepted Answer · 2025-01-07T20:30:04.447

Yes, this is an ambiguity in Leibniz notation, not a fault of your understanding.

The question you are really being asked is: when you change $u$ by an infinitesimal amount, how much does $f(x(u,v),\,y(u,v))$ change?

And you can figure that out using the chain rule: it is the amount $f(\__1, \__2)$ changes when you change its first argument $\__1$, times the amount $x(u,v)$ changes when you change $u$, plus the amount $f(\__1, \__2)$ changes when you change its second argument $\__2$, times the amount $y(u,v)$ changes when you change $u$.

But in Leibniz notation, this question is written "What is $\frac{\partial f}{\partial u}$?" which doesn't tell you everything you need to know. The information is incomplete.

This notation just uses the name of the outermost function ($f$) as a nickname for what it really wants you to compute. This means the notation uses the same nickname $f$ whether you are supposed to compute $\frac{\partial}{\partial u}f(x,y)$ or $\frac{\partial}{\partial u}f( \alpha(u,\beta(u,v))), u)$ or $\frac{\partial}{\partial u} f(x(u,v), y(u,v))$. You have to look at the surrounding context to guess which derivative is meant.

It would be less ambiguous to ask "What is $\frac{\partial}{\partial u} f(x(u,v), y(u,v))$?". But the notation is designed to prioritize being short, at the expense of being ambiguous.

To use a concrete example, if $f(m)$ is the function that converts meters into feet, and $g(t)$ is the height of a rocket over time, I might ask: what is $\frac{\partial f}{\partial t}$?

And you would be correct to say "$\frac{\partial}{\partial t}f = 0$" because the process for converting meters into feet doesn't change over time— $f$ doesn't depend on $t$.

But you can guess that I meant to ask about a more interesting question, which is computing how the height of my rocket (in feet) changes over time, i.e. how $f(g(t))$ changes over time. So you would know from context that when I write the shorthand expression $\frac{\partial f}{\partial t}$, I am in this case really asking for $\frac{\partial}{\partial t} f(g(t))$.

Even though the notation is ambiguous and the nickname $\frac{\partial f}{\partial t}$ could also refer to $\frac{\partial}{\partial t}f(m)=0$.

Understanding an Inconsistency for the Multivariable Chain Rule

3 Answers3