5

The role of $x$ in $\frac{\mathrm{d}}{\mathrm{d}x} y$ not only confuses my calculus students, it has also puzzled some well known mathematicians. Questions one might ask are:

  • Does the $x$ in the denominator bind the $x$ in $y$? (Clearly no, since $\frac{dx^2}{dx}=2x$.)

  • Can one substitute for $x$ in the denominator? (Looks like not, what should $\frac{dx^2}{d3}$ mean?)

  • Is the $x$ in the denominator itself bound?

I was wondering if interpreting that $x$ as a symbol, in the sense of Bob Harper's Practical Foundations for Programming Languages (Chapter 1.2 on abstract binding trees) might solve these riddles, and if this had already been worked out by someone?

Here's a quote from PFPL:

It will often be necessary to consider languages whose abstract syntax cannot be specified by a fixed set of operators, but rather requires that the available operators be sensitive to the context in which they occur. For our purposes it will suffice to consider a set of symbolic parameters, or symbols, that index families of operators so that as the set of symbols varies, so does the set of operators. [...] The only difference between symbols and variables is that the only operation on symbols is renaming; there is no notion of substitution for a symbol.

2 Answers2

2

Some mathematicians find it natural to substitute for the variable in the denominator of a derivative, writing things like $\frac{d \log V}{d\log p}$. This suggests that the $x$ in the denominator $\frac{dy}{dx}$ is neither bound nor binding nor a symbol.

Rather $\frac{dy}{dx}$ seems to be an operation on "variable quantities" $y,x$ requiring some side conditions, similar to how the usual division $a/b$ requires $b\neq 0$, or how the operation $\frac{v}{v_1^0\ldots v_i^1\ldots v_n^0}$ in my comment requires $v_1,\ldots,v_n$ to be linearly independent and $v$ to lie in their span.

I suspect that the side condition for $\frac{dy}{dx}$ should be analogous: $dx$ needs to be linearly independent and $dy$ has to be a multiple of $dx$.

With these side conditions we are not allowed to substitute a constant for $x$ in $\frac{dy}{dx}$ since then $dx=0$ and so $dx$ is not linearly independent. That would explain why we are not allowed to write $\frac{d x^2}{d3}$ or to take the derivative of equation $x=0$ w.r.t. to $x$ to conclude $1=0$.

I'd be interested in hearing if someone sees an immediate problem with this interpretation.

(I see some subtlety in the fact that $d(x|_{x=3})$ cannot mean the same as $(dx)|_{x=3}$, since otherwise I'd expect $\frac{dy}{dx}|_{x=3}$ to be the same as $\frac{d(y|_{x=3})}{d(x|_{x=3})}$ which would not be allowed by the side condition. On the other hand, that is not so strange from a differential geometric point of view if we read $|_{x=a}$ as restriction: there is a difference between restricting a differential form and pulling it back. But I need to think more about this.)

1

Functions of a single variable

We can define a operator $\mathcal{D}$ on functions $f: \mathbb{R} \to \mathbb{R}$ so that $\mathcal{D}(f)$ is the first derivative of $f$. It is common to write this operator without parentheses, i.e., write it as $\mathcal{D} f$ instead of $\mathcal{D}(f)$ and write $\mathcal{D} f(x)$ instead of $(\mathcal{D}(f))(x)$ or $\mathcal{D}(f)(x)$.

What about functions of multiple variables? We can define a operator $\mathcal{D}_1$ on functions $f: \mathbb{R}^n \to \mathbb{R}$ so that $\mathcal{D}_1(f)$ is the partial derivative of $f$ with respect to its first argument, i.e., $(\mathcal{D}_1 f)(x) = {\partial f \over \partial x_1} f(x_1,\dots,x_n)$. Then $\mathcal{D}_1(f)$ is a function with signature $\mathbb{R}^n \to \mathbb{R}$.

These operators $\mathcal{D},\mathcal{D}_i$ have a clear interpretation that avoids the ambiguities you mentioned.

On to interpret more standard notation. Here's the thing about standard notation. When someone writes $x^2$ in a math textbook, there are two things they might mean. If $x$ is taken as a free variable, this might represent a function, namely the function $\lambda x . x^2$. Or, if $x$ is taken as a bound variable, it might represent a number: it is the value of $x$, squared. Since the same notation is used for both, the reader has to infer which was intended based on the surrounding context. That's OK for mathematical exposition, but problematic for programming languages, where we need expressions to have an unambiguous meaning.

And the same ambiguity infects notation surrounding functions and derivatives. The expression $f(x)$ sometimes is used to represent the function $f$, and sometimes to represent the value obtained by evaluating $f$ at the input $x$. The expression ${df \over dx}$ is sometimes intended to represent the function $\mathcal{D} f$, and sometimes to represent the value of that function evaluated at the input $x$, i.e., $(\mathcal{D} f)(x)$.

When you see someone write something like ${d \over dx}f(x)$ or ${df \over dx}(x)$ where $f$ is a function of one variable, that might be intended to represent $\mathcal{D} f$ or $(\mathcal{D} f)(x)$: you have to look at context to guess what was intended. Mentioning "$x$" in the denominator is a bit sloppy since $x$ is a bound variable of the expression defining $f$; the $\mathcal{D}$ notation makes it clearer that $\mathcal{D}$ is an operator that takes a function and returns another function. This operator doesn't really care what name you give the bound variable.

What about ${dy \over dx}$ or ${d \over dx} y$? Sometimes, the context makes clear that $y$ is implicitly a function of $x$, i.e., $y = f(x)$. Then, this notation might refer to the function $\mathcal{D} f$, or it might refer to the number $(\mathcal{D} f)(x)$ -- you have to guess from context. Or, if you prefer, it might represent $\mathcal{D} \lambda x . \cdots x \cdots$, where "$\cdots x \cdots$" is some expression that describes how $y$ is computed as a function of $x$, or it might represent $(\mathcal{D} \lambda t . \cdots t \cdots)(x)$. In other words, $y$ might represent either a function (of $x$) or a number. In the former case, we are basically writing $y$ as a shorthand for $\lambda x . \cdots x \cdots$; the context has made clear how $y$ varies as a function of $x$, so in communication with humans we don't bother writing it out a second time.

Functions of multiple variables

What about functions of multiple variables? When you see someone write something like ${\partial \over \partial x_1} f(x_1,\dots,x_n)$, this is implicitly the same as either $(\mathcal{D}_1 f)(x_1,\dots,x_n)$ or $\mathcal{D}_1 f$ (you have to guess from context which was intended). Everything should now follow from the discussion above.

If we take this understanding that ${d \over dx}$ is syntactic sugar for $\mathcal{D}$ and $\mathcal{D}_i$, then it becomes clearer how to answer your questions. In particular, we only need to answer your questions for the $\mathcal{D}$ operators. And most of your questions go away, because $\mathcal{D}$ no longer mentions a variable $x$, so we don't need to answer whether $x$ is bound or not, whether you can substitute for $x$ in the denominator, etc.

Last question: is $\mathcal{D}$ a symbol in the sense that Harper meant? I don't know. You'll have to check the definition in that paper.

D.W.
  • 167,959
  • 22
  • 232
  • 500