36

In college, I've come across many instances where we multiply a derivative by a function, and the result somehow becomes the derivative of the function i.e $\frac{d}{dx}\times f=\frac{df}{dx}$— as if we're multiplying "operators" with functions in a purely algebraic way. This has always puzzled me, because it seems to work, but I don't fully understand why it works.

Here are a few examples to illustrate my confusion:


1. Curl of a vector field:

$$ \text{curl } \mathbf{F} = \nabla \times (M, N, P) = \left( \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right) \times (M, N, P) $$

When evaluating the determinant in the cross product definition, it seems like we’re "multiplying" the components of a vector of operators (whatever that means) with the components of a vector field. But that feels mysterious. Why is it okay to do something like:

$$ \left( \frac{\partial}{\partial x} \right) \cdot M $$

and call the result just $\frac{\partial M}{\partial x}$? What allows us to manipulate the gradient operator $\nabla$ as though it were a vector?

If this is valid, why can't I do something like:

$$\operatorname{ pie} (M,N,P) = (!, \sqrt{\ }, \wedge 2) \times (M, N, P) $$

where “!” is the factorial operator, and $\wedge 2$ is the square operator, just for the sake of argument? it doesn't make less or more sense that the defintion of curl in my opinion.


2. Divergence of a vector field:

$$ \text{div } \mathbf{F} = \nabla \cdot (M, N, P) = \left( \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right) \cdot (M, N, P) $$

Same idea — we’re dotting a vector of operators with a vector of functions. What justifies treating the differential operator as if it were a vector component that can be dotted and crossed in this way?


3. Operator methods in ODEs:

In solving linear differential equations like:

$$ a y'' + b y' + c y = f $$

we often introduce the differential operator $ D = \frac{d}{dx} $ and rewrite the equation as:

$$ (aD^2 + bD + c)y = f $$

Then we manipulate this expression algebraically, even dividing by the operator polynomial, e.g.,

$$ y = \frac{1}{aD^2 + bD + c} f $$

In some cases, we go even further and treat operator expressions like $\frac{1}{1 - D}$ (I also have no idea how to define $1/(1-D)$) as a geometric series:

$$ \frac{1}{1 - D} = 1 + D + D^2 + \dots $$

and apply this to a function $f$, obtaining:

$$ f + Df + D^2f + \dots $$

How does this make sense? Why are we allowed to expand differential operators like power series and apply them this way?

More about my doubts about this method here.


4. Schrödinger equation and time evolution:

In quantum mechanics, I remember (though somewhat vaguely) that we would manipulate expressions like:

$$ i\hbar \frac{\partial}{\partial t} \psi = H \psi $$

and treat $\frac{\partial}{\partial t}$ almost as an algebraic quantity. At one point, I recall solving for $4\psi(t)$ by exponentiating the Hamiltonian operator and applying it to $\psi(0)$, like:

$$ \psi(t) = e^{-iHt/\hbar} \psi(0) $$

This again treats differential operators as algebraic objects, able to be exponentiated, composed, and manipulated.

We also did this :- $\frac{ \partial w(x,t)}{ \partial x} =\frac{ \partial }{ \partial x}\times \cancel{ w(x,t)} =\frac{2i\pi \cancel{ w(x,t)} P}{h} $ To get $\frac{ \partial}{ \partial x}= \frac{2i\pi P}{h} $


My Question:

All of these cases seem to rely on treating differential operators like algebraic objects — manipulating them in ways that resemble ordinary algebra (adding, multiplying, factoring, even inverting or expanding them in power series). Why is this allowed? What mathematical framework makes this rigorous?

I've asked professors and classmates, but usually get vague or unsatisfying answers like "it just works" or "it's a notation thing.". I feel there must be a more rigorous explanation but I don't know where to look.

Is there a general theory or justification that explains why and when it’s valid to treat differential operators this way?

pie
  • 8,483
  • I think I have encountered more of these but these are the ones that I can recall now, If I remember more I will add them. – pie Jun 07 '25 at 08:37
  • 2
    Your number 3 perplexed me to no end when I learned it, decades ago, in university math. I look for a compelling answer here. Thanks. ($+1$) – David G. Stork Jun 07 '25 at 19:43
  • 1
    Why can't I do something like… pie()? You can, but what use would it have? The curl, by contrast, 1. appears in Stokes’ Theorem; 2. corresponds to the exterior derivative of a 1-form; 3. produces another vector field that transforms like a vector should under the group of rotations; and 4. appears frequently in physics, such as Maxwell’s equations. What is your pie() good for? Nothing. – Ghoster Jun 07 '25 at 21:11
  • @Ghoster If I am allowed to do that, why? wgat is the defintion of $(!,\sqrt, \text{^2})$ and why does it work – pie Jun 08 '25 at 02:50
  • I would expect your $pie(M,N,P)$ to mean $(\sqrt P-M^2, N^2-P!, M!-\sqrt N)$. You compute a cross product of two triples in the usual way and let each “operator” in the first triple operate on the appropriate component of the second triple. What would you expect it to mean? I don’t know what “work” is supposed to mean; math just needs to be well-defined and free of contradictions. Unlike this silly computation, the curl is meaningful because of how partial derivatives transform under rotations; $\nabla$ is a vector operator. – Ghoster Jun 08 '25 at 03:27
  • Note that “vector operator” doesn’t mean “has three components”. Your operator has three components but isn’t a vector operator. – Ghoster Jun 08 '25 at 03:36
  • Pseudo linear algebra studies properties common to linear differential and difference (recurrence) operators. – Bill Dubuque Jun 08 '25 at 03:54
  • For a “purely algebraic” approach, check out Mikusinksi Calculus. You can interpret differential operators as inverse convolutions. – user3716267 Jun 08 '25 at 20:58
  • @user3716267 Can you elaborate more on that? – pie Jun 09 '25 at 00:26
  • @pie consider the ring of half-line functions under convolution. We can construct a field of fractions of this ring, the same way we do when we construct rational numbers from the ring of integers. This field of “convolution fractions” includes the differential operator, as the inverse of the integral operator (which is just the unit step function). – user3716267 Jun 09 '25 at 01:26
  • @BillDubuque I can't find that book but I am more surprised by the fact that the answer to is question is in a CS book not a pure math one – pie Jun 09 '25 at 02:59
  • 1
    @pie That "pseudo linear algebra" paper is published in the (prestigious) journal Theoretical Computer Science because it is concerned with effective algorithms (both authors did much work on computer algebra systems). As I mentioned in the linked post, many of the ideas date back to Ore and Jacobson (among others). You can probably find a free copy by googling the title and authors. $\ \ $ – Bill Dubuque Jun 09 '25 at 03:05
  • I had thought that Heaviside invented this stuff, but there were others before him, according to this historical article. – Simon Crase Jun 09 '25 at 05:41
  • Theory and examples at : Operator Calculus – Han de Bruijn Jun 15 '25 at 18:33

4 Answers4

26

The short answer is ‘yes’ there is a general theory of these things.

The story starts when you start thinking of collections of functions that all satisfy a certain property (let’s say: smooth real-valued functions on the real line) as infinite dimensional vector spaces. Then one thinks of differential operators as a linear maps between such spaces. Often the space of all linear maps between two spaces is itself a vector space and so one can indeed start to manipulate differential operators as if they are ‘objects’ in their own right eg add them together. And often but not always the space of linear maps is in fact an algebra: you can multiply together the elements, which naturally corresponds to composing the linear maps they represent.

Some keywords to read more: Unbounded operator, Banach space, operator algebra, spectral theory more generally etc.

SBK
  • 3,633
  • 12
  • 17
  • 1
    Can you give me reference for "you can multiply together the elements, which naturally corresponds to composing the linear maps they represent." – pie Jun 08 '25 at 02:51
  • @pie : The most common example is multiplication of matrices which is one way for composing (finite dimensional) linear maps among vector spaces. See https://en.wikipedia.org/wiki/Linear_algebra#Matrices , https://en.wikipedia.org/wiki/Hilbert_space#Operators_on_Hilbert_spaces , and https://en.wikipedia.org/wiki/Differential_operator . See also https://en.wikipedia.org/wiki/Derivation_(differential_algebra) . – Eric Towers Jun 08 '25 at 03:37
  • 1
    I would add pseudodifferential operators and symbols of to your keywords – AwkwardWhale Jun 08 '25 at 21:56
  • @EricTowers From what can recall for a linear transformation $T$ from $F^n$ to $F^m$, there is a $n\times m$ matrix A such that $T(X)=AX$, but how can we apply this to the derivative? I recall there is a Matrix for the derivative operator but that one only works on vector space of polynomials not all differentiable functions, also if that even exist in these example we multiply the function with $X$ not the matrix i.e $T\times X$ which doesn't make sense to me – pie Jun 09 '25 at 00:35
  • @pie The point is that matrix multiplication already represents a form of composition (of actions-on-vectors). A matrix $M$ defines a (linear) action on vectors $\phi_M:v\mapsto Mv.$ Matrix multiplication tells you how to simplify compositions of these actions: $\phi_N\circ\phi_M=\phi_{NM}.$ (The content of this equation is the associativity of matrix-matrix-vector products--that, for all $v,$ $(NM)v=N(Mv).$) So it should not be surprising that composition of operators forms a reasonable definition for the multiplication of operator algebras. – HTNW Jun 09 '25 at 12:09
  • To see why multiplying matrices is the same as composition of linear maps in the $2\times 2$ case, see my answer to the question https://math.stackexchange.com/questions/271927/why-historically-do-we-multiply-matrices-as-we-do. – KCd Jun 09 '25 at 17:13
  • I'd tend to emphasize "operator algebra" here, and maybe add "(noncommutative) ring theory". So, there is a ring (or a $\mathbb{R}$-algebra or $\mathbb{C}$-algebra) of differential operators, which could be constructed as the subring/subalgebra of an endomorphism ring/algebra generated by $\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \ldots$. – Daniel Schepler Jun 09 '25 at 20:58
7

It comes down to the fact that the derivative is a linear operator. That is, it distributes over addition:

$$ D (f + g) = (D f) + (D g) $$

and commutes with scalar multiplication:

$$ D (a f) = a (D f) $$

Thus it can be notationally treated as a kind of multiplication.

user76284
  • 6,408
  • 17
    "you know your notation is good when it can be meaningfully abused" is probably the proper philosophy here. – Sidharth Ghoshal Jun 08 '25 at 03:06
  • 9
    You’re missing the key fact that it can be composed (so we have a (unital) algebra). Hence, polynomials in $D$ make sense (since it generates a commutative algebra), and under suitable conditions we can make sense of continuous functions/Borel functions of $D$. In other words, it’s not so much the linearity of $D$ which is important, but rather that it lives inside of a vector space (actually an algebra). – peek-a-boo Jun 08 '25 at 03:23
  • 3
    After revisiting this I see @peek-a-boo's point a bit more clearly. It may feel obvious perhaps that $\frac{d}{dx} \circ \frac{d}{dx} = \frac{d^2}{dx^2}$ and that differential expressions can be added via like-terms but that is also critical here and doesn't immediately follow from this answer. – Sidharth Ghoshal Jun 08 '25 at 21:37
2

I will try to show how differential operators can be treated like algebraic variables, which simplifies solving linear differential equations. This explanation is intuitive and without detailed proofs.


Definitions and Notation

Define the differential operator $D$ as:

$$ D^n f = \frac{d^n f}{dx^n} $$

which simplifies notation and allows algebraic manipulation of differential equations.


Operator Polynomial

Let

$$ \phi(D) = a_n D^n + a_{n-1} D^{n-1} + \cdots + a_1 D + a_0 $$

where the coefficients $a_k$ is constant in all theorems except theorem 1 can be a function.


Theorems (Without Proof)

Theorem 1:
Any linear differential equation

$$ a_n y^{(n)} + a_{n-1} y^{(n-1)} + \cdots + a_1 y' + a_0 y = f(x) $$

can be expressed as:

$$ \phi(D) y = f(x) $$


Theorem 2:
If $a, b$ are constants, then

$$ (a D^m)(b D^n) y = ab\, D^{m+n} y $$


Theorem 3:
If $\phi(D) = \phi_1(D) \phi_2(D)$ then $$ y = Y_1 + Y_2 $$

where $Y_1$ and $Y_2$ are the general solutions of $\phi_1(D) y = 0$ and $\phi_2(D) y = 0$ respectively.


Results for Particular Solutions

Result 1:
If $ \phi(D) y = e^{a x} $ and $\phi(a) \neq 0$, then

$$ y = \frac{e^{a x}}{\phi(a)} $$


Result 2:
If $\phi(a) = 0$ and

$$ \phi(D) = (D - a)^m \phi_1(D), \quad \phi_1(a) \neq 0 $$

then

$$ y = \frac{x^m}{m!} \frac{e^{a x}}{\phi_1(a)} $$


These two results show the benefit of using the definition:

$$ y = \frac{1}{\phi(D)} e^{ax} $$

to find particular solutions. Using Results 1 and 2 along with this definition, we have:

  • For $\phi(a) \neq 0$:
    $$ \frac{1}{\phi(D)} e^{ax} = \frac{e^{ax}}{\phi(a)} $$

  • For $\phi(a) = 0$ and $\phi(D)=(D-a)^m \phi_1(D)$ with $\phi_1(a) \neq 0$:
    $$ \frac{1}{\phi(D)} e^{ax} = \frac{x^m}{m!} \frac{e^{ax}}{\phi_1(a)} $$

Result 3:
If $ \phi(D) = \phi_1(D) \phi_2(D) $ and $$ Y_1 = \frac{1}{\phi_1(D)} f(x) $$

then

$$ \frac{1}{\phi(D)} f(x) = \frac{1}{\phi_2(D)} Y_1 $$


Why does $\frac{1}{1 - D} = 1 + D + D^2 + \cdots$ work?

Consider the equation $ (1 - D) y = f(x) $ This is equivalent to

$$ y = (1 + D + D^2 + \cdots) f(x) $$

The right-hand side means

$$ y = f(x) + f'(x) + f''(x) + \cdots $$

Putting this into the left-hand side:

\begin{aligned} y - y' &= \big(f + f' + f'' + \cdots \big) - \big(f' + f'' + f''' + \cdots \big) \end{aligned} therefore we get $y-y'=f(x)$

which shows the equality holds, so the operator series is valid.

for more generalization lets see theorem 5


Theorem 5:
If

$$ y = \frac{1}{1 - \phi(D)} f(x) $$

then

$$ y = (1 + \phi(D)+ \phi^2(D) + \cdots) f(x) $$


Faoler
  • 2,568
  • 6
  • 18
  • 4
    You might wish to elaborate in what sense your series of operators converges :) in some sense this is the matter of the problem. If we were talking bounded operators you could get away with basic power series, not so much in this setting. – Severin Schraven Jun 09 '25 at 00:40
1

This is a comment not an answer.

You can remove a lot of mystery by simply noting that if we had two linear operators $E_1, E_2$ given by:

$$ E_1 = a_0*I + a_1 \frac{d}{dx} + a_2 \frac{d^2}{dx^2} + ... $$ $$ E_2 = b_0*I + b_1 \frac{d}{dx} + b_2 \frac{d^2}{dx^2} + ... $$

Then we can compute

$$E_1 [E_2] = a_0 b_0 + (a_0 b_1 + b_0 a_1) \frac{d}{dx} + ...$$

But now independent of this suppose we had two analytic functions

$$ f(z) = a_0 + a_1z + a_2z^2 + ... $$ $$ g(z) = b_0 + b_1z + b_2z^2 + ... $$

Then $$f(z)\cdot g(z) = a_0 b_0 + (a_0 b_1 + b_0 a_1)\cdot z + ... $$

So, without deeply knowing WHY, we can still SEE CLEARLY, that composing the operators $E_1, E_2$ gives the same coefficients as multiplying the analytic functions $f,g$. From here all the results you want arise intuitively (such as integration being division, factoring operators, the Euler-Maclaurin formula as a jacked cousin of $\frac{1}{1-e^{x}}$ etc...)

The WHY is given by @user76284 's answer. And it's good to explore this.

Yet simultaneously you want to learn how to manipulate these even without understanding too deeply why. See back in the late 1700s and 1800s folks were wildly manipulating these expressions without consciously being aware of such formal considerations, because they noticed the specific pattern I mentioned above. And so getting used to this similar to how a young child learns to add numbers (not understanding what base-N arithmetic IS, just knowing, that you can consistently ADD and CARRY) is also useful in its own right.

A good rite of passage in seeing if you "really get it" is if you can derive this from the expression $\frac{I}{I - e^{\frac{d}{dx}}}$ and explain why that makes sense as a summation formula. (Hint: you should first try to be absolutely clear as to why $e^{\frac{d}{dx}}[f] = f(x+1)$ )

KCd
  • 55,662
  • 1
    the “why” isn’t really given in user76284’s answer, rather in the other answer (albeit briefly, but understandably so, since operator theory and spectral theory is a massive subject (and along similar lines, the theory of pseudo differential operators etc)). – peek-a-boo Jun 08 '25 at 03:29
  • 3
    I'm not so sure I would say so. The other answer is, of course useful too, but really the point is that the function $f(c) = c*x $ and the operator $ O[f] = \frac{d}{dx}[f]$ are both linear and so "analytic functions on the left hand side" are really identical in a sense to the "generic linear operators" on the right hand side and hence this correspondence exists. I find personally that: @user76284's answer gets right to the heart of the matter this way. – Sidharth Ghoshal Jun 08 '25 at 03:32
  • A more formal way to say this is that: there is an isomorphism between the ring of "analytic functions under multiplication" and the ring of "linear operators under composition" given by $\phi: a_0 + a_1 z + a_2 z^2 + ... \rightarrow a_0 + a_1 \frac{d}{dx} + a_2 \frac{d^2}{dx^2} + ... $ and $\phi^{-1}$ similarly defined (when either is defined). – Sidharth Ghoshal Jun 08 '25 at 03:35
  • @peek-a-boo I left a comment on user76284s answer. I think you were right that there was a bit more to be desired there. – Sidharth Ghoshal Jun 08 '25 at 21:38