9

Let $f:M \subset \mathbb{R}^2 \rightarrow N \subset \mathbb{R}^3$.

  • The function $f$ is a vector function.
  • Its differential $\mathrm{d}f \in \mathbb{R}^3$ represents the infinitesimal change in the function, where by $\mathrm{d}f$, I mean $\mathrm{d}f(x)$.
  • Its Jacobian (matrix) $J \in \mathbb{R}^{3 \times 2}$ maps vectors between tangent spaces $T_x M$ and $T_{f(x)} N$.

The relation between the two is $\mathrm{d}f = J dx$, where $\mathrm{d}x \in \mathbb{R}^2$.

However, if $f$ is considered a "mapping", then is the differential of the mapping $\mathrm{d}f$ equal to the Jacobian $J$?


From some of the answers, it seems that I took some things for granted (common knowledge or agreed by all). Moreover, there seems to be a confusion between differential, derivative, and their notation.

So first, let's agree that the differential (total derivative) and the derivative (Jacobian) are not the same thing:

Next, as per Wikipedia, let's agree on notation. Each of $f'(x)$, $D f(x)$, and $\frac{\mathrm{d} f}{\mathrm{d} x}$, and $J$ refers to the derivative. The notation $\mathrm{d}f$ is reserved to denote the differential.

Now, back to my question.

  • The derivative of $f$ is the Jacobian matrix $f'(x)=Df=J \in \mathbb{R}^{3 \times 2}$.

  • The differential of $f$ is the 3D vector $\mathrm{d}f = J \mathrm{d}x$.

For some reason, there are people who confusingly use the term "differential of a mapping" to refer to the derivative, as if they don't distinguish between the derivative and the differential:

My question is: What's up with that, and what am I missing?

Why is that important: for a long time, I wasn't clear about what exactly the differential is. It became an issue when I used matrix calculus to calculate the Hessian of a matrix function. The book Matrix Differential Calculus with Applications in Statistics and Econometrics cleared it all up for me. It properly and distinctively defines the Jacobian, gradient, Hessian, derivative, and differential. The distinction between the Jacobian and differential is crucial for the matrix function differentiation process and the identification of the Jacobian (e.g. the first identification table in the book).

At this point, I am mildly annoyed (with myself) that previously I wrote things (which are too late to fix now) and blindly (relying on previous work) used the term "differential of a mapping". So, currently, I either look for some justification for this misnomer or otherwise suggest to the community to reconsider it.


I tried to track down the culprit for this "weird fashion", and I went as far as the differential geometry bible. Looking at do Carmo, definition 1 in chapter 2 appendix, pg. 128 (pg. 127 in the first edition), the definition of $dF_p$ is fine (grammar aside): it's a linear map that is associated with each point in the domain.

But then, in example 10 (pg. 130), he uses the same notation to denote both Jacobian and differential. (This is probably what Ulrich meant by almost the same thing.) More specifically, he "applies it twice": once to get the Jacobian and once to get the differential. He uses $df(\cdot)$ to denote the Jacobian, a non-linear map into a matrix target, and $df_{(\cdot)}(\cdot)$ to denote the differential, a linear map into a vector target, and he calls both a differential.


Another point why I find it confusing is that for me the Jacobian is a matrix of partial derivatives and the differential is an operator. For example, to differentiate the matrix function $f:\mathbb{R}^{2 \times 2} \rightarrow \mathbb{R}$:

$f(X) = tr AX$

I would use the differential operator:

$df(X; dX) = tr AdX$

And from the Jacobian identification table (Magnus19), I'll get:

$Df(X) = A'$

Note that the differential isn't a trivial linear map anymore.

It also leads to another point. The differential has a linear approximation meaning. Basically, it denotes the change in the function. If it's a scalar value function, the change would be scalar, and thus the differential (would map to a scalar). If the domain is matrices, then the Jacobian is a matrix (a non-linear map from matrices to matrices). I definitely would find it confusing if someone would treat them the same.


Let's do another example, $f:\mathbb{R}^{2 \times 2} \rightarrow \mathbb{R}^{2 \times 2}$:

$f(X) = AX$

Using the differential operator:

$df(X; dX) = AdX$

$vec\ df(X; dX) = (I_2 \otimes A) vec\ dX$

From the Jacobian identification table:

$Df(X) = I_2 \otimes A$

In this case, I'm not sure I'd consider the differential $df$ and Jacobian $Df$ almost the same thing (I'm not so good with tensors). This is the root of my issue. It's not always a simple matrix multiplication, and one needs to be mindful about the difference between the differential and Jacobian.

Not to mention the second order differential and the Hessian identification.


I corresponded with a couple of Caltech guys who settled it for me, and I can live with that. To paraphrase:

Math is a living language like any other, it evolves and changes. As long as we clearly define the terms in the context, there shouldn't be a problem--call it whatever you want.

  • To start with, the terms function and variable do not have now their original meanings, used at the origin of differential calculus. Worse, the terms are often used in two meanings interchangeably, creating a great confusion, while the original meanings are not properly formalized in the modern mathematics. To understand it, i would recommend to read the original writings by Leibniz and others. – Alexey May 04 '25 at 13:31
  • In short, if $y = f(x)$, then $dy = f'(x)dx = y'dx$, where $f$ and $f'$ are functions in the modern sense, while $y$, $y'$, $dy$ are functions in the original sense. (Moreover, $x$ and $dx$ might also be implicit functions of $y$, $dy$, etc.) Here the variable quantities $dx$ and $dy$ are differentials of the variable quantities $x$ and $y$. – Alexey May 04 '25 at 13:35

7 Answers7

6

Short answer: if $f$ is a differentiable, then $Df(x)$ is the linear map and, if $f$ has continuous partial derivatives, then $Jf(x)$ is the matrix representation of the linear map $Df(x)$. That's all.

I will elaborate.

Let $f:\mathbb R^n \to \mathbb R^m$ be a differentiable function (we could restrict to open sets). By definition, this means that for every $x\in\mathbb R^n$ there is a linear map $T:\mathbb R^n \to \mathbb R^m$ such that

$$\lim_{y\to x}\frac{\|f(y)-f(x)-T\|}{\|y-x\|}=0.$$

One can prove that the linear map $T$ is uniquely specified by $f$ and $x$, so we can use the notation $Df(x):\equiv T$. This linear map is called the differential of $f$ at $x$ or the derivative of $f$ at $x$.

Let $f_1,\dots,f_m$ be the component functions of $f:\mathbb R^n\to\mathbb R^m$. This is, for every $x\in\mathbb R^n$ we have $f(x)=(f_1(x),\dots,f_m(x))$. Then for every point $x$, we define the as the matrix $Jf(x)\in M_{m\times n}\mathbb R$ whose $i$th row and $j$th column is the number $\partial_jf_i(x)$. One can also prove that if the partial derivatives $\partial_if_j$ are all continuous, then $Jf(x)$ is the matrix that represents the linear map $Df(x):\mathbb R^n\to\mathbb R^m$ with respect to the standard bases of $\mathbb R^n$ and $\mathbb R^m$.

With regards to notation, sometimes a lowercase $d$ is used instead of $D$. It is also convenient not to abuse the notation and distinguish clearly between $Df$, $Df(x)$, $J$, $Jf$ and $Jf(x)$. If we denote the set of linear maps $\mathbb R^n\to\mathbb R^m$ by $L(\mathbb R^n,\mathbb R^m)$, then $Df$ is the function $\mathbb R^n\to L(\mathbb R^n,\mathbb R^m)$ given by $x\mapsto Df(x)$. Similarly, $Jf$ is the function $\mathbb R^n\to M_{m\times n}\mathbb R$ given by $x\mapsto Jf(x)$. Personally I don't like to use $J$ by itself because it can give rise to confusion as to which object we are talking about.

Jackozee Hakkiuz
  • 6,119
  • 1
  • 17
  • 38
  • Okay, but I asked about the differential. Please see my added "EDIT 2". – Zohar Levi Jul 24 '20 at 22:33
  • Could you elaborate on why the partial derivatives need to be continuous? Is it because we are looking for a continuous/bounded linear functional $T$? And the the continuity of the derivatives implies the continuity of the map $J$? – lightxbulb May 17 '23 at 14:32
  • 1
    @lightxbulb it can happen that the partial derivatives of $f$ exist at $x$, and hence $Jf(x)$ exists (because it's just the matrix of partial derivatives of $f$ at $x$) but yet $Df(x)$ doesn't exist and hence $Jf(x)$ cannot be its matrix representation. The culprit of this anomaly is the lack of continuity of the partial derivatives.

    However, if the partial derivatives of $f$ exist at $x$ and are continuous at $x$, then $Df(x)$ exists and $Jf(x)$ is its matrix representation.

    – Jackozee Hakkiuz Jun 01 '23 at 23:48
  • @JackozeeHakkiuz Thank you. Just to make sure I have understood things correctly: whenever $Df(x)$ exists, then $Jf(x)$ is its matrix representation. And the partial derivatives of $f$ being continuous implies that $Jf(x)$ is linear and thus continuous and then $Jf(x) = Df(x)$ exists. I assume the converse is not true? That is, could I have discontinuous $Jf(x)$ but $Df(x)$ exists and $Df(x)=Jf(x)$? Also if the directional derivatives are non-linear (in the direction) then I guess $Df(x)$ doesn't exist? – lightxbulb Jun 02 '23 at 10:03
3

If $M\subset \mathbb{R}^n$ is an open set and $f:M\to \mathbb{R}^k$ is differentiable, then for $p\in M$ we have the derivative $d_pf:\mathbb{R}^n\to\mathbb{R}^k$, a linear map. In your situation it is not necessary (but certainly possible) to think about tangent spaces. The matrix that describes the linear map $d_pf$ with respect to the standard bases of $\mathbb{R}^n$ and $\mathbb{R}^k$ I would denote by $f'(p)$ (you call it $J$). So it is just the matter of applying a linear map to a vector versus multiplying this vector by a matrix: $$d_pf(Y)=f'(p).Y$$ Almost the same thing...

  • That's fine, but I asked about the differential. Please see my added "EDIT2". – Zohar Levi Jul 24 '20 at 22:31
  • I think I'm starting to understand the confusion. For you, differential is synonymous with derivative. For me, differential is what you call variation (your $\delta$ operator in Chao10). – Zohar Levi Jul 25 '20 at 02:10
  • So your function $f$ depends on an additional variable $t$, so it really is a function of three variables? And your "differential" is the partial derivative with respect to $t$? For me, that would be the meaning of a "variation". And this is usually the way how can make sense of the $\delta$-notation. And yes, this notation is used Chao et al. where I am one of the authors, but please do not blame me. I hate that notation. – Ulrich Pinkall Jul 26 '20 at 07:33
  • I assumed the fancy math in appendix B in that paper was your doing. – Zohar Levi Jul 26 '20 at 22:42
2

I use the word "differential" (aside from the abuse the term gets in beginning calculus texts) only when referring to $1$-forms. So, of course, for a mapping $f\colon M\to\Bbb R$, the differential $1$-form at $p$ ($df(p)$) coincides with the derivative at $p$ as a linear map $Df(p)\colon T_pM\to\Bbb R$. (Some people write this $df(p)$, $Df_p$, $df_p$, and who knows what else.) Sometimes you will see that for vector-valued functions $f\colon M\to\Bbb R^k$, some of us will refer to the differential as a vector-valued $1$-form; this, too, coincides with the derivative as a linear map.

Ted Shifrin
  • 125,228
  • Let's consider scalar-valued functions. I'm fine with your differential 1-form definition that maps a vector from $M$ to a scalar: https://en.wikipedia.org/wiki/One-form#Differential. I'm not clear, though, why you call it derivative/Jacobian/gradient, where these entities map a vector to a vector. For example, do you agree with the definitions and answers ($df(2,2;v)=12$, $\nabla f(1,1)=(2,2)$) in https://math.stackexchange.com/questions/3071033/difference-between-differential-and-derivative – Zohar Levi Jul 25 '20 at 01:13
  • Gradient is out of play here. It’s the vector dual to the derivative/differential. That answer was sloppy because it confused a row vector (linear map) and column vector (e.g.. gradient). – Ted Shifrin Jul 25 '20 at 02:13
  • I agree (like Magnus19) that the gradient (column vector) is the transpose of the Jacobian (row vector). Therefore, up to transposition, they represent the derivative. The main point is, though, that these are vectors while the differential is a scalar (and I chose to use $\nabla$ to distinguish it from $d$). – Zohar Levi Jul 25 '20 at 02:23
  • No, the differential is not a scalar. It's a linear map. – Ted Shifrin Jul 25 '20 at 03:41
  • Fine; does it map to a scalar or a vector? What about the Jacobian/derivative? – Zohar Levi Jul 25 '20 at 07:52
2

One thing I think differential topology does really nicely is to disambiguate spaces which look identical in Euclidean space. By making it clear that the spaces are different, it's also clear that the objects which inhabit them are different.

Given a map between differentiable manifolds $f: M \to N$, the differential (or total derivative, or pushforward) of $f$ at $p\in M$ is

$$\text d_p f: T_pM \to T_q N$$

where $q := f(p)$. In other words, it maps tangent vectors at $p$ to tangent vectors at $q$. Since each space of tangent vectors is indeed a vector space, each $\text d_p f$ is a linear map.

Given a choice of bases for $T_p M$ and $T_q N$ (or more accurately, choice of local coordinates on $M$ and $N$ near $p$ and $q$), there is a unique matrix $J_p f \in \Bbb R^{n\times m}$, where $m := \dim M$ and $n := \dim N$. This matrix is called the Jacobian of $f$ at $p$.

In a fairly straightforward manner, we can "aggregate" these into a bundle map

$$\text df: TM \to TN$$

called the differential (or total derivative, or pushforward) of $f$. In general, the tangent bundles are not linear spaces. However, when $M = \Bbb R^m$ and $N = \Bbb R^n$, then $TM \simeq \Bbb R^m \times \Bbb R^m$ and $TN \simeq \Bbb R^n \times \Bbb R^n$. This gives us coordinates for the map

$$\text df: (p,v) \mapsto (q,\text d_pf(p))$$

where $\text d_p f$ is linear (as above). With a bit of massaging, we can turn this into

$$Df: \Bbb R^m \to \text{Hom}(\Bbb R^m,\Bbb R^n) \\ Df: p \mapsto \text d_pf$$

i.e. a linear map at each point in the domain, which can be represented as a matrix $Jf\in \Bbb R^{n\times m}$ with entries dependent on $p\in \Bbb R^m$. This matrix(-valued function) is called the Jacobian of $f$.

Alex Jones
  • 10,028
1

No. The derivative of the map $f$ is the Jacobian, $J$ : $$ \frac{\mathrm{d}f}{\mathrm{d}x} = J \text{.} $$ Then the relation between the differentials is "algebra". (It's not. It's a lot of machinery for handling linear approximations. But it looks like algebra due to a judicious choice of notation.)

Eric Towers
  • 70,953
  • "No" what? And the Leibniz's notation "algebra" that you are referring to may hold for univariate, scalar functions, but not in general. Here, $df$ and $dx$ are vectors. – Zohar Levi Jul 21 '20 at 06:27
  • @ZoharLevi : There is only one question in the Question and it is the last sentence: "However, if f is considered a "mapping", then the differential of the mapping df is equal to the Jacobian J?" The answer to that question is "No." – Eric Towers Jul 21 '20 at 13:19
  • @ZoharLevi : The derivative of the map $f$ with respect to its input is the local linear approximation $J$. This is a map from displacement vectors (from $\vec{x}$) in the tangent space (at $\vec{x}$) of the domain space of $f$ to displacement vectors (from $f(\vec{x})$) in the tangent space (at $f(\vec{x})$) of the codomain of $f$. In your setting, it is the linear map $J$. – Eric Towers Jul 21 '20 at 13:25
  • @ZoharLevi : The "algebra" is not even true for univariate, scalar functions. The indecomposable object $\frac{\mathrm{d}f}{\mathrm{d}x}$ is not a ratio of differentials. As I wrote in the Answer, both equations are correct, but one does not pass between them by operations in some field, but instead by first defining derivatives and differentials, then showing that it is a fortuitous feature of our notation that something as simple as a (formal) field operation can syntactically transform the one equation into the other, (continued...) – Eric Towers Jul 21 '20 at 13:31
  • @ZoharLevi : but only by disregarding the type mismatch between the derivative $J$ and the differential $J , \mathrm{d}x$. – Eric Towers Jul 21 '20 at 13:33
  • That wasn't a yes/no question. It started with the title and ended with "how come?". If you think that this isn't the definition, then see e.g. https://en.wikipedia.org/wiki/Pushforward_(differential)#The_differential_of_a_smooth_map https://math.stackexchange.com/questions/2134224/differential-of-a-map Now, you don't need to convince me that it's not the differential since that was my point. Nevertheless, this is how many define it, it's inconsistent--actually wrong, and the question is why (do they do that..). – Zohar Levi Jul 21 '20 at 22:47
  • By the way, when I said $df=J dx$, I didn't do some silly Liebniz algebra. It was derived from the derivative definition, e.g. the Jacobian identification table in Magnus19. – Zohar Levi Jul 21 '20 at 22:59
  • @ZoharLevi : Your Question completely fails to ask your question. The term "differential" has many meanings. What else would you call the exterior derivative of an abstract (argument-less) function? – Eric Towers Jul 22 '20 at 02:51
  • First, my function has arguments, and I expressed everything in coordinates. So, exterior calc has nothing to do with this. Actually, the exterior calc definition does makes sense: the differential is the directional derivative, i.e. $J$ the derivative times the direction $dx$. – Zohar Levi Jul 22 '20 at 06:50
  • @ZoharLevi : $\mathrm{d}f(x)$ has arguments, $\mathrm{d}f$ does not. You ask about argumentless $\mathrm{d}f$. – Eric Towers Jul 22 '20 at 14:01
  • Fine, edited to clarify. Still looking for an answer. – Zohar Levi Jul 22 '20 at 22:57
1

The way I stay sane with all this is to remember that everything can be derived from considering directional derivatives.

Given a function $f: A \rightarrow B$, where each of $A, B$ can be the real line, an open subset of $\mathbb{R}^n$ or an open subset of the space of $n$-by-$m$ matrices or a manifold, then the directional derivative of $f$ at $p\in A$ in the direction $v$ is defined to be $$ D_vf(p) = (f\circ c)'(0), $$ where $c: I \rightarrow A$ is a curve such that $c(0)=p$ and the velocity of $c$ at $p$ is $c'(0)=v$. This works because $f\circ c$ is a curve in $B$ and therefore its derivative is a velocity vector at $f(c(0))$.

The question now is what kind of thing is a velocity vector $v$? This can be worked out on a case by case basis. If $A$ is an open set in $\mathbb{R}^n$, then $v$ can be any vector in $\mathbb{R}^n$. If $A$ is an open set of $n$-by-$m$ matrices, then $v$ can be any $n$-by-$m$ matrix. If $A$ is a manifold, then $v$ can be any element of $T_pA$.

Once you have a clear understanding of what a directional derivative is, then you can fix $p$ and observe that the directional derivative defines a map from the set of all possible velocity vectors at $p\in A$ to the set of all possible velocity vectors at $f(p) \in B$. This turns out to be a linear map and is called, depending on the context, the differential, derivative, or Jacobian of $f$.

For example, if $A$ is an open set of $n$-by-$m$ matrices and $B$ is an open set of $r$-by-$q$ matrices, then the derivative/differential/Jacobian of $f$ at $p \in A$ is a linear map from the space of $n$-by-$m$ matrices to the space of $r$-by-$q$ matrices. Note thhat $A$ and $B$ don't even have to be the same type of space. If $B$ is instead a manifold, then the differential of $f$ is a linear map from the space of $n$-by-$m$ matrices to the tangent space $T_{f(p)}B$.

So, for example, if I'm asked to compute the derivative of a function of a matrix, I always start by computing the directional derivative. I find doing anything else to be too confusing

Deane
  • 10,298
  • "This turns out to be a linear map" - it can turn out to be a nonlinear map. "then the derivative/differential/Jacobian of $f$ at $p\in A$ is a linear map" - the derivative, differential, and Jacobian are different things. Sometimes they are used interchangeably, but that is typically abuse of terminology - and usually leads to confusion when going to a more general setting. The Jacobian is the coordinate representation of the linear map $df(x_0)$. The differential is typically $df$ which is linear in its second slot, but accepts $x_0$ in its first slot. – lightxbulb May 04 '25 at 17:29
  • @lightxbulb, my answer assumes the domain and codomain are finite-dimensional. If the function is sufficiently smooth ($C^1$ suffices) the directional derivative is always a linear function of the vector. It is of course not necessarily a linear function of the “first slot”, i.e., the point at which the derivative is being taken. – Deane May 04 '25 at 18:23
  • Certainly - it's minor nitpicks - though I must say those details did throw me off as a student. For example nowadays I have to work with FEM, where in the simplest setting the function space is $C^0$ and not $C^1$ - this also applies in general when one works with meshes in discrete differential geometry. As far as conflating $df(x_0)$ and $Jf(x_0)$ goes, I must say I really dislike it - it's partially why I wrote my extremely verbose answer. This conflation seemed to also be a confusion point of the OP. – lightxbulb May 04 '25 at 18:28
0

Resources

I have tried to give an answer where the definitions are chosen such that they will be compatible with as many fields I could think of. For example one may choose to define the differential to not necessarily be linear, but I have not taken this approach. Most of what I have written is based on various resources that I have listed below, but I emphasize once again that the definitions in these resources may clash with other uses of differential in the literature.

I highly recommend looking at the presentation on "Differentiation in Linear Spaces" by Simovici - especially for the definition of differential. He also has examples in the second part of the presentation. For discussion of terminology and definitions around Fréchet and Gateaux derivatives and variations I have found this appendix on "Differentiation in Abstract Spaces" by Tapia to be very nice (jump to figure 7.1 for a visual summary). For counterexamples of Gateaux differentiable functions that are not Fréchet differentiable see the nice figures in this handout on calculus of variations by Slastikov and Kitavtsev. You can also look at "Introduction of Fréchet and Gateaux Derivative" by Bemardi and Enyari and the section on generalized derivatives in Jahn's book "Introduction to the Theory of Nonlinear Optimization".


Total/Fréchet derivative for $f:\mathbb{R}^n\to\mathbb{R}^m$

The (total) derivative of a function $f:\mathbb{R}^n\to\mathbb{R}^m$ at point $x_0\in \mathbb{R}^n$, is defined as the (bounded) linear map $L$ such that $$\lim_{v\to 0}\frac{\|f(x_0+v)-f(x_0)-L(v)\|}{\|v\|} = 0.$$ If there exists an $L$ satisfying the above, we say that $f$ is differentiable at $x_0$, and one can show that the derivative $L$ is unique (i.e. there is no $L'\ne L$ such that it also satisfies the above).

One typically uses the notation $D_{x_0}f, Df(x_0), d_{x_0}f, df(x_0), f'_{x_0}, f'(x_0)$ instead of $L$. The function-like notation is intentional as will become evident when I define the differential in a subsection below. The above definition of the total derivative is equivalent to there existing a linear map $df(x_0)$ such that $$f(x_0+v) = f(x_0) + df(x_0)(v) + R(x_0,v), \quad \lim_{v\to 0}\frac{\|R(x,v)\|}{\|v\|} = 0.$$ That is, you can use it in the Taylor expansion.

Moreover, whenever the total derivative exists it agrees with the directional derivatives $$Df(x_0)(v) = \partial_v f(x_0) := \lim_{\epsilon\to 0}\frac{f(x_0+\epsilon v)-f(x_0)}{\epsilon}.$$ Note that the directional derivatives existing does not guarantee that the total derivative exists.


The differential

Suppose that the total derivative of $f:\mathbb{R}^n\to\mathbb{R}^m$ exists on a subset $X\subseteq\mathbb{R}^n$. Then you can define the differential (see slide 6 of Simovici's prensentation for this definition) as the function $\delta f : X \times \mathbb{R}^n \to \mathbb{R}^m$ such that $\delta f(x_0; v) = df(x_0)(v)$. Note that the total derivative of $f$ at $x_0$ is a linear map $df(x_0):\mathbb{R}^n\to\mathbb{R}^n$, while the differential is a function of two arguments $\delta f : X \times \mathbb{R}^n \to \mathbb{R}^m$ that is linear in its second argument. For convenience one often reuses the notation of the derivative with an omitted point evaluation argument $df, Df, f'$ for the differential $\delta f$. Then $df =\delta f$, i.e., $df: X\times \mathbb{R}^n\to\mathbb{R}^m$. You can think of this as producing the derivative $df(x_0)$ from the differential $df$ through currying.

On a separate note - the notation $\delta f$ is often reserved for the Gateaux variation, so I would really use $df, Df, f'$ unlike Simovici's $\delta f$.


Relation to the derivative for $f:\mathbb{R}\to\mathbb{R}$

In single variable calculus one defines the derivative at $x_0$ as $$\frac{df}{dx}(x_0) := \lim_{v\to 0} \frac{f(x_0+v)-f(x_0)}{v}.$$ The above says that for any $\epsilon>0$ we can find a $\delta(\epsilon)>0$ such that for $|v|<\delta$ we have $$|(f(x_0+v)-f(x_0))/v-\frac{df}{dx}(x_0)|<\epsilon.$$ But then this is also equivalent to $$\lim_{v\to 0} \frac{|f(x_0+v)-f(x_0)-\frac{df}{dx}(x_0)v|}{|v|} = 0.$$

In other words $df(x_0)(v) = \frac{df}{dx}(x_0)\cdot v$. Note that $df(x_0)$ is the total derivative, which is a linear map, while what we call the derivative in basic calculus, i.e. $df/dx(x_0) = df(x_0)(1)$, is really the coordinate represetation of $df(x_0):\mathbb{R}\to\mathbb{R}$, similar to how matrices are coordinate representations of linear maps (see my section on the Jacobian below). I suppose that one uses the term "derivative" for both the coordinate representation and the map, because in $\mathbb{R}$ it doesn't matter too much as you have a standard basis. It's similar to how one informally refers to a matrix $M:\mathbb{R}^{m\times n}$ as a linear map, even though it's really $M\cdot :\mathbb{R}^n\to\mathbb{R}^m$ that is the linear map ($\cdot$ here being matrix-vector multiplication). If you go to abstract vector spaces where there is no canonical choice of basis, I would argue that it's better to reserve the term derivative for the linear map, as the Jacobian depends on the choice of basis, while the definition of $df(x_0)$ does not depend on the choice of basis (you find an elaboration of this argument in the introduction of Tapia's "Differentiation in Abstract Spaces").


The Jacobian

In finite-dimensional spaces any linear map $L:U\to V$ between vector spaces $U$ and $V$, can be written as a matrix with respect to specific bases of $U$ and $V$. Suppose that $A=[a_1,\ldots,a_n] \in U^{1\times n}$ is a basis for $U$ and $B=[b_1,\ldots,b_m] \in V^{1\times m}$ is a basis for $V$. The canonical dual basis for the continuous dual space $V^*$ corresponding to $V$ is $b^1,\ldots,b^m: V \to\mathbb{R}$ satisfying the biorthogonality condition $b^i(b_j) = \delta^i_j$. Then the coordinate representation of $L$ wrt the two bases is given as $$([L]^A_B)^i_j = b^i(L(a_j)) \implies [L]^A_B\in\mathbb{F}^{m\times n}.$$ This is just a matrix, but note that it depends on the choice of bases.

If you take $U=\mathbb{R}^n$ and $V=\mathbb{R}^m$ one usually chooses the standard basis $a_j = e_j$ and $b^i(w) = e^i(w) = e_i^T\cdot w = w^i$. So now suppose you have the derivative $df(x_0):\mathbb{R}^n\to\mathbb{R}^m$, then its coordinate representation w.r.t. the standard basis is $$([df(x_0)]^A_B)^i_j = e^i(df(x_0)(e_j)) = e^i(\partial_{e_j} f(x_0)) = (\partial_{e_j} f(x_0))^i = \partial_{e_j} f^i(x_0).$$ This is the Jacobian matrix $Jf(x_0)=[df(x_0)]^A_B$ at $x_0$. To make this even clearer you can write the above as \begin{align} f &= \begin{bmatrix} f^1 \\ \vdots \\ f^m\end{bmatrix} : \mathbb{R}^n\to\mathbb{R}^m \\ Jf(x_0) &= \begin{bmatrix} \partial_{e_1}f^1(x_0) & \ldots & \partial_{e_n} f^1(x_0) \\ \vdots & & \vdots \\ \partial_{e_1}f^m(x_0) & \ldots & \partial_{e_n} f^m(x_0) \end{bmatrix} \implies df(x_0)(v) = J_f(x_0)\cdot v. \end{align}

You could also write $df(x_0) = \sum_{j=1}^n \partial_{e_j}f(x_0) e^j$ - remember that $e^j:\mathbb{R}^n\to\mathbb{R}$, so this is indeed a function from $\mathbb{R}^n$ to $\mathbb{R}^m$. This way of writing it makes it obvious that it is related to the exterior derivative from differential geometry, although there they use $dx^i$ for $e^i$ and $\frac{\partial f}{\partial x^j}(x_0)$ for $\partial_{e_j}f(x_0)$, that is $$df(x_0) = Jf(x_0) \cdot dx = \sum_{j=1}^n \frac{\partial f}{\partial x^i}(x_0)dx^i \implies df = \sum_{j=1}^n \frac{\partial f}{\partial x^i}dx^i.$$

If your spaces were not $U=\mathbb{R}^m$ and $V=\mathbb{R}^n$, and if you didn't have a standard/canonical choice of basis, then it may not be as obvious wrt which bases you have defined the Jacobian so you might have to specify that as $[df(x_0)]^A_B$ instead of just writing $Jf(x_0)$. When you see $Jf(x_0)$ it is defined w.r.t. some bases which should be clear from the context where this appears. The point is that $df(x_0)$ is the primary notion, not $Jf(x_0)$, the latter is just a coordinate representation of the linear map. Of course, for computations, you usually start by computing $Jf(x_0)$ and do not care that $df(x_0)$ is the primary notion, as you typically work in some basis.


Generalizations: Fréchet and Gateaux derivatives

You can take $f:X\to T$ where $(S,\|\cdot\|_S)$ and $(T,\|\cdot\|_T)$ are normed vector spaces (they can be infinite dimensional too - then consider Banach spaces), and $X$ is an open subset of $S$. Then the Fréchet derivative $df(x_0): X \to T$ of $f$ at $x_0$ is the bounded linear map that satisfies $$\lim_{v\to 0}\frac{\|f(x_0+v)-f(x_0)-df(x_0)(v)\|_T}{\|v\|_S} = 0.$$ Equivalently $$f(x_0+v) = f(x_0) + df(x_0)(v) + R(x_0,v), \quad \lim_{v\to 0}\frac{\|R(x_0,v)\|_T}{\|v\|_S} = 0.$$

So the total derivative is the Fréchet derivative for $S=\mathbb{R}^n$ and $T=\mathbb{R}^m$ with the Euclidean norms $\|\cdot\|_S=\|\cdot\|_2$ and $\|\cdot\|_T=\|\cdot\|_2$.

There's also a Gateaux derivative at $x_0$ defined as the bounded linear map $df_G(x_0)$ such that for any $v\in X$ $$df_G(x_0)(v) = \lim_{\epsilon\to 0}\frac{f(x_0+\epsilon v)-f(x_0)}{\epsilon}.$$ Equivalently you can write $$f(x_0+v) = f(x_0) + df_G(x_0)(v) + R(x_0,v), \quad \lim_{\epsilon\to 0}\frac{R(x_0,\epsilon v)}{\epsilon} = 0.$$

It's a weaker derivative than the Fréchet derivative in the sense that we care only about convergence along lines, while for the Fréchet derivative we require convergence along any path. This means that any Fréchet derivative is also a Gateaux derivative, but the converse is not necessarily true. You can find various examples illustrating the differences between the two in the figures in this handout on calculus of variations by Slastikov and Kitavtsev. Typically one does not write $df_G$ and rather just writes $df$, so whether $df$ refers to the Gateaux or Fréchet derivative should typically be deduced from the context.

The Gateaux derivative is quite useful for example in the calculus of variations. Note also that there are conflicting definitions of Gateaux differentiability in the literature, where for example linearity may not be required. Personally here I used the definitions from the presentation by Simovici "Differentiation in Linear Spaces" and Tapia's "Differentiation in Abstract Spaces". As in Tapia's treatment I would rather reserve the word variation for the setting where the "derivative" is not necessarily linear or bounded.


The gradient $\nabla f$ for $f:\mathbb{R}^n\to\mathbb{R}$

While in the above definitions the Fréchet and Gateaux derivatives required normed vector spaces (in fact a topological vector space is sufficient for a Gateaux variation), the definition of the gradient requires an inner product. Then you can define the (Gateaux) gradient $\nabla f(x_0)$ as the element that satisfies $df(x_0)(v) = \langle \nabla f(x_0), v\rangle$ where $\langle\cdot,\cdot\rangle$ is the inner product and $df(x_0)$ is the Gateaux derivative. You will notice a notational clash if you read the article on the (Gateaux) gradient in the encyclopedia of math and in Tapia's treatement. I would prefer to stick to $\nabla f$ for gradient, and reserve $f'$ for derivatives. In either case, the gradient technically depends on your choice of inner product. For $f:\mathbb{R}^n\to\mathbb{R}$ with respect to the standard dot product it simply becomes $$\nabla f(x_0) = \begin{bmatrix} \partial_{e_1} f(x_0) \\ \vdots \\ \partial_{e_n} f(x_0) \end{bmatrix}.$$ But if I were to define an inner product such that the Gramian is $G_{ij} = \langle e_i, e_j\rangle$ (i.e. $\langle u,v\rangle = u^TGv$), then the gradient w.r.t. this inner product is given as $$\nabla_G f(x_0) = G^{-1}\begin{bmatrix} \partial_{e_1} f(x_0) \\ \vdots \\ \partial_{e_n} f(x_0) \end{bmatrix} = G^{-1}(Jf(x_0))^{T}.$$ You can verify that with this definition $df(x_0)(v) = \langle \nabla_G f(x_0), v\rangle$. Note that the gradient is a vector, while the derivative is a linear map - this is often confused, and you can even find misleading answers on physics.stack and math.stack that conflate the two.

A more interesting example is to consider the Gateaux gradient of the functional $E(f) = \frac{1}{2}\int \|\nabla f\|^2$, w.r.t. the standard inner product $\langle f, g\rangle = \int f g$. You can show that it is $\nabla E(f) = -\Delta f$, where $\Delta$ is the Laplacian.


The gradients $\nabla f^i$ for $f:\mathbb{R}^n\to\mathbb{R}^m$

If you have a vector-valued function $f:\mathbb{R}^n\to\mathbb{R}^m$ you could define gradients for each component: \begin{equation} f = \begin{bmatrix} f^1 \\ \vdots \\ f^m\end{bmatrix} \implies df^i(x_0) (v) = \langle \nabla_G f^i(x_0), v\rangle \implies \nabla_G f^i = G^{-1}\begin{bmatrix} \partial_{e_1} f^i(x_0) \\ \vdots \\ \partial_{e_n} f^i(x_0) \end{bmatrix}. \end{equation}


Relation to exterior derivative and differential forms

Let $f:\mathbb{R}^n\to\mathbb{R}$, then the exterior derivative of $f$ is $df = \sum_{j=1}^n \partial_{e_i} f e^i$. This is precisely the differential $df$ of $f$, which happens to be a one form (field of linear functionals). In differential geometry one often writes $dx^i$ for the $e^i$ and $\frac{\partial f}{\partial x^i}$ for $\partial_{e_i} f$. This is the case because $f$ is typically defined over some manifold $M$ and you have coordinate functions $x^i:U\subseteq M\to\mathbb{R}$. Then $\frac{\partial}{\partial x^j}|_{p}$ form a basis for the tangent space $T_pM$ of $M$ at $p$, and $dx^i$ is the canonical dual basis for the dual space $(T_pM)^*$ such that $dx^i(\frac{\partial}{\partial x^j}|_{p}) = \delta^i_j$. In exterior calculus the exterior derivative is also defined for higher order forms, however. That is, one may consider fields of antisymmetric $k$-linear maps, i.e. differential $k$-forms, and define the exterior derivative to produce a $(k+1)$-form.

lightxbulb
  • 2,378