A differential geometer will tell you that the differential $Df|_p$ at a point is the linear map $\mathbb R^n \to \mathbb R$ which sends a column vector $v \in \mathbb R^n$ to the inner product $(\partial_1|_p,...,\partial_n|_p) \cdot v$. The (total) differential $Df$ is the smooth map $\mathbb R^n \to (\mathbb R^n)^*$ (the dual as a vector space1) which sends $p$ to $Df|_p$. It makes sense to define the differential for a function $f : \mathbb R^n \to \mathbb R^m$ too; it will be a smooth map $\mathbb R^n \to \operatorname{Hom}(\mathbb R^n, \mathbb R^m)$.1
The gradient $\nabla f$ is a vector field (a tangent vector at each point) characterized by the property that $\langle (\nabla f)(p), v \rangle = Df|_p(v)$, where $\langle \cdot, \cdot \rangle$ is the inner product. That is, $(\nabla f)(p) = (\partial_1|_p,...,\partial_n|_p)^T$.
The differential $Df$ generalizes to any smooth map between differentiable manifold $M,N$ as a linear map between the tangent spaces at each point $p$ and its image $f(p)$. If $N = \mathbb R$, then $Df$ is a differential $1$-form. The gradient generalizes to Riemannian or pseudo-Riemannian manifolds, where we have a non-degenerate inner product.
The differential of a smooth real-valued function is a section of the cotangent bundle $\Omega^1(M)$. Its gradient is a section of the tangent bundle $TM$, which is the dual of $\Omega^1$.
1 The differential structure of $(\mathbb R^n)^*$ and $\operatorname{Hom}(\mathbb R^n, \mathbb R^m)$ are obtained by transporting the structure via a linear bijection with some $\mathbb R^N$, and it does not depend on the linear bijection. That is, you can take any basis and maps will be smooth iff their components in that basis are smooth.