First things first, note that what Schutz terms a gradient:
In fact, the gradient is a one-form, and understanding why helps us to
develop an intuitive understanding of one-forms.
is typically termed a differential in most modern mathematical books I have seen. Schutz is by no means the only one that seems to do so, however, Minser, Thorne, and Wheeler do so in their book Graviation, Sean Carrol does so in his lecture notes on GR, and you can find something similar even on this page. So at this point I assume this is just accepted physics convention. Most modern mathematical books I have read, however, define the gradient of $f$ at $x$ as the vector $\nabla f(x)$ such that $\langle \nabla f(x), v\rangle = df(x)(v)$, where $df(x)$ is the differential of $f$ at $x$ and $\langle \cdot, \cdot\rangle$ is an inner product. Note that with this definition you would get different gradients for different inner products, and if you don't have an inner product you wouldn't even be able to define the gradient, while you can define the differential even without an inner product in Banach spaces. Moreover if the function is not differentiable the gradient would also not be defined as defined in the encyclopediaofmath page (they are really defining the Jacobian matrix there).
Basis and Canonical Dual Basis in Finite-Dimensional Spaces
A basis $A = \begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n\end{bmatrix} \in U^{1\times n}$, for a vector space $(U,\mathbb{F},+,\cdot)$, is a set of vectors in $U$ such that there exists a unique coordinate representation $[\vec{u}]^A$ for any vector $\vec{u}\in U$:
\begin{equation*}
\vec{u} = \sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j = A[\vec{u}]^A, \quad [u]^A\in\mathbb{F}^n.
\end{equation*}
Mathematically the existence part is typically stated as "$A$ spans $U$" ($\textrm{span}(A)=U$), while the uniqueness is in the form of "the vectors in $A$ are linearly independent".
A natural question is how do we find the coordinates $[\vec{u}]^A$ of $\vec{u}$ in the basis $A$. That is, we want functions $\alpha^i:U\to\mathbb{F}$ such that $\alpha^i(\vec{u}) = ([u]^A)^i$. If we choose $\alpha^i$ to be linear and such that $\alpha^i(\vec{a}_j) = \delta^i_j$ then this holds, since
\begin{equation*}
\alpha^i(\vec{u}) = \alpha^i\left(\sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j \right) \stackrel{linearity}{=}
\sum_{j=1}^n \alpha^i(\vec{a}_j) ([\vec{u}]^A)^j \stackrel{\alpha^i(\vec{a}_j) = \delta^i_j}{=} \sum_{j=1}^n \delta^i_j ([\vec{u}]^A)^j = ([\vec{u}]^A)^i.
\end{equation*}
The above functionals are the canonical dual basis $A^* = \begin{bmatrix} \alpha^1 \\ \vdots \\ \alpha^n\end{bmatrix}\in (U^*)^{n\times 1}$ of $A$ for the dual space $(U^*,\mathbb{F},+,\cdot)$ where $U^*$ is the space of (continuous) linear maps from $U$ to $\mathbb{F}$. They span the space since any $\omega \in U^*$ can be written as $\omega = \sum_{j=1}^n \omega(a_j) \alpha^j$:
\begin{equation*}
\omega(\vec{u}) = \omega\left(\sum_{j=1}^n \vec{a}_j \alpha^j(\vec{u})\right) \stackrel{linearity}{=}
\sum_{j=1}^n \omega(\vec{a}_j)\alpha^j(\vec{u}) \implies
\omega = \sum_{j=1}^n \omega(\vec{a}_j)\alpha^j = \sum_{j=1}^n [\omega]_A\alpha^j = [\omega]_AA^*.
\end{equation*}
So you see that $\alpha^i$ give the coordinates $([\vec{u}]^A)^i = \alpha^i(\vec{u})$ of (contravariant) vectors in the basis $A$, while $\vec{a}_j$ give the coordinates $([\omega]_A)_j = \omega(\vec{a}_j)$ of linear functionals (covariant vectors) in the basis $A^*$. To reiterate:
\begin{align*}
\vec{u} &= A[\vec{u}]^A = \sum_{j=1}^n \vec{a}_j ([\vec{u}]^A)^j = \sum_{j=1}^n \vec{a}_j\alpha^j(\vec{u}) = AA^* \vec{u} , \\
\omega &= [\omega]_AA^* = \sum_{j=1}^n ([\omega]^A)_j \alpha^j = \sum_{j=1}^n \omega(\vec{a}_j)\alpha^j = \omega AA^*. \\
\end{align*}
The product between $A[\vec{u}]^{A}$, $[\omega]_AA^*$, and $AA^*$ is standard matrix matrix multiplication:
\begin{equation*}
AA^* = \begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n \end{bmatrix}
\begin{bmatrix}
\alpha^1 \\ \vdots \\ \alpha^n
\end{bmatrix} = \sum_{j=1}^n \vec{a}_j \alpha^j.
\end{equation*}
Canonical Dual Basis: Polynomial Space Example
Some examples are in order. Consider the space of real polynomials of at most degree $n-1$:
$$U = \left\{p:\mathbb{R}\to\mathbb{R}\,|\, p(x) = \sum_{j=0}^{n-1} c^j x^j,\,\, c^j\in\mathbb{R}\right\},$$
with the usual function-function addition $(p+q)(x) = p(x)+q(x)$, scalar-function multiplication $(s\cdot p)(x) = s\cdot p(x)$, and the field being $\mathbb{F} = \mathbb{R}$. The basis is explicit in the above definition, namely the monomial basis $a_j(x) = x^{j-1}$. The dual basis is somewhat funny $\alpha^i = ((i-1)!)^{-1}\delta_0 \circ d_x^{i-1}$ (here $\delta_0$ is the evaluation functional at zero). Indeed you can check that:
\begin{align*}
\alpha^i(\vec{a}_j) &= ((i-1)!)^{-1}\delta_0 \circ d_x^{i-1} x^{j-1} \\
&= \delta_0 \circ
\begin{cases}
((i-1)!)^{-1}(j-1)\ldots (j-i+1) x^{j-i}, &\textrm{for} \,\, j\geq i, \\
0, &\textrm{for} \,\, j<i.
\end{cases}
\\
&=
\begin{cases}
0, &\text{for} \,\, j<i, \\
1, &\text{for} \,\, j=i, \\
0, &\text{for} \,\, j>i.
\end{cases}
\\
&= \delta^i_j.
\end{align*}
Canonical Dual Basis: Euclidean Subspace Example
Another example consists of an $n$-dimensional subspace $U$ of $\mathbb{R}^{m\times 1}$ (i.e. $n\leq m$). Let $A=\begin{bmatrix} \vec{a}_1 & \ldots & \vec{a}_n\end{bmatrix} \in\mathbb{R}^{m\times n}$ be a basis for $U$. Let $A^{-}\in\mathbb{R}^{n\times m}$ be any left inverse of $A\in\mathbb{R}^{m\times n}$ (i.e. $A^{-}A = I_n$), and $(A^{-})^i \in \mathbb{R}^{1\times m}$ be the $i$-th row of $A^{-}$. Define $\alpha^i(\vec{u}) = (A^{-})^i*\vec{u}$, where $*$ is the standard matrix-matrix multiplication. It is then clear that $\alpha^i(\vec{a}_j) = \delta^i_j$ because $A^{-}A = I_n$. I intentionally emphasized $*$, since that's the main difference between the row-vector $(A^{-})^i$ and the functional $\alpha^i$. One may object that for $n<m$ the above does not provide a unique canonical dual basis since there are infinitely many left inverses $A^{-}$. However, if those are restricted to have domain $U$, all of them are equivalent, and thus the canoncial dual basis is indeed unique in $U^*$ (it is however not unique in $(\mathbb{R}^m)^*$).
Canonical Dual Basis: Partial Derivatives
Consider the space of directional derivative operators evaluated at $p\in\mathbb{R}^n$ that act on continuously differentiable functions with domain $\mathbb{R}^n$ and co-domain $\mathbb{R}$, i.e., $C^1(\mathbb{R}^n,\mathbb{R})$:
\begin{equation*}
U_p = \left\{\delta_p \circ \partial_{u} : C^1(\mathbb{R}^n,\mathbb{R})\to\mathbb{R}\,\biggr|\, (\delta_p \circ \partial_{u})(f) := \lim_{t\to 0} \frac{f(p+tu)-f(p)}{t}, \,\, u\in \mathbb{R}^n\right\}.
\end{equation*}
Since the domain is made up of functions in $C^1$ we can write any directional derivative by using the partial derivatives:
\begin{equation*}
(\delta_p \circ \partial_{u})(f) = (\delta_p \circ \partial_{\sum_{i=1}^n u^ie_i})(f) = \sum_{i=1}^n u^i (\delta_p \circ \partial_{e_i})(f).
\end{equation*}
In other words $A = \begin{bmatrix} \delta_p\circ \partial_{e_1} & \ldots & \delta_p \circ \partial_{e_n}\end{bmatrix} \in U_p^{1\times n}$ is a basis for $U_p$.
Since $U_p$ is a subspace of $C^{1}(\mathbb{R}^n,\mathbb{R})^*$ the dual space $U_p^*$ is a subspace of the double dual $C^{1}(\mathbb{R}^n,\mathbb{R})^{**}\cong C^1(\mathbb{R}^n,\mathbb{R})$. We then have the natural isomorphism $f\in C^1(\mathbb{R}^n,\mathbb{R})\mapsto \delta_f\in C^1(\mathbb{R}^n,\mathbb{R})^{**}$, thus
$$\delta_f(\delta_p\circ \partial_{u}) = (\delta_p\circ \partial_{u})(f) = \partial_u f(p).$$
It is clear that $\delta_f$ is linear in its argument $U_p$ since it's just an evaluation functional. The addition and scalar multiplication for $\delta_{a\cdot f + b\cdot g}$ are also defined as one would expect:
\begin{equation*}
\delta_{a\cdot f + b\cdot g}(\delta_p\circ \partial_{u}) = a\cdot (\delta_p\circ \delta_u)(f) + b\cdot (\delta_p\circ \delta_u)(g).
\end{equation*}
Now consider the coordinate functions $\epsilon^i(v) = v^i$ (this is the canonical dual basis for $e_j$, i.e., $\epsilon^i(e_j) = e^i_j = \delta^i_j$). Their evaluation functionals $\delta_{\epsilon^i}$ give you the canonical dual basis:
\begin{equation*}
A^* = \begin{bmatrix}
\delta_{\epsilon^1}
\\
\vdots
\\
\delta_{\epsilon^n}
\end{bmatrix} \in (U_p^*)^{n\times 1}, \quad
\delta_{\epsilon^i}(\partial_p \circ \partial_{e_j}) =
\lim_{t\to 0}\frac{\epsilon^i(p+te_j)-\epsilon^i(p)}{t} = \epsilon^i(e_j) = \delta^i_j.
\end{equation*}
Now define the notation $dx^i := \delta_{\epsilon^i}$, $\frac{\partial}{\partial x^j} := \delta_p \circ \partial_{e_j}$, $\frac{\partial x^i}{\partial x^j} := \frac{\partial \epsilon^i}{\partial x^j} = (\delta_p \circ \partial_{e_j})(\epsilon^i) = \delta^i_j$.
This should now look familiar.
At this point it should probably be mentioned that it's not the derivatives that are the basis of the covectors (as the title of your question would imply). It is these $\delta_{\epsilon^i}$ evaluation functionals that are the canonical dual basis to the partial derivatives, and it is they that provide a basis for the covectors in $U_p^*$. First remember that the coordinates of an element $(\delta_p \circ \partial_{u})\in U_p$ are given through the canonical dual basis:
\begin{equation}
\delta_{\epsilon^i}(\delta_p \circ \partial_{u}) = \delta_{\epsilon^i}\left(\delta_p \circ \sum_{j=1}^n u^j \partial_{e_j}\right) =
\delta_{\epsilon^i}\left(\sum_{j=1}^n u^j(\delta_p \circ \partial_{e_j})\right) =
\sum_{j=1}^n u^j \delta^i_j = u^i.
\end{equation}
Now suppose $\delta_f \in U_p^*$ is a $1$-form. Then you can apply it to an element of $\delta_p \circ \partial_{u} \in U_p$:
\begin{align*}
\delta_f(\delta_p\circ \partial_u) = \delta_f\left(\delta_p \circ \sum_{j=1}^n u^j \partial_{e_j}\right) = \sum_{j=1}^nu^j \delta_f(\delta_p \circ \partial_{e_j}) = \sum_{j=1}^n \delta_f(\delta_p \circ \partial_{e_j})\delta_{\epsilon^j}(\delta_p\circ \partial_u).
\end{align*}
Now removing the argument leads to:
\begin{equation*}
\delta_f = \sum_{j=1}^n \delta_f(\delta_p \circ \partial_{e_j})\delta_{\epsilon^j} = \sum_{j=1}^n [\delta_f]_A^j \delta_{\epsilon^j} = [\delta_f]_AA^* = \delta_f AA^*.
\end{equation*}
The partial derivative $\partial_{e_j}f(p)$ are the coordinates of $\delta_f$ w.r.t. the basis $A^*$ made up of $\delta_{\epsilon^j}$.
Partial Derivative Fields and Their Covector Fields
As another note - you can define the space of "vector fields" of partial derivative operators as follows:
\begin{equation*}
U = \left\{\partial_{u} : C^1\bigl(\mathbb{R}^n,C^1(\mathbb{R}^n, \mathbb{R})^*\bigr)\,\biggr|\, (\partial_{u})(p) := \partial_{u(p)}, \,\, u\in C(\mathbb{R}^n,\mathbb{R}^n)\right\}.
\end{equation*}
Note that $u$ is now a function of $p$. You could also make your basis for $\mathbb{R}^n$ a function of $p$ if it's not constant in space, i.e., $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix} \in (C^1(\mathbb{R}^n, \mathbb{R}^n))^{1\times n}$. Similarly for its canonical dual basis
$$B^* = \begin{bmatrix} \beta^1 \\ \vdots \\ \beta^n\end{bmatrix} \in (C^1(\mathbb{R}^n,(\mathbb{R}^n)^*))^{n\times 1}.$$
Then you get location dependent bases for $U$:
$$A=\begin{bmatrix} \partial_{b_1} & \ldots & \partial_{b_n}\end{bmatrix} \in U^{1\times n}.$$
Similarly for the canonical dual:
$$A^* = \begin{bmatrix} \delta_{\beta^1} \\ \vdots \\ \delta_{\beta^n} \end{bmatrix} \in (U^*)^{n\times 1}.$$
Then you have for any element $\partial_u \in U$:
$$\partial_u\in U \implies (\partial_{u})(p) = \partial_{u(p)} =
\partial_{\sum_{j=1}^n \beta^j(p)(u(p))b_j(p)} = \sum_{j=1}^n \delta_{\beta^j(p)}(\partial_{u(p)}) \partial_{b_j(p)} = \sum_{j=1}^n b_j(p)
([\partial_{u(p)}]^{A(p)})^j = A(p) [\partial_{u(p)}]^{A(p)} = A(p) A^*(p) \partial_{u(p)} = (AA^*\partial_u)(p).$$
Then fields of covectors $U^*$ are what are termed differential $1$-forms (I have no idea why they termed it "differential forms" instead of fields of linear functionals):
$$\omega\in U^* \implies \omega(p) = \sum_{i=1}^n \omega(p)(\partial_{b_i(p)})\delta_{\beta^i(p)} = \sum_{i=1}^n ([\omega(p)]_{A(p)})_i \delta_{\beta^i(p)} = [\omega(p)]_{A(p)}A^*(p) = \omega(p) A(p)A^*(p) = (\omega AA^*)(p).$$