Let $V,W$ be Banach spaces (i.e complete, normed vector spaces; think of them as finite-dimensional if you wish). Then, given a mapping $f:V\to W$ and a point $x\in V$, we have by definition that higher order derivatives are multilinear maps. More explicitly,
the first derivative at $x$ is by definition a linear map $Df_x:V\to W$, so it eats a vector $h\in V$ and spits out a vector $Df_x(h)\in W$. In your case you have $V=\Bbb{R}^n$ and $W=\Bbb{R}$, so in this case $Df_x\in (\Bbb{R}^n)^*$, which is why you can use the "standard inner product" to identify this with the gradient vector $\nabla f(x)\in\Bbb{R}^n$, such that $\langle \nabla f(x), h\rangle = Df_x(h)$ (this is your first term in the Taylor expansion). I would highly suggest you get out of the habit of thinking in terms of the gradient vector because it unnecessarily complicates issues (you may disagree temporarily because the gradient vector is probably what you and I and everyone else is first taught, but trust me on this) and obscures the higher order generalization.
The second derivative at $x$ is by definition a bilinear mapping $D^2f_x:V\times V\to W$, so it eats a pair of vectors $(h_1,h_2)$ and spits out an element $D^2f_x[(v_1,v_2)]$ of $W$. In the case of $V=\Bbb{R}^n$ and $W=\Bbb{R}$, by choosing the standard basis, one can identify this bilinear map with an $n\times n$ matrix, namely the Hessian matrix $Hf(x)$, but again this completely obscures the general idea, so I would suggest you not think in terms of this.
Once you get to the third derivative, it becomes clear that the vector, and matrix ideas of $\nabla f(x)$ and $Hf(x)$ become difficult to generalize. But really, the "correct" definition involves a trilinear map $D^3f_x:V\times V\times V\to W$.
In general, the $j^{th}$ derivative is a $j$-fold multilinear map $D^jf_x:\underbrace{V\times \cdots \times V}_{\text{$j$ times}}\to W$ (and by convention for $j=0$ we define $D^0f_x := f(x)\in W$).
In this setting, a $k^{th}$ order Taylor polynomial about the point $x$ is the mapping $T_{k,x,f}:V\to W$ defined as
\begin{align}
T_{k,x,f}(h):=\sum_{j=0}^k\frac{(D^jf)_x[(h)^j]}{j!}\tag{$*$}
\end{align}
where for the $j^{th}$ summand we have $(h)^j:= \underbrace{(h,\cdots, h)}_{\text{$j$ times}}$. Taylor's theorem then tells you that $k$-times differentiability at a point $x$ implies
\begin{align}
f(x+h)&= T_{k,x,f}(h)+ o(\|h\|^k).
\end{align}
See this previous answer of mine for a more explicit statement of the theorem, and also a pretty complete outline of the proof along with some references where you can learn more.
In this sense, the Taylor polynomial is an immediate generalization to the single variable case:
\begin{align}
T_{k,x,f}(h)&=\sum_{j=0}^k\frac{f^{(j)}(x)\cdot h^j}{j!}\tag{$**$}
\end{align}
(just look at how similar the formulas $(*)$ and $(**)$ are; hopefully this will convince you that thinking in terms of the gradient vector and Hessian matrix etc is not the right mindset).