I'm trying to work out the derivatives for Taylor's theorem in a specific case (Using the general framework). In fact, wikipedia derives it in my case but I think I'm getting a little sloppy with the dimensions and I can't see how to write it.
We have Taylor's theorem: $f: \mathbb{R}^n \rightarrow \mathbb{R}$: $$f(\mathbf{x}) = f(\mathbf{a}) + \sum_{\alpha=1}^k \frac{1}{\alpha !}D^{\alpha}f(\mathbf{a})(\mathbf{x} - \mathbf{a})^{\alpha} + h_k(\mathbf{x})(\mathbf{x} - \mathbf{a})^{k}$$
Here's my specific case: $g: \mathbb{R}^2 \rightarrow \mathbb{R}, \;g(x, y) = z$. Then we have $Dg(\mathbf{x}) = [g_x \: g_y]$. But isn't this also just a function $Dg: \mathbb{R}^2 \rightarrow \mathbb{R}$? And then would it not also be the case that $D(Dg): \mathbb{R}^2 \rightarrow \mathbb{R}\: ([g_{xx} \: g_{yy}])$? I guess I don't see how we get the Hessian here ($\in \mathbb{R}^{2 \times 2}$).
ChatGPT is telling me to use this "quadratic form": $$g(\mathbf{x}) \approx g(\mathbf{x_0}) + \nabla g(\mathbf{x_0})^T(\mathbf{x} - \mathbf{x_0}) + \frac{1}{2}(\mathbf{x} - \mathbf{x_0})^T H_g(\mathbf{x_0})(\mathbf{x} - \mathbf{x_0})$$
Is there a relation here to the Hessian being the transpose of the Jacobian of the gradient? Also it seems that with this form we could transpose anyway and get the Jacobian of the gradient.
Why is it that we use the Jacobian (total derivative) of the gradient (Or with the Hessian, the transpose of the Jacobian of the gradient), instead of the Jacobian (total derivative) of the gradient tranpose (first total derivative)? Would it even make sense to do that? I couldn't find answers tying all of these together yet online. (I saw some stuff about notational differences and matrix conventions but it seems unsatisfying and I think I'm still missing something). All in all, I feel like there are some basic concepts that I'm missing here, combined with maybe some more advanced generalizations that unify these concepts.
(Aside: Is there a generalization of Taylor's theorem to functions $g: \mathbb{R}^n \rightarrow \mathbb{R}^m$?)
Thanks.