The Jacobian, the way you wrote it, really only works after vectorizing w.r.t. the components of $A$. More generally, given a function $f\colon X\to Y$, you can think of the derivative as a $(\operatorname{rank} Y, \operatorname{rank} X)$-tensor, which allows one to keep the matrix structure of the input here.
In this case, since the input is a matrix, i.e. a rank-$2$ tensor, and the output is a vector, i.e. a rank-$1$ tensor, you can think of $dy/dA$ the (1, 2) tensor with components
$$\Big(\frac{\rm d y}{\rm d A}\Big)^k_{ij}=\frac{\partial y_k}{\partial A_{ij}} \leadsto D = \frac{\rm d y}{\rm d A} = \begin{bmatrix} \begin{bmatrix} x_1 & x_2 \\ 0& 0\end{bmatrix}, \begin{bmatrix}0& 0 \\ x_1 & x_2\end{bmatrix}\end{bmatrix}$$
This tensor encodes a linear map given by the corresponding tensor contraction
$$D\cdot X = \sum_{ij} D^{k}_{ij} X^{ij}$$
In particular, note that for $f(A)=Ax$ we have the first order Taylor expansion
$$ T_f^{(1)}(A) = f(0) + Df(0)\cdot A =\begin{bmatrix}0 \\ 0\end{bmatrix}+ \begin{bmatrix}A_{11}x_1 + A_{12}x_2 \\ A_{21}x_1 + A_{22}x_2\end{bmatrix} = f(A)$$
I.e. the function is in fact equal to its first order Taylor expansion, as expected, since it's a linear function.