Let $x_{1}, \dots, x_{N}$ be a sequence of vectors in $\mathbb{R}^{n}$ and $A$ be an $n \times n$ matrix. Let $f: \mathbb{R} \to \mathbb{R}$ be a smooth (as smooth as you want) function. We define $$ h_{i} = f(Ax_{i}) $$ for $i = 1, \dots, N$, where $f$ is applied to the vector $Ax_{i}$ element-wise.
Suppose there is a function $G$ is a scalar function of the vectors $h_{1}, \dots, h_{N}$ (i.e. it maps to $\mathbb{R}$), and we want to find $\nabla_{A} G$, i.e. the gradient of $G$ with respect to the matrix $A$, so an $n \times n$ matrix with the $(i, j)$ entry being $\frac{\partial G}{\partial A_{ij}}$. Suppose we already knew all the gradients $\nabla_{h_{i}} G$, and want to use the chain rule along with these to find $\nabla_{A} G$. How would we go about doing that?
I know that the chain rule says that for generic vectors $x, y$ and a function $F$ mapping to $\mathbb{R}$, $$ \nabla_{x} F = \left(\frac{dy}{dx}\right)^{T} \nabla_{y} F $$ where $\frac{dy}{dx}$ the Jacobian of $y$ w.r.t. $x$. But how do I apply this to the setup above?