3

Suppose $J$ is a function of $\bf{a}$, $\bf{a}$ is a function of $\bf{z}$ and $\bf{z}$ is a function of $\bf{\Theta}$ where $J$ is a scalar, $\bf{a}$ and $\bf{z}$ are vectors and $\bf{\Theta}$ is a matrix. What is the form of the chain rule to evaluate the partial derivative $\dfrac{\partial J}{\partial \bf{\Theta}}$?

I was thinking $$\dfrac{\partial J}{\partial \bf{\Theta}} = \dfrac{\partial J}{\partial \bf{a}}\dfrac{\partial \bf{a}}{\partial \bf{z}}\dfrac{\partial \bf{z}}{\partial \bf{\Theta}}$$ but I'm not sure of the order or any transposes.

The vectors are of dimension $n\times 1$, the matrix is of dimension $n \times m$. (what if the dimension of $\bf{a}$ was $n+1$?)

Rlanls
  • 71

1 Answers1

1

I think it makes sense, as long as you define the final term. In indicial terms: $$ \frac{\partial J}{\partial \Theta_{ij}} = \sum_\alpha\sum_k \frac{\partial J}{\partial a_k} \frac{\partial a_k}{\partial z_\alpha} \frac{\partial z_\alpha}{\partial \Theta_{ij}} $$ where $a:\mathbb{R}^m\rightarrow\mathbb{R}^{n}$, $J:\mathbb{R}^n\rightarrow\mathbb{R}^{}$, $z:\mathbb{R}^{n_1\times n_2}\rightarrow\mathbb{R}^{m}$. (Note I'm choosing slightly more general sizes). This we can plausibly write as: $$ \frac{\partial J}{\partial \Theta} = \frac{\partial J}{\partial a} \frac{\partial a}{\partial z} \frac{\partial z}{\partial \Theta} $$ The only issue is that $\partial_\Theta z$ is not really a matrix, it is an array of size $m\times n_1\times n_2$ (some might call it an order-3 tensor). But we can reasonably define a slightly generalized "array multiplication" via: $$ \underbrace{\frac{\partial J}{\partial \Theta} }_{\mathbb{1\times n_1\times n_2}} = \underbrace{\frac{\partial J}{\partial a} }_{1\times n}\; \underbrace{\frac{\partial a}{\partial z} }_{n\times m}\;\, \underbrace{\frac{\partial z}{\partial \Theta}}_{m\times n_1\times n_2} $$ This is essentially tensor contraction, which I would write indicially as: $$ \mathcal{J}_{ij}=\Psi_k\mathcal{Q}^k_\alpha\mathcal{R}^\alpha_{ij} $$ where $\Psi=\partial_aJ$, $\mathcal{Q}=\partial_za$, $\mathcal{R}=\partial_\Theta z$, although none of the quantities are, strictly speaking, necessarily tensors.

user3658307
  • 10,843