If A has the dimension [m$\times$n] and X [n$\times$1], $\frac{\partial AX}{\partial A}$ is said to be $X^T$ (according to Section 2.3.1 here). However, shouldn't $\frac{\partial AX}{\partial A}$ have the dimension [m$\times$m$\times$n] instead of [1$\times$n]? How is this result derived?
Asked
Active
Viewed 4,015 times
5
-
$m\times m\times n$? I'm struggling to see why it should become three-dimensional, what makes you think that? – man_in_green_shirt Dec 05 '16 at 00:37
-
2man_in_green_shirt@: Because AX has the dimension [m*1] and A [m*n]. I thought the derivative is computed on all combinations. Please correct me if I am wrong. – Steven Dec 05 '16 at 00:40
-
For example, if AX was a scalar f instead, then $\frac{\partial f}{\partial A}$ has the dimension [m $\times$ n]. – Steven Dec 05 '16 at 00:59
-
1Ah ok, I see what you mean, sorry. From what I understand, in effect every entry in the derivative of $Ax$ w.r.t $A$ is a $m\times m$ matrix. So you have $n$ copies of an $m\times m$ matrix stacked upon each other, and if you work out the derivatives, each one just ends up having $X^T$ in one row and zeros everywhere else. – man_in_green_shirt Dec 05 '16 at 00:59
-
Having said that, I can't actually see this identity you're asking about in section 2.3.1 – man_in_green_shirt Dec 05 '16 at 01:00
-
man_in_green_shirt@: Got it. Then my confusion is how the stacked m$\times$m matrices (where each contains $X^T$) is considered equivalent to $X^T$. For example, what happens when I multiply this result with another matrix? Should I just use $X^T$ or the stacked matrices? In Section 2.3.1, I was actually referring to $\frac{\partial a^TXb}{\partial X} = ab^T$ and setting $a$ to an identity matrix. – Steven Dec 05 '16 at 01:06
-
I'm honestly not sure either - perhaps someone else can help us out? – man_in_green_shirt Dec 05 '16 at 12:35
-
1Though keep in mind that $a$ is probably a vector, not a matrix, so it doesn't really make sense to set it to an identity matrix – man_in_green_shirt Dec 05 '16 at 12:37
1 Answers
2
In index notation $$\eqalign{ y_i &= A_{in}x_n \cr dy_i &= dA_{in}x_n \cr \frac{\partial y_i}{\partial A_{jk}} &= \delta_{ij}\delta_{nk}x_n = \delta_{ij}x_k \cr\cr }$$ You could write this result using the dyadic $(\star)$ product $$\eqalign{ \frac{\partial y}{\partial A} = I\star x \cr\cr }$$ Or by using the $4^{th}$ order $({\mathbb E}_{ijkl} = \delta_{ij}\delta_{kl})$ isotropic tensor $$\eqalign{ \frac{\partial y}{\partial A} &= {\mathbb E}\,x \cr }$$but neither of these is common.
greg
- 40,033