Derivative with respect to a matrix in matrix multiplication

Question

I have this very simple problem, but I cannot seem to understand how this can be computed. I need to find the derivative with respect to a matrix that is part of matrix multiplication: $$A_{(m,n)}*W_{(n,p)} = C_{(m,p)} \\ \frac{dC}{dW} = ?$$ I would need a matrix with the same dimensions as $W$ so $n$ by $p$, but whatever resources I find just confuse me more.. Is it not possible to express this in terms of $A$, the constant matrix?

Jan Bohr · Accepted Answer · 2018-02-02T18:11:13.627

Let's first recall how the derivative of a function $f\colon\mathbb{R}^a \rightarrow\mathbb{R}^b$ is characterized (provided it exists). Given a point $x\in\mathbb{R}^a$, the derivative of $f$ at $x$ is the unique linear map $df_x\colon\mathbb{R}^a\rightarrow \mathbb{R}^b$ (which may be represented by a matrix, which you know as Jacobian) such that $$ \Vert f(x+h)-f(x)-df_x(h) \Vert=o(\Vert h\Vert) $$ Now translate it to your question: Let $F\colon M(n\times p)\rightarrow M(m\times p)$ be the map between real matrices defined by $F(W)=A \cdot W$, where $A\in M(m\times n)$ is a fixed matrix. Then given a "point" $X\in M(n\times p)$, the derivative of $F$ at $X$ is the unique linear map $dF_X\colon M(n\times p)\rightarrow M(m\times p)$ with $$ \Vert F(X+H)-F(X)-dF_X(H) \Vert=o(\Vert H\Vert) $$ But $F(X+H)-F(X)=A\cdot(X + H) - A \cdot X= A \cdot H$, which is already a linear function in $H$. This shows that $dF_X(H)=A \cdot H$, independent of the choice of $X$.

EDIT: In order to connect this with derivative of a function in one variable, consider the case $a=b=1$, then you would define $$ f'(x):=\lim_{h\rightarrow 0}\frac{f(x+h)-f(x)}{h} $$ Equivalently $$ \lim_{h\rightarrow 0}\frac{f(x+h)-f(x)-f'(x)h}{h}=0 $$ or $$ {\vert f(x+h)-f(x)-f'(x)h \vert} = o(\vert h\vert ) $$ in the Landau notation. Thus in this case $df_x$ is the linear map with $df_x(h)=f'(x)h$. Now for the matrix case the expression $$ \lim_{H\rightarrow 0} H^{-1}( F(X+H)-F(X)) $$ only makes sense for square matrices and invertible $H$. That is why in general one uses the definition $$ \Vert F(X+H)-F(X)-dF_X(H) \Vert=o(\Vert H\Vert), $$ which as demonstrated above is equivalent to the usual definition in the one variable case. Without referring to the $o()$ notation you could also say that $dF_X$ is the unique linear map such that $$ \lim_{\Vert H \Vert\rightarrow 0 }\frac{\Vert F(X+H)-F(X)-dF_X(H) \Vert}{ \Vert H\Vert } = 0. $$

To be honest, I am not as familiar with this notation. Usually when I think about an $h$ term in regards to derivatives it is approaching 0 (as in $\lim_{h\to0} \frac{f(x+h)-f(x)}{h}$). It does makes sense that $X$ would have no effect on the derivative (kind of like $(ax)'=a$ with scalars). However, I cannot imagine how this would play out with concrete numbers.. How do I resolve this $H$ term? Sorry if I am too incompetent... — KGS, Feb 02 '18 at 17:59
Ok thanks, now I understand the notation better (totally didn't take into account what $^{-1}$ would mean for a matrix.. But I still don't understand how I'd actually compute this if I am given $A$ — KGS, Feb 02 '18 at 19:06
Well, what do you want to compute? It's perfectly fine to say that the derivative $dF_X$ is just "multiplication by $A$". Writing down the Jacobian (i.e. a matrix representation of that linear map) is then an exercise in linear algebra, but that does not give you any new insights. — Jan Bohr, Feb 02 '18 at 22:23

score 1 · Answer 2 · answered Feb 02 '18 at 17:31

1

If $C(W) = AW$, then $C(W+H) = C(W) + AH$, so the derivative is $DC(W)(H) = AH$.

answered Feb 02 '18 at 17:31

copper.hat

178,207

could you help me out on this similar question?
https://math.stackexchange.com/questions/4854753/simple-matrix-calculus-and-yet-i-am-struggling-to-understand

@copper.hat
– wrek Feb 01 '24 at 04:57

Derivative with respect to a matrix in matrix multiplication

2 Answers2