In the matrix cookbook there is an identity $$\frac{\partial (a^TX^T b)}{\partial X} = ba^T$$
I recently ran into a problem where I had to compute $$\frac{\partial (X^T b)}{\partial X}$$
but I couldn't find a formula for this. However it seems that, at least for my example,
$$\frac{\partial (X^T b)}{\partial X} = b$$.
Does this formula hold in general?
Does it even make sense to take the derivative $$\frac{\partial (X^T b)}{\partial X}$$.
The problem where this came up was chapter 3.1.5 of Pattern Recognition and Machine Learning specifically taking the derivative wrt W of 3.33:
$$ln(p(T|X,W,\beta))=\frac{NK}{2}ln(\frac{\beta}{2\pi}) - \frac{\beta}{2}\sum_{n=1}^N || t_n -W^T \phi (x_n)||^2$$ where I used the chain rule to compute:
$$\frac{\partial}{\partial W}ln(p(T|X,W,\beta))=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial W}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n)) $$
$$=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial (t_n-W^T \phi (x_n))}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n))\frac{\partial}{\partial W} (t_n-W^T \phi (x_n)) $$
Then I used $$\frac{\partial (x^Tx)}{\partial x}=2x$$ and to compute the derivative
$$\frac{\partial}{\partial W} (t_n-W^T \phi (x_n))$$ I used
$$\frac{\partial (X^T b)}{\partial X} = b$$
which seems to give the correct results.
Furthermore a proof in a similar vein to this seems to work. Although I'm not sure if this is valid.