2

In the matrix cookbook there is an identity $$\frac{\partial (a^TX^T b)}{\partial X} = ba^T$$

I recently ran into a problem where I had to compute $$\frac{\partial (X^T b)}{\partial X}$$

but I couldn't find a formula for this. However it seems that, at least for my example,

$$\frac{\partial (X^T b)}{\partial X} = b$$.

Does this formula hold in general?

Does it even make sense to take the derivative $$\frac{\partial (X^T b)}{\partial X}$$.

The problem where this came up was chapter 3.1.5 of Pattern Recognition and Machine Learning specifically taking the derivative wrt W of 3.33:

$$ln(p(T|X,W,\beta))=\frac{NK}{2}ln(\frac{\beta}{2\pi}) - \frac{\beta}{2}\sum_{n=1}^N || t_n -W^T \phi (x_n)||^2$$ where I used the chain rule to compute:

$$\frac{\partial}{\partial W}ln(p(T|X,W,\beta))=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial W}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n)) $$

$$=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial (t_n-W^T \phi (x_n))}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n))\frac{\partial}{\partial W} (t_n-W^T \phi (x_n)) $$

Then I used $$\frac{\partial (x^Tx)}{\partial x}=2x$$ and to compute the derivative

$$\frac{\partial}{\partial W} (t_n-W^T \phi (x_n))$$ I used

$$\frac{\partial (X^T b)}{\partial X} = b$$

which seems to give the correct results.

Furthermore a proof in a similar vein to this seems to work. Although I'm not sure if this is valid.

Ѕᴀᴀᴅ
  • 35,369
  • This seems possibly relevant https://math.stackexchange.com/q/2044191/280789 – tail_recursion Jun 04 '21 at 06:42
  • 1
    Be careful that for the first formula to make sense (the one with $a^T X^Tb$), you need $X$ to be a matrix, while for the second (the one you are asking to prove) $X$ is a vector – SolubleFish Jun 04 '21 at 07:01
  • @SolubleFish Why does $X$ need to be a vector in the second formula? – tail_recursion Jun 04 '21 at 11:53
  • 1
    Because if $X$ is a matrix and $b$ a vector, then $f(X) = X^Tb$ is a vector value function, therefore the gradient $\nabla_Xf$ is a $(2,1)$-tensor, which cannot be equal to $b$ – SolubleFish Jun 04 '21 at 13:10

3 Answers3

1

In this case it is much clearer to think of the derivative as a linearization of the map $$ F\colon \mathbb R^{m\times n} \to \mathbb R^n,\, X\mapsto X^T b. $$ Indeed since $$ F(X+V)=(X+V)^Tb = X^Tb+V^Tb=F(X)+V^Tb $$ we find that $$ D_XF(V)=\frac{\partial (X^Tb)}{\partial X}(V)=V^Tb. $$

With this there is no need for partial derivatives or writing out the matrices at play. Generally, this definition of the derivative (See here) is often helpful in similar settings.

Alex
  • 943
0

Yes, this formula is true. Assuming $X\in\mathbb R^n$ and $b\in\mathbb R^n$, we obtain $$X^T b = \sum_{i=1}^n X_i b_i.$$

The partial derivative of this linear combination with respect to $X_k$ is $b_k$, which proves your formula:

\begin{align*} \frac{\partial(X^T b)}{\partial X} = b. \end{align*}

L. Milla
  • 785
0

Let's consider $X \in \mathbb{R}^{m \times n}$, so $b \in \mathbb{R}^m$ and $X^Tb \in \mathbb{R}^n$.

The derivative $\frac{\partial X^Tb}{\partial X}$ has $n \times m \times n$ terms, so it would be helpful to compute each $\frac{\partial (X^Tb)_k}{\partial X}$ separately:

$$\frac{\partial (X^Tb)_k}{\partial X} = \begin{bmatrix} \frac{\partial (X^Tb)_k}{\partial X_{11}} & \frac{\partial (X^Tb)_k}{\partial X_{12}} & \cdots & \frac{\partial (X^Tb)_k}{\partial X_{1n}} \\ \frac{\partial (X^Tb)_k}{\partial X_{21}} & \frac{\partial (X^Tb)_k}{\partial X_{22}} & \cdots & \frac{\partial (X^Tb)_k}{\partial X_{2n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial (X^Tb)_k}{\partial X_{m1}} & \frac{\partial (X^Tb)_k} {\partial X_{m2}} & \cdots & \frac{\partial (X^Tb)_k}{\partial X_{mn}} \\ \end{bmatrix}$$

If we compute for each element:

$$\frac{\partial (X^Tb)_k}{\partial X_{ij}} = \frac{\partial}{\partial X_{ij}} \sum_{l = 1}^{m}X^T_{kl}b_l = \frac{\partial}{\partial X_{ij}} \sum_{l = 1}^{m}X_{lk}b_l = \begin{cases} 0 & \text{if $k \ne j$} \\ b_i & \text{if $k = j$} \end{cases}$$

So the columns of matrix are all zero except $k$th one:

$$\frac{\partial (X^Tb)_k}{\partial X} = \begin{bmatrix} 0 & \cdots & b_1 & \cdots & 0 \\ \vdots & & \vdots & & \vdots \\ 0 & \cdots & b_m & \cdots & 0 \\ \end{bmatrix}$$

As you mentioned, in this example the results are correct if we take simply $\frac{\partial X^Tb}{\partial X} = b$, but that's not guaranteed to find a vector answer for every derivative of vector w.r.t. matrix.

To prevent dealing with tensors, I recommend to not use chain rule and calculate each element of the main derivative separately.

fraxea
  • 23