Derivative of Softmax

Question

I'm new to deep learning and am attempting to calculate the derivative of the following function with respect to the matrix w:

$$p(a) = \frac{e^{w_a^Tx}}{\Sigma_{d} e^{w_d^Tx}}$$

Using quotient rule, I get: $$\frac{\partial p(a)}{\partial w} = \frac{xe^{w_a^Tx}\Sigma_{d} e^{w_d^Tx} - e^{w_a^Tx}\Sigma_{d} xe^{w_d^Tx}}{[\Sigma_{d} e^{w_d^Tx}]^2} = 0$$

I believe I'm doing something wrong, since the softmax function is commonly used as an activation function in deep learning (and thus cannot always have a derivative of 0). I've gone over similar questions, but they seem to gloss over this part of the calculation.

I'd appreciate any pointers towards the right direction.

score 3 · Answer 1 · answered Mar 07 '17 at 04:47

Denote the elementwise (Hadamard) product by $A\odot B$, the inner (Frobenius) product by $A:B$, and the regular matrix product by $AB$.

Let $u$ be the vector of all ones, and define some additional vectors $$\eqalign{ z &= W^Tx, &\,\,dz= dW^Tx \cr e &= \exp(z), &\,\,de = e\odot dz \cr\cr }$$ Now find the differential of the $p$-vector $$\eqalign{ p &= \frac{e}{u:e} \cr\cr dp &= \frac{de}{u:e}-\frac{e\,(u:de)}{(u:e)^2} \cr &= \frac{e\odot dz}{u:e}-\frac{p\,(u:e\odot dz)}{u:e} \cr &= p\odot dz -p\,(p:dz) \cr &= \Big({\rm Diag}(p)-pp^T\Big)\,dz \cr &= (P-pp^T)\,dz \cr &= (P-pp^T)\,dW^T\,x \cr\cr }$$ Note that the gradient of a vector wrt a matrix will be a 3rd order tensor. Now continuing $$\eqalign{ dp &= (P-pp^T)\,{\mathcal E}\,x^T:dW^T \cr &= (P-pp^T)\,{\mathcal E}\,x^T:{\mathcal B}:dW \cr \frac{\partial p}{\partial W} &= (P-pp^T)\,{\mathcal E}\,x^T:{\mathcal B} \cr }$$ where $({\mathcal E}, {\mathcal B})$ are 4th order isotropic tensors whose components are $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik}\,\delta_{jl} \cr {\mathcal B}_{ijkl} &= \delta_{il}\,\delta_{jk} \cr }$$

Working through the index gymnastics, the $({\mathcal E}, {\mathcal B})$ tensors can be eliminated, and the gradient reduces to $$\frac{\partial p}{\partial W}=(P-pp^T)\star x$$ where $\star$ is the dyadic product. — frank, Mar 07 '17 at 15:28

Derivative of Softmax

1 Answers1

Linked