I'm new to deep learning and am attempting to calculate the derivative of the following function with respect to the matrix w:
$$p(a) = \frac{e^{w_a^Tx}}{\Sigma_{d} e^{w_d^Tx}}$$
Using quotient rule, I get: $$\frac{\partial p(a)}{\partial w} = \frac{xe^{w_a^Tx}\Sigma_{d} e^{w_d^Tx} - e^{w_a^Tx}\Sigma_{d} xe^{w_d^Tx}}{[\Sigma_{d} e^{w_d^Tx}]^2} = 0$$
I believe I'm doing something wrong, since the softmax function is commonly used as an activation function in deep learning (and thus cannot always have a derivative of 0). I've gone over similar questions, but they seem to gloss over this part of the calculation.
I'd appreciate any pointers towards the right direction.