Questions tagged [softmax]

66 questions
80
votes
6 answers

Cross-entropy loss explanation

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…
enterML
  • 3,091
  • 9
  • 28
  • 38
36
votes
4 answers

Gumbel-Softmax trick vs Softmax with temperature

From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning). Many papers and articles describe it as a way…
4-bit
  • 461
  • 1
  • 4
  • 3
6
votes
5 answers

Do I need to standardize my one hot encoded labels?

I'm trying to do a simple softmax regression where I have features (2 columns) and a one hot encoded vector of labels (two categories: left = 1 and Right = 0). Do I need to standardize just the vector of features or also the vector of labels? when…
5
votes
1 answer

Can I turn any binary classification algorithms into multiclass algorithms using softmax and cross-entropy loss?

Softmax + cross-entropy loss for multiclass classification is used in ML algorithms such as softmax regression and (last layer of) neural networks. I wonder if this method could turn any binary classification algorithms into a multiclass one? For…
5
votes
1 answer

What is the advantage of using Euler's number ($e^x$) instead of another base in the softmax equation?

I understand the softmax equation is $\boldsymbol{P}(y=j \mid x)=\frac{e^{x_{j}}}{\sum_{k=1}^{K} e^{x_{k}}}$ My question is: why use $e^x$ instead of say, $3^x$. I understand $e^x$ is it's own derivative, but how is that advantageous in this…
4
votes
2 answers

Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?

I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities? Let's say our model is two trees. A…
Caleb
  • 151
  • 1
  • 1
  • 3
3
votes
1 answer

Difference in performance Sigmoid vs. Softmax

For the same Binary Image Classification task, if in the final layer I use 1 node with Sigmoid activation function and binary_crossentropy loss function, then the training process goes through pretty smoothly (92% accuracy after 3 epochs on…
3
votes
3 answers

Dot product for similarity in word to vector computation in NLP

In NLP while computing word to vector we try to maximize log(P(o|c)). Where P(o|c) is probability that o is outside word, given that c is center word. Uo is word vector for outside word Vc is word vector for center word T is number of words in…
Vivek Dani
  • 129
  • 1
  • 5
3
votes
1 answer

Softmax activation predictions not summing to 1

I am a beginner with rnns, consider this sample code from tensorflow import keras import numpy as np if __name__ == '__main__': model = keras.Sequential(( keras.layers.SimpleRNN(5, activation="softmax", input_shape=(1, 3)), )) …
user329387
  • 31
  • 1
3
votes
1 answer

Pytorch doing a cross entropy loss when the predictions already have probabilities

So, normally categorical cross-entropy could be applied using a cross-entropy loss function in PyTorch or by combing a logsoftmax with the negative log likelyhood function such as follows: m = nn.LogSoftmax(dim=1) loss = nn.NLLLoss() pred =…
3
votes
1 answer

Why softmax training is more stable

I'm wondering about which activation function will be easier to train with (get better accuracy / smallest loss) - with SoftMax or sigmoid (for multiclass classification problem) According…
3
votes
1 answer

Which activation function for multi-class classification gives true probability (softmax vs sigmoid)

I'm wondering which activation function for multi class classification problem, give true probability. According to: https://ai.stackexchange.com/questions/37889/are-softmax-outputs-of-classifiers-true-probabilities it seems that the output…
2
votes
0 answers

Precision-Recall Curve Intuition for Multi-Class Classification Utilizing SoftMax Activation

I am running a CNN image multi-class classification model with Keras/Tensorflow and have established about a 90% overall accuracy with my best model trial. I have 10 unique classes I am trying to classify. However I want to present a PRC for the…
2
votes
1 answer

Problem with chain rule in softmax layer when differentiated separately

I have some problems with backpropagation in softmax output layer. I know how it should work but if I try to apply the chain rule in the classical way, I get different results compared to when Softmax is derivated with Cross-Entropy error. Here's an…
Display name
  • 153
  • 1
  • 4
2
votes
1 answer

Why use different variations of Softmax in training and validation for neural networks with Pytorch?

Specifically, I'm working on a modeling project, and I see someone else's code that looks like def forward(self, x): x = self.fc1(x) x = self.activation1(x) x = self.fc2(x) x = self.activation2(x) x = self.fc3(x) x =…
Anon
  • 123
  • 4
1
2 3 4 5