Questions tagged [softmax]
66 questions
80
votes
6 answers
Cross-entropy loss explanation
Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…
enterML
- 3,091
- 9
- 28
- 38
36
votes
4 answers
Gumbel-Softmax trick vs Softmax with temperature
From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning).
Many papers and articles describe it as a way…
4-bit
- 461
- 1
- 4
- 3
6
votes
5 answers
Do I need to standardize my one hot encoded labels?
I'm trying to do a simple softmax regression where I have features (2 columns) and a one hot encoded vector of labels (two categories: left = 1 and Right = 0). Do I need to standardize just the vector of features or also the vector of labels? when…
José Lucas Araújo dos Santos
- 61
- 1
- 2
5
votes
1 answer
Can I turn any binary classification algorithms into multiclass algorithms using softmax and cross-entropy loss?
Softmax + cross-entropy loss for multiclass classification is used in ML algorithms such as softmax regression and (last layer of) neural networks. I wonder if this method could turn any binary classification algorithms into a multiclass one? For…
5
votes
1 answer
What is the advantage of using Euler's number ($e^x$) instead of another base in the softmax equation?
I understand the softmax equation is
$\boldsymbol{P}(y=j \mid x)=\frac{e^{x_{j}}}{\sum_{k=1}^{K} e^{x_{k}}}$
My question is: why use $e^x$ instead of say, $3^x$. I understand $e^x$ is it's own derivative, but how is that advantageous in this…
Codedorf
- 53
- 5
4
votes
2 answers
Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?
I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities?
Let's say our model is two trees. A…
Caleb
- 151
- 1
- 1
- 3
3
votes
1 answer
Difference in performance Sigmoid vs. Softmax
For the same Binary Image Classification task, if in the final layer I use 1 node with Sigmoid activation function and binary_crossentropy loss function, then the training process goes through pretty smoothly (92% accuracy after 3 epochs on…
Confucius Cat
- 51
- 4
3
votes
3 answers
Dot product for similarity in word to vector computation in NLP
In NLP while computing word to vector we try to maximize log(P(o|c)). Where P(o|c) is probability that o is outside word, given that c is center word.
Uo is word vector for outside word
Vc is word vector for center word
T is number of words in…
Vivek Dani
- 129
- 1
- 5
3
votes
1 answer
Softmax activation predictions not summing to 1
I am a beginner with rnns, consider this sample code
from tensorflow import keras
import numpy as np
if __name__ == '__main__':
model = keras.Sequential((
keras.layers.SimpleRNN(5, activation="softmax", input_shape=(1, 3)),
))
…
user329387
- 31
- 1
3
votes
1 answer
Pytorch doing a cross entropy loss when the predictions already have probabilities
So, normally categorical cross-entropy could be applied using a cross-entropy loss function in PyTorch or by combing a logsoftmax with the negative log likelyhood function such as follows:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
pred =…
user3023715
- 203
- 2
- 5
3
votes
1 answer
Why softmax training is more stable
I'm wondering about which activation function will be easier to train with (get better accuracy / smallest loss) - with SoftMax or sigmoid (for multiclass classification problem)
According…
user3668129
- 769
- 4
- 15
3
votes
1 answer
Which activation function for multi-class classification gives true probability (softmax vs sigmoid)
I'm wondering which activation function for multi class classification problem, give true probability.
According to:
https://ai.stackexchange.com/questions/37889/are-softmax-outputs-of-classifiers-true-probabilities
it seems that the output…
user3668129
- 769
- 4
- 15
2
votes
0 answers
Precision-Recall Curve Intuition for Multi-Class Classification Utilizing SoftMax Activation
I am running a CNN image multi-class classification model with Keras/Tensorflow and have established about a 90% overall accuracy with my best model trial. I have 10 unique classes I am trying to classify. However I want to present a PRC for the…
Coldchain9
- 159
- 5
2
votes
1 answer
Problem with chain rule in softmax layer when differentiated separately
I have some problems with backpropagation in softmax output layer. I know how it should work but if I try to apply the chain rule in the classical way, I get different results compared to when Softmax is derivated with Cross-Entropy error. Here's an…
Display name
- 153
- 1
- 4
2
votes
1 answer
Why use different variations of Softmax in training and validation for neural networks with Pytorch?
Specifically, I'm working on a modeling project, and I see someone else's code that looks like
def forward(self, x):
x = self.fc1(x)
x = self.activation1(x)
x = self.fc2(x)
x = self.activation2(x)
x = self.fc3(x)
x =…
Anon
- 123
- 4