13

I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
bharath chandra
  • 131
  • 1
  • 1
  • 4

5 Answers5

12

If your task is a kind of classification that the labels are mutually exclusive, each input just has one label, you have to use Softmax. If the inputs of your classification task have multiple labels for an input, your classes are not mutually exclusive and you can use Sigmoid for each output. For the former case, you should choose the output entry with the maximum value as the output. For the latter case, for each class, you have an activation value which belongs to the last sigmoid. If each activation is more than 0.5 you can say that entry exists in the input.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
7

softmax() will give you the probability distribution which means all output will sum to 1. While, sigmoid() will make sure the output value of neuron is between 0 to 1.

In case of digit classification and sigmoid(), you will have output of 10 output neurons between 0 to 1. Then, you can take biggest one of them and classify as that digit.

Preet
  • 638
  • 3
  • 5
2

@bharath chandra A Softmax function will never give 3 as output. It will always output real values between 0 and 1. A Sigmoid function also gives output between 0 and 1. The difference is that in the former one, the sum of all the outputs will be equal to 1 (due to mutually exclusive nature) while in the latter case, the sum of all the outputs need not necessarily be equal to 1 (due to independent nature).

PS Nayak
  • 173
  • 1
  • 8
1

For Beginners: You may read this quora answer Which explains Pros and cons of Sigmoid Activations and softmax Probability. there are 6 answers at the time of writing for inclusiveness . Sigmoid vs Softmax

Answer Highlights :

  • if you see the function of Softmax, the sum of all softmax units are supposed to be 1. In sigmoid it’s not really necessary.

  • In the binary classification both sigmoid and softmax function are the same where as in the multi-class classification we use Softmax function.

  • If you’re using one-hot encoding, then I strongly recommend to use Softmax.

What i Noticed: to the best of my knowledge >> Softmax is probability distribution for various possible classes (multi class) in our sample space. and all classes must be predefined in advance before passing anything to softmax activation layer via one-hot encoding. for example tokenization and word stemming in NLP to homogenize data.

For Not-beginners: on the official Keras Page softmax documentation is given as:

softmax

keras.activations.softmax(x, axis=-1)

Softmax activation function.

Arguments

    x: Input tensor.
    axis: Integer, axis along which the softmax normalization is applied.

Returns

Tensor, output of softmax transformation.

Raises

    ValueError: In case dim(x) == 1.

While for Sigmoid is given as:

sigmoid

keras.activations.sigmoid(x)

Sigmoid activation function.

Arguments

    x: Input tensor.

Returns

The sigmoid activation: 1 / (1 + exp(-x)). 
nikhil swami
  • 129
  • 2
0

If you want to classify multiple classes using logistic regression you would have to use multinomial logistic regression instead.

jangui
  • 1