Activation function is a non-linear transformation, usually applied in neural networks to the output of the linear or convolutional layer. Common activation functions: sigmoid, tanh, ReLU, etc.
Questions tagged [activation-function]
177 questions
62
votes
3 answers
LeakyReLU vs PReLU
I thought both, PReLU and Leaky ReLU are:
$$f(x) = \max(x, \alpha x) \qquad \text{ with } \alpha \in (0, 1)$$
Keras, however, has both functions in the docs.
Leaky ReLU
Source of LeakyReLU:
return K.relu(inputs, alpha=self.alpha)
Hence (see relu…
Martin Thoma
- 19,540
- 36
- 98
- 170
48
votes
4 answers
Why is ReLU used as an activation function?
Activation functions are used to introduce non-linearities in the linear output of the type w * x + b in a neural network.
Which I am able to understand intuitively for the activation functions like sigmoid.
I understand the advantages of ReLU,…
Bunny Rabbit
- 603
- 1
- 6
- 6
44
votes
2 answers
What is GELU activation?
I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as
$$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{
2/π}(x + 0.044715x^3)])$$
Could you simplify the equation…
thanatoz
- 2,495
- 4
- 20
- 41
36
votes
4 answers
How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?
How do you use LeakyRelu as an activation function in sequence DNN in keras?
If I want to write something similar to:
model = Sequential()
model.add(Dense(90, activation='LeakyRelu'))
What is the solution? Put LeakyRelu similar to Relu?
Second…
user10296606
- 1,906
- 6
- 18
- 33
23
votes
2 answers
Why ReLU is better than the other activation functions
Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…
Green Falcon
- 14,308
- 10
- 59
- 98
22
votes
1 answer
Difference of Activation Functions in Neural Networks in general
I have studied the activation function types for neural networks. The functions themselves are quite straightforward, but the application difference is not entirely clear.
It's reasonable that one differentiates between logical and linear type…
Hendrik
- 8,767
- 17
- 43
- 55
21
votes
2 answers
Why deep learning models still use RELU instead of SELU, as their activation function?
I am a trying to understand the SELU activation function and I was wondering why deep learning practitioners keep using RELU, with all its issues, instead of SELU, which enables a neural network to converge faster and internally normalizes each…
Konstantinos Skoularikis
- 413
- 1
- 3
- 10
20
votes
3 answers
How to create custom Activation functions in Keras / TensorFlow?
I'm using keras and I wanted to add my own activation function myf to tensorflow backend. how to define the new function and make it operational. so instead of the line of code:
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
I'll write…
Basta
- 201
- 1
- 2
- 4
14
votes
1 answer
Input normalization for ReLu?
Let's assume a vanilla MLP for classification with a given activation function for hidden layers.
I know it is a known best practice to normalize the input of the network between 0 and 1 if sigmoid is the activation function and -0.5 and 0.5 if tanh…
Taiko
- 243
- 1
- 2
- 6
13
votes
5 answers
How does Sigmoid activation work in multi-class classification problems
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
bharath chandra
- 131
- 1
- 1
- 4
11
votes
3 answers
Why Leaky ReLU is not so common in real practice?
As Leaky ReLU does not lead any value to 0, so training always continues. And I can't think of any disadvantages it has.
Yet Leaky ReLU is less popular than ReLU in real practice. Can someone tell why this is?
Prashant Gupta
- 221
- 2
- 5
10
votes
4 answers
Activation function vs Squashing function
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today, in a video explaining deep neural networks, I came across the term Squashing function. This is a term that I have never heard or…
Mate de Vita
- 203
- 1
- 2
- 8
10
votes
1 answer
Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training?
ReLU is an activation function defined as $h = \max(0, a)$ where $a = Wx + b$.
Normally, we train neural networks with first-order methods such as SGD, Adam, RMSprop, Adadelta, or Adagrad. Backpropagation in first-order methods requires first-order…
Rizky Luthfianto
- 2,256
- 2
- 21
- 22
10
votes
2 answers
ReLU vs Leaky ReLU vs ELU with pros and cons
I am unable to understand when to use ReLU, Leaky ReLU and ELU.
How do they compare to other activation functions(like the sigmoid and the tanh) and their pros and cons.
Ayazzia01
- 113
- 1
- 1
- 6
8
votes
2 answers
How does one derive the modified tanh activation proposed by LeCun?
In "Efficient Backprop" (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf), LeCun and others propose a modified tanh activation function of the form:
$$ f(x) = 1.7159 * tanh(\frac{2}{3}*x) $$
They argue that :
It is easier to approximate with…
Lucas Morin
- 2,775
- 5
- 25
- 47