Highest Voted 'activation-function' Questions

62

votes

3 answers

LeakyReLU vs PReLU

I thought both, PReLU and Leaky ReLU are: $$f(x) = \max(x, \alpha x) \qquad \text{ with } \alpha \in (0, 1)$$ Keras, however, has both functions in the docs. Leaky ReLU Source of LeakyReLU: return K.relu(inputs, alpha=self.alpha) Hence (see relu…

asked Apr 25 '17 at 11:58

Martin Thoma

19,540
36
98
170

48

votes

4 answers

Why is ReLU used as an activation function?

Activation functions are used to introduce non-linearities in the linear output of the type w * x + b in a neural network. Which I am able to understand intuitively for the activation functions like sigmoid. I understand the advantages of ReLU,…

machine-learning neural-network deep-learning activation-function

asked Jan 10 '18 at 13:07

Bunny Rabbit

603
1
6
6

44

votes

2 answers

What is GELU activation?

I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation…

activation-function bert mathematics

asked Apr 18 '19 at 08:06

thanatoz

2,495
4
20
41

36

votes

4 answers

How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?

How do you use LeakyRelu as an activation function in sequence DNN in keras? If I want to write something similar to: model = Sequential() model.add(Dense(90, activation='LeakyRelu')) What is the solution? Put LeakyRelu similar to Relu? Second…

machine-learning python deep-learning keras activation-function

asked Oct 02 '18 at 04:06

user10296606

1,906
6
18
33

23

votes

2 answers

Why ReLU is better than the other activation functions

Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…

machine-learning neural-network deep-learning gradient-descent activation-function

asked Oct 03 '17 at 14:17

Green Falcon

14,308
10
59
98

22

votes

1 answer

Difference of Activation Functions in Neural Networks in general

I have studied the activation function types for neural networks. The functions themselves are quite straightforward, but the application difference is not entirely clear. It's reasonable that one differentiates between logical and linear type…

neural-network activation-function

asked Oct 04 '16 at 11:05

Hendrik

8,767
17
43
55

21

votes

2 answers

Why deep learning models still use RELU instead of SELU, as their activation function?

I am a trying to understand the SELU activation function and I was wondering why deep learning practitioners keep using RELU, with all its issues, instead of SELU, which enables a neural network to converge faster and internally normalizes each…

machine-learning deep-learning neural-network activation-function

asked Oct 02 '21 at 19:17

Konstantinos Skoularikis

413
1
3
10

20

votes

3 answers

How to create custom Activation functions in Keras / TensorFlow?

I'm using keras and I wanted to add my own activation function myf to tensorflow backend. how to define the new function and make it operational. so instead of the line of code: model.add(layers.Conv2D(64, (3, 3), activation='relu')) I'll write…

keras tensorflow activation-function

asked Sep 09 '19 at 07:34

Basta

201
1
2
4

14

votes

1 answer

Input normalization for ReLu?

Let's assume a vanilla MLP for classification with a given activation function for hidden layers. I know it is a known best practice to normalize the input of the network between 0 and 1 if sigmoid is the activation function and -0.5 and 0.5 if tanh…

machine-learning neural-network deep-learning activation-function

asked Dec 20 '17 at 03:39

Taiko

243
1
2
6

13

votes

5 answers

How does Sigmoid activation work in multi-class classification problems

I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?

machine-learning neural-network deep-learning multiclass-classification activation-function

asked Oct 06 '18 at 08:41

bharath chandra

131
1
1
4

11

votes

3 answers

Why Leaky ReLU is not so common in real practice?

As Leaky ReLU does not lead any value to 0, so training always continues. And I can't think of any disadvantages it has. Yet Leaky ReLU is less popular than ReLU in real practice. Can someone tell why this is?

machine-learning neural-network deep-learning activation-function

asked May 14 '20 at 02:30

Prashant Gupta

221
2
5

10

votes

4 answers

Activation function vs Squashing function

This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer. Today, in a video explaining deep neural networks, I came across the term Squashing function. This is a term that I have never heard or…

machine-learning neural-network activation-function

asked Aug 06 '18 at 12:48

Mate de Vita

203
1
2
8

10

votes

1 answer

Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training?

ReLU is an activation function defined as $h = \max(0, a)$ where $a = Wx + b$. Normally, we train neural networks with first-order methods such as SGD, Adam, RMSprop, Adadelta, or Adagrad. Backpropagation in first-order methods requires first-order…

neural-network optimization backpropagation activation-function

asked Jul 12 '16 at 17:16

Rizky Luthfianto

2,256
2
21
22

10

votes

2 answers

ReLU vs Leaky ReLU vs ELU with pros and cons

I am unable to understand when to use ReLU, Leaky ReLU and ELU. How do they compare to other activation functions(like the sigmoid and the tanh) and their pros and cons.

machine-learning python deep-learning neural-network activation-function

asked Sep 25 '21 at 21:34

Ayazzia01

113
1
1
6

8

votes

2 answers

How does one derive the modified tanh activation proposed by LeCun?

In "Efficient Backprop" (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf), LeCun and others propose a modified tanh activation function of the form: $$ f(x) = 1.7159 * tanh(\frac{2}{3}*x) $$ They argue that : It is easier to approximate with…

neural-network activation-function mathematics

asked Jan 25 '20 at 14:17

Lucas Morin

2,775
5
25
47

Questions tagged [activation-function]