Highest Voted 'gradient' Questions - Data Science Stack Exchange

15

votes

4 answers

How to prevent vanishing gradient or exploding gradient?

What's causing the vanishing gradient or exploding gradient, and what are the measures to be taken to prevent it?

asked Apr 15 '20 at 05:00

yashdk

189
1
1
4

4

votes

3 answers

Forward pass vs backward pass vs backpropagation

As mentioned in the question, I have some issues understanding what are the differences between those terms. From what I have understood: Forward pass: compute the output of the network given the input data Backward pass: compute the output error…

deep-learning neural-network gradient-descent backpropagation gradient

asked Jan 13 '20 at 16:07

Mattia Surricchio

421
3
5
15

4

votes

1 answer

Differentiable approximation for counting negative values in array

I have an array of time of arrivals and I want to convert it to count data using pytorch in a differentiable way. Example arrival times: arrival_times = [2.1, 2.9, 5.1] and let's say the total range is 6 seconds. What I want to have is: counts = [0,…

pytorch gradient

asked Nov 08 '21 at 15:14

iRestMyCaseYourHonor

159
5

3

votes

1 answer

How batch normalization layer resolve the vanishing gradient problem?

According to this article: https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484 The vanishing gradient problem occurs when using the sigmoid activation function because sigmoid maps large input space into small space, so the…

deep-learning backpropagation activation-function batch-normalization gradient

asked Jun 02 '21 at 04:40

user3668129

769
4
15

3

votes

3 answers

Why a sign of gradient (plus or minus) is not enough for finding a steepest ascend?

Consider a simple 1-D function $y = x^2$ to find a maximum with the gradient ascent method. If we start in point 3 on x-axis: $$ \frac{\partial f}{\partial x} \biggr\rvert_{x=3} = 2x \biggr\rvert_{x=3} = 6 $$ This means that a direction in which…

gradient-descent learning-rate gradient

asked Jan 16 '21 at 16:19

Kenenbek Arzymatov

189
6

3

votes

1 answer

Gradient Checking: MeanSquareError. Why huge epsilon improves discrepancy?

I am using custom C++ code, and coded a simple "Mean Squared Error" layer. Temporarily using it for the 'classification task', not a simple regression. ...maybe this causes the issues? I don't have anything else before this layer - not even a simple…

gradient-descent backpropagation gradient

asked Apr 18 '20 at 23:30

Kari

2,756
2
21
51

2

votes

1 answer

Why does my manual derivative of Layer Normalization imply no gradient flow?

I recently tried computing the derivative of the layer norm function (https://arxiv.org/abs/1607.06450), an essential component of transformers, but the result suggests that no gradient flows through the operation, which can't be true. Here's my…

normalization transformer gradient

asked Feb 19 '21 at 21:43

Alex

23
5

2

votes

1 answer

Gradient passthough in PyTorch

I need to quantize the inputs, but the method (bucketize) I need to do so is indifferentiable. I can of course detach the tensor, but then I lose the flow of gradients to earlier weights. I guess the question is quite simple, how do you continue…

neural-network deep-learning pytorch backpropagation gradient

asked Dec 30 '20 at 05:21

user3023715

203
2
5

2

votes

1 answer

How to choose appropriate epsilon value while approximating gradients to check training?

While approximating gradients, using actual epsilon to shift the weights results in wildly big gradient approximations, as the "width" of the used approximation triangle is disporportionately small. In Andrew NG-s course, he is using 0.01, but I…

gradient

asked Sep 01 '20 at 13:46

Dávid Tóth

145
5

2

votes

1 answer

Tensorflow.Keras: How to get gradient for an output class w.r.t a given input?

I have implemented and trained a sequential model using tf.keras. Say I am given an input array of size 8X8 and an output [0,1,0,...(rest all 0)]. How to calculate the gradient of the input w.r.t to the given output? model = ... output =…

keras tensorflow gradient

asked Jul 18 '20 at 07:09

samarendra chandan bindu Dash

197
11

2

votes

1 answer

Vanishing Gradient vs Exploding Gradient as Activation function?

ReLU is used as an activation function that serves two purposes: Breaking linearity in DNN. Helping in handling Vanishing Gradient problem. For Exploding Gradient problem, we use Gradient Clipping approach where we set the max threshold limit of…

activation-function gradient

asked Feb 26 '20 at 13:03

vipin bansal

1,282
11
19

2

votes

1 answer

What does it mean for a method to be invariant to diagonal rescaling of the gradients?

In the paper which describes Adam: a method for stochastic optimization, the author states: The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the…

machine-learning neural-network deep-learning optimization gradient

asked Jul 14 '18 at 10:13

dhulmul

121
2

1

vote

0 answers

Which Neural Network or Gradient Boosting framework is the simplest for Custom Loss Functions?

I need to implement a custom loss function. The function is relatively simple: $$-\sum \limits_{i=1}^m [O_{1,i} \cdot y_i-1] \ \cdot \ \operatorname{ReLu}(O_{1,i} \cdot \hat{y_i} - 1)$$ With $O$ being some external attribute specific to each case. I…

python neural-network loss-function lightgbm gradient

asked Jul 21 '21 at 15:12

Borut Flis

199
3
7

1

vote

1 answer

vanishing gradient and gradient zero

There is a well known problem vanishing gradient in BackPropagation training of Feedforward Neural Network (FNN)(here we don't consider the vanishing gradient of Recurrent Neural Network). I don't understand why vanishing gradient does not mean the…

machine-learning gradient-descent gradient

asked Sep 30 '20 at 05:18

user6703592

127
5

1

vote

1 answer

Can mini-batch gradient descent outperform batch gradient descent?

As I was reading and going through the second course of Andrew Ng's deep learning course, I came across a sentence that said, With a well-turned mini-batch size, usually it outperforms either gradient descent or stochastic gradient descent…

machine-learning python deep-learning gradient-descent gradient

asked Jul 30 '20 at 05:28

mitra mirshafiee

153
3

Questions tagged [gradient]