Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

https://en.wikipedia.org/wiki/Gradient_descent

446 questions

votes

7 answers

What is the difference between Gradient Descent and Stochastic Gradient Descent?

What is the difference between Gradient Descent and Stochastic Gradient Descent? I am not very familiar with these, can you describe the difference with a short example?

asked Aug 04 '18 at 06:36

Developer

1,109
2
9
11

votes

5 answers

Does gradient descent always converge to an optimum?

I am wondering whether there is any scenario in which gradient descent does not converge to a minimum. I am aware that gradient descent is not always guaranteed to converge to a global optimum. I am also aware that it might diverge from an optimum…

machine-learning neural-network deep-learning optimization gradient-descent

asked Nov 09 '17 at 16:41

wit221

votes

4 answers

Scikit-learn: Getting SGDClassifier to predict as well as a Logistic Regression

A way to train a Logistic Regression is by using stochastic gradient descent, which scikit-learn offers an interface to. What I would like to do is take a scikit-learn's SGDClassifier and have it score the same as a Logistic Regression here.…

python logistic-regression scikit-learn gradient-descent

asked Aug 04 '15 at 08:11

hlin117

votes

1 answer

How does Gradient Descent and Backpropagation work together?

Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…

machine-learning neural-network gradient-descent backpropagation cost-function

asked Jan 28 '19 at 13:34

Mohamed Mahyoub

votes

2 answers

Why ReLU is better than the other activation functions

Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…

machine-learning neural-network deep-learning gradient-descent activation-function

asked Oct 03 '17 at 14:17

Green Falcon

14,308
10
59
98

votes

4 answers

How to prevent vanishing gradient or exploding gradient?

What's causing the vanishing gradient or exploding gradient, and what are the measures to be taken to prevent it?

machine-learning deep-learning gradient-descent gradient

asked Apr 15 '20 at 05:00

yashdk

votes

4 answers

Is Gradient Descent central to every optimizer?

I want to know whether Gradient descent is the main algorithm used in optimizers like Adam, Adagrad, RMSProp and several other optimizers.

machine-learning neural-network deep-learning optimization gradient-descent

asked Mar 12 '19 at 10:04

rawwar

votes

2 answers

Why averaging the gradient works in Gradient Descent?

In Full-batch Gradient descent or Minibatch-GD we are getting gradient from several training examples. We then average them out to get a "high-quality" gradient, from several estimations and finally use it to correct the network, at once. But why…

gradient-descent mini-batch-gradient-descent

asked Jun 22 '18 at 05:49

Kari

2,756
2
21
51

votes

1 answer

What feature engineering is necessary with tree based algorithms?

I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your…

feature-selection decision-trees xgboost gradient-descent feature-engineering

asked Aug 08 '17 at 15:00

William Entriken

votes

4 answers

In training a neural network, why don’t we take the derivative with respect to the step size in gradient descent?

This is one of those questions where I know I am wrong, but I don't know how. I understand that when training a neural network, we calculate the derivatives of the loss function with respect to the parameters. I also understand that these…

machine-learning gradient-descent derivation

asked Mar 03 '25 at 14:16

Leo Juhlin

votes

2 answers

Why is learning rate causing my neural network's weights to skyrocket?

I am using tensorflow to write simple neural networks for a bit of research and I have had many problems with 'nan' weights while training. I tried many different solutions like changing the optimizer, changing the loss, the data size, etc. but with…

machine-learning python tensorflow optimization gradient-descent

asked Dec 27 '16 at 22:50

abeoliver

votes

4 answers

What is momentum in neural network?

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says For The momentum, type a value to apply during learning as a weight on nodes from previous…

machine-learning deep-learning neural-network gradient-descent momentum

asked Oct 18 '20 at 09:25

Sandeep Bhutani

votes

1 answer

How flexible is the link between objective function and output layer activation function?

It seems standard in many neural network packages to pair up the objective function to be minimised with the activation function in the output layer. For instance, for a linear output layer used for regression it is standard (and often only choice)…

neural-network gradient-descent

asked Jul 08 '15 at 20:04

Neil Slater

29,388
5
82
101

votes

1 answer

What is the difference between SGD classifier and the Logisitc regression?

To my understanding, the SGD classifier, and Logistic regression seems similar. An SGD classifier with loss = 'log' implements Logistic regression and loss = 'hinge' implements Linear SVM. I also understand that logistic regression uses gradient…

machine-learning logistic-regression gradient-descent loss-function

asked Sep 07 '18 at 18:15

Akash Dubey

votes

3 answers

Why do we use gradients instead of residuals in Gradient Boosting?

I have found mentions of two advantages in using gradients instead of actual residuals: 1) Using gradients will allow us to plug in any loss function (not just mse) without having to change our base learners to make them compatible with the loss…

machine-learning xgboost optimization gradient-descent

asked May 13 '18 at 20:25

eyio

2 3

…

29 30 Next