Questions tagged [gradient-descent]

47 questions
7
votes
0 answers

Algorithms for curve construction

I am interested in algorithms that construct continuous curves between two points in such a way that minimizes an energy functional of the curve. What sort of algorithms are most used for such tasks? More formally, given two points $a$ and $b$, and…
4
votes
1 answer

How to show that cross entropy is minimized?

This Question is taken from the book Neural Networks and DeepLearning by Michael Nielsen The Question: In a single-neuron ,It is argued that the cross-entropy is small if σ(z)≈y for all training inputs. The argument relied on y being equal to either…
4
votes
1 answer

Why do we use the log in gradient-based reinforcement algorithms?

I've been reading some papers on reinforcement learning. $$\Delta w=\frac{\partial ln\ p_w}{\partial w}r$$ I often see expressions, similar to the above one, where the weights (denoted by $w$) are updated following the partial derivative of the…
4
votes
1 answer

MDS minimization with gradient descent

I have the following multiple dimensional scaling (MDS) minimization problem in vectors $v_1, v_2, \dots, v_n \in \mathbb R^2$ $$\min_{v_1, v_2, \dots, v_n} \sum_{i,j} \left( \|v_i - v_j\| - d_{i,j} \right)^2$$ which I wish to solve numerically…
4
votes
1 answer

Why update weights and biases after training a Neural Network on whole set of training samples

I am reading the book Neural Networks and Deep Learning by Micheal Nielsen. In the second chapter of his book, he describes the following algorithm for updating weights and biases for a neural network: In the 2nd step, the algorithm computes the…
user5139637
  • 185
  • 5
3
votes
1 answer

Is it possible to solve the Mountain Car reinforcement learning task with linear Q-Learning using the state as direct input?

I'm trying to solve the Mountain Car task on OpenAI Gym (reach the top in 110 steps or less, having a maximum of 200 steps per episode) using linear Q-learning (the algorithm in figure 11.16, except using maxQ at s' instead of the actual a', as…
3
votes
1 answer

Why updating only a part of all neural network weights does not work?

I am having a problem with my program of deep neural network using Theano. In my deep neural network, I have several layers of neural network to predict an output given a certain input. Because of an issue when compiling theano, I have to debug my…
3
votes
1 answer

Speed up minimizing quadratic function by FFT

I'm trying to understand the following excerpt from a paper: Subproblem 1: computing $S$. The $S$ estimation subproblem corresponds to minimizing $$ \sum_{p}(S_p - I_p)^2 + \beta((\partial_xS_p - h_p)^2 + (\partial_yS_p - v_p)^2) \tag…
Yu Dai
  • 131
  • 2
3
votes
2 answers

Gradient descent overshoot - why does it diverge?

I'm thinking about gradient descent, but I don't get it. I understand that it can overshoot the minimum when the learning rate is too large. But I can't understand why it would diverge. Let's say we have $$J(\theta_0, \theta_1) =…
user47979
3
votes
1 answer

Mathematical optimization with thresholded optimization function

Gradient descent can be used to minimize an objective function $\Phi:\mathbb{R}^d \to \mathbb{R}$, if we know how to evaluate $\Phi$ on any input of our choice. However, my situation is a little different. I have an objective function $\Phi$ of the…
D.W.
  • 167,959
  • 22
  • 232
  • 500
2
votes
1 answer

Is there a universal learning rate for NeuralNetworks?

I'm currently creating a NeuralNetwork with backpropagation/gradient descent. There is this hyperparameter introduced called "learning rate" (η). Which has to be chosen to guarantee not overshooting the minimum of the cost function when doing…
2
votes
0 answers

About gradient descent on non-convex functions

There is this "folklore" result that gradient descent on a non-convex function takes $O(\frac n {\epsilon^2})$ steps to get to a point whose gradient norm is below $\epsilon$ and with SGD this takes $O(\frac {1}{\epsilon^4})$ steps. Can someone…
2
votes
1 answer

Calculating gradient in a neural net using batches

I am a CS student learning about neural nets. Currently I am confused about how to train a neural net in batches. If I calculate error in a batch, I will get a vector of errors e.g. real1 - predicted1, real2 - predicted2, etc... How do I then…
2
votes
0 answers

Lazy Stochastic Gradient Descent: Multiplicative vs Additive

I am reading Bob Carpenter's note at http://lingpipe.files.wordpress.com/2008/04/lazysgdregression.pdf and William Cohen's note at http://www.cs.cmu.edu/~wcohen/10-605/notes/sgd-notes.pdf. They described the same technique to lazily decay the…
2
votes
0 answers

Computing $\mathrm{tr}(X^{-1}Y)$ efficiently

I know that one can compute the expression $X^{-1}\mathbf{v}$ quickly with conjugate gradient method. Is there a similar approach for computing $\mathrm{tr}(X^{-1}Y)$? Similarly interesting to me are $\mathrm{tr}(X^{-1})$ and…
1
2 3 4