Will stochastic gradient descent converge for multivariate linear regression

Question

I am trying to figure out if stochastic gradient descent for a multivariate linear regression will converge (assuming there is no mini-batching, i.e., the batch size is 1).

My guess is yes, based on the fact that stochastic gradient descent will converge for most well-behaved, convex functions, but the multivariate aspect is throwing me off, as well as the batching.

What is a good way to think about this problem, from both the multidimensional aspects as well as the hyperparameters?

score 1 · Accepted Answer · answered Jul 04 '20 at 06:51

For an MSE Loss, you will have a convex surface

Gradient Descent on a convex function is guaranteed to converge if the learning rate is not too high. The only that can happen is a high enough learning rate which diverges the learning

Impact of one record per learning step - Light on computing but a zigzag gradient

Working on a single instance at a time makes the algorithm much faster because it has very little data to manipulate at every iteration.
On the other hand, due to its stochastic (i.e., random) nature, this algorithm is much less regular than Batch Gradient Descent: instead of gently decreasing until it reaches the minimum, the cost function will bounce up and down, decreasing only on average.
_{$\hspace{2cm}$[Ref] - Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,by Aurelien Geron}

Impact of multi-variate gradient -
Multi-variate means, now you have multiple participants to decide the resultant Gradient. Everything will be the same and the learning will follow the net Gradient path.
One thing that will change is that with a high dimension, we have a long and flat surface in most of the places in the Loss space which can slow down the learning if the learning rate is not scheduled. Convergence is still guaranteed if LR is not too high.

For Intuition -
For more than 3-D, you will have to trust on the extrapolation and vizualize in your own way.
$\hspace{3cm}$
$\hspace{3cm}$Individual record Gradient and Overall gradient - 1 Feature

$\hspace{3cm}$

$\hspace{6cm}$A Loss surface in 3-D - 2 Features
$\hspace{3cm}$_{[3-D plot credit] - https://www.monroecc.edu/faculty/paulseeburger/calcnsf/CalcPlot3D/}

Will stochastic gradient descent converge for multivariate linear regression

1 Answers1