I am only just getting familiar with gradient descent through learning logistic regression. I understand the directional component in the gradient vectors is correct information derived from the slope of the cost with respect to the weights. What about the magnitude, why is it arbitrary? because if this magnitude is not even marginally proportional to the distance from the minimum, this means that it is arbitrary. As for the direction, the magnitude is not needed, because the vector already tells us the direction.
I understand that this magnitude is the derivative of the cost function, which reflects the rate of change of the cost function at a given point. what information does this rate of change of the cost function at a given point tell us then, is it about the distance to the minimum? but this can't be because local steepness (which is what the gradient magnitudes represent) tells us nothing about the distance to the minimum because the slope can be very steep at one point and the minimum could be right next to it. If it is arbitrary then we wouldn't use it along with the learning rate, but I don't understand the information we are getting from this value.
(ChatGPT just answered by giving circular explanations so it wasn't much help)
