1

What do warm steps and warmup proportion mean? how to select the number of warmup steps?

Learning rate changes for each batch or each epoch for warmup step=1 ?

SS Varshini
  • 249
  • 5
  • 15

2 Answers2

3

Answering your four questions

  1. Warm up steps: Its used to indicate set of training steps with very low learning rate
  2. Warm up proportion ($wu$): Its the proportion of number of warmup steps to the total number of steps 3 Selecting the number of warmup steps varies depending on each case.
    • This research paper discusses warmup steps with 0%, 2%, 4%, and 6%, which are all reflect significantly fewer warmup steps than in BERT.
    • This particular user had better performance with warmup steps of 165k. Kindly refer to this forum
  3. As per this deep-learning documentation its warmup per epoch

enter image description here

References:

Archana David
  • 1,279
  • 7
  • 22
2

I will quote from several well-explaining resources.

Reddit.

a) Warm-up: A phase in the beginning of your neural network training where you start with a learning rate much smaller than your "initial" learning rate and then increase it over a few iterations or epochs until it reaches that "initial" learning rate.

Another nice explanation. This one also has an example code and graph.

Warmup is a method of warming up learning rate mentioned in ResNet paper. At the beginning of training, it uses a small learning rate to train some epoches or steps (for example, 4 epochs, 10000 steps), and then modifies it to the preset learning for training.

Now, carefully read this one from Stack Overflow:

A training step is one gradient update. In one step batch_size examples are processed. An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of:

2,000 images / (10 images / step) = 200 steps.

desertnaut
  • 2,154
  • 2
  • 16
  • 25
Mr. Panda
  • 131
  • 4