1

Via this MIT document i studied the single layer Perceptron convergence proof (= maximum number of steps).

In the convergence proof inside this document , the learning rate is implicitly defined as 1 . After studying it, by myself i tried to re-do the proof inserting this time a generic learning rate $\eta $.

The result is that the result remains the same:

$k\leq \frac{R^{2}\left \| \theta ^{*} \right \|^{2}}{\gamma ^{2}}$

that is the learning rate $\eta$ cancelled out in the proof.

Is it possible , or i make mistakes in my proof ?

Qwerto
  • 705
  • 1
  • 8
  • 15

1 Answers1

2

Your conclusion is correct.

Note that the classifier is of the form of

$$f(x; \theta) = \operatorname{sign}(\theta^Tx)$$

while the update rule is

$$\theta^{(k+1)}=\theta^{(k)}+\eta y_tx_t$$ which only occurs when there is a misclassification and we only care about the in the classification but

$$\operatorname{sign}(\theta^Tx)=\operatorname{sign}(\eta \theta^Tx)$$

as long as $\eta$ is positive.

It only scales $\theta^Tx$ but our concern is just the sign, hence the learning rate is irrelevant.

Remark: You might like to note the assumptions being used in the proof though. For instance, there is this assumption that $\theta^*$ exists. It is assumed that the data can be perfectly separated by a hyperplane.

Siong Thye Goh
  • 3,138
  • 2
  • 17
  • 23