Swin Transformer Relative Position Biases

Asked Sep 08 '23 at 17:23

Active Sep 08 '23 at 17:35

Viewed 100 times

I was reading the swin transformer paper and looking at the github implementation, i noticed that when calculating the relative position bias the input to the log function before the CPB MLP is scaled to a range 0 to 8. I couldn't see mention of this in the original paper my intuition is that this will give output in the range 0 to 1 I was wondering if this was the correct reasoning?

However, i also noticed that the output of the MLP is passed through a sigmoid function before being scaled by a factor of 16. I also couldn't find this being mentioned in the paper and was wondering what the underlying reasoning is?

Also I have just noticed that the logit scale parameter is intialised to ln(10) is there a reason for this?

Thankyou for any assistance.

edited Sep 08 '23 at 17:35

asked Sep 08 '23 at 17:23

jack

Swin Transformer Relative Position Biases

0 Answers0