1

I was reading the swin transformer paper and looking at the github implementation, i noticed that when calculating the relative position bias the input to the log function before the CPB MLP is scaled to a range 0 to 8. I couldn't see mention of this in the original paper my intuition is that this will give output in the range 0 to 1 I was wondering if this was the correct reasoning?

However, i also noticed that the output of the MLP is passed through a sigmoid function before being scaled by a factor of 16. I also couldn't find this being mentioned in the paper and was wondering what the underlying reasoning is?

Also I have just noticed that the logit scale parameter is intialised to ln(10) is there a reason for this?

Thankyou for any assistance.

jack
  • 11
  • 2

0 Answers0