Pytorch Transformer only generating NaN when using mask

Asked Nov 12 '23 at 22:57

Active Nov 12 '23 at 22:57

Viewed 205 times

When I generate a src_mask like this

mask = torch.triu(
    torch.ones(batch_size, batch_size).bool(), 
    diagonal=0
)
>> tensor([[ True,  True,  True,  True,  True],
           [False,  True,  True,  True,  True],
           [False, False,  True,  True,  True],
           [False, False, False,  True,  True],
           [False, False, False, False,  True]])

then the transformer only generates NaN values. If I change diagonal=1 it works, but I don't really understand why. The goal of the mask is to prevent the transformer from paying attention to to any sample after the current sample (and I want to increase the amount of masked values later) for the model to predict further into the future.

asked Nov 12 '23 at 22:57

kot

Pytorch Transformer only generating NaN when using mask

0 Answers0