1

When I generate a src_mask like this

mask = torch.triu(
    torch.ones(batch_size, batch_size).bool(), 
    diagonal=0
)

>> tensor([[ True, True, True, True, True], [False, True, True, True, True], [False, False, True, True, True], [False, False, False, True, True], [False, False, False, False, True]])

then the transformer only generates NaN values. If I change diagonal=1 it works, but I don't really understand why. The goal of the mask is to prevent the transformer from paying attention to to any sample after the current sample (and I want to increase the amount of masked values later) for the model to predict further into the future.

kot
  • 11
  • 1

0 Answers0