0

As we most know that, Attention is focuses on specific parts of the input sequence those are most relevant in generating output sequence.

Ex: The driver could not drive the car fast because it had a problem.

  1. how attention finds the specific parts(here it is 'it') in input and how it will assign score for the token?

  2. is attention context- based model?

  3. how to obtain attention maps (query, key, value)?

  4. On what basis attention assigns higher weights to input tokens?

tovijayak
  • 67
  • 1
  • 8

1 Answers1

0
  1. Neural networks are considered black boxes because they are not interpretable: we don't know why they compute the results they do.

  2. Yes.

  3. To get attention maps you can you the BERTViz library, e.g.:

    enter image description here

  4. Same answer as 1).

noe
  • 28,203
  • 1
  • 49
  • 83