I am implementing a custom algo inspired by NMT architecture BUT in the decoder, if Query = target language then the "value" should also be the same thing right ? Only the "key" should be the encoder output ( encoded source language ). After much heartburn i have made peace with the fact that a "query" is what you are trying to find out and the "key" is some sort of index FOR the "values", from which you choose your answer (based on the best score generated by the attention algo ). So if i want to convert French to English, and my encoder is encoding French, then my query is got to be in English and so the values must be in English right ? why does the TF code ( NMT tutorial ) take the French encoding as the KEY and the VALUE ??
OR is the interpretation that since the query ( masked input , English ) and the key ( encoded French ) are first "dot producted" ( sorry ) together, the "values" are in fact being learnt during the training based on the loss calculated by difference between input context ( English ), so far and the next predicted word ? and during inference the "key-values" are now in the form of a French-English dictionary ( a very smart dict at that which gives nearest word, based on context ) ?