Transformer (machine learning model): Difference between revisions

Transformer (machine learning model) (view source)

72 bytes added , 23 November 2020

5,322

edits

@@ Line 34: / Line 34: @@
 ;Encoder-decoder attention
-The decoder also uses encoder-decoder attention where the keys and values come from the encoder (i.e. the input sentence) but the queries come from the decoder input (i.e. the previously-generated output).
+The decoder also uses encoder-decoder attention where the keys and values are generated from the output embedding of the encoder (i.e. the input sentence) using its own weight matrices but the queries are generated from the decoder input (i.e. the previously-generated output).
 ===Encoder===