Jump to content

Transformer (machine learning model): Difference between revisions

Line 34: Line 34:


;Encoder-decoder attention
;Encoder-decoder attention
The decoder also uses encoder-decoder attention where the keys and values come from the encoder (i.e. the input sentence) but the queries come from the decoder input (i.e. the previously-generated output).
The decoder also uses encoder-decoder attention where the keys and values are generated from the output embedding of the encoder (i.e. the input sentence) using its own weight matrices but the queries are generated from the decoder input (i.e. the previously-generated output).


===Encoder===
===Encoder===