Jump to content

Transformer (machine learning model): Difference between revisions

Line 44: Line 44:


===Decoder===
===Decoder===
Each decoder consists of a self-attention, an encoder-decoder attention, and a feed-forward layer.
Each decoder consists of a self-attention, an encoder-decoder attention, and a feed-forward layer
As with the encoder, each layer is followed by an add-and-normalize residual connection. 
The encoder-decoder attention generates its keys and values from the output of the last encoder block.


==Code==
==Code==