5,322
edits
Line 44: | Line 44: | ||
===Decoder=== | ===Decoder=== | ||
Each decoder consists of a self-attention, an encoder-decoder attention, and a feed-forward layer. | Each decoder consists of a self-attention, an encoder-decoder attention, and a feed-forward layer. | ||
As with the encoder, each layer is followed by an add-and-normalize residual connection. | |||
The encoder-decoder attention generates its keys and values from the output of the last encoder block. | |||
==Code== | ==Code== |