Transformer (machine learning model): Difference between revisions

Transformer (machine learning model) (view source)

196 bytes added , 23 November 2020

5,322

edits

@@ Line 44: / Line 44: @@
 ===Decoder===
 Each decoder consists of a self-attention, an encoder-decoder attention, and a feed-forward layer.
+As with the encoder, each layer is followed by an add-and-normalize residual connection.
+The encoder-decoder attention generates its keys and values from the output of the last encoder block.
 ==Code==