Jump to content

Transformer (machine learning model): Difference between revisions

Line 21: Line 21:


===Encoder===
===Encoder===
The receives as input the input embedding added to a positional encoding.<br>
The entire encoder receives as input the input embedding added to a positional encoding.<br>
The encoder is comprised of N=6 layers, each with 2 sublayers.<br>
The encoder is comprised of N=6 blocks, each with 2 layers.<br>
Each layer contains a multi-headed attention sublayer followed by a feed-forward sublayer.<br>
Each block contains a multi-headed attention layer followed by a feed-forward layer.<br>
Both sublayers are residual blocks.<br>


===Decoder===
===Decoder===