Transformer (machine learning model)
Attention is all you need paper
A neural network architecture by Google.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.
- represents queries (each query is a vector)
- represents keys
- represents values
The attention block can be represented as the following equation:
The receives as input the input embedding added to a positional encoding.
The encoder is comprised of N=6 layers, each with 2 sublayers.
Each layer contains a multi-headed attention sublayer followed by a feed-forward sublayer.
Both sublayers are residual blocks.
- Guides and explanations