Jump to content

Transformer (machine learning model): Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
{{main | Wikipedia: Transformer (machine learning model)}}
{{main | Wikipedia: Transformer (machine learning model)}}
[https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br>
[https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br>
A neural network architecture by Google.
A neural network architecture by Google.<br>
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.


==Architecture==
The Transformer uses an encoder-decoder architecture.
Both the encoder and decoder are comprised of multiple identical layers which have
attention and feedforward sublayers.<br>
[[File:Transformer architecture.png|500px]]
===Attention===
Attention is the main contribution of the transformer architecture.<br>
[[File:Transformer attention.png|500px]]
The attention block outputs a weighted average of values in a dictionary of key-value pairs.<br>
In the image above:<br>
* <math>Q</math> represents queries (each query is a vector)
* <math>K</math> represents keys
* <math>V</math> represents values
The attention block can be represented as the following equation:
* <math>\operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V</math>
===Encoder===
The receives as input the input embedding added to a positional encoding.<br>
The encoder is comprised of N=6 layers, each with 2 sublayers.<br>
Each layer contains a multi-headed attention sublayer followed by a feed-forward sublayer.<br>
Both sublayers are residual blocks.<br>
===Decoder===
==Resources==
;Guides and explanations
;Guides and explanations
* [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer]
* [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer]
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video]
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video]