5,322
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{main | Wikipedia: Transformer (machine learning model)}} | {{main | Wikipedia: Transformer (machine learning model)}} | ||
[https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br> | [https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br> | ||
A neural network architecture by Google. | A neural network architecture by Google.<br> | ||
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks. | It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks. | ||
==Architecture== | |||
The Transformer uses an encoder-decoder architecture. | |||
Both the encoder and decoder are comprised of multiple identical layers which have | |||
attention and feedforward sublayers.<br> | |||
[[File:Transformer architecture.png|500px]] | |||
===Attention=== | |||
Attention is the main contribution of the transformer architecture.<br> | |||
[[File:Transformer attention.png|500px]] | |||
The attention block outputs a weighted average of values in a dictionary of key-value pairs.<br> | |||
In the image above:<br> | |||
* <math>Q</math> represents queries (each query is a vector) | |||
* <math>K</math> represents keys | |||
* <math>V</math> represents values | |||
The attention block can be represented as the following equation: | |||
* <math>\operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V</math> | |||
===Encoder=== | |||
The receives as input the input embedding added to a positional encoding.<br> | |||
The encoder is comprised of N=6 layers, each with 2 sublayers.<br> | |||
Each layer contains a multi-headed attention sublayer followed by a feed-forward sublayer.<br> | |||
Both sublayers are residual blocks.<br> | |||
===Decoder=== | |||
==Resources== | |||
;Guides and explanations | ;Guides and explanations | ||
* [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer] | * [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer] | ||
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video] | * [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video] |