Jump to content

Transformer (machine learning model): Difference between revisions

No edit summary
Line 11: Line 11:
===Attention===
===Attention===
Attention is the main contribution of the transformer architecture.<br>
Attention is the main contribution of the transformer architecture.<br>
[[File:Transformer attention.png|500px]]
[[File:Transformer attention.png|500px]]<br>
The attention block outputs a weighted average of values in a dictionary of key-value pairs.<br>
The attention block outputs a weighted average of values in a dictionary of key-value pairs.<br>
In the image above:<br>
In the image above:<br>
Line 19: Line 19:
The attention block can be represented as the following equation:
The attention block can be represented as the following equation:
* <math>\operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V</math>
* <math>\operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V</math>
===Encoder===
===Encoder===
The receives as input the input embedding added to a positional encoding.<br>
The receives as input the input embedding added to a positional encoding.<br>