Transformer (machine learning model): Difference between revisions

← Older edit Newer edit →

@@ Line 11: / Line 11: @@
 ===Attention===
 Attention is the main contribution of the transformer architecture.<br>
-[[File:Transformer attention.png|500px]]
+[[File:Transformer attention.png|500px]]<br>
 The attention block outputs a weighted average of values in a dictionary of key-value pairs.<br>
 In the image above:<br>
@@ Line 19: / Line 19: @@
 The attention block can be represented as the following equation:
 * <math>\operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V</math>
 ===Encoder===
 The receives as input the input embedding added to a positional encoding.<br>