Transformer (machine learning model)

From David's Wiki
Jump to navigation Jump to search
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Attention is all you need paper
A neural network architecture by Google.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.

Architecture

The Transformer uses an encoder-decoder architecture. Both the encoder and decoder are comprised of multiple identical layers which have attention and feedforward sublayers.
Transformer architecture.png

Attention

Attention is the main contribution of the transformer architecture.
Transformer attention.png
The attention block outputs a weighted average of values in a dictionary of key-value pairs.
In the image above:

  • \(\displaystyle Q\) represents queries (each query is a vector)
  • \(\displaystyle K\) represents keys
  • \(\displaystyle V\) represents values

The attention block can be represented as the following equation:

  • \(\displaystyle \operatorname{SoftMax}(\frac{QK^T}{\sqrt{d_k}})V\)

Encoder

The receives as input the input embedding added to a positional encoding.
The encoder is comprised of N=6 layers, each with 2 sublayers.
Each layer contains a multi-headed attention sublayer followed by a feed-forward sublayer.
Both sublayers are residual blocks.

Decoder

Resources

Guides and explanations

References