5,332
edits
Line 1,886: | Line 1,886: | ||
===Long Short Term Memory (LSTMs)=== | ===Long Short Term Memory (LSTMs)=== | ||
Goal is to solve the vanishing and exploding gradient problem. | |||
LSTM has several gates: | LSTM has several gates: | ||
* Input gate: <math>i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)</math> | * Input gate: <math>i_t = \sigma(W_{xi}x_t + W_{hi}h_{t-1} + b_i)</math> | ||
Line 1,891: | Line 1,893: | ||
* Output Gate: <math>o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)</math> | * Output Gate: <math>o_t = \sigma(W_{xo}x_t + W_{ho}h_{t-1} + b_o)</math> | ||
* Cell state: <math>c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t</math> | * Cell state: <math>c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t</math> | ||
* Hidden state: <math>h_t = o_t \odot tanh(c_t)</math> | * Hidden state: <math>h_t = o_t \odot \tanh(c_t)</math> | ||
;Bidirectional RNNs | |||
First LSTM takes input in correct order. | |||
Second LSTM takes input in reverse order. | |||
Concatenate outputs from both LSTMs. | |||
===Attention=== | |||
Goal: To help memorize long source sentences in machine translation. | |||
;Encoder-decoder attention | |||
;Self-Attention | |||
===Transformer=== | |||
;Positional encoding | |||
;Self-Attention | |||
Have queries, keys, and values. | |||
Multiply queries with keys, pass through softmax. Then times values. | |||
Yields attention of every work with respect to another. | |||
Initially, transformers had n=8 heads giving 8 queries, keys, and values. | |||
;Architecture | |||
Stack encoders. | |||
==Misc== | ==Misc== |