Long short-term memory: Difference between revisions
Created page with "Long short-term memory<br> Primarilly used for time-series or sequential data<br> Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (mach..." |
No edit summary |
||
Line 2: | Line 2: | ||
Primarilly used for time-series or sequential data<br> | Primarilly used for time-series or sequential data<br> | ||
Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]] | Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]] | ||
See this video for an explanation:<br> | |||
[https://www.youtube.com/watch?v=XymI5lluJeU https://www.youtube.com/watch?v=XymI5lluJeU] | |||
==Architecture== | |||
[[File:The_LSTM_cell.png | thumb | 400px | LSTM picture from Wikipedia]] | |||
The LSTM architecture has two memory components | |||
* A long term memory <math>c</math> | |||
* A short term memory <math>h</math> | |||
The architecture itself has the following gates in addition to the traditional RNN: | |||
* A forget gate for the long term memory (sigmoid 1) | |||
* An input gate for the long term memory (sigmoid 2) | |||
* An output gate for the short term memory/output |
Latest revision as of 20:33, 20 March 2020
Long short-term memory
Primarilly used for time-series or sequential data
Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (machine learning model)
See this video for an explanation:
https://www.youtube.com/watch?v=XymI5lluJeU
Architecture
The LSTM architecture has two memory components
- A long term memory \(\displaystyle c\)
- A short term memory \(\displaystyle h\)
The architecture itself has the following gates in addition to the traditional RNN:
- A forget gate for the long term memory (sigmoid 1)
- An input gate for the long term memory (sigmoid 2)
- An output gate for the short term memory/output