Long short-term memory: Difference between revisions

no edit summary
(Created page with "Long short-term memory<br> Primarilly used for time-series or sequential data<br> Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (mach...")
 
No edit summary
 
Line 2: Line 2:
Primarilly used for time-series or sequential data<br>
Primarilly used for time-series or sequential data<br>
Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]]
Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]]
See this video for an explanation:<br>
[https://www.youtube.com/watch?v=XymI5lluJeU https://www.youtube.com/watch?v=XymI5lluJeU]
==Architecture==
[[File:The_LSTM_cell.png | thumb | 400px | LSTM picture from Wikipedia]]
The LSTM architecture has two memory components
* A long term memory <math>c</math>
* A short term memory <math>h</math>
The architecture itself has the following gates in addition to the traditional RNN:
* A forget gate for the long term memory (sigmoid 1)
* An input gate for the long term memory (sigmoid 2)
* An output gate for the short term memory/output