Long short-term memory: Difference between revisions

From David's Wiki
(Created page with "Long short-term memory<br> Primarilly used for time-series or sequential data<br> Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (mach...")
 
No edit summary
 
Line 2: Line 2:
Primarilly used for time-series or sequential data<br>
Primarilly used for time-series or sequential data<br>
Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]]
Previously state-of-the-art for NLP tasks but has since been surpassed by [[Transformer (machine learning model)]]
See this video for an explanation:<br>
[https://www.youtube.com/watch?v=XymI5lluJeU https://www.youtube.com/watch?v=XymI5lluJeU]
==Architecture==
[[File:The_LSTM_cell.png | thumb | 400px | LSTM picture from Wikipedia]]
The LSTM architecture has two memory components
* A long term memory <math>c</math>
* A short term memory <math>h</math>
The architecture itself has the following gates in addition to the traditional RNN:
* A forget gate for the long term memory (sigmoid 1)
* An input gate for the long term memory (sigmoid 2)
* An output gate for the short term memory/output

Latest revision as of 20:33, 20 March 2020

Long short-term memory
Primarilly used for time-series or sequential data
Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (machine learning model)

See this video for an explanation:
https://www.youtube.com/watch?v=XymI5lluJeU

Architecture

LSTM picture from Wikipedia

The LSTM architecture has two memory components

  • A long term memory \(\displaystyle c\)
  • A short term memory \(\displaystyle h\)

The architecture itself has the following gates in addition to the traditional RNN:

  • A forget gate for the long term memory (sigmoid 1)
  • An input gate for the long term memory (sigmoid 2)
  • An output gate for the short term memory/output