Long short-term memory
Long short-term memory
Primarilly used for time-series or sequential data
Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (machine learning model)
See this video for an explanation:
https://www.youtube.com/watch?v=XymI5lluJeU
Architecture
The LSTM architecture has two memory components
- A long term memory \(\displaystyle c\)
- A short term memory \(\displaystyle h\)
The architecture itself has the following gates in addition to the traditional RNN:
- A forget gate for the long term memory (sigmoid 1)
- An input gate for the long term memory (sigmoid 2)
- An output gate for the short term memory/output