# Long short-term memory

Primarilly used for time-series or sequential data
Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (machine learning model)

See this video for an explanation:
https://www.youtube.com/watch?v=XymI5lluJeU

## Architecture

LSTM picture from Wikipedia

The LSTM architecture has two memory components

• A long term memory $$\displaystyle c$$
• A short term memory $$\displaystyle h$$

The architecture itself has the following gates in addition to the traditional RNN:

• A forget gate for the long term memory (sigmoid 1)
• An input gate for the long term memory (sigmoid 2)
• An output gate for the short term memory/output