Long short-term memory

Long short-term memory
Primarilly used for time-series or sequential data
Previously state-of-the-art for NLP tasks but has since been surpassed by Transformer (machine learning model)

See this video for an explanation:
https://www.youtube.com/watch?v=XymI5lluJeU

Architecture

The LSTM architecture has two memory components

A long term memory \(\displaystyle c\)
A short term memory \(\displaystyle h\)

The architecture itself has the following gates in addition to the traditional RNN:

A forget gate for the long term memory (sigmoid 1)
An input gate for the long term memory (sigmoid 2)
An output gate for the short term memory/output