LSTM

1. History of LSTM
2. Forget Store Update Output
3. Links

See: Recurrent Neural Network - MIT 6.S191 2020, RNN and Transformers (MIT 6.S191 2022) for links to lecture video.

Key Concepts:

Maintain a separate cell state from what is outputted
Use gates to control the flow of information
- Forget gate gets rid of irrelevant information
- Store relevant information from current input
- Selectively update cell state
- Output gate returns a filtered version of the cell state
Backpropagation through time with uninterrupted gradient flow

Training Recurrent Nets is Optimization Over Programs#+CAPTION: Uninterrupted flow of gradient throught cell state

1. History of LSTM

Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997 and set accuracy records in multiple applications domains.

LSTM broke records for improved machine translation, Language Modeling and Multilingual Language Processing. LSTM combined with convolutional neural networks (CNNs) improved automatic image captioning.