Recurrent Neural Network - 6.S191 2020

1. Sequence Modelling
2. Sequence Modeling: Design Criteria
3. Recurrent Neural Networks for Sequence Modeling
- 3.1. RNN State Update and Output
4. Backpropagation Through time
- 4.1. The Problem of Long-Term Dependencies: Vanishing Gradient
5. LSTMs

1. Sequence Modelling

1.1. Using a Fixed Window won't work because long term dependencies wont' work

1.2. Use Entire Sequence as Set of Counts - Bag of Words

But they don't preserve order

1.3. Use a REALLY Big Fixed Window

Occurence of parameters in diffrent position requirer retraining - sortof

2. Sequence Modeling: Design Criteria

To model sequences, we need to:

Handle variable-length sequences rq
Track long-term dependencies RNN
Maintain information about order
Share parameters across the sequence

3. Recurrent Neural Networks for Sequence Modeling

Recurrent because information is being passes within the cell through time

RNN maintain a internal state \(h_t\) which is updated based on previous state and current input \(h_t = f_w(h_{t-1}, x_t)\) and here same set of functions and same set of parameters are used i.e. \(f_w\) is remains same in each time step and it is what we learn

3.1. RNN State Update and Output

There are three weight matrices \(W_{hh}\) , \(W_{xh}\) , \(W_{hy}\)

4. Backpropagation Through time

4.1. The Problem of Long-Term Dependencies: Vanishing Gradient

Vanishing Gradient

4.1.1. Trick 1: Activation Function - ReLU (has derivative = 1 or 0)

4.1.2. Trick 2: Initialized the weights to identity matrix and Bias to zero

4.1.3. Trick 3: Gated Cells (LSTM, GRU, etc)- Best

Use a more complex recurrent unit with gates to control what information is passed through.

5. LSTMs

Links: LSTM.

5.1. Standard RNN

5.2. LSTMs

5.3. Information is added or removed through structures called gates - sigmoid neural net layer and pointwise multiplication

5.4. Forget Store Update Output

5.4.1. Forget

Decide what information is going to be thrown state depending on \(h_{t-1}\) and input \(x_t\)

5.4.2. Store

Decide what part of new information is important and store that to cell state

5.4.3. Update

Use the relevant part of prior information and current state to selectively update the cell state values

5.4.4. Output

What info stored in cell state is used to compute the hidden state to carry over to next time step

5.5. Uninterrupted flow of gradient throught cell state

5.6. LSTMs: Key Concepts

Maintain a separate cell state from what is outputted
Use gates to control the flow of information
- Forget gate gets rid of irrelevant information
- Store relevant information from current input
- Selectively update cell state
- Output gate returns a filtered version of the cell state
Backpropagation through time with uninterrupted gradient flow