UP | HOME

Date: [2023-03-19 Sun]

Recurrent Neural Network - 6.S191 2020

Table of Contents

1. Sequence Modelling

1.1. Using a Fixed Window won't work because long term dependencies wont' work

1.2. Use Entire Sequence as Set of Counts - Bag of Words

But they don't preserve order

1.3. Use a REALLY Big Fixed Window

mpv-screenshot0JVeei.png

Occurence of parameters in diffrent position requirer retraining - sortof

2. Sequence Modeling: Design Criteria

To model sequences, we need to:

  1. Handle variable-length sequences rq
  2. Track long-term dependencies RNN
  3. Maintain information about order
  4. Share parameters across the sequence

3. Recurrent Neural Networks for Sequence Modeling

mpv-screenshotN68cMG.png

Recurrent because information is being passes within the cell through time

RNN maintain a internal state \(h_t\) which is updated based on previous state and current input \(h_t = f_w(h_{t-1}, x_t)\) and here same set of functions and same set of parameters are used i.e. \(f_w\) is remains same in each time step and it is what we learn

3.1. RNN State Update and Output

mpv-screenshot4gIooJ.png

There are three weight matrices \(W_{hh}\) , \(W_{xh}\) , \(W_{hy}\)

4. Backpropagation Through time

4.1. The Problem of Long-Term Dependencies: Vanishing Gradient

4.1.1. Trick 1: Activation Function - ReLU (has derivative = 1 or 0)

mpv-screenshotSyWU5h.png

4.1.2. Trick 2: Initialized the weights to identity matrix and Bias to zero

4.1.3. Trick 3: Gated Cells (LSTM, GRU, etc)- Best

Use a more complex recurrent unit with gates to control what information is passed through.

5. LSTMs

Links: LSTM.

5.1. Standard RNN

mpv-screenshotkCtu5q.png

5.2. LSTMs

mpv-screenshot4K4faK.png

5.3. Information is added or removed through structures called gates - sigmoid neural net layer and pointwise multiplication

5.4. Forget Store Update Output

5.4.1. Forget

Decide what information is going to be thrown state depending on \(h_{t-1}\) and input \(x_t\) mpv-screenshotwa7mQC.png

5.4.2. Store

Decide what part of new information is important and store that to cell state mpv-screenshotsJIKH0.png

5.4.3. Update

Use the relevant part of prior information and current state to selectively update the cell state values mpv-screenshotrQkDTz.png

5.4.4. Output

What info stored in cell state is used to compute the hidden state to carry over to next time step

mpv-screenshotblSzjU.png

5.5. Uninterrupted flow of gradient throught cell state

mpv-screenshota0DSl0.png

5.6. LSTMs: Key Concepts

  1. Maintain a separate cell state from what is outputted
  2. Use gates to control the flow of information
    • Forget gate gets rid of irrelevant information
    • Store relevant information from current input
    • Selectively update cell state
    • Output gate returns a filtered version of the cell state
  3. Backpropagation through time with uninterrupted gradient flow

Backlinks


You can send your feedback, queries here