MIT 6.S191: Introduction to Deep Learning

1. What is Deep Learning
2. Why is it popular now?
3. Training Deep Neural Networks

This is a series of Notes from MIT 6.S191 2022 lectures by:

Alexander Amini
Ava Soleimany

The list of lectures and their corresponing notes are:

Lecture 1: Introduction to Deep Learning
Lecture 2: RNN and Transformers
Lecture 3: Convolutional Neural Networks
Lecture 4: Deep Generative Modeling
Lecture 5: Deep Reinforcement Learning
Lecture 6: Deep Learning New Frontiers

Lecture 1: Introduction to Deep Learning: In which we explore what Deep Learning is, why it is popular. How are Deep NNs are trained, and some cautions on overfitting.

1. What is Deep Learning

@ 00:06:36

Figure 1: AI - ML - DL

Artificial Intelligence: Any technique that enables computers to mimic human behaviour
Machine Learning: Ability to learn without explicitly being programmed
Deep Learning: Extract patters from data using neural networks

2. Why is it popular now?

@ 00:12:15

Big Data
Faster Hardware
Software
- Improved techniques
- New Models
- Toolboxes (TensorFlow, PyTorch)

3. Training Deep Neural Networks

3.1. Optimization Algorithms:

SGD
Adam
Adadelta
Adagrad
RMSProp

3.2. Learning Rate

@ 00:39:35

Low learning rate: Slow convergence, and may get stuck at local minima
Large learning rate: May diverge

How to find Learning Rate?

Try different learning rates and check which works better
Adaptive Learning Rate

3.3. Mini Batches

Actual Loss is summation over all dataset. This is expensive to compute.
And, Using only one example will be noisy
So, compute loss from a subset of the dataset with say \(B\) samples. This is called mini-batching.

This allows:

Smoother convergence
Larger learning rate
Parallization of computing gradient

3.4. Overfitting

@ 00:44:57 Overfitting results good performance in Training data but the model doesn't generalize well and performs poorly in test dataset. Or, when there is distributional shift in data.

Figure 2: Overfitting

3.4.1. Regularization

Regularization is a technique that constrains our optimization problem to discourage complex models. This improves generalization of model on unseen data

Techniques for Regulaization

Dropout: randomly set neurons on hidden layers to 0
Early Stopping: Stop training before we have a chance to overfit

Figure 3: Early Stopping (Regularization)