makemore
Table of Contents
makemore is a character level langauge model.
1. BiGram Model
bi-gram model uses probablity distribution of pair of adjacent character.
1:02:02 Model Smoothing: add count of 1 to the count so that no probablity is zero. This would prevent infinty while take log likelihood.
2. Neural Network
1:06:07
Takes a character as input and gives probability for next character.
1:10:17 You can't plug in integer index into a neural network 1:10:50 So use One Hot encoding.
1:22:59 Log of Count = Logit
1:27:47 Soft max normalizes. It can be thought as taking logits then it exponentiates them so we get count, then we divide by the sum of counts. Thus the output of soft max are in between 0 and 1 and so Soft Max, converts logits to probabilities.
Neural Network with a single Weigth matrix. ie. Input Layer -> Output Layer -> Soft Max -> Negative Log Likelihood will give same results a the bigram model.
1:46:09 But neural network can be made more complex, and are scalable than the BiGrame model. If we take last 10 character then the 10-gram table will get way too large (1:47:33).