Cross Entropy
Average message length. https://youtu.be/ErfnhcEV1O8?t=300
H(p,q) = - ∑ p log(q)
p : true distribution of events q : bits used to encode events (Or the predicted distribution)
H(p,q) = H(p) + DKL(p || q)
i.e. cross entropy is greater than entropy of true distribution (H(p)), and that the difference is called Kullback-Leibler Divergence (DKL).