Probability Distributions and Bayesian Degrees of Belief
The video introduces probability distributions as functions mapping possible states to numerical degrees of belief, adopting a Bayesian interpretation rather than a purely frequentist one.
Surprisal, Entropy, and Average Information Content
Entropy is introduced as the average surprise or information content of outcomes generated by a probability distribution, based on the concept of surprisal for individual events.
Cross Entropy and Mismatch Between Reality and Models
Cross entropy quantifies the average surprise experienced when data are generated by a true distribution (P) but interpreted using a model distribution (Q), capturing the cost of believing in the wrong model.
KL Divergence and Training Objectives in Generative Models
KL divergence emerges as the key quantity measuring how far a model distribution (Q) is from a target distribution (P), and thus as a natural training objective for generative models.