Changes

Jump to navigation Jump to search
* Maximum entropy - given what we know, what is the least surprising distribution
* Conditional entropy
** <math>H(Y|X) = \sum \limits_{x \in X} Pr(x) H(Y|X=x)</math>* Chain rule of entropy** <math>H(X; Y) = H(X) + H(Y | X)</math>** Entropy of a pair of RVs = entropy of one + conditional entropy of the other
===Divergence===
* Relative entropy
** Measure of distance between two distributions
** A measure of inefficiency of assuming that distribution is q when the true distribution is p
** If we use distribution q to construct code, we need H(p) +D(p|q) bits on average to describe the RV
* Divergence - the additional uncertainty induced by using probabilities from one distribution to describe another distribution
* How we use information entropy to say how far a model is from the target
2,466

edits

Navigation menu