Shannon Entropy of a distribution is the expected amount of information in an event drawn from that distribution. It gives a lower bound on the number of bits needed on average to encode symbols drawn from a distribution P.
The Shannon Entropy measures:
the degree of uncertainty in a value
the amount of "surprise" in seeing this observation
how much information is represented by this distribution
Kullback-Liebler Divergence
Kullback-Liebler Divergence is a method for measuring the dissimilarity between two probability distributions and . It can also be seen as the Relative Entropy measure between the two distributions.
and iff
represents the amount of information lost on approximating with
In general,
Mutual Information
Mutual Information between two vectors and measures the dissimilarity between joint distribution and factored distribution . Mutual Information also measures the reduction in uncertainty for one variable given a known value of the other variable.
iff and are independent
Information Gain
Information Gain measures the reduction in entropy or surprise by splitting a dataset according to a given value of a random variable.