Information Theory

Shannon Entropy

Shannon Entropy H(X) of a distribution is the expected amount of information in an event drawn from that distribution. It gives a lower bound on the number of bits needed on average to encode symbols drawn from a distribution P.

H(X)=xXp(x)log2[p(x)]

The Shannon Entropy measures:

Kullback-Liebler Divergence

Kullback-Liebler Divergence KI is a method for measuring the dissimilarity between two probability distributions P and Q. It can also be seen as the Relative Entropy measure between the two distributions.

KL(P||Q)=i=1NP(i)log[P(i)Q(i)]

Mutual Information

Mutual Information MI between two vectors X and Y measures the dissimilarity between joint distribution p(X,Y) and factored distribution p(X)p(Y). Mutual Information also measures the reduction in uncertainty for one variable given a known value of the other variable.

MI(X,Y)=xXyYp(X,Y)log[p(X,Y)p(X)p(Y)]

Information Gain

Information Gain IG measures the reduction in entropy or surprise by splitting a dataset according to a given value of a random variable.

IG(Y,X)=H(Y)H(Y|X)