In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence,information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.
Although it is often intuited as a distance metric, the KL divergence is not a true metric – for example, the KL from P to Q is not necessarily the same as the KL from Q to P.
KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullbackand Richard Leibler in 1951