On the monotonicity of Shannon entropy

This post continues my series on favorite theorems of the 21st century. For an overview of the categories and earlier selections, see this post.

My choice for 2003 in probability theory is a theorem of Artstein, Ball, Barthe and Naor confirming Lieb’s conjecture on the monotonicity of Shannon entropy. The theorem was proved in 2003 and published in 2004 (Artstein et al. 2004).

A central result in probability theory is the central limit theorem (CLT). It states that if are independent and identically distributed random variables with finite mean and finite variance , then converges in distribution to the normal distribution . In the special case where the follow a Bernoulli distribution, this result goes back to de Moivre in 1733. A general form was proved by Lyapunov (Lyapunov 1901) in 1901.

Assume now, for simplicity, that . Then the CLT says that the normalized sums converge in distribution to a normal random variable, no matter what the original distribution of the looks like, provided the variance is finite. The question behind Lieb’s conjecture is whether this convergence can be made monotone: is , in a precise sense, always closer to the normal distribution than ?

The sense of closeness in Lieb’s conjecture is given by Shannon entropy (Shannon 1948). For a real-valued random variable with density , its continuous Shannon entropy is whenever this integral is well defined. Among all real-valued random variables with a fixed variance, the normal distribution has the largest entropy. Thus, if the entropy of increases with , then the normalized sums become increasingly Gaussian from the entropic point of view.

This monotonicity was known in a special dyadic form before Lieb’s conjecture. Shannon (Shannon and Weaver 1949) outlined the argument that and Stam (Stam 1959) later gave a rigorous proof. Applying this inequality repeatedly gives Lieb (Lieb 1978) conjectured that the full sequence is monotone: Before the work of Artstein, Ball, Barthe and Naor, this conjecture remained open even for the first non-dyadic step, namely the comparison between and . Their theorem settled the conjecture completely.

The main difficulty in proving Theorem is that there is no obvious way to compare the sum of independent copies of a random variable with the sum of independent copies. The dyadic inequality avoids this problem because can be written as the normalized sum of two independent copies of . No such direct decomposition is available when one passes from to .

Artstein, Ball, Barthe and Naor overcame this obstacle by shifting the problem from entropy to Fisher information. For a random variable with smooth positive density , the Fisher information is Using de Bruijn’s identity, the desired entropy inequality can be reduced to the Fisher-information inequality They then proved a new formula for the Fisher information of a normalized sum. This formula is flexible enough to compare sums of different lengths, and it is precisely what makes the passage from to possible.

The result is beautiful because it strengthens the central limit theorem from a statement about eventual convergence to a statement about steady progress. The CLT tells us that the normalized sums become Gaussian in the limit. The theorem of Artstein, Ball, Barthe and Naor says that, measured by entropy, they move toward the Gaussian distribution at every single step.

References

Artstein, Shiri, Keith Ball, Franck Barthe, and Assaf Naor. 2004. “Solution of Shannon’s Problem on the Monotonicity of Entropy.” J. Amer. Math. Soc. 17 (4): 975–82.
Lieb, E. H. 1978. “Proof of Wehrl’s Entropy Conjecture.” Comm. Math. Phys. 62: 35–40.
Lyapunov, Aleksandr Mikhailovich. 1901. “A General Proposition of Probability Theory.” CR Acad. Sci. Paris 132: 814–15.
Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27 (3): 379–423.
Shannon, Claude Elwood, and Warren Weaver. 1949. The Mathematical Theory of Communication. University of Illinois Press.
Stam, Aart J. 1959. “Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon.” Information and Control 2 (2): 101–12.

No comment found.

Add a comment

You must log in to post a comment.