next up previous
Next: An Algebraic Interpretation Up: Resource Letter ITP:1: Information Previous: A Third Thread

The Physical Connection

The principal rigorous connection of information theory to physics came somewhat indirectly, with the realization by Edwin Jaynes that Shannon had actually uncovered a fundamental element of probability theory. Namely, the measure of Eq.(2) can be interpreted as describing a property of any probability distribution. Whereas Shannon envisioned the set tex2html_wrap_inline612 as given in communication theory, Jaynes turned the interpretation around to utilize available information to determine the probabilities. In this sense, the entropy of a probability distribution on an exhaustive set of mutually exclusive alternatives ( tex2html_wrap_inline614 ) is defined as the functional

equation50

In this form S represents the uncertainty in a probability distribution, or the information required to render the distribution unique. In all but the most trivial problems of science one rarely has sufficient information to construct a unique probability assignment in the same sense that declaration of an honest coin unambiguously assigns (,) to the possible choices. The entropy of Eq.(6) provides a quantitative measure of just how much information is required to accomplish this in general.

A short digression is in order here to point out that Khinchin also clearly understood in 1953 that Shannon's entropy was a fundamental element of probability theory (Ref.6). He writes, ``There is no doubt that in the years to come the study of entropy will become a permanent part of probability theory; tex2html_wrap_inline604 " He applied information theory in this sense to Markov chains in some detail, but does not seem to have taken the probability theory connection much further.

Jaynes went on to enunciate a principle of maximum entropy (PME), which can be phrased as follows (Ref.31): the distribution tex2html_wrap_inline622 that maximizes S subject to constraints imposed by the available information is the least biased description of what we know about the set of alternatives tex2html_wrap_inline626 . The PME is a rule for rational inference that provides a variational procedure for constructing prior probabilities based on given evidence. On the one hand, if that evidence implies nothing more than the alternatives are equally probable, the only constraint is that tex2html_wrap_inline628 and maximization of S yields the uniform distribution {1/n, ...,1/n}. In this event tex2html_wrap_inline634 , the missing information is maximal, and one can make no definite predictions. On the other hand, the evidence may indicate that one alternative is certain, rendering all others impossible, in which case S=0 and there is no uncertainty whatsoever. The bulk of scientific inference lies somewhere in between, where one must generally cope with incomplete information. A direct proof that any choice of information measure other than (6) will lead to inconsistencies if pursued far enough, and that the PME is essentially unique, was subsequently provided by Shore and Johnson (Ref.37).

As an aside, we note a slight generalization of Shannon's measure introduced by Kullback (Ref.7):

displaymath63

which is interpreted as the entropy of the set tex2html_wrap_inline622 relative to the set tex2html_wrap_inline640 , and sometimes called the cross-entropy. It is useful when part of the initial information consists of a set of prior probabilities, and it provides for a straightforward generalization to continuous distributions, since there can be no ambiguity regarding the basic measure on the space of alternatives.

There is no logical reason at all to associate S with any physical quantity at this point, and the PME is first and foremost a rule of probability theory. But if one applies that theory to physical problems it is expected to take on physical (and maybe experimental) meaning, in the same way mathematical symbols do in any other theory. If it is applied to any other area of probable reasoning it takes on an appropriately significant meaning there. In making such applications, however, it is first necessary to express the available information in the form of mathematically well-defined constraints, and this procedure may not always be transparent.

In his 1957 papers (Refs.31,32) Jaynes made the compelling application to statistical mechanics and thermodynamics, having noted that the constraints provided by macroscopoic information could be expressed as expectation values. He also observed that this was just the problem Gibbs had solved long ago in deriving his ensembles by minimizing his ``average index of probability of phase" subject to constraints on average total energy, or that plus average total particle numbers. Gibbs gave no explanation for why this particular function should be minimized, but this procedure is exactly what we call the PME.

With this interpretation of Shannon's information measure, along with the PME, Jaynes and others have clarified considerably the foundations of statistical mechanics, relating it ultimately to a problem of information in a way that seems to have been appreciated implicitly by the founders over a century ago. That is, S measures the degree of information about the microstate conveyed by data on macroscopic thermodynamic variables. For equilibrium systems the entropy (6) and the probabilities become equivalent to the canonical ensemble of Gibbs, with K being identified with Boltzmann's constant k. Because the canonical ensemble is known to predict experimental values, one can now safely relate the theoretical (maximum) entropy to the experimental entropy of Clausius. Quantum mechanically one employs the density matrix tex2html_wrap_inline650 and von Neumann's form of the entropy,

equation65

Maximization of S subject to available information yields the least-biased probability assignment over the quantum states of the system. Since the theoretical function S in the form (7) is invariant under unitary transformation, it is often argued that this expression cannot describe the second law. But Jaynes (Ref.34) later demonstrated that, in fact, it is just this property that allows one to derive the second law, which is a statement about experimental entropy.

A large portion of the subsequent involvement of information theory with problems of physics stems from the maximum-entropy formalism. In addition, there have been numerous other uses of information-theoretic concepts in physics not directly related to the PME, many of which are noted below. Prior to surveying these applications, though, there is another path emanating from the Wiener-Shannon formulation that requires explication.




next up previous
Next: An Algebraic Interpretation Up: Resource Letter ITP:1: Information Previous: A Third Thread

W.T. Grandy Jr.
Wed Nov 20 17:12:25 GMT-0600 1996