Next: An
Algebraic Interpretation Up: Resource
Letter ITP:1: Information Previous: A
Third Thread
The principal rigorous connection of information theory to physics came
somewhat indirectly, with the realization by Edwin Jaynes that Shannon
had actually uncovered a fundamental element of probability theory. Namely,
the measure of Eq.(2) can be interpreted as describing a property of any
probability distribution. Whereas Shannon envisioned the set
as given in communication theory, Jaynes turned the interpretation
around to utilize available information to determine the probabilities.
In this sense, the entropy of a probability distribution on an exhaustive
set of mutually exclusive alternatives (
) is defined as the functional
In this form S represents the uncertainty in a probability distribution,
or the information required to render the distribution unique. In all but
the most trivial problems of science one rarely has sufficient information
to construct a unique probability assignment in the same sense that declaration
of an honest coin unambiguously assigns (
,
)
to the possible choices. The entropy of Eq.(6) provides a quantitative
measure of just how much information is required to accomplish this in
general.
A short digression is in order here to point out that Khinchin also
clearly understood in 1953 that Shannon's entropy was a fundamental element
of probability theory (Ref.6). He writes, ``There is no doubt that in the
years to come the study of entropy will become a permanent part of probability
theory;
" He applied information theory in this sense to Markov chains in
some detail, but does not seem to have taken the probability theory connection
much further.
Jaynes went on to enunciate a principle of maximum entropy (PME),
which can be phrased as follows (Ref.31): the distribution
that maximizes S subject to constraints imposed by the available
information is the least biased description of what we know about the set
of alternatives
. The PME is a rule for rational inference that provides a variational
procedure for constructing prior probabilities based on given evidence.
On the one hand, if that evidence implies nothing more than the alternatives
are equally probable, the only constraint is that
and maximization of S yields the uniform distribution {1/n, ...,1/n}.
In this event
, the missing information is maximal, and one can make no definite predictions.
On the other hand, the evidence may indicate that one alternative is certain,
rendering all others impossible, in which case S=0 and there is
no uncertainty whatsoever. The bulk of scientific inference lies somewhere
in between, where one must generally cope with incomplete information.
A direct proof that any choice of information measure other than (6) will
lead to inconsistencies if pursued far enough, and that the PME is essentially
unique, was subsequently provided by Shore and Johnson (Ref.37).
As an aside, we note a slight generalization of Shannon's measure introduced by Kullback (Ref.7):
which is interpreted as the entropy of the set
relative to the set
, and sometimes called the cross-entropy. It is useful when part of the
initial information consists of a set of prior probabilities, and it provides
for a straightforward generalization to continuous distributions, since
there can be no ambiguity regarding the basic measure on the space of alternatives.
There is no logical reason at all to associate S with any physical quantity at this point, and the PME is first and foremost a rule of probability theory. But if one applies that theory to physical problems it is expected to take on physical (and maybe experimental) meaning, in the same way mathematical symbols do in any other theory. If it is applied to any other area of probable reasoning it takes on an appropriately significant meaning there. In making such applications, however, it is first necessary to express the available information in the form of mathematically well-defined constraints, and this procedure may not always be transparent.
In his 1957 papers (Refs.31,32) Jaynes made the compelling application to statistical mechanics and thermodynamics, having noted that the constraints provided by macroscopoic information could be expressed as expectation values. He also observed that this was just the problem Gibbs had solved long ago in deriving his ensembles by minimizing his ``average index of probability of phase" subject to constraints on average total energy, or that plus average total particle numbers. Gibbs gave no explanation for why this particular function should be minimized, but this procedure is exactly what we call the PME.
With this interpretation of Shannon's information measure, along with
the PME, Jaynes and others have clarified considerably the foundations
of statistical mechanics, relating it ultimately to a problem of information
in a way that seems to have been appreciated implicitly by the founders
over a century ago. That is, S measures the degree of information
about the microstate conveyed by data on macroscopic thermodynamic variables.
For equilibrium systems the entropy (6) and the probabilities become equivalent
to the canonical ensemble of Gibbs, with K being identified with
Boltzmann's constant k. Because the canonical ensemble is known
to predict experimental values, one can now safely relate the theoretical
(maximum) entropy to the experimental entropy of Clausius. Quantum mechanically
one employs the density matrix
and von Neumann's form of the entropy,
Maximization of S subject to available information yields the least-biased probability assignment over the quantum states of the system. Since the theoretical function S in the form (7) is invariant under unitary transformation, it is often argued that this expression cannot describe the second law. But Jaynes (Ref.34) later demonstrated that, in fact, it is just this property that allows one to derive the second law, which is a statement about experimental entropy.
A large portion of the subsequent involvement of information theory with problems of physics stems from the maximum-entropy formalism. In addition, there have been numerous other uses of information-theoretic concepts in physics not directly related to the PME, many of which are noted below. Prior to surveying these applications, though, there is another path emanating from the Wiener-Shannon formulation that requires explication.