In the study of a many-body system the most impressive feature -- indeed, the essential definition of such a system -- is the lack of microscopic control that can be exercised over experimentally reproducible processes therein. An inability to control or even know microscopic initial conditions, or obtain any other microscopic information, forces us to employ probabilities in relating the deterministic microscopic physical behavior of the constituents to the thermodynamic description of the system. (In so-called chaotic systems one has also lost the ability to control the macroscopic initial conditions, but for reasons quite different than those associated with statistical mechanics -- it is the nonlinear macroscopic Hamiltonian, rather than the large number of particles, that is the culprit.) But the situation is worse than that, for there is rarely even enough macroscopic information available with which to determine a unique probability distribution over the states of the system. One is therefore compelled to seek a means for discriminating among all those possible distributions that may be in agreement with the meager information we have -- a problem solved long ago for equilibrium systems by Boltzmann and Gibbs.
A short digression is in order here, prior to further discussion of
this specific physical problem. Consider an experiment, of the
`random' type, for which there are m possible results at each
trial, and thus for which there are
conceivable outcomes in
n trials. Each outcome yields a set of sample numbers
,
along with frequencies
.
If in n trials the ith result occurs
times,
then out of the
possible outcomes the number of those
yielding a particular set of frequencies
is given by the
multinomial coefficient, or multiplicity factor
We now ask for that set
that can be realized in the
greatest number of ways, which means maximizing W subject to any
constraints we may have upon the problem. At a minimum we require
that
, or
.
For large n it is useful to note that an equivalent procedure is
to maximize
, for Stirling's formula then encourages us to consider
the quantity
Let us emphasize what the result of this variational problem
yields: for large n we obtain that set of frequencies which can
be realized in the greatest number of ways, a course which common
sense tells us to pursue in any event. To see this in more detail
we examine a somewhat simple problem that has become known as
the Brandeis Dice Problem.
Let a single die be
thrown a large number of times -- say,
. Suppose the
results are recorded and the only information we have about the
experiment is the average number of spots that appeared `up' in
these n throws, which for an honest die would be 3.5. But
suppose, instead, we are told that the average is
This is just a special case of the above scenario, so that maximization of H subject to the additional constraint (3) yields the optimal estimate for the set of frequencies that led to the result (3):
Clearly the die is biased! The maximum value H=1.553 is to be contrasted with the value 1.792 attained by the unconstrained, or uniform distribution.
As an aside, we note that the above scenario could have been
interpreted differently by merely asking a different question.
Namely, after
throws of the die, what are the
probabilities of obtaining a particular number of spots `up' on the
next throw, based on the evidence
? That set of
probabilities is also given by the numbers in Eq.(4), obtained by
maximizing the entropy of the probability distribution,
subject to that constraint and
. This, of course,
answers a different question than that asked above, though the
results are numerically the same. (Shannon's choice of the word
`entropy' here to broaden the meaning of that originally intended
by Clausius is almost as unfortunate as appropriation of the word
`chaos' by Yorke.)
But, if the set of frequencies (4) is that which can be realized in
the greatest number of ways, precisely how good an estimate is
this? Consider any other set of frequencies
thought to be more reasonable, in that they might better fit the
facts (3). The entropy
must, of course, be less than
H -- a difference of 0.1, say (
6%). Then the ratio of
the number of ways
can be realized to the number of ways
can be realized is, again using Stirling's
formula,
up to an irrelevant constant. In the present case this ratio is
, indicating that for every way in which
can be realized, there are more than
in which the maximum-entropy distribution
can be
realized. Similarly, it is just such numerical leverage that
renders probability distributions having maximum entropy subject
to available constraints so useful. And it is this principle that
provides us with a unique choice of distribution in the face of
overwhelmingly insufficient information.
[If the reader feels uncomfortable about this solution to the dice problem, he or she is invited to construct an alternative. In particular, suppose a person performs the above experiment with that questionable die, holds a gun to your head, and compels you to bet your life on the next throw. How will you proceed? You may lose your life at any rate, but it is extremely unlikely your chances of survival will be represented better than by the set in Eq.(4). You can bet on it!]