Entropy
Applications
Time Evolution In Macroscopic Systems. 
... Department of Physics & Astronomy, University of Wyoming 
... Laramie, Wyoming 82071 
Abstract. Time evolution of macroscopic systems is reexamined primarily through further
analysis and extension of the equation of motion for the density matrix r(t). Because r contains
both classical and quantummechanical probabilities it is necessary to account for changes in both in the
presence of external influences, yet standard treatments tend to neglect the former. A model of
timedependent classical probabilities is presented to illustrate the required type of extension to the
conventional timeevolution equation, and it is shown that such an extension is already contained in
the definition of the density matrix.
1. Introduction
A principal tenet of statistical thermodynamics for over a century has been that the microscopic constituents of
macroscopic systems obey the fundamental dynamical laws of physics, but today there is still no broad
consensus as to exactly how this translates into the time evolution of the macroscopic system itself. At the heart
of the difficulty is the fact that the behavior of macroscopic systems, as determined by the dynamics of their microscopic
constituents, requires a probabilistic description, so that a primary
concern must lie with the time development of probabilities themselves, and rarely has this concern been addressed
from first principles. In a quantum description the problem is further exacerbated by the presence of two
kinds of probabilities in the density matrix: one intrinsic to the underlying quantum mechanics, and another
pertaining to incomplete information in the context of classical probability theory. The aim of the following
discussion is to disentangle these two contributions, at least conceptually, in a manner that leads to unambiguous
equations of motion for
macroscopic systems and at the same time clarifies the foundation of the latter in the microscopic laws.
A classical description of a manybody system begins by introducing an ensemble of like systems along with a phase
function f exhibiting their distribution in the phase space. We shall find it more convenient to employ a
quantum mechanical description instead, not only because the mathematical expressions are a bit less unwieldy, but
also because it allows us to focus more readily and naturally on the single system under study. For an isolated
system ^{1}
the standard stages in such a study are the following: (i) construct an initial density matrix r_{0}
describing the initial macroscopic state at time t_{0}; (ii) let the system evolve under its Hamiltonian H, thereby evolving
the density matrix r by the deterministic microscopic equation of motion
where the superposed dot denotes a total time derivative;
and, (iii) at time t use the timeevolved r(t) to predict the expectation value of a system variable C as
áC(t)ñ = Tr[r(t)C]=Tr[r_{0}C(t)] . (2) 

This last expression illustrates the equivalence of the Schrödinger and Heisenberg pictures, for Eq.(1) itself
is equivalent to a unitary transformation:
r(t)=U(t)r_{0}U^{f}(t) , (3) 

where the timeevolution operator U(t) is determined by
with initial condition r(t_{0})=r_{0}.
For the moment we shall consider only the Schrödinger picture in which r(t) evolves in time; the density
matrix is very definitely not a Heisenberg operator. If r is stationary, [H,r]=0, it's a constant
of the motion; if r is a functional only of constants of the motion the macroscopic state is one of
thermodynamic equilibrium,
corresponding to maximum entropy, and stage (ii) is solved immediately.
It would appear, at least in principle, that proceeding through these three stages must lead to a complete description
of timevarying processes in a macroscopic system in terms of the underlying microscopic dynamics.
An exact solution of (1) is practicably out of the question for any macroscopic system with a nontrivial Hamiltonian,
of course, so that many approximations have been pursued over the years. Classically the ensemble density of
Nparticle systems, f_{N}(q,p,t), where {q,p} represents the collection of all 6N coordinates of a system
in phase space, satisfies the Liouville equation: ¶f_{N}={H,f_{N}}, where the righthand side is a Poisson
bracket. Integration over all coordinates but those of a single particle yields the oneparticle distribution f_{1},
for which an explicit equation of motion at low densities is readily derived, the wellknown Boltzmann equation. In
a quantum mechanical formulation much effort has been devoted to deriving the socalled master equation for
a coarsegrained probability distribution over system states (Pauli, 1928). As emphasized by van Kampen (1962),
the solutions describe a single system and there is no need for the notion of ensembles in this approach. Other lines
of attack include projection operator techniques (Zwanzig, 1961; Mori, 1965), and the notion from the Brussels
school that there may be an intrinsic irreversibility within the microscopic equations themselves.
All of these
attempts at obtaining macroscopic equations of motion tend to create some sort of irreversibility, but it is difficult
to establish whether it is the real thing or simply an artifact of the approximations. We now demonstrate, however,
that even imagining we could solve (1) exactly leads us into serious difficulties.
There are two simple applications of our 3stage scenario that lead us directly to some disturbing questions concerning its
general applicability. The first of these involves the response of the system to the presence of a welldefined
external field. The effects of this field on the system can often be sensibly described by including an additional
term in the Hamiltonian, as in
where v(t) describes the time dependence of the external force and A is a system variable (operator) coupling
that force to the medium. A formal solution to (1) is given by
r(t)=r_{0}+ 
i
ℏ


ó õ

t
t_{0}

U(t,t¢)v(t¢) 
é ë

A,r_{0} 
ù û

U^{f}(t,t¢) dt¢ , (6) 

and t_{0} is the time at which the external force is turned on. The interpretation here is that prior to t_{0} the system
is in thermal equilibrium, and for t > t_{0} the density matrix evolves unitarily under the operator U(t,t¢)
determined by (4) with Hamiltonian H_{t}. At any later time the response of the system, described by the departure
of expectation values of other operators C from their equilibrium values, is found by substitution of (6) into
(2) to be
áC(t)ñáCñ_{0} = 
ó õ

t
t_{0}

F_{AC}(t,t¢)v(t¢) dt¢ , (7) 

where á ñ_{0} denotes an equilibrium expectation value, and
F_{AC}(t,t¢) º 
1
i ℏ

á[A,C(t,t¢)]ñ_{0} (8) 

is called the dynamical response functional. The time dependence of C is given by
C(t,t¢)=U^{f}(t,t¢)C(t¢)U(t,t¢), which is effectively now a Heisenberg operator.
Most often U is approximated by exp[i(tt¢)H_{0}/ ℏ], leading to the wellknown linear response theory. Although this approximation has been criticized as
inappropriately approximating the microscopic dynamics (e.g., van Kampen, 1971), this is really the least of the
problems that arise.
Unquestionably the exact timeevolved r(t) will predict the correct value of áC(t)ñ at time t > t_{0}, for
both the quantum mechanics and associated mathematics are impeccable. But, as noted in (3), that time evolution is
equivalent to a unitary transformation, under which the eigenvalues of r are unchanged. These eigenvalues are the
probabilities for the system to be found in any of its possible macrostates under given macroscopic constraints, hence
there would seem to be no macroscopic change in the system; but there is such a change, of course, as indicated in (7).
The equivalent classical observation is that the Liouville
equation moves the ensemble distribution around the phase space subject to given constraints, but does not alter those
constraints. A further difficulty is that the von Neumann entropy S=kTr(rlnr), where k is Boltzmann's constant,
is also invariant under unitary
transformation, indicating the absence of irreversible behavior during the process despite the possibility of energy
having been added to the system throughout the time interval.
A similar but more general application is the common task of heating a pot of water, on an electric stove, say.
To describe this process in complete detail we have to
specify the total system Hamiltonian H_{0} of the closed system consisting of water, pot,
electric burner, and interactions among them:
H_{0}=H_{water}+H_{pot}+H_{burner}+H_{int} . (9) 

When the burner is turned on the voltage and current can be enfolded into an
external contribution H_{ext}(t), which leads to a total Hamiltonian
H_{tot}=H_{0}+H_{ext} for the heating process. With the water initially in thermal equilibrium
with the rest of the system at
temperature T_{i}, we know that the initial density matrix is given by the canonical distribution
r(0)= 
e^{H0/kTi}
Z

, Z=Tr e^{H0/kTi} . (10) 

If the burner is turned on for a period (0,t), then the density matrix for the isolated system at
time t is obtained by unitary transformation, as above.
Upon turning off the switch one expects the system to relax almost immediately to a final equilibrium
state at temperature T_{f}, and the conventional teaching is that r(t)
somehow goes over into the final canonical density matrix known to describe thermal equilibrium,
r(t)® r_{f} = 
e^{H0/kTf}
Z_{f}

, Z_{f}=Tr e^{H0/kTf} . (11) 

But this cannot happen: because the eigenvalues of r_{f} and r(t) are in general different, the two
density matrices are incompatible; the eigenvalues of r(t) are just those of r_{i}=r(0). Indeed,
as in the previous example, the theoretical entropy of the final state is the same as that of the initial state,
whereas we are certain that the initial and final measured entropies cannot be the same. Where is the
irreversibility of this process to be found? We shall address that question in Part II of this discussion (Grandy,
2003; following paper, herein referred to as II);
our task here is to sort out the details of the time evolution process per se in the presence of
external influences.
2. Origin of The Difficulties
The source of the above difficulties is that the Hamiltonian governs only the microscopic behavior of the
system. While there is little doubt that the r(t) evolved under H will make correct predictions of
macroscopic expectation values, it does so by including only the local effects of the external forces, with
no reference to either the external sources or the macroscopic constraints; and it is the changes in those
constraints that constitute much of the thermodynamics. Time development by unitary transformation alone affects
only the basic quantum mechanical probabilities, but not those of classical probability theory that are
determined by the macroscopic constraints. Similarly, in the classical formulation the ensemble density f_{N}
is evolved by Liouville's equation and the meaning of the partial time derivative there is that an observer
fixed in phase space would see the distribution move by without changing its shape. In either formulation the
canonical microscopic equations of motion are ultimately responsible for the changing state of the system, to be
sure, but both the impetus for these changes and the observed effects are macroscopic and must be included as
such in any realistic analysis.
To explore the origins of these matters more deeply and systematically it will be useful to return to the task
of stage (i) mentioned earlier, and recall some of the details involved in constructing an initial density matrix,
or probability distribution. ^{2} We adopt the view that the probability for any one
in a set of n mutually exclusive and exhaustive
alternatives {x_{i}} is contingent on given information I, and will be written P_{i}=P(x_{i}I).
As first developed by Gibbs (1902), and elucidated further by Shannon (1948) and Jaynes (1957), P_{i} is determined
uniquely by maximizing the information entropy
S_{I}=K 
å
i

P_{i}lnP_{i} , K=constant > 0 , (12) 

subject to constraints provided by I.
The subscript I emphasizes that this theoretical entropy is defined in the context, and as a part, of
probability theory. This is an important caveat, for otherwise it is easy to confuse S_{I} with physical
or thermodynamic entropy. In fact, merely by making this distinction we see that the invariance of the von Neumann
entropy S=kTr(rlnr) under unitary transformation is not as great a problem as first thought; it,
too, should be considered an S_{I}, and it is only its maximum subject to physical constraints that
corresponds to thermodynamic entropy. Note that S_{I} has the form of an expectation value, S_{I}=álnPñ, which is just the negative of Gibbs' `average index of probability' that he minimized to define the
equilibrium state.
With the advent of the ShannonJaynes insights into construction of prior probabilities based on given
evidence, the reasoning behind the Gibbs algorithm is immediately transparent. Given information in terms of a
function f(x), such that x can take one, but only one, of the values {x_{i}},
the optimal choice of a probability distribution over {x_{i}}
is obtained by maximizing the entropy of the probability distribution (12) subject to constraints

n å
i=1

P_{i}=1 , P_{i}=P(x_{i}I) > 0 , (13a) 

I: F º áf(x)ñ = 
n å
i=1

P_{i} f(x_{i}) . (13b) 

This is the principle of maximum entropy (PME), and in this form presumes the information to be given in
the form of an expectation value.
As is well known, the solution to this variational problem is most readily effected by the method of Lagrange
multipliers, so that the desired probability distribution is given by
P_{i}= 
1
Z

e^{lf(xi)} , Z(l)= 
å
i

e^{lf(xi)} , (14) 

and the normalization factor Z is called the partition function. Substitution of P_{i}
into (13b) provides the differential equation formally determining the Lagrange multiplier l:
and the expected value for any other function g(x) is given by
.
It must be stressed that the expectation value on the lefthand side of (13b) is a known
number F that we have identified in this way so as to incorporate the given information or data
mathematically into a probability distribution. Whether we use one notation or the other will depend on
which feature of these data we wish to emphasize in a particular discussion. This scenario is readily generalized
to include several pieces of data in terms of several functions {f_{r}}, although it is also useful to
note that not all these data need be specified at once. For example, a distribution can be constructed via
the PME based on a datum áf_{1}ñ; subsequently further information may emerge in the form
áf_{2}ñ, and the new distribution is obtained by remaximizing S_{I} subject to both pieces of data.
If the new
information contradicts the old there will be no solutions for real l_{2}, and if the new datum is irrelevant
it will simply drop out of the distribution. This procedure provides a method for incorporating new information
into an updated probability estimate, in the spirit of Bayes' theorem in elementary probability theory.
Maximization of S_{I} over all probability distributions subject to given constraints transforms the
context of the discussion into one involving the maximum as a function of the data specific to this
application; it is no longer a functional of probabilities, for they have been `integrated out'. To acknowledge this
distinction we shall denote the maximum entropy by S_{theor}, and recognize that it is now a function
only of the measured expectation values or constraint variables. That is, S_{theor} is the entropy of the
macrostate, and the impetus provided by information theory is no longer relevant. What remains of the
notion of information is now only to be found on the righthand side of P(AI); we are here applying
probability theory, not information theory. Substitution of (24) into (22) provides an explicit
expression for the maximum entropy,
S_{theor}=KlnZ +Kláfñ . (16) 

The scenario described by Eq.(14) is precisely that leading to the canonical distribution (10) when the
single piece of data involves the Hamiltonian, or total energy E(E_{i}), where {E_{i}} is the set
of possible system energies. For constants of the motion, H in this case, the PME provides most of elementary
classical thermodynamics and solves the tasks of stages (i) and (ii) in a single step. Stage (iii), of course, is
very well developed mathematically for equilibrium systems, and we only note here that the expectation value
minimizes the rootmeansquare error in estimating f.
When the specified functions or operators are not constants of the motion, or they vary in time, then r no
longer commutes with H, Eq.(1) must be addressed explicitly, and we return to the conundrums raised above.
Small changes in the problem defined by Eqs.(13) can occur through changes in the set of possible values
{f_{i} º f(x_{i})}, as well as from changes dp_{i} in the assigned probabilities. A small change in the
expectation value is then
dáfñ = 
å
i

p_{i}df_{i} + 
å
i

dP_{i} f_{i} , (17) 

and one readily verifies that the corresponding change in the information entropy is
dS_{I}=SS_{0} = 
å
i

dP_{i}lnP_{i} . (18) 

The first sum on the righthand side of (17) is just ádfñ, the expected change in f, so
we can rewrite that expression as
dáfñádfñ = dQ_{f} , (19) 

where
dQ_{f} º 
å
i

dP_{i} f_{i}

. Also, from (18), dS=ldQ_{f}.
Equation (19) can be interpreted as a generalized First Law of thermodynamics, which is suggested by taking
f=E, the total system energy. In that case we can interpret áEñ as the predicted internal energy U and,
since dE_{i} is the work done on the system when it is in state E_{i}, it must be that dW=ádEñ is the predicted work done by the system. In this case, then, (19) has the form dU+dW=dQ,
and dQ is unambiguously identified as heat. The latter is usually thought of as energy transfer through degrees
of freedom over which we have no control, whereas work takes place through those we do control.
But this is now seen as a special case of a more general rule in
probability theory: a small change in any expectation value consists of a small change in the physical quantity
("generalized work") and a small change in the probability distribution ("generalized heat"). Just as with ordinary
applications of the First Law, we see that the three ways to generate changes in any scenario are interconnected, and
specifying any two determines the third.
The important point for the current discussion is that dQ_{f} is effectively a source term, and it arises
only from a change in the probability distribution. From (18) it is then clear that any change in the information
entropy can only be induced by a dQ. Thus, since a unitary transformation corresponding to the timeevolution
equation (1) leaves S_{I} invariant, any complete description of the system evolution must contain some explicit
reference to the sources inducing that evolution. A source serves to change the macroscopic constraints on the
system, which the microscopic equations alone cannot do, and this can lead to changes in the maximum entropy. In
the case of thermal equilibrium this is
simply thermodynamics at work: a small change in áfñ induced by a source dQ_{f} results in
a new macroscopic state corresponding to a remaximized entropy, a readjustment brought about by the underlying
particle dynamics, of course.
A deeper issue uncovered here has to do with the nature of probability itself. Many writers subscribe to the view
that objects and systems possess intrinsic probabilities that are actually physical properties like mass or charge.
While most physicists would surely reject such a stance, the idea often seems to lurk in the background of many
probabilistic discussions. One consequence of this viewpoint is that one may be led to believe that a density
matrix r(t) is a physically real quantity that's completely determined by the usual dynamical equations
of motion, rather than representing a state of knowledge about a physical situation. This may work for an isolated
system for which the only information
available is that encoded in the initial value r(0), but we have seen above that this cannot be the case
when external influences are operative. The probabilities ánrnñ can change only if the information
on which they are based changes. Thus, the resolution of the difficulties discussed above is to be found
through reexamination of time evolution when changes in external constraints are explicitly taken into
consideration.
3. A TimeDependent Probability Model
The question of how to define timedependent probabilities unambiguously is an old one, but no general theory
seems to have been developed. In the real world any kind of change must have a physical origin, and so most
expositions tend to focus on, or adapt physical equations of motion in one way or another to describe time
varying probabilities. But this again risks viewing a probability as a physical object or property, rather than a
representation of a state of knowledge. While quantum mechanical probabilities are governed by microscopic
physical laws, this is not necessarily the case for the macroscopic probabilities of interest here. The point of
this section, then, is to develop a mathematical model that may provide some insight into the type of extension
of the physical equations that we are seeking.
Our problem is that of defining a timedependent probability unambiguously. An understanding that
all probabilities
are conditional on some kind of information leads to the realization that P(A_{i}I) can change in time
only if the information I is changing in time, while the propositions A_{i} are taken as fixed. For example,
if the trajectory y(t) of a particle is changing erratically owing to the presence of an unknown
timevarying field, the allowed values of y do not change, but information on the current position
and some timedependent effects
of that field might permit construction of probabilities for which values of y might be realized
at some later time t.
Armed with this insight the path to extending the PME algorithm in a straightforward manner is clear. We shall
do this in steps by introducing an abstract probability model that avoids possible confusion with physical
laws for the time being. Equations (13)(15) summarized very briefly the maximumentropy procedure for constructing
a probability distribution based on a single piece of timeindependent information.
Unless there is
a definite reason to suppose that the observed value of áf(x)ñ is unvarying, as is the case with
equilibrium statistical mechanics where only constants of the motion are considered, there is little
to persuade us that a subsequent observation will not produce a different value. One might, for example, harbor
such thoughts while rolling a die made from a sugar cube. To generalize our model somewhat let us suppose that
áf(x)ñ is observed at a series of times, and ultimately over a continuous time interval
[t_{0},t]. Since there is a piece of data given at every instant in the interval, there is likewise a
Lagrange multiplier defined at each point as well.
We thus accumulate a body of information that, in the same manner as above, leads to the
maximumentropy prescription

= Z^{1}[l] exp 
ì í
î

ó õ

t
t_{0}

l(t¢)f_{i}(t¢) dt¢ 
ü ý
þ


 
= 
å
i

exp 
ì í
î

ó õ

t
t_{0}

l(t¢)f_{i}(t¢) dt¢ 
ü ý
þ


 
= 
d
dl(t)

lnZ[l(t)] , t_{0} £ t £ t , 
 
= 
å
i

P_{i}(t_{0},t)g_{i} . 

(20) 
Thus, Z is now a partition functional, l(t) a Lagrangemultiplier function defined
only on the interval [t_{0},t], and this function is determined through functional differentiation.
Although nothing in (20) is to be considered explicitly time dependent, it is true that áf(x,t)ñ, l(t), f(t) vary over the interval [t_{0},t], but we know only that they do so there;
indeed, l(t) is defined only on that interval. The meaning of áf(x,t)ñ here is that
at each point of the interval we know a definite value of f(x_{i},t), and l(t) is determined such
that the mass of the probability distribution resides squarely on these points over that interval. This
scenario is just a generalization of stage (i) considered earlier and does nothing more than provide an
initial probability distribution at time t=t.
Some further features of (20) are worthy of note, beginning with the observation that {P_{i}} goes over into
the uniform distribution as t®t_{0}, as it should. If f ¹ f(t) it can be removed from the
integrals and Eqs.(20) reduce to the timeindependent case.
While physical processes must be causal, it can be shown (e.g., Jaynes, 1979) that
logical inferences can propagate either forward or backward in time, as in geology and astrophysics, say. Thus, ág(x)ñ can not only be predicted for t > t, but also retrodicted for t < t_{0}. Generally, as t
increases beyond t the accuracy of predictions made by this distribution can be expected to deteriorate
continually, especially if f continues to vary; only new data can contribute to a better estimate of
{P_{i}} at this point.
In the previous model we thought of the data being collected over a definite time interval, after which we
maximized the entropy.
Having made this first generalization we can see at once the next step. Information gathered in one interval
can certainly be followed by collection in another a short time later, and can continue to be collected in a
series of such intervals, the entropy being remaximized after each interval. Now let those intervals become
shorter and the intervals between them closer together, so that by an obvious limiting procedure they all
blend into one continuous interval whose upper endpoint is always the present moment.
Thus, there is nothing to prevent us from
imagining a situation in which our information or data are continually changing in time; weather forecasts
come to mind. With experiments performed in a more controlled manner, such information can be
specified in detail and sources turned on and off at will; a common example is the slow heating of that
pot of water.
The leap made here is to imagine remaximization occurring at every moment, rather
than all at once. There is no fundamental conceptual difference between the two scenarios, however, for in either
case f(x,t) must be known on the set {x_{i}} during the basic informationgathering time interval. Yet,
how do we justify the notion of continual remaximization? The key point to realize is that there is no
causal signal involved here, and no physical readjustment to be made. For any imaginable set of constraints
there is a corresponding unique maximum entropy, just as for a firstorder differential equation there is
a unique solution for any given initial condition and it is unnecessary to actually solve the equation to know the
solution exists. When you warm up your halfcup of coffee by pouring more in from the pot, you've just
remaximized the entropy of the coffee in the cup to conform to the new N and E.
Without further ado, we now envision continuous data in the form of a timevarying expectation value,
áf(x,t)ñ_{t}= 
å
i

f_{i}(t)P_{i}(t_{0},t) , f_{i}(t) º f(x_{i},t) , P_{i}(t) º P(x_{i}; t_{0},t) . (21) 

That is, f(x_{i},t) is given at these points on [t_{0},t], and is specified to continue so until
further notice. Then, in analogy to (20),

= Z^{1}_{t}[l]exp 
ì í
î

ó õ

t
t_{0}

l(t¢)f_{i}(t¢) dt¢ 
ü ý
þ

, 
  
= 
å
i

exp 
ì í
î

ó õ

t
t_{0}

l(t¢)f_{i}(t¢) dt¢ 
ü ý
þ

, 
  
= 
d
dl(t¢)

lnZ_{t¢} 
é ë

l(t¢) 
ù û

, t_{0} £ t¢ £ t , 
  
 

In these expressions the subscript t denotes expectation values computed with {P_{i}(t)}, and
we note that the function g need not itself depend explicitly on time. Also, none of these quantities
is necessarily a continuous function of x; rather, x simply denotes the sampling space for the discrete set
{x_{i}}.
If áf(x,t)ñ is specified to be constant for all time, then t®¥ and we regain Eqs.(14).
And once again the distribution is uniform at t=t_{0}, whereas if the specified time variation is halted at
some time t=t Eqs.(20) are regained.
The actual time variation of P_{i}(t) is described by
¶_{t} P_{i}(t) = l(t) Df_{i}(t) P_{i}(t) , (23a) 

where
Df_{i}(t) º f_{i}(t)áf(x,t)ñ_{t} . (23b) 

We verify that this integrates back into the original P_{i}(t) by performing a functional integration in
(22), and in doing so obtain the useful alternative expression
Z_{t}[l] = exp 
ì í
î

ó õ

t
t_{0}

l(t¢)áf(x,t¢)ñ_{t¢} dt¢ 
ü ý
þ

. (24) 

Equation (23a) has the form of a `master' equation that is often introduced into timedependent scenarios;
here it is exact.
Direct differentiation in (22d) yields the `equation of motion'

= 
å
i

P_{i}(t) 
é ë


×
g

i

(t)l(t)g_{i}(t)Df_{i}(t) 
ù û


 
= á 
×
g

(x,t)ñ_{t} +l(t) K_{fg}(x,t) , 
 

where we introduce the equaltime covariance function
K_{gf}(x,t) = K_{fg}(x,t) 

º ág(x,t)f(x,t)ñ_{t} ág(x,t)ñ_{t}áf(x,t)ñ_{t} 
 
 

Note that one can choose g=f; otherwise, g need not depend explicitly on t, in which case the first
term on the righthand side of (25) vanishes. While of formal interest, these equations of motion are somewhat
redundant in view of (22d); in physical applications, however, they lead to several important insights.
Because {P_{i}} is now time dependent the informationtheoretic entropy analogous to (16), and which was
maximized continuously to obtain Eqs.(22), also depends on the time. There is no implied relation to the
thermodynamic entropy, of course, so we introduce yet another entropic notation and write this functional
as
H_{t}[P(y)]=K 
n å
i=1

P_{i}(t)lnP_{i}(t) , K > 0 . (27) 

Upon maximization this depends on the initial data only, so is a functional of áf(x,t)ñ_{t}.
Substitution from (22) then yields

= lnZ_{t}[l]  
ó õ

t
t_{0}

l(t¢)áf(x,t¢)ñ_{t} dt¢ 
 
= 
ó õ

t
t_{0}

l(t¢)áf(x,t¢)ñ_{t¢} dt¢ 
ó õ

t
t_{0}

l(t¢)áf(x,t¢)ñ_{t} dt¢ , 
 

the integrands differing only in the subscripts.
Note that a functional differentiation yields an alternative expression for l(t),
l(t)= 
dH_{t}
dáf(x,t)ñ_{t}

, (29) 

and a time derivative provides a rate of `entropy production'
¶_{t} H_{t}=l(t) 
ó õ

t
t_{0}

l(t¢)K_{ff}(t,t¢) dt¢ . (30) 

These equations are remarkably similar to many of those found in writings on irreversible thermodynamics
(e.g., de Groot and Mazur, 1962). Though no application to physical models is made here,
one recognizes the analogs of fluxes and forces, and Onsagerlike reciprocity is immediately evident in
(26). But no linear approximations are made in this model, so the current scenario is considerably more general.
It must be emphasized once again, however, that the time dependencies derived above are based entirely on the
supposition of information supplied in the form of specified timevarying expectation values or source
functions; we actually know how f(x,t) varies in time, allowing us to predict the variation of g(x,t).
A possible general application of these considerations might be made to driven noise, in which the noise
amplitude varies in a known way.
4. The Physical Problem
Precisely how to adapt the preceding probability model to macroscopic systems will be taken up in Part II of
this discussion, while here we shall complete the study of Eq.(1) and the fundamental timeevolution equation.
If we believe that only an external source can produce changing macroscopic constraints and timevarying information
I(t), then r(t) must evolve in a manner over and above that determined by the Hamiltonian alone. In fact,
such an additional evolution is already implied by the densitymatrix formalism, as we now demonstrate.
In the equation of motion (1) for the density matrix we noted that the superposed dot represented a total time
derivative. But in many works the equation is commonly written as
where H may be time dependent. The standard argument is that the derivative in (31)
is indeed a partial derivative because this expression is derived directly from the
Schrödinger equation, which contains a partial time derivative, although it makes no
difference in (31) since r depends only on t. This comment would not be notable
were it not for an additional interpretation made in most writings on
statistical mechanics, where r describes the entire macroscopic system.
Equation (31) is often compared with the Heisenberg equation of motion for an operator F(t) in the
Heisenberg picture

dF
dt

= 
1
i ℏ

[F,H]+¶_{t} F , (32) 

whereupon it is concluded from analogy with Liouville's theorem that dr/dt=0 and (31) is just the
quantum mechanical version of
Liouville's equation in classical statistical mechanics. But there is nothing in quantum mechanics that requires
this conclusion, for r(t) is not a Heisenberg operator; it is basically a
projection operator constructed from state vectors, and in any event (31) refers to the Schrödinger picture.
Heisenberg operators are analogous to functions of phase
in classical mechanics, r is not. We shall argue here that
the derivative in (31) should be considered a total time derivative, as asserted earlier for (1); this
follows from a careful derivation of that equation.
A density matrix represents a partial state of knowledge of a system. Based on
that information we conclude that with probability w_{1} the system may be
in a pure state y_{1}, or in state y_{2} with probability w_{2}, etc.
Although the various alternatives y_{i} are not necessarily mutually
orthogonal, they can be expanded in terms of a complete orthonormal set
{u_{k}}:
y_{i}(r,t)= 
å
k

a_{ik}(t)u_{k}(r) , (33) 

such that áu_{k}u_{j}ñ = d_{kj}. The quantum mechanical expectation value of a
Hermitian operator F in state y_{i} is
áFñ_{i} º áy_{i}Fy_{i}ñ = 
å
k,n

a^{ }_{ki}a^{*}_{ni}áu_{n}Fu_{k}ñ . (34) 

The expected value of F over all possibilities (in the sense of classical probability theory)
is then
áFñ = 
å
i

w_{i}áFñ_{i} , 
å
i

w_{i}=1 . (35) 

This last expression can be written more compactly (and generally) in matrix form as
where the density matrix r is defined in terms of its matrix elements:
r_{kn} º 
å
i

a^{ }_{ki}a^{*}_{ni} w_{i} . (37) 

This expression is equivalent to writing r as a weighted sum of projection operators
onto the states y_{i}:
r = 
å
i

w_{i}y_{i}ñáy_{i}

.
To find an equation of motion for r we recall that each y_{i} must satisfy the
Schrödinger equation i ℏ¶_{t}y_{i}=Hy_{i}, and from (33) we find that this
implies the equations of motion
i ℏ 
×
a

ij

= 
å
k

a_{ik} H_{jk} , H_{jk} º áu_{j}Hu_{k}ñ . (38) 

The superposed dot here denotes a total time derivative, for a_{ij} describes a particular
state and depends only on the
time. One thus derives an equation of motion for r by direct differentiation in (37),
but this requires some prefatory comment.
Usually the weights w_{i} are taken to be constants, determined by some means outside the
quantum theory itself. In fact, they are probabilities and can be determined in principle
from the PME under constraints representing information characterizing the state
of the system. As noted earlier, however, if that information is
changing in time, as with a nonequilibrium state, then the probabilities will also be
time dependent. Hence, quite generally one should consider w_{i}=w_{i}(t); if such time
dependence is absent we recover the usual situation.
An equation of motion for r is now found in a straightforward manner,
with the help of (37) and (38), by computing its total time variation:
i ℏ 
×
r

kn

= 
å
q


æ è

H_{kq}r_{qn}H_{nq}r_{kq} 
ö ø

+i ℏ 
å
i


×
w

i

a^{ }_{ki}a^{*}_{ni} , (39) 

or in operator notation
i ℏ 
×
r

=[H,r] +i ℏ¶_{t}r . (40) 

The term i ℏ¶_{t}r is meant to convey only the time variation of the w_{i}.
Comparison of (40) with (32)confirms that the former is not a Heisenberg equation
of motion  the commutators have opposite signs. Indeed, in the Heisenberg picture the only time
variation in the density matrix is given by ¶_{t}r, which arises qualitatively in the same general
manner as P_{i}(t) in the preceding probability model.
If, in fact, the probabilities w_{i} are constant,
as in equilibrium states, then (40) verifies (31) but with a total time derivative.
Otherwise, (40) is the general equation of motion for the density matrix, such that the first term
on the righthand side describes the usual unitary time development of an isolated system. The
presence of external sources, however, can lead to an explicit time dependence as represented
by the second term, and thus the evolution is not completely unitary; classically, Liouville's theorem is
not applicable in this case.
An additional source term of this kind also appears in the work of Zubarev, et al (1996), and
in Kubo, et al (1985), but of a considerably different origin and unrelated to the basic probabilities.
5. Summary
Equation (40) is the sought after extension to
macroscopic systems of the equation of motion for the density matrix . The difference between this equation
and the canonical version (1) is evident; it is the differences in their solutions that is most important.
No matter what the solution to (1), it is always equivalent to a unitary transformation of the initial density
matrix, and hence incapable of describing an irreversible process completely; some approximations of that equation,
however, may exhibit various aspects of irreversible behavior. The r(t) evolved by (1) in conjunction with
the total Hamiltonian is certainly a correct result of quantum mechanics, but from a macroscopic viewpoint
it is incomplete; it contains no new macroscopic information about the processes taking place. It can predict
changes in the macroscopic constraints, yet is not itself affected subsequently by those changes; it reflects changes in the
quantum mechanical probabilities but not in the w_{i} in (37), which are determined by the external constraints.
An example illustrating these points is given in II.
To elaborate further on these differences, let us recall Boltzmann's enormously insightful relation between the
maximum entropy and phase space volumes (or manifolds in a Hilbert space). In the form articulated by Planck
this is
where W is a measure of the set of microscopic states compatible with the macroscopic constraints on the
system; it is a multiplicity factor. We emphasize that this expression is to be a representation of the
maximum of the information entropy and, as Boltzmann himself observed, it is not restricted to equilibrium
states. Although it sometimes is stated that (41) constitutes a proper definition of timedependent entropy for
nonequilibrium states (e.g, Lebowitz, 1999), there is no theoretical or mathematical basis for this
assertion. Rather, Boltzmann's formulation  which can actually be expressed as a theorem (e.g.,
Grandy, 1980)  provides a deep physical explanation of what is achieved by maximizing the information entropy
in the manner of Gibbs; namely, S_{B} characterizes that huge set of microstates compatible with the
macroscopic constraints, each microstate contributing to W having probability roughly equal to W^{1}.
In addition, (41) also illustrates through Liouville's theorem on conservation of phase
volume why the maximum entropy itself remains unchanged under canonical (or unitary) transformation; that is, no
macroscopic information is either gained or lost during evolution under (1). Yet, (41) must change under
the action of external forces; but how?
Let us return to the scenario of heating a pot of water on an electric burner, where initially the entire system
is in equilibrium with multiplicity W_{i}. As energy DQ is added to the water its temperature rises and
the number of microscopic configurations corresponding to the new constraints increases enormously, until at
some point the plug is pulled and the entire system relaxes to a final equilibrium state with multiplicity
W_{f} >> W_{i}. As a consequence, the entropy increases to S_{B}(final) > S_{B}(initial), and this is
completely equivalent to remaximizing the information entropy subject to the constraint E_{f}=E_{i}+DQ.
Note, however, that one can imagine carrying out such a remaximization at any time during the process, for
that maximization is also equivalent to acknowledging the existence of a definite phase volume of compatible
microscopic states at any instant.
Thus, the multiplicity factor W increases to its final value owing to a change in the macroscopic constraint
provided by the total energy. In turn, this can come about only by an evolution of the weights w_{i} in (39).
Because the only time variation in r(t) in the Heisenberg picture is that of these weights, we suspect
there may be more direct ways to determine the appropriate density matrix than by trying
to solve an incredibly complex differential equation. After all, from a macroscopic standpoint we need only know
DQ and the heat capacity of the water to predict its final temperature and energy.
Explicit construction of r(t) and S(t) appropriate for describing nonequilibrium
phenomena in these systems is carried out in Part II (following paper).
 de Groot, S.R. and P. Mazur (1962), Nonequilibrium Thermodynamics, NorthHolland, Amsterdam.
 Gibbs, J.W. (1902), Elementary Principles in Statistical
Mechanics, Yale University Press, New Haven, Conn.
 Grandy, W.T. (1980), "Principle of Maximum Entropy and Irreversible Processes," Phys. Repts. 62, 175.
 (2004), "Time Evolution in Macroscopic Systems. II: The Entropy," Found. Phys. 34, 16.
 Jaynes, E.T. (1957), ``Information Theory and Statistical Mechanics,''
Phys. Rev. 106, 620.
 Kubo, R., M. Toda, and N. Hashitsume (1985), Statistical Physics II, SpringerVerlag, Berlin.
 van Kampen, N.G. (1962), ``Fundamental Problems in Statistical Mechanics of Irreversible Processes,"
in Fundamental Problems in Statistical Mechanics, E.G.D. Cohen (ed.), NorthHolland, Amsterdam; p.173.
 (1971), ``The case against linear response theory," Physica Norvegica 5, 279.
 Lebowitz, J.L. (1999), ``Statistical mechanics: A selective review of two central issues," Rev. Mod. Phys.
71, S346.
 Mori, H. (1965), ``Transport, Collective Motion, and Brownian Motion," Prog. Theor. Phys, 33, 423.
 Pauli, W. (1928), ``Über das HTheorem vom Anwaschen der Entropie vom Standpunkt der neuen Quantenmechanik,"
in Probleme der modernen Physik, Arnold Sommerfeld zum 60. Geburtstage gewidmet vonsiner Schülern,
Verlag, Leipzig.
 Shannon, C. (1948), ``Mathematical Theory of Communication," Bell System Tech. J. 27,379, 623.
 Zubarev, D., V. Morozov, and G. Röpke (1996), Statistical Mechanics of Nonequilibrium Processes.
Volume 1: Basic Concepts, Kinetic Theory, Akadamie Verlag, Berlin.
 Zwanzig, R. (1961), ``Statistical Mechanics of Irreversibility," in Lectures in Theoretical Physics, Vol.III,
W.E. Brittin, B.W. Downs, and J. Downs (eds.), Interscience, New York.
Entropy
Applications
Footnotes:
^{1}In many common cases the system can be merely closed, or even open, as long as fluctuations are small; the
main requirement is that there be no net external forces. Alternatively, relaxation times are much shorter than those
required for external influences to cause appreciable perturbations.
^{2}For simplicity we shall consider a discrete set of states or alternatives, which
is equivalent to employing a representation in which r is diagonal. The description then appears as
independent of any particular physical application.
File translated from
T_{E}X
by
T_{T}H,
version 3.10.
On 10 Oct 2003, 14:46.