next up previous
Next: The Second Thread Up: Resource Letter ITP:1: Information Previous: Resource Letter ITP:1: Information

Historical Introduction

It has long been understood that physics and the notion of information are intimately related -- indeed, information is the lifeblood of all science. In a very real sense the differential equations of physics are simply algorithms for processing the information contained in initial conditions. Data obtained by experiment and observation, sense perceptions, and communication either are, or contain information forming the basis of our understanding of nature. Yet, an unambiguous clear-cut definition of information remains as slippery as that of randomness, say, or complexity. Is it merely a set of data? Or is it itself physical? If the latter, as Einstein once commented upon the ether, it has no definite spacetime coordinates. While most physicists would agree that the only valid means of knowing the physical world is by obtaining information through observation and measurement, a general definition of the term is elusive, even though much effort has been devoted to the task without reaching any definite conclusions (Refs.13,14).

The difficulty is somewhat similar to that of attempting to explain the origin and meaning of inertia to beginning students. While the term can seem a bit obscure in its meaning, there is no ambiguity in defining inertial mass as its measure, and the concept becomes scientifically useful. Similarly, the general notion of information becomes a scientific one only if it is made measurable. This was first done in a tentative way by Nyquist (Ref.1) in 1924, and then quite clearly by Hartley (Ref.2) in 1928. Hartley was interested in the transmission of information in telegraphy and telephony, and concluded that a proper quantitative measure of the information in a message (symbol sequence) is the logarithm of the number of equivalent messages that might have been sent. For example, if a message consists of a sequence of k choices made from n symbols at each selection, then the number of equivalent messages is tex2html_wrap_inline538 and transmission of any one of these conveys an amount of information tex2html_wrap_inline540 . Implicit here is a presumption that all messages are equally likely.

Quite appropriately, modern information theory had its origins in the theory of communication, though this is only one of the threads in the tapestry. From these heuristic beginnings there developed an elegant and complete theory by 1948, produced almost simultaneously by Norbert Wiener (Ref.3) and Claude Shannon (Ref.4). Wiener's contribution first appears in his book Cybernetics, the scope of his interests indicated by the subtitle ``control and communication in the animal and the machine." Influenced by John von Neumann, he introduces as a measure of the information associated with a probability density function f(x) the quantity

equation11

and applies it to a theory of messages in various systems. The similarity of this expression to some encountered in statistical mechanics did not escape Wiener's attention.

At virtually the same time, Shannon realized that the basic problem in sending and receiving messages was a statistical one, and he extended Hartley's ideas to situations in which the possible messages were not all equally probable (though they are presumed to constitute an exhaustive and mutually exclusive set, so that the probabilities sum to unity). If messages are composed of an alphabet A with n symbols having probabilities of transmission tex2html_wrap_inline548 , the amount of information in a message is defined as

equation15

where K is a positive units-dependent constant. Shannon arrived at this expression through arguments of common sense and consistency, along with requirements of continuity and additivity. Because information is often transmitted in strings of binary digits (0s and 1s), it is conventional in communication theory to take the logarithm to the base 2 and measure H in bits. Thus, H quantifies the average information per symbol of input, measured in bits. Note that if the symbols are equally probable then, because tex2html_wrap_inline556 , each tex2html_wrap_inline558 and we regain Hartley's result of maximum information. If, however, one symbol is transmitted with unit probability it follows that H(A)=0 and no information is contained in a message whose content is known in advance.

One might object that there is indeed information in this latter event, but it is simply not useful. It is not the intent of the definition (2) to judge usefulness, however, nor is there any meaning to be attributed to a piece of information. Shannon originally thought of naming his measure `uncertainty', but von Neumann urged him to call it entropy (perhaps accounting for the Greek letter H), arguing that a similar expression already existed in statistical mechanics. Thus was unleashed a flood of mischief that has yet to abate completely.

With this measure of the information required to estimate which message had been sent, Shannon laid the foundations of the modern mathematical theory of communication. If a communication channel (e.g., a telephone line) is noise-free, then one can expect the output message to reproduce faithfully the input. This is rarely the case, so one is led to introduce as well an output alphabet B with m symbols. The properties of the noisy channel can be described by conditional probabilities p(i|j), in terms of which one defines the mutual information

equation23

It is this quantity, which reduces to H(A) in a noiseless channel, that Shannon employed to state one of the most important results of his theory. The capacity C of a communication channel is the maximum rate, in bits per second, at which information can be transmitted from input to output. It is then a theorem that, with suitable coding and decoding, information can be transmitted without error at any rate up to and including C, and any attempt to transmit at a rate beyond capacity inevitably results in errors. Formally, C is proportional to the maximum of H(B;A) over all possible input probability distributions tex2html_wrap_inline580 . As an example, for a single channel with additive white Gaussian noise having power spectrum S, bandwidth B, and signal power P, the capacity is

equation28




next up previous
Next: The Second Thread Up: Resource Letter ITP:1: Information Previous: Resource Letter ITP:1: Information

W.T. Grandy Jr.
Wed Nov 20 16:12:26 GMT-0600 1996