Katz InformationTheory

Information 1, Definition of information Information isthe entity which makes the difference between knowing and not ‘kaowing, between being faced with a number of possibilities and between knowing the one that actually prevails. To define it quantitatively, we consider a simple case of choice between n possibilities. We hope that other cases will be reducible to this simple one or that suitable generalization of our definition will prove possible. For the moment let us consider the simple case in which we are faced with a choice among n possibilities. As an example we may consider an object hidden in one of n similar boxes; we do not know which one. We shall assume thatthe 1 possibilities are a complete set: one of them must be true (inthe example the object certainly is in one of the boxes). We shall assume them to be mutually exclusive: the object cannot be inside more than one box. We shall also assume them to be equally probable: no knowledge we possess tends to make us prefer one possibility (o another, ur inability to decide among the n possibilities reflects @ certain lack of information. Tt seems reasonable to take the amount of information which is missing to be a function of n—the number of possibilities. We denote the missing information by I and write T= Ie). a) It is natural to assume that the larger m is, the more information there is missing. We therefore require that J be an increasing function of nz Hn) > Hon) ifn > m. a2 When the missing information is supplied, one possibility is chosen. We then face a problem of choice with only one choice, n = I. Since at this stage we consider our information complete, we require that the missing information vanish when a single possibility remains: 0) (1.3) 1. Definition of information 15 Finally let us consider a problem of choice which consists of two independent problems of choice: one with n possibilities, the other with m possibilities. The total number of possibilities is now nm. In the example of the boxes, imagine that each of the n boxes is divided into m compartments. To locate the object ‘we have to know both the box and the compartment. Since the total number of compartments is nm, the missing information is now (nm). However, it can be supplied in two steps, by first identifying the box—thus supplying information in the amount J(a)—and only later the compartment—supplying information in the amount J(m). This achieves exactly the same result as conveying the whole Tem) missing information at once. We now require that the information be additive, That is, when information is supplied piece by piece (without, however, repeating the same information), then the total amount of information conveyed rust be the sum of the information conveyed at each step. This requirement leads to Ha) = Hen) + Kn) aa ‘We shall now solve the requirements (1.1) through (1.4) to find the choice of functions 1(n) that satisfy them all.* The function [(n) is so far only supposed to be defined on the natural numbers (the positive integers). However, assuming ‘that 2 function J(n) is known, we can readily generalize it to all positive rational smumbers by the further definition, 1(3) = 10 ~ 10. as) Equation (1.5) defines / for all nonintegral rational numbers n/t. If n/€ happens tobe an integer, Eq. (1.5) coincides with (1.4), which by assumption is satisfied by (0). Similarly, Eq. (1.4) guarantees that K(m/) = I(p/q) whenever m/n = p/9, so that the values of J are uniquely defined for all positive rational arguments. ‘We next notice that from Eqs. (1.2) and (1.5) it follows that Fis a monotonic rising function also on the positive rational numbers. This enables us to extend the function To all positive real numbers. For any irrational x we choose a monotonic increasing series of rationals a, with x as limit, lim a, = x (1.6) and define (2) as AG) = lim Ka). ay The series 1(a,) is also monotonic increasing, and therefore the limit (1.7) exists. ‘This limit is also independent of the choice of seties. If two series day by both ‘monotonic increasing and tending to x are given, we may rearrange them into «a single monotonic increasing series tending to x. On the points of the combined "Information is the mast basic concept in the present presentation. It was therefore judged ‘appropriate to go in some detal into the consideration: identifying the unique definition Which satishies our requirements. However, the reader i 901 interested in this mathematical part, may skip it and pass to Eq. (1.21), where the form of Ka) is given.series, I tends to a single limit, which must be also the limit of each series sepa- rately. ‘Notice that 1(x), now defined for all real x > 0, is monotonic increasing. If X > y and liye de = 2, lige bn = J, We have IG) — ly) = lim (Ula) — 10,)] > 0. as) Also, since limsae-ve dabm = xy and since from Eqs. (1.4) and (1.5) we have Hain) = an) + Hn) for rational dysbny it follows for all postive xy* that Tey) = 1) +10) as) Let us now consider a special value of x, the logarithm of which is rational: logs = 2 x = erie; m,n positive integers 0) For this choice of x we have Kee) = He, aay By successive application of Eq. (1.9) we obtain ee) = nto, (uy Her) = mie), (13) so that n(x) = mile) (14) and Ke) = Ke) = (og) = koe, as) where k stands for He). Equation (1.15) provides en explicit form for 1(x) for the special values of x of Eg, (1.10). But since these special values are dense in the positive real numbers. they should sufice to fix the value of /(x) everywhere if only 1(x) is continuous. This is the next point to check. Consider Txt) I) = (1+ 16) Continuity of (x) means thet tins (1+ $) = tim sa +9 = 0= 10, ay and 1(x) is continuous for every x if it is continuous for x = 1; in the opposite ccase I(x) is everywhere discontinuous. A monotonic function cannot be everywhere discontinuous. In fact, a monotonic function is always continuous except possibly on a set of measure zero, The possibility of a discontinuous (x) is thus ruled out, and we find that /(x) is continuous. *To apply the same arguments for 1 | Towe this device to ¥. Frishiman ional x, choose a, = x for all It was not really necessary to rely on general theorems on monotonic Fanc- tions, since it is quite easy to prove Eq. (1.17) directly. Consider a sequence «, which tends to zero. For every such sequence, one can find a sequence 8, with the following properties. 1. 6+is rational. 2. fim 3: = 0, us) 3, For every i, Lt as < ek qs) ‘We now have Hd + 4) < He) = 50), (120) from which Eq, (1.17) follows Now that we know that /(x) is continuous, we may adopt Eq, (1.15) for all x. Going back to the integers and to our problem of choice, we find that In) = klogn. an ‘This expression satisfies all our requirements. The reasoning that brought ws to Eq, (1.21) shows that itis the only one that does. Equation (1.2) restricts k to be positive. Otherwise, k is left undetermined. It is an arbitrary constant which embodies the choice of units for information. We leave it arbitrary for the time being. We shall make a choice later (Chapter VI). 2. An illustration Let us estimate the amount of information that can be conveyed by a page of symbols made by an English typewriter. There are 26 letters, each in upper and lower case, and 9 numerals (I usually missing and replaced by (). There are punctuation marks and several other symbols, and the typewriter can also produce a space. Assume that this makes 75 different symbols, and suppose that our page may contain N such symholk. With a choice of 75 possibilities for each symbol, the total number of different pages that the typewriter may produce is 75, ‘The information missing before we look at the page and conveyed by it is thus J = k log 75* = Nk log 75. The information per symbol is log 75. We assumed them to be independent; therefore the total information is the sum of the information conveyed by each letter. ‘The estimate N& log 75 is of course a maximum estimate. Iti true only if we have no prior knowledge of the contents of the page. It should be valid for a page typed by an infant or in an unknown code. If we happen to know before ‘hand that the page is typed in English, the missing information is greatly reduced. For now we know that letters shall appear in groups with spaces in between, the ‘groups being words existing in English. The words will be grouped into 8 tences, starting with a capital and ending in a full stop. Only such sentences shall appear as are allowed by English grammer. All this advance knowledge18 Taformation reduces greatly the amount of information missing before the page is read. If one also has some foreknowledge of the contents of the page, the information conveyed is even'rauch further reduced. For example, a pamphlet of a given party before elections conveys very little information. To estimate the reduction in missing information under the situations de- scribed above may be quite a complicated problem. For this the reader is referred to texts of information theory. There is one aspect in the reduction of the missing information that may be understood in the present text, Its this: in any language there is a known frequency in which the various letters appear. The reduction in missing information resulting from this is treated in the next section, which is devoted to a problem of choice with preassigned probabilities. To close the present section, let us stress that our quantitative concept of formation pertains only to knowledge and has nothing to do with understand ing and credibility. Tn this way a written page, when coupled with previously acquired experience and intelligence, may lead us to understand a great deal about a variety of things. We are then emotionally inclined to credit the page ‘with having taught us a great deal. What has happened is that the information contained in the page was So chosen a to be very useful to us. The quantitative concept of information has nothing to do with usefulness and measures only the amount of information, useful or useless, to which knowledge of the contents of the page corresponds. Usefulness or relevance of information is disregarded, It may also happen that the page may tell us about many things outside itself. Although it is composed from just a finite number of symbols and corresponds to one of a finite number of choices, it may give answers to a problem of choice with an infinite number of possibilities. By using a formula such as ‘Aq = 2n-+ 3nt, it defines an infinite series and answers a problem of choice ‘between a continuous infinity of possibilities. However, whenever information about something outside the page is conveyed, it may be either true or false. ‘The credibility of the information must be decided on the basis of other infor- ‘mation outside the page. The page itself conveys only the knowledge of its contents, and that is what J measures. 3. Missing information when a probability distribution is given Let us return to our problem of choice between x possibilities, but this time let there be a probability assigned to each possibility. The probabilities are non- negative numbers, P20, a=hee.sm GD ye To evaluate the missing inforination we have to reduce this case to the former case of equally probable possibilities: For this we have'to be clear about what which add up to one: 62) 3. Missing information when a probal distribution is given 19 we mean in saying thatthe probability of the possibility ais Py. In this we are getting into deep water. The only way of turning probabilities into a definite statement is to say that if faced with aot one but a large number of similar independent problems of choice, then the frequencies of the different answers should become proportional to the probabilities as the numer of problems {grows to infinity. In fact, whenever a physical theory predicts probabilities the way to check the prediction experimentally is to repeat the experiments many times. However, not all experiments can be repeated. In everyday life many things may be tried only once, An executive contemplating a given investment may ‘think in terms of probabilities of success and failure. But the given investment may be tried only once. Also, in case of failure the executive may never find himself in @ position to try any other investment again, Strictly speaking, the universe changes continuously, and no experiment ean be tried more than once under exactly the same conditions, Nevertheless, probabilities are a useful tool of thinking. When our theory of guessing is complete (Chapter 1V), we shall be assigning probabilities even without experiment. There is more in the concept of probability than a prediction of frequencies. But when we have to evaluate the missing information we must resort to imagining thatthe problem of choice between m possibilities may be replaced by W independent similar problems. As N —» 2 we know that the possibility acis the corret one in INP, cases. What we do not know is the order in which the various results will appear. Which of the problems will be the NP, with answer a? All orders are equally possible and their total number is NI/Il3.1 (NP)!. When the order is specified, no more information is missing. “The missing information for the N problems is thus In = k10g—— = i flog wt — Y° tog vr.) G3) Treen * = : We only want Jas N —> 20, We may thus use the asymptotic formula* log M = NlogV — N+ 0 (og.N) G4) and restate Eq. (3.3) as i= b{wtoga 9 = $2 nF, doea4 oe) + $55} + 06on.) == Wk YS Py log P. + 0 log N). 65) The missing information is thus proportional to WN, the number of problems. ‘The missing information per problem is +The formula log NI = log N ~ N also happens to hold for NV ~ 0. This takes care of the possibiliv that some of the vanish, in which case NP, does not become large a6 N—> =20 Information ~kY Plog Pe 66 Coming back to a single problem with a given probability distribution, we adopt Eq. (3.6) as the expression for the information stil missing after the probability distribution is known. When all possiblities are equally probable, Px = I/n for each a and Eq. (3.6) becomes identical withthe original definition (1.14). We may now view Eq. (1.14) as a special case of (3.6). Also, when a choice % is definitely known and we have P, = Sam, Eq. (3.6) vanishes, as it should. 4. An infinite choice problem We shall now generalize our considerations to a problem of choice among an {infinite number of possibilities. Let us first consider a discrete infinity, We have the possibilities a, with a = 1, -.., ©. This may be considered the limit of the choice between m possibilities as n —> <0, If nothing is known, then the missing information, according to Eq. (1.14), is Tm lim klog. An infinite amount of information is necessary to choose one of an infinity of equally probable possiblities. On the other hand, if a probsbility distribution is given, from Eq. (3.6) we have to 4) =k Pelog Pa (42) and this expression may well be finite, 5. A continuous choice problem Let us now suppose that we are faced with a choice between a continuous set of possibilities labeled hy a variable x, varying in the range a < x < B. Let us also assume for this case a continuous probability distribution function P(x), which satisfies PO>0, f’ PQ)de = 1 6D ‘The meaning of P(x) is that the probability that the actual possiblity will be labled by a value between x and sis" A) de * We consider here only P(x) which are continuoss functions. In general, a probebility disti- bation may be a more general function or even a measure (whichis 4 geaeralized function ‘and need not bea function). The more general cases may be considered as limits of sequences of continuous functions. We shall not need them. The arguments that fellow apply cirectly to any P(x) which is Riemann-integrable. 5. A continuous choice problem 21 ‘We must now consider our continuous problem of choice to be the limiting case of a discrete problem, so that our expression for the missing information may apply. The simplest way to achieve this is to divide the segment a — 8 into 1m equal subsegments (we shall call them cells), choose between the cells, and then let n—>o, (The name cell derives from the same problem in many dimensions.) The cells are labeled by «= 1, ..., The cell a lies between* a= at Se lea) ad Bat 86-0, ‘The probability tha he cell a contains the ight answer is ran [me a 62 The infomation required to specify the cl contain the ight anne by our previous formalism, I= -kSS PhlogPt = kD [ Poodstos Pode. 63) We now express? of Eq, (5.2) a8 mba Pi Pe 6.4) where P3 is a number between the upper and lower bound of P(x) within the cell and (6 — a)/n is the size of the cell. Substituting this value of P3 into Eq, 6.3), we obtain aebe (106 #2 + toe 2=*) —k > = 4 prog Ps — klog 3) When tends to infinity, the frst =i [703 19g P00) ax, andthe second term becomes +. Thus the informe term in Eq, (5.5) becomes tion required to pick up one point in a continuum is infite. However, we notice that the infinite term in Eq. (5.5) depends only on the size of the cell and is one and the same for all probability distributions. The difference between the missi information for two probability distributions P,(x) and P(x) can still be meaning and isthe dierence between the itera —K [" Po) og Ps and * Since the probabil of any single point is ero, we do not bother to specify to which cell he limit points themselves belong.22 Information = [! rea) ing PA) ds We therfore “roma” the mising iafomation by the infinite constant —Klim,.. log [(6 — @)/m] and adopt the form 1a ~& [PPO 106 PO) ae 69 By the renormalization we have violated the requirement that the missing infor- ‘mation Vanish when the answer is specified. As P(x) approaches a 6 function distribution the new J of Eq. (5.6) tends to ~», but all other requirements on J fare sil satisfied. We cannot really hope to do any better in the continuous case, Since [or any reasonable probability distribution, however sharply peaked, there exists 2 neighborhood of the peak where P(x) is almost constant. This neighborhood contains an infinity of points to choose from, which requires, by the con siderations of the last section, infinite information. "We must next notice that the form (5.6) results ftom use of equal eells in the variable x. However, the various possibilities could also have been specified by any other variable y ~ (x), with p(x) monotonic and continuous between @ and b, The probability of finding the answer between y: and yx both between Ho) and 9 “ : [2 rea [econ Spa, 6D vith x9) the solution for x of» = pC). Weese that we could have started with Beste iving between 3) and), using the probability dstsibuton PLx(y)]idx(y)/dy]. If we did this, we should use ‘equal cells in y (which usually eee ego els in x) and should beled to we ax) a0) = rts SP {los Paon + toe a -k Jf roots me ae + [ 7) 10g 22 ax. 68) ‘This result differs from Eq. (5.6) by the second term, which does not generally vanish. ‘We thus see that a choice has to be made in the continuous case of @ basic variable to vse before the (unnormalized) missing information can be uniquely efined, This variable is the one in which equal cells are considered as “equally probable, @ priori,” meening that if no information on probabilities were available we should judge them as equally probable. ‘The generalization to a continuous probability distribution defined over an infinite range is straightforward. 3 Statistical Mechanics 1. Introduction ‘We shall now go into the formalism of statistical mechanics. Any such formalism is of course superimposed over a microscopic mechanical theory. As our microscopic theory we could choose either classical mechanics or quantum mechanics. In principle we conld have done everything quantum mechanically, As far as we know, quantum mechanics is the correct description of nature. Also, the ‘passage to classical theory may always be achieved by taking the h —> 0 limit. ‘There are cases, however, in which the quantum treatment is considerably more complicated than the classical treatment, and no quantum effects in the results are of any importance. In these cases it is preferable to use classical formalism to begia with. In the present text we shall carry both the classical and the quantum mechanical formalisms side by side, This will not be difficult, since the statistical treatment of both formalisms is very similar. It will also allow readers not familiar with quantum mechanics to follow the presentation of the statistical theory. Following the above policy, the present chapter will eal first with classical statistical mechanics (Sections 2 to 5), then with quantum statistical mechanics (Gections 6 to 8). In both the classical and the quantum mechanical parts we shall use the nonrelativistic (Galilean) theory. Our principles could have been. applied to the relativistic case just as well, 2, Classical mechanics In this section and the next we review classical mechanics as we shall use it. In classical mechanics the state of our system is specified by a set of generalized coordinates q:, with /= 1, ..., Ni and their conjugate canonical momenta Po with £ + Np where Ny is the “number of degrees of freedom.” The simplest example of canonical coordinates is provided by the components of

Katz InformationTheory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Katz InformationTheory

Uploaded by

Copyright:

Available Formats

You might also like