1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical...

62
CHAPTER 1 STATISTICAL PRELIMINARIES 1.1 Introductory Remarks In the large, the subject of this book is statistical communication theory. In detail, it is an application of modern statistical methods to random phe- nomena which influence the design and operation of communication sys- tems. The specific treatment is confined to certain classes of electrical and electronic devices, e.g., radio, radar, etc., and their component elements, although the general statistical methods are applicable in other areas as well. Among such may be mentioned the theory of the Brownian motion, the turbulent flow of liquids and gases (in hydro- and aerodynamics), corre- sponding communication problems in acoustical media, and various phe- nomena in astronomy, in cosmic radiation, and in the actuarial and eco- nomic fields. In the present case, the phenomena we have mainly to deal with are the fluctuating, or random, currents and voltages that appear at different stages of the communication process—i.e., the inherent, or background, noise which arises in the various system elements, noise and signals originating in the medium of propagation, and the desired messages, or signals, which are generated in or impressed upon the system at various points, f Phenomena, or processes, of this type are characterized by unpredictable changes in time: they exhibit variations from observation to observation which no amount of effort or control in the course of a run or trial can remove. However, if they show regularities or stabilized properties as the number of such runs or observations is increased under similar conditions, these regularities are called statistical properties and it is for these that a mathematical theory can be constructed. Physical processes in the natural, or real, world which possess wholly or in part a random mechanism in their structure and there- fore exhibit this sort of behavior are called random or stochastic% processes, the latter term being frequently used to call attention to their time-depend- ent nature. We shall also refer to their mathematical description as a stochastic process, remembering that the analytical model here is always a more or less precise representation of a corresponding set, or ensemble, of possible events in the physical world, possessing properties of regularity in f For a more general definition of signal and noise, see Sec. 1.3-5. t From crroxos, meaning "chance." 3

Transcript of 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical...

Page 1: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

CHAPTER 1

STATISTICAL PRELIMINARIES

1.1 Introductory Remarks

In the large, the subject of this book is statistical communication theory.In detail, it is an application of modern statistical methods to random phe-nomena which influence the design and operation of communication sys-tems. The specific treatment is confined to certain classes of electrical andelectronic devices, e.g., radio, radar, etc., and their component elements,although the general statistical methods are applicable in other areas as well.Among such may be mentioned the theory of the Brownian motion, theturbulent flow of liquids and gases (in hydro- and aerodynamics), corre-sponding communication problems in acoustical media, and various phe-nomena in astronomy, in cosmic radiation, and in the actuarial and eco-nomic fields.

In the present case, the phenomena we have mainly to deal with are thefluctuating, or random, currents and voltages that appear at different stagesof the communication process—i.e., the inherent, or background, noise whicharises in the various system elements, noise and signals originating in themedium of propagation, and the desired messages, or signals, which aregenerated in or impressed upon the system at various points, f Phenomena,or processes, of this type are characterized by unpredictable changes in time:they exhibit variations from observation to observation which no amountof effort or control in the course of a run or trial can remove. However,if they show regularities or stabilized properties as the number of such runsor observations is increased under similar conditions, these regularities arecalled statistical properties and it is for these that a mathematical theorycan be constructed. Physical processes in the natural, or real, world whichpossess wholly or in part a random mechanism in their structure and there-fore exhibit this sort of behavior are called random or stochastic% processes,the latter term being frequently used to call attention to their time-depend-ent nature. We shall also refer to their mathematical description as astochastic process, remembering that the analytical model here is always amore or less precise representation of a corresponding set, or ensemble, ofpossible events in the physical world, possessing properties of regularity in

f For a more general definition of signal and noise, see Sec. 1.3-5.t From crroxos, meaning "chance."

3

Page 2: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

4 STATISTICAL COMMUNICATION THEORY [CHAP. 1

the above sense. Since the theory of probability is the mathematical theoryof phenomena which show statistical regularity, our methods of treatingrandom processes are based on such concepts. The necessity of a statisticalapproach stems, of course, from the fact not only that it is usually impossibleto specify the initial states of many physical systems with sufficient accuracyto yield unique descriptions of final states but also that the very laws ofnature which we invoke are themselves idealizations, which ignore all butthe principal characteristics of the model and of necessity omit the pertur-bations. Even then, a detailed application is often unworkable because ofthe inherent complexity of the system, so that a statistical treatment aloneis productive.

It has been widely recognized since about 1945 that an effective study ofcommunication systems and devices cannot generally be carried out in termsof an individual message or signal alone, 1-2f nor can such a study safely neg-lect the inhibiting effects of the background noise or other interference onsystem performance.3 Rather, one must consider the set, or ensemble, ofpossible signals for which the system is designed, the ensemble appropriateto the accompanying noise, and the manner in which they combine in thecommunication process itself. For those systems and system inputs, then,which possess statistical regularity (and fortunately these are usually thephysically important ones here), we expect probability methods to providethe needed approach. Our concern with the properties of individual sig-nals and associated system behavior, characteristic of earlier treatments,remains, but the emphasis is shifted now to the properties of the ensembleas a whole. It is these which are ultimately significant in analysis, design,and performance. Thus, when we speak of the probability of an event ata single observation, such as the presence (or absence) of a particular signalin a particular noise sample, we recall that this has meaning only in thecontext of an ensemble of such observations and that system performanceis properly judged in terms of this ensemble, and not on the basis of anindividual run alone.

Communication systems may be broadly described in terms of the oper-ations which they perform on postulated classes of inputs. Chief amongsuch operations electronically are the typical linear J ones of amplification,differentiation, smoothing (or integration, e.g., linear filtering) and suchnonlinear % ones as modulation, demodulation, distortion, detection, etc.,included under the general term of rectification. From the statistical view-point, these operations have their mathematical counterparts in correspond-ing transformations (e.g., translation, multiplication, differentiation, inte-gration, etc.) of the signal and noise ensembles. In this way, the study ofcommunication systems in a general sense becomes one of determining thestatistical properties associated with the new ensembles that result from

f Superscript numerals are keyed to the References at the end of the chapter.t For a definition of linear and nonlinear operation, see Sec. 2.1-3(2).

Page 3: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 5

these various linear and nonlinear transformations. Just what transfor-mations are important in a given situation depends, of course, on the pur-pose or intent of the particular communication process. Therefore, from astill broader viewpoint not only have we to consider ensemble properties—i.e., the statistical properties of signals and noise, including those for trans-formed ensembles—but we have also to determine which ensembles andtransformations to select when the consequences of an action or decisionare specifically made a part of system function itself. For this purpose,statistical decision theory* provides necessary concepts and techniques.5-6

Before moving on to examine the salient features of signal and noiseensembles, let us consider very briefly a few results of probability theory,after which it will be possible to discuss more precisely the notion of astochastic process and to proceed thus to the main task of the book.

1.2 Probability Distributions and Distribution Densities^

In this section, we summarize in a formal way some of the results ofprobability theory that are required for our subsequent discussion; for adetailed and rigorous treatment, the reader is referred to standard works. %Let us begin, then, with the notion of a random variable.^

Consider a definite random experiment E, which may be repeated undersimilar (controlled) conditions a large number of times. Furthermore, letthe result of each particular trial, or run, of this experiment be given by a(single real) quantity X. Then let us introduce a corresponding variablepoint X in a probability, or measure^ space R\. X is called a (one-dimen-sional) random variable. Accordingly, if a value xk is associated with aparticular event or outcome of the experiment E on a particular trial, wesay that the random variable X has a probability P(X = Xk) — Pk of assum-ing the realized value xk. In this way, probability is defined as a numberwhich is associated with a possible outcome of the random experiment E;pk, of course, is equal to or greater than 0 but cannot exceed 1. In fact,Pk — 0 is to be interpreted that X takes the value Xj a negligible fractionof the total number of trials in the ensemble of possible runs, while pk = 1is equivalent to certainty that X has the value x3- on all (but a negligibleportion of the total possible number of) runs in the ensemble.

The concept of random variable can be extended. If the outcome of a

t Section 1.2 may be omitted by the reader familiar with probability theory.$ Among these, see in particular Cramer.7 The modern mathematical foundations

and development of probability theory stem mainly from the work of Kolmogoroff,8

Borel,9 Frechet,10 and others. From a more applied point of view, we mention, forexample, Uspensky11 and Feller.12 We shall use Cramer in this section as our principalreference for the detailed exposition and points of rigor as well as for an extensive bibli-ography of the pertinent mathematical literature.

§ As an introduction, see Cramer, op. cit., chap. 13, and also Arley and Buch,13 chaps.1 and 2; fundamental definitions and axioms are considered in Crame*r, chap. 14.

Page 4: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

6 STATISTICAL COMMUNICATION THEORY [CHAP. 1

particular trial is the set of values x = (xi, . . . ,xn), we then introduce acorresponding variable vector point X = (Xi, . . . ,Xn) in probability, ormeasure, space Rn and say that X is a vector random variable, or an n-dimen-sional random variable. We remark, for both X and X, that the distribu-tion of the points in measure space, or, more suggestively, the "mass" (i.e.,the pk for X, etc., above), or weighting associated with the points of measurespace, is equal to the probability measure, or, equivalently, the probability,associated with the corresponding set of values of the random variable.

A SINGLE RANDOM VARIABLE

1.2-1 Distribution Functions.! Consider a single random variable X\from the above remarks we write the distribution of X as the function

D(x) = P(X < x) (1.1)

where P{X < x) is the probability that X takes any one of the allowedvalues x, or less, on any given trial of the experiment E; in short form, wewrite D(x) = d.f. of X. The general properties that the distribution mustpossess\ are

(1) D(x) > 0 - o o < ^ < o o , a l l x(2) 0<D(x)<\ D(oo) = i f JD(-oo) = 0 Q 2 )

(3) Nondecreasing (point) function as x —* <*> \ • )(4) Everywhere continuous to the right

The probability that our random variable X takes a value in the interval(a < X < b) is thus

D(b) - D(a) = P(a < X < b) (1.3)

The derivative of D(x), when it exists, is defined by

r D(x + e) - D[x - e) , , , . ..hrn - ^ '— ^ ^ -> w(x) (1.4)

if x is a point of continuity; w(x) is the density of "mass" at x and is calledthe probability density (p.d.), or frequency function (f.f.), of the distribu-tion D. Properties of w(x), corresponding to those of D(x), are easilydeduced from the above.

(1) w(x) > 0 all x

(2) j~mw(x)dx = 1 (1.5)

(3) D(x) = [*+w(x)dxJ —00

The quantity w(x) dx is the probability of the random variable X havingt Cramer, op. cit., chap. 15.t Ibid., chap. 6, sec. 6.6.

Page 5: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES

the value x in the range (x, x + dx) and is often called the probability ele-ment of #.f

1.2-2 Functions of a Random Variable.$ Any (real) function Y = #(X)of the (real) random variable X is itself a random variable, with its ownprobability measure. Thus, its distribution function is

D(y) = P(Y < y) (1.6)

where y are the values corresponding to x} obtained from the transformationV = g(x); D(y) possesses exactly the same general properties [Eq. (1.2)] asdoes D(x). These notions may be extended to complex functions X =Xi + iX2 (Xh X2 real), Y = Y± + iY2 (Yh Y2 real).§ One has now apair of random variables Xi, X2} or Fi, Y2, with their appropriate distri-butions, etc. (see Sees. 1.2-8 to 1.2-14).

1.2-3 Discrete and Continuous Distributions. Most of our appliedproblems fall into three classes, those possessing (1) discrete, (2) continuous,

D(x)

xl *2 X3 O—Xk * „ - ! Xn X

FIG. 1.1. Distribution of X in the discrete case.

or (3) mixed distributions. The discrete distribution (class 1) is one forwhich the weighting of the distribution of the random variable X is concen-trated at a discrete set of points, such that any finite interval in measurespace contains at most a finite number of such "mass" points. As anexample, consider the mass points x\, x2, . . . , xn, with the corresponding"masses," or probabilities, pi, p%, . . . , pn, such that

2> = 1fc = i

(for unity measure), with p* > 0 (fc = 1, . .the d.f. of X is here

D{x) = P(X < x) =

n). As shown in Fig. 1.1,

Pk (1.7)

where P(X = Xh) == Pk and P(X z* xk) (all k) vanishes.

t Cf. ibid., sec. 15.1, p. 167.% Ibid., sec. 14.5.§ Ibid.

7

l

ft

I P n - lPn

MPz

Pi

Page 6: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

8 STATISTICAL COMMUNICATION THEORY [CHAP. 1

For random variables possessing a continuous distribution (class 2), werequire that the d.f. is everywhere continuous, and that the f.f. w(x) = Df(x)[cf. Eq. (1.4)] exists and is continuous for all x (except possibly for a finitenumber of points in any finite interval). Here

D(x) - P(X <x) = J'^ w(x) dx (1.8)

In the specific example of the normal frequency function w(x) = e~x*l2/(2w)W(all x), we have from Eq. (1.8) the corresponding d.f.

where ® is the error function;^ D(x) is sketched in Fig. 1.2. Note that

\D(x)

1.0

\D(b)

-*—«» a 0 b xFIG. 1.2. The continuous distribution 2)(:c) = } [1 + ©(x/-\/2)l.

i>W

xl X2 X3 O ~xk xn-\ xn x

FIG. 1.3. Distribution of X in the mixed case (3).

the probability that X takes a value belonging to the interval (a,6) isP(a < X < b) = D(b) - Dip), the difference of the two ordinates indi-cated in Fig. 1.2.

Finally, we have the case of the mixed distribution (class 3)4

D(x) = diD^x) + d2D2(x) dh d2 > 0; di + d2 = 1 (1.10)

t For properties of this function, see, for example, W. Magnus and F. Oberhettinger,"Special Functions of Mathematical Physics," chap. 6, par. 4, Chelsea, New York, 1949,and also Tables of the Error Function and Its First Twenty Derivatives, Ann. HarvardComputation Lab., January, 1952.

$ Cramer, op. cit.} sec. 6.6, p. 58.

0.5 /

D(o)

D(x) =12

1 + 0 * x,vv e(x) =

2

V T .

a;

0

e-y* dy (1.9)

l

• r l X2 X3 o- *k~xn~\ xn X

Page 7: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 9

Here D\ is a discrete distribution and D2 a continuous one. In our twoexamples (classes 1 and 2), we see that d\ = 1, d2 = 0 and d\ = 0, d2 = 1,respectively, representing the two extremes of the mixed case. A typicaldistribution curve, combining these two examples according to Eq. (1.10),is shown in Fig. 1.3.

1.2-4 Distribution Densities. We have observed above in the case ofcontinuous distributions that the d.f. of X possesses a derivative D'(x) =w(z), called the frequency function, or d.d., of X, expressed in terms of thevalues x which the random variable X can assume on any one trial of theexperiment E. For discrete distributions (cf. class 1), no such density func-tion strictly exists. However, probabilities and d.f.'s may still be rigor-ously expressed in integral form with the help of the Riemann-Stieltjes, orLebesgue-Stieltj es, f representations,

D{X) - f{[x]dD(x) (1.11a)

which become for operations on a function g of X, such as integration,

Z(a,b) » fbg{x)dD{x) (1.116)J a

When the distribution is continuous, dD(x) is simply the probability ele-ment w(x) dx (cf. class 1).

Here, however, because of its convenience in manipulation and its famili-arity to the engineer and physicist, we shall introduce the formalism of theDirac singular ("impulse" or "delta") function 8 to write probability-densityoperators [in place of dD(x)]y which for all practical purposes give resultsidentical with the distribution functions of the continuous cases and, uponintegration, for the discrete cases as well.} Thus, we include, within theusual representation of a Biemann integral which combines the ordinaryintegration procedures with the properties (described below) of this densityoperator, the notion of probability density, extended formally to discretedistributions. §

The salient properties of the 8 function needed here are briefly sum-marized. If We remark first that the 8 function can be regarded as a limitingform of a sequence of functions:

t Ibid., sec. 7.4 for L-S and sec. 7.5 for R-S representations. For more general dis-cussions of the Lebesgue-Stieltjes integral, see chap. 7; for Lebesgue measure, chaps.8, 9 (based on chaps. 1-3, 4-7).

X In this connection, no other meaning is to be imparted to this p.d. operator; from aprobabilistic point of view, it is only the distribution functions in the discrete cases whichcan have physical significance.

§ Of course, the 8 function has other interpretations under other, nonprobabilisticcircumstances.

II See Van der Pol and Bremmer,14 chap. 5, especially pp. 56-83. This reference con-tains a full discussion of the 5 function, its interpretation, and its relation to the rigoroustreatment. See also a paper by Clavier.16 In addition, a particularly convenient refer-ence for a systematic treatment is given by M. J. Lighthill.16a

Page 8: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

10 STATISTICAL COMMUNICATION THEORY

when 8a =

lim 8a(x - x1) = 8(x - x')a-»0

with the properties that, as a5 —> 0, in such a way that

fb 5(x - x')dx =

or

or

da =

7T[O!2 + (X - Z')2]sin of"1 (x — x')

[CHAP. 1

(1.12a)

(1.126)ir(x — x')

8a = flr1e-("/)/a

J a; - a;' > 0 (1.12c)

0, x = a;', 5 —> oo, while, if x p* #', then

(1.13)

where a symmetrical representation, like Eqs. (1.12a), (1.126), is used; thatis, 8(x — xr) = 8(x' — x). The functions are of two general types: in onecase, the singularity appears as an infinite "spike" at a single point x = x')with the function smooth elsewhere, while, in the other, the infinite spikestill occurs at x = x', but now the function oscillates rapidly everywhereexcept at this point. Note that the third example [Eq. (1.12c)] is a one-sided delta function in the limit. Here we shall always use the two-sided,or symmetrical, form, with the added convention that if the point x' = a(or b) is to be included, i.e., if the interval (a,b) is to be closed, we write

or

fb 8(x - x') dx - 1 e > 0

[h 8(x - x')dx - 1 x' = aJa~

(1.14)

where e is a very small positive number. Otherwise, Eq. (1.13) applies:the average value } ^ at x1 = a (or b) is taken. The principal propertypossessed by these functions, which justifies its formal use in applications, is

(1.15)

rb [F(xf) a < x' <bJa F(x) 8(x - x') dx = I Y2F(xf) x' - a or b

{ 0 x' > b or < a

and fb F{x) 8{x — a) dx = F(a), etc.

Particularly useful forms of the 8 function are the integral representations

f °° e±2"ixtdt = 5 ( 3 - 0 ) fM e*2***-*'" dt = 8(x - xf) (1.16a)J — 00 J — W

which are simpler examples of the many-dimensional vector 5 function« n

J - • • J exp [±2wi(x - x') • t]dh • • • dtn - 5(x - x') = [ ] 8{Xj - x'3)y = i

(1.166)

1

o

a < x' < ba = x or b = xx' < a, x' > b (b > a)

b

' a8(x — x') dx =

or

ir(x — x')

a

[CHAP. 1

10y*

' 1

&a

8(x - x') dx - 1 x' = a

•b

J a — e8(x - x') dx - 1 e > 0

Fix')HF(x')0

a < x' < bx' = a or bx' > b or < a

fb F{x) 6(x - x')dx =Ja

f &_ F(x) 6(x - a)dx = F(a), etc.

/* °° e±2ir«' dt = 6(x - 0)J — 00

P e±a«<*-.')« <ft = 5(x - s') (1.16a)

nea

— 00

exp [±2BI"(X - x') • t]dh • - • dtn = 6(x - x') = f] 5fe - y)

a-»0

("O/^z-a/X)

Particularly useful forms of the 8 function are the integral representations

and

Page 9: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

S E C . 1.2J STATISTICAL PRELIMINARIES 11

71

in which (x — x') • t = } (XJ — xfitj, with our usual vector notation, f; = 1

The delta function 8(x — xf) is also recognized as the derivative of thediscontinuous unit function u{x — xf) = 1 (x > xf)ju(x — xr) = 0 (x < xf).

With the help of the above, we can now consider probability densitiesfor discrete distributions. For class 1 in Sec. 1.2-3, we may write the p.d.of X as

71

w(x) = ^ pk 8(x - xk) (1.17)A - l

From the properties (1.13), it is clear that the pk are the "masses'1 associ-ated with the values Xk of the random variable X, and 8(x — Xk) is an oper-

w(x)

Pi pk8(x-xk)

A.*1 *2 ^3 ^ xk xn~l xn x

FIG. 1.4. The d.d. of X in the discrete case.

ator which picks out from the continuum of point values [x] only that one,xk, for which there is a nonzero weighting. Observe that

^ w(x) dx = y pk p^ 8(x - xk) dx = V pk - 1 (1.18)

as required: the measure is unity. Furthermore, the d.f. of X is nown

D(x) = j * ^ w(x) dx = ^ ph f*M 8(x - xh) dx - £ pk (1.19)fe = i Xk<X

in agreement with Eq. (1.7). The distribution density [Eq. (1.17)] issketched in Fig. 1.4; the relative heights of the "spikes" at x = xk aredetermined by the relative values of the weightings pk. (The represen-tation of Fig. 1.4 is purely conventional in that a finite, rather than aninfinite, ordinate is used for convenience.) Our 5-function formalism

f For the properties, definitions, etc., of vectors and tensors, see, for example, H. Mar-genau and G. M. Murphy, "The Mathematics of Physics and Chemistry/' chap. 5,Van Nostrand, Princeton, N.J., 1943; L. Page, "Introduction to Theoretical Physics,"2d ed., introduction, Van Nostrand, Princeton, N.J., 1935; and in particular P. M.Morse and H. Feshbach, "Methods of Theoretical Physics," sees. 1.1-1.6, McGraw-Hill,New York, 1953.

pi\

Pz

*l *2 *3 O -xk- *n-l xn X

n

n

— 00

X

pk (1-19)

71.

X

— oo

* « 1

7!

— oo — 00

oon

k=l

FIG. 1.4. The d.d. of X in the discrete case.

Page 10: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

12 STATISTICAL COMMUNICATION THEORY [CHAP. 1

enables us also to treat mixed distributions [cf. class 3 in Sec. 1.2-3] insimilar fashion: the distribution density is then a mixture of the type

n

w(x) = dij pk d(x - xk) + d2w2(x) dh d2 > 0; dx + d2 = 1 (1.20)

where w2(x) belongs to the continuous part of the distribution D(x), sketchedalready in Fig. 1.3.

1.2-8 Mean Values and Moments. Let w{x) be the density functionof a random variable X, which may exhibit either continuous or singularproperties, or both, as described above. Consider now some real functiong(X) of the original random variable, integrable over (— <x>, oo) with respectto w(x). We interpret!

~gVQ = E{g(X)} = f^ g(x)w(x) dx (1.21)

as the weighted mean, mean value, or expectation of g(X), with respect to thed.d. w(x). Here E is the expectation operator, defined according to Eq.(1.21). Note that, for purely discrete distributions, from Eqs. (1.17) and(1.15) this becomes

n

tfX)=E\g(X)} = J 0(**)P* (1-22)ft=l

while for the mixed distribution [Eq. (1.10)] the weighting is of the type ofEq. (1.20), so that for g{X) one gets in addition to an expression like Eq.(1.22) an integral over the continuous portion of w(x).

Letting g(x) = \x\v (v > 0), we write the absolute vth moment of X asj

jjp = p 1x1 (2;) dx (1.23)

In most applications, v = n is a positive integer, so that g(x) = xn is called

/to

\x\nw(x) dx

is absolutely convergent. Moments of particular interest are x, x2, the meanand mean-square values of X, while

E{(X - E{X})2} = (x-x)2 = x~*-x* ^ poQ(x-xyw(x)dx^ax2 (1.24)

is called the variance of X and ax its standard deviation. Moments of thetype nn = (x — ^)n (n = 1, 2, . . .) are known as central moments.

t Cram&r, op. tit., sec. 15.3.% Here and in the following discussion, we shall not make a special distinction between

a random variable X and the values x that it may assume (cf. the beginning of Sec. 1.2),using X or x interchangeably, as the case may be.

g(x)w(x) dx

nXT-\

the nth moment of X, when it exists, i.e., when the integral00

— oo\x\nw{x) dx

(x - x)2w(x) dx s= ax2 (1.24)E{(X - E{X\y\ = (3 - x)2 - x2 - ^2 -

w(x) = din

k

Vk d(x - xk) + d2w2(x) dh d2 > 0; dx + d2 = 1 (1.20)

g(X) = E{g(X)} = » (1.21)

a(X) = E\a(X)\ = g{*k)vk (1.22)

lip = — oo\x\vw(x) dx (1.23)

Page 11: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 13

1.2-6 The Characteristic Function, f A mean value of particular impor-tance is represented by the characteristic functiont

E{e^x} = J"w w(x)e^dx s Fx(£) (1.25)

with the properties, following immediately from the definition above, §

FM = 1 \F,(t)\ < 1 F,(0* = F , ( - Q (1.26)

where the asterisk denotes the complex conjugate. Furthermore, we have^j

00

— 00

= f" w(z') «(x' ~ a;) dxf = w(a?) (1.27)

from Eq. (1.15). Accordingly, the characteristic function (abbreviated c.f.)and the distribution density are a Fourier-transform pair,

Too

Fx{& = e^x = / w(x)e^xdx = &{w(x)} by definition (1.28a)

w(x) = 1^ FMe-** ^ - SF-1 {F,(0} (1.286)as indicated symbolically by the Fourier-transform operator & and itsinverse S1"1. Moreover, this relation is unique:\\ once the c.f. is given, soalso is the corresponding d.d., and vice versa; that is, IF"1 {IF} — &{$F~X}.

The characteristic function is of great importance in applications, becauseit is usually analytically simpler to deal with than the corresponding p.d.function w(x).

For the discrete distribution [Eqs. (1.7), (1.17)], we see that the c.f. is

n n

FM == y pk r J**t(x - xk)dx = V pke^ (1.29)

and, conversely, inserting this into Eq. (1.286), we get Eq. (1.17) once again,as expected. Observe that, for these singular d.d.'s, ^ (±00) oscillatesindefinitely. On the other hand, for continuous distributions, Fx(± 00)

f Cramer, op. tit, sec. 15.9. Mathematical proofs and properties are discussed indetail in chap. 10, especially sees. 10.1-10.4.

t This is a special case of the moment-generating function E{e6x], where 0= i%.§ We shall also write Fx(i£) for Fx{£) on occasion, emphasizing the fact that Fx is a

function of it-.% Cf. Crame*r; op. cit.y sec. 10.3.II Loc. cit.

(1.25)

(1.26)

E{e^x) =1 QO

— 00w{x)^x dx s Fx{£)

FM = 1 |F,(Q| < 1 /?.«)* = **.(-{)

/*00

/ — 00

Fm{$)e-*t* §*2T

00

— 00

00

w(z'y*<*'~*> dxr rf^2TT

— 00ty(x;) 5(x' - x) dx' - w(rc) (1.27)

Fx{& - e1^ =

w{x) -00

— 00

' oo

— oo

F.(&tT*> d€2ir

w(a;)e<{*da; = JF{w(a;)}

= 8 ^ {*".(«)}

by definition (1.28a)

(1.286)

^-(1) =n

* = 1

PAr oo

' — 00e1'** 6 ( x — a?fc) rfx =

n

* * 1

Vice1*** (1.29)

Page 12: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

14 STATISTICAL COMMUNICATION THEORY [CHAP. 1

vanishes. As an example, consider the normal frequency function w(x) =r-«^i)i/lr-i(2wr,1)->*; from Eq. (1.28a), one has

F,(Q = /"" g'fr-fr-gfrW ** = ei&-".w* (1.30)•/ — (27rcrx

2) /2

from which it is observed that Eq. (1.30) does indeed possess the generalproperties of Eq. (1.26), with Fx(± « ) = 0.

When the c.f. possesses all derivatives about the point £ = 0, or, equiva-lently, when all moments x» (n > 0) exist, we can write the Taylor'sexpansion

OO 00

F*{® - X "'" ^ r = X /_". ^ d* (L31a)n = 0

from the series for eiiz. Thus,dn

f « 0= x» (1.316)

showing that the coefficient of (i£)n/n\ in the Taylor's series for the charac-teristic function is simply the nth moment of X. When x = 0, then fj,fn —fjLn = (x — x)n. Even when the characteristic function is not expandablebeyond a certain order n' of derivative, i.e., when moments of the distri-bution higher than n' do not exist, we can still get the lower-order momentsby the above process. An example where none of the moments exists(absolutely) is given by the Cauchy distribution, f whose f.f. and c.f. are

w{x) - [TT(1 + z2)]-1 FJJk) = e-W (1.32)

1.2-7 Generalizations; Semi-invariants. The definition of the charac-teristic function can be extended. We define the c.f. of any (real) randomvariable g(X) as the mean value

Fg(Q = R[&*™\ = j ~ m e^^w(x) dx (1.33)

Now, however, we no longer have the Fourier-transform relations (1.28) inx unless g(x) is linear. Of course, with g as the new random variable, theFourier-transform pair [Fg($)9W(<g))f corresponding to Eq. (1.28) with ginstead of x, is defined as before. Accordingly, let us write with the help ofEqs. (1.286) and (1.33)

W(g) = 9 ~ l \ F 9 ) - fm Fg(i)e-<t'P- = [ ' % * [ " w{x)<r**t*-.Mi dxJ - » ZTT J _ « lir J _ »

= j " w(x) S[g - g(x)]dx (1.34)

t One can, however, define a mean, in the sense of a Cauchy principal value, for exam-fa

pie, lim I xw(x) dx, which is seen to be zero for Eq. (1.32).a-* « J — a

F.(& =[ 0 8

— 00

itx-{x-xfl2a? _dx

{i™x2)l/2

= eit-x-«^l2 (1.30)

F.(Q =

oo

. *M! =n!

00

71=0

^ OO

OO

(«)"

n!a;"w(a;) da; (1.31a)

d B Fml-*rMi

w(*) = [x(l + a;2)]-1 F,(f) = e-1*1 (1-32)

F.{& = £{e^ m i =oo

— ooei«<'<a:)M>(a;) da; ( 1 . 3 3 )

TF((7) = SF-M^) =

^ M

f — SOw(#) 5(^ ~ ffC^)] dx

— oo

f M

F9(&trtu dt2ir

/ °°/ — oo

df2ir

' 00

— oo

w{x)eri^«-^x^ dx

(1.34)

Page 13: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 15

which gives the distribution density W(g), once the transformation g = g(x)and the d.d. w(x) have been specified. The relation (1.34) may be inter-preted as a mapping operation, with d[g — g(x)] the mapping operator whichtakes the points on the x axis over into the new set g in the interval [G],when [G] is the portion of the (real) axis g, defined explicitly by the trans-formation g = g(x). We shall find Eq. (1.34) and its extensions to two ormore random variables especially useful in actual manipulations; for anexample, see Prob. LI.

Closely related to the moments p'n (and the central moments /xn) are thesemi-invariants, f or cumulants, Xm, which are obtained from the charac-teristic function by taking its logarithm and developing it in a Maclaurinseries according to

00

log F.«)= £ ^ ^ (1.35)

By comparing the two series (1.31a) and (1.35), we can easily establish therelations between the various moments and semi-invariants (cf. Prob. 1.2).

TWO OR MORE RANDOM VARIABLES

In much of our subsequent work, we shall have to deal with more thanone random variable and, in some cases, with a very large number. Theextension of the preceding discussion to situations involving two or morerandom variables is briefly summarized, to indicate some of the importantnew features which appear under these conditions.

1.2-8 Two Random Variables.! If X and Y are two random variables,with x and y their respective values that can be assumed on any one trial,then we can write for the joint distribution D(x,y) and the correspondingjoint distribution density W(x,y)

D(x,y) = P(X <x,Y <y) - / * [* W(x,y) dxdy (1.36)/ — oo / — QO

with Dx{x) = f* dx f" W(x,y)dy = [* wx(x) dx7-00 y-co y-oo ( 1 3 ? )

W(xfy) dywi(z) = 11and similar expressions for ^2(2/), w2(y). Here a statistical relationship isin general implied between X and F; Dxix), D2(y) are the marginal distri-butions of X and F, respectively, while Wi{x), W2(y) are the correspondingmarginal d.d.'s. [For discrete and mixed distributions, Eqs. (1.36), (1.37)still apply, with the aid of the formalism of Sec. 1.2-4.] The usual proper-ties of Eqs. (1.2), (1.5), etc., are extended, in an obvious fashion, so that

t Cramer, op. dt.t sec. 15.10.t Ibid., chap. 21.

log F.(«) =

00

m = l

x»(^)ro

TO!(1.35)

/)(*,») = P(X < x, Y < y) = f* ff — » y — oo

W(aw) da; dy (1.36)

with Di(x) =p a;

J — * dx• oo

— ooW(x,y) dy =

W(x,y) dyf — oo

— oo

' X

wi{x) dx(1.37)

Page 14: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

16 STATISTICAL COMMUNICATION THEORY [CHAP. 1

D{?W) 2= 0, Z)(oo,oo) = l. DCbM + Ddalya2) - Dta^ - DQ>xp£ issimilarly interpreted as the joint probability that the pair of random varia-bles X, Y takes values in the regions («i < X < bi), (a2 < Y < fr2), and soforth. For example, in the purely discrete case we can write

m,n

P(X = Xk;Y = yi) = pkl wi th Y pkl = 1 (1.38)k,l

for measure unity, (xk)yi) being the points at which the "masses'' pki aresituated. The marginal distributions of X and Y are here

pk{ > = P(X - xfc) - ^ pw p( )i - P(F - y,) = ^ pH (1.39)

One also has the characteristic function, with its unique Fouriertransform!

..y({i,&) = E{e^x+i^Y\ = /7 e^+^Wix.y) dx dy

= ViW(z,y)\ (1.40a)00

= 8!-M'I..»(€i,fi)} (1-406)

and the jotni moments (provided that they exist) are

00

sYW(z,y) <**<*» = ( - ^ + i ^ S r - ^ :dtfdh1 '" t,-t.-o(1.41)

as obtained by the double Taylor's-series expansion of ei(ix+i(iv in Eq.(1.40a), etc. [cf. Eq. (1.31)]. The marginal c.f.'s are

F.(ii) = ^,,,(^,,0) F,(fc) = F,.,(0,W (1-42)

corresponding to Wi(x), Wi{y). Complex random variables can also bedefined. If Z = X + iY (X, Y real) and X} Y obey a joint distributionD(x,y)f then Z is a complex random variable whose distribution and distri-bution density are taken to be the joint distribution and d.d. of X and F,since by definition the distribution and d.d. are always real and positive(or zero). Moments of Z, however, may be complex, viz.,

E{Z) = E{X} +iE{Y\ (1.42a)

We shall encounter examples of complex random variables later when deal-flbid., sec. 21.3.

PCX = Xk:Y = yi) = pkl w i thm,n

k,l

Pki = 1 (1.38)

pfc( > = P{X = xfc) =z

PA=Z XK )i - P ( F - y,) =k

PHI (1.39)

.y({i,*O = i ( ^ X W ) =

oo

— oo

el^x+i^vW(x}y) dx dy

(1.40a)= 9{W(x,y)}00

W(x,y) =^ J

«-*««•-*«•» F f t ,(«i,fc) df i dti(2*)2

= SF-M^..»(€i,fi)}

F.tti) = ^..,(«i,0) F,(fc) = F,.,(0,W (1.42)

£{Z} =£{X} +iB{FJ (1.42a)

Xkyl ,

00

J J— 00

Qk+lF(-i)h+l

Page 15: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 17

ing with the spectral properties of random processes with the help of Eq.(1.28b).

1.2-9 Conditional Distributions ;t Mean Values and Moments. We canalso define conditional distributions and their d.d.'s. Thus, the conditionaldistribution of a random variable Y relative to the hypothesis that anotherrandom variable X has a value in the range (x, x + e) isf for continuousdistributions

P{Y < y\x < X < x + e) = ["*' dx [* rx+t W(^y) dy (1.43)[X+'dx fV Wix'y) dy

jx J- - f*+'dx /-"« Wix'y) dy

The corresponding frequency function for F, given X = x, is

W{y\x) = ~ lim P(Y < y\x < X < x + e)

= ± fy w^yf) dyf = EMI a 44)dy J-oo wi(z) wi(x)

and with our formalism (cf. Sec. 1.2-4) we include the discrete and mixedcases as well. Thus, W(y\x) dy is the probability that Y takes a value in therange (y, y + dy) when X has the value x. The corresponding expressionfor W(x\y) is obtained in the same way, so that we can write alternatively

W(x,y) = WtWWivlx) = w2(y)W(x\y) (1.45)

[^(ccjy) and W(y\x) are, of course, different functions of their arguments,in general.] These conditional densities must also satisfy the usual con-ditions imposed by measurability and their probabilistic interpretation, viz.,

W(y\x) > 0 pw W(y\x) dy = 1 etc. (1.46)

Eliminating the joint density W(x,y) with the aid of Eqs. (1.44), (1.45), wecan write still another expression for the conditional densities:

w m = EM = MyW(x\y)Ky{ wi(x) wi(x) K J

Conditional averages and moments can also be defined. Consider the con-ditional average of g(Y), given X = x.

f Ibid., sec. 14.3.i Two notations for conditional distributions, densities, averages, etc., will be used

throughout. In the first, the quantity following the vertical bar is the given quantity orrepresents the stated hypothesis, while that preceding this vertical bar is the argumentof the function, for example, W(y\x)t x given, y the functional argument. The secondnotation just reverses that of the first. The former is common in mathematical work,while the latter is often used in physical applications (cf. Sec. 1.4 and Chap. 10). In anycase, the proper interpretation will be clear in each instance.

— « J XJ xdx

00

0 0

W(x,y) dy(1.43)

Wlx.y) dyax

*X + € fyP(Y < y\x < X < x + e) =

W(y\x) =dy e->o

dlim

ddy — 00

ry Wlx.y') dy'wi{x)

. W(x,y)wi{x)

(1.44)

P(Y < y\x < X < x + e)

W(x,y) = wl(x)W(y\x) = w^Wixly) (1.45)

W(y\x) > 0f ao

— ooW(y\x) dy = 1 etc. (1.46)

W{y\x) = PT(a;,2/)Wi(a;)

u>i(y)W{x\y)Wi(X)

(1.47)

Page 16: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

18 STATISTICAL. COMMUNICATION THEORY [CHAP. 1

E{g(Y)\X = x) = f~m g(y)W(y\x) dy (1.48a)

E{ Ym\X = x) = | ^ irTP(y|&) dy (1.486)

This last is the conditional mth moment of the random variable Y when Xtakes the value x. E{g(X)\Y = y\, E{xm\Y = 2/}, etc., are defined in thesame way. As will become evident as we proceed, conditional distributions,densities, and averages are of fundamental importance in the study of ran-dom processes and in our particular applications of that study to statisticalcommunication theory.

1.2-10 Statistical Independence, t It may happen that the mechanismunderlying the random variable X in no way affects that of Y, and viceversa. We say then that X and Y are statistically independent. A neces-sary and sufficient condition that this be true mathematically is that thejoint d.d. W(x,y) factors into the two marginal densities Wi(x), W2Q/), viz.,

W(x,y) = wi(x)w,(y) (1.49)

An alternative condition, equivalent to this, is that the c.f. FXtV{^iy^) factorsinto the two marginal c.f.'s:

*Vy(*i,fe) = Fx(Si)Fy(h) (1.50)

The extension to discrete and mixed d.d.'s is made as before. From Eqs.(1.45) or (1.47) and (1.49), we see that

W(y\x) = w2(y) W(x\y) = w±(x) (1.51)

since here knowledge of x (or y) in no way influences the distribution of y(or x).

When all moments xkyl exist, a third condition, equivalent to the above[Eqs. (1.49), (1.50)] for statistical independence, is xkyl = ? • p (all k,l>0).It is not enough that just the joint second moment factors, for example,xy = x • y, for the higher moments may not, and then X and Y are notstatistically independent. An example is provided by the two randomvariables X = cos <j>, Y = sin 0, where <f> is uniformly distributed over theinterval (0 < j ^ < 2TT). Then^^ = y = 0, xy = x • y = 0, but since F =y2 = }4 and x2y2 = % ?* x2 • y2, we can say only that X and Y are linearlyindependent.

1.2-11 The Covariance. A quantity of considerable importance in sub-sequent applications is the covariance Kxy, defined by

Kxy - E{(X - X)(Y - y)] = (x - x)(y - y) (1.52)

with the properties that Kxy < <rx<ry, Kxx — cr*2, Kyv = ay2. If X and Y are

statistically independent, Kxy vanishes, but if Kxy vanishes, the most we cansay is that X and Y are linearly independent (cf. the example above),

t Cramer, op. ctL, sees. 15.11, 21.3.

Page 17: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 19

In a similar way, for a pair of complex random variables Z\ = X\ + fFi,Z2 = Z 2 + iF2 (cf. Sec. 1.2-8) we can define a complex covariance by

Kl2 = E[{ZX - ii)(Z2 - s2)*} = (21 - *)(*, - z2)* (1.52a)K12 = (*i - xx)(x2 + z2) + (?/i - ffiXVi - »«)

+ i{(a:2 - ft)(yi - jfi) ~ (2/2 - 0.)(si - *i)} (1.526)2£l2 — KXlxt + -Kyiya + ^I-K^tn ~~ K*iy%) (1.52c)

Statistical independence of Z\ and Z2 requires that W(xhy 11x2^2) =^(^1,2/1)1^(^2,^2), while for linear independence it is enough that all KXlX2,Kyiy2, etc., in Eq. (1.52c) vanish.

1.2-12 Multivariate Cases, f The treatment of two random variables isreadily extended to greater numbers. Such systems, of two or more ran-dom variables, are called multivariate, or vector, systems. A few of theimportant extensions are listed below, where it is now convenient to intro-duce matrix notation. J Thus, y is a column vector of n components; K isa n n X w matrix; y is a transposed vector, i.e., a row vector, viz.,

y =

2/2

Vk

L Vn J

y = [2/1,2/2, . • • ,Vkf • • • ,2/n] K =

^11 K12K21 K22

Kln~

Kn\ Knn

(1.53)The tilde (~) indicates the transposed matrix. The nth-order multivariatedistribution Dn and distribution density Wn are given in the followingrelation:

Dn(yi, . . . ,2/n) = J^\ dyi • • • J^ dyn Wn(yh • . . ,2/n) (1.54)

Similarly, we have a variety of marginal distributions and densities. Forexample, in the latter instance

Wm(yh . . . ,ym)= l - « dym+1 ' ' ' /-". dVn Wn^ ' ' ' >Vn) ^ < m < n ( 1 . 5 5 )

with the usual properties Wn > 0, Wm > 0; f ^ dyi • • • /J^ d?/n Wn == 1,

etc. Various conditional distributions and densities can also be constructedalong the lines of Sec. 1.2-9 [cf. Eqs. (1.45) et seq.].§ We shall see examplesof these presently, in Sec. 1.3.

t IUd.y chap. 22.X See, for example, ibid., chap. 11; Margenau and Murphy, op. cit., chap. 10; and R. A.

Frazer, W. J. Duncan, and A. R. Collar, " Elementary Matrices," chap. 1, Cambridge,New York, 1950; etc.

§ Cramer, op. cit., sec. 22.1.

yi

Vk

Vn

2/2

fVn

— oo

v\00

J — °8

Wm(yu . . . ,ym)oo

dv~.i • • •' 00

f — oo^ TTn(!/l, • • - ,Vn) I <m <n (1.55)

Page 18: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

20 STATISTICAL COMMUNICATION THEORY [CHAP. 1

1.2-13 Characteristic Functions and Semi-invariants. Like the one-dimensional cases, the corresponding characteristic functions for Wn, Dn,etc., are defined as the expectation of e^, viz.,

oo

Fyi fc(fr, . . . , £ . ) = E{e&] = / • • • / d y W n ( y u . . . ,yn)efib

= Fy(0 = S[W«(yu . . . ,yn)} (1.56a)oo

Wn<],L . . . ,yn) = J • • • j er<bF,(K) ^ = ^[Fr(JO\ (1.566)— oo

where we have used the abbreviation dy for dy\ • • • dyn, ? for(£i> • • • ,£n), etc. Again F7 and Wn are uniquely related by the Fourier-transform pair (1.56). The moments, when they exist, follow from

tf'...,mfl = yimi - - - Vnmn

AM— ( — <i\M " IT

d ^ - • • 3{n« € - 0

n

M = > m, > 0 (1.57a)

U(M)''%}'...,m»= f ' ' ' f d y y i ^ • • • 2/nm t lWn(2/i , . . . ,yn) ( 1 . 5 7 5 )— 00

The semi-invariants, or cumulants X(L), are found from

log Fy(e) - 2 , Zlu2! . . . ZJ L - 2 / Zj' - ° (L58)

logifytt) =K(U) (1.58a)where 3C is the cumulant function and the cumulants \(L) themselves can beexpressed in terms of the moments /*(M); as in the one-variable case [cf. Sec.1.2-7, Eq. (1.35)]. A necessary and sufficient condition that the yk (k =1, . . . , n) be statistically independent is that the c.f. Fy(%) (or the d.d. Wn)factors be

n n

FAQ = II *•**(&) or Wn(y) - f[ wh(yk) (1.59)

An example of general interest here and elsewhere is provided by thenormal, or gaussian, distribution for the vector random variable y =(Vu • • • >2/n), defined by the cumulant function

K(»0 = *&r - 3-^K^ (1.60)K is now the covariance matrix,t

t The covariance matrix is positive definite, that is, aKa > 0, with equality only ifvp —I-—i T V A^

a = 0 identically, since if we set y — y =» y', then aKa = > akat ykyl — ( / o*yfc ) > 0.

WJvi uJ =

~F,(£) = S{Wn(yh . . . ,yn)\

FV1 „(&, . . . ,*„) =£{***} =

oo

— oo

dy Wn(yu • • • ,2/n)el?y

(1.56a)

(1.566)— oo

oo

er*Fr&) d^(ar)"

= ff-MFT(0l

oo

— 00rmi, . . . ,r»n d y y i m i • • • ^""PTnCj/i, . . . ,yn) (1.576)

log F,(«) =

log F,tt) = 3C(i?)L = 0

CO

xa)...,i.(»ti)1' • • • (if-)1-Z i ! Z 2 ! • • • / » !

i / —

?l

y

I, > 0 (1.58)

(1.58a)

*V«) =n

ft = l

^«»(fe) or Tfn(y) =

n

4 = 1

wk(yk) (1.59)

3C(»0 = *'&r - H&t (1.60)

kl

akdi y^x

T2°*

"5> 0.a =» 0 identically, since if we set y — y =» y', then aKa =

K is now the covariance matrix,f

t The covariance matrix is positive definite, that is, aKa > 0, with equality only if

Page 19: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 21

K = \\Kkl\\ = UVk - 9k)toi - 9i)\\ K I = 1, • • . , n (1.60a)

The corresponding c.f. and d.d. become in this instancef

Fj{l) = e««© = «*&-wte (1.61a)Wn(j) = (&)-"/« (det K)-w exp [ - ^ ( y - y)K-Ky - y)] (l.eifc)

with K"1 the inverse of K and det K the latter's determinant. In Chap. 7,we shall discuss the normal distribution and normal process in more detail.

1.2-14 Generalizations. As in the one- and two-variable cases (cf. Sec.1.2-7), the concept of the characteristic function may be extended to a newvector random variable G, obtained from Y by the set of transformationsg = toil • • • ,9m) irn < »), where gh = gk(yh . . . ,yn). We write

FEQQ = £{«**<*>}00

= / • • / dvi • • • <*V.e i«" l+"-H«-»-TP,.(yi, • • • ,»•) (1-62)— oo

in which the corresponding p.d. of G is found to be

00

Wm(gh . . . ,gm) = I • - • f Wn(yh . . . ,yn)

X 5[g - g(yh . . . ,y»)] dyi > - - dyn (1.63)

and where 5[g — g(y)] is the m-dimensional delta function d[gi — gi{y)}$[92, ~ ^2(y)] * ' ' %m - 0»(y)] [cf. Eq. (1.166)]. In conjunction with theintegral form (1.166) for 5, Eq. (1.63) provides a powerful method for deal-ing explicitly with many types of transformations and mappings. % Trans-

t See Crame*r; op. cit., chap. 11, sec. 11.12, for the explicit evaluation of Eq. (1.566),to give Eq. (1.616) when Eq. (1.61a) is the c.f., and also chap. 24. See Sec. 7.3-1 here.

t In the case m — n and the y distribution of the continuous type, with the followingconditions on the transformation satisfied:

1. gk everywhere unique and continuous, with continuous partial derivatives dgi/dxk,dxi/dgk

2. The inverse transformation yu — hk(git . . . ,gn) existing and uniqueit can be shown that the probability element of g in G-space is obtained from the prob-ability element of y according to

wTO(0i, • • • ,0«Xffi • • • <ten = WJ^i(0i» • • • >Ot)> • • • >Ki9\, - • • . f t J I I ^ t o i • -dyn

where / i s the jacobian d(yif . . . ,2/n)/#(0i, . . . ,gn) (Crame*r, op, cit., sec. 22.2). Evenwhen m < n, we can use this approach as an alternative to the 5-function technique above,introducing n — m additional relations ytf = ft*'(gri, . . . ,gn) (kr — m + 1, . . . , n).Apart from the conditions 1, 2, these relations are arbitrary and may be selected tofacilitate the (subsequent) integrations and transformations. However, in most cases ofinterest to us, the transformations implied by Eq. (1.63) are more convenient (cf. Chaps.17, 19, 21, 22).

Page 20: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

2 2 STATISTICAL COMMUNICATION THEORY [CHAP. 1

formations of this type, where m < n, are sometimes called irreversible trans-formations, since they define a unique transformation from Y- to G-space,but not from G- to Y-space. That is, given g = (gh . . . ,gm), it is notpossible uniquely to find y = (yh . . . ,yn), yk = y*(ffi, . . . ,ff»).

As a very simple, but important, example, consider the problem of deter-mining the d.d. of a random variable G, which is the sum of n independent

n

random variables Fi, F2, . . . , Fn. Thus, g = }yk, and from Eq. (1.62)

we write

Fg(Q = ^ ^ »> = ] - ' - ] e * Wn(vi, • . . ,yn)dyi - - - dyn

00

(1.64)

Since the F;s are statistically independent, however, we can use Eq. (1.59)to write

n n

F.(& = I I f e'^fo*) dy« = 11 F»(«) (i-os*)J — 00

ft-1 fc=l

Therefore TFxfo) = J ^ «-'«• f ] ^ ( ^ ^ (L656)

which is the desired distribution density.So far, we have not introduced explicitly the notion of process—i.e.,

dependence on time—which is needed for most of our subsequent oper-ations. This is simply done if we agree in our experiment E (and its mathe-matical counterpart) that the various values yi, . . . , yn that our vectorrandom variable Y = (Fi, . . . , Yn) can assume represent values observedat particular times (h, . . . ,Jn). For example, y± = y(h), y2 = y(t2), etc.,and the j/i, . . . , yn are thus stochastic variables or functions of the time t,where for the moment t is allowed to have a discrete set of values (ti, . . . ,tn).Often, we shall order these values of time, so that fa < t2 < • * • < tn, andallow the fo's to be chosen in a finite or infinite interval. The probabilitydensities and characteristic functions (Sec. 1.2-6) are now more fully written

Wn(vi,ti; . . . ;yn,tn);Fyi ,B({i,/i; . . . ;««,t) = £ { ^ ™ } = jbw (1.66)

exhibiting the explicit dependence on the n parameters t\, . . . , tn. Weshall see in the next section how these probability concepts may be used forthe mathematical description of a random process.

Fg(t) = ^9{VI Vk) ="

00

00

«Xv»/ e * Wn(yi, . . . ,yn) dyi • • • dyn

(1.64)

*•.(«) =

n

ft-1 "

r°°

' 00e^KWkiyk) dyk —

n

ft=l

Fy>(& (1.65a)

WiCg) =00

k

e-l\Q

n

FvM)d|2ir

(1.656)

random variables Yly F2, . . • , Yn. Thus, g =n

1ykf and from Eq. (1.62)

Page 21: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.2] STATISTICAL PRELIMINARIES 23

PROBLEMS

1.1 (a) If the random variables X, Y are related by Y = aX -f b, show that

Wfo) = |a|-«w (2-=^)

(ft) If K = Xs, show that

^ ) = {?>(V,-)-/>(-^) "S°o (2)

|w[*(y)] = o y < o

- ^ W V 5 ) + « ( - V y ) I s />o (3 )

(c) If Y — Ao cos (coot + <£), where <£ is uniformly distributed in the interval (0 < 4> <2ir), show that

I F M - I ( 7 r v ^ ° 2 " " * < |Ao1 (4)^ } I 0 |if| > |Ao| W

1.2 (a) In terms of the central moments jun of the random variable X, obtain thefollowing relations between them and the semi-invariants \m:

What are the corresponding expressions for nn in terms of the Xm?(6) For n independent random variables Xk (k — 1, , . . , n) whose first, second, and

third moments, at least, exist, show for the random variable Z that is their sum that itsmean, variance, and third central moment are

Z - Si + X2 + - • • + Xn <rz2 - <ri2 + <r22 + • • • + an2

m

(Mi)* - M3(1> + M3(2) + • • • M3(n) W

(c) For the semi-invariants of Z in (b), show also tha t

(km)* = Xm<» + Xm<2) + • • • + \Jn) ™ > 1 (2)

I t is this simple linear combination which makes the semi-invariant so convenient inmanipulations.

1.3 Let the random variable Xk take the (real) values Xk * a, with probability p,or Xk = b (T* a), with probability q ( = 1 — p) on the kth trial of an experiment E, andlet each trial be independent of every other. Show tha t

n

(a) Thed.d. of U = Y Xkis

n

W(u) - ^ »c*Phqn~k *{t* - M; + Hn - k)]} nCk - n\/(n - ft)1A;! (1)o

Xl = 5 X2 = /*2 =* < 2 X3 = /*3 X4 = /A4 — 3/A22

Xfi = fl5 ~ 10/i2M3 X8 =» M6 ™ 15/X2M4 ~ 10M32 4" 30/X23

2/ < |Ao|M > |Ao|

(4)

FV(S) = /0(AoO

Vto) - (r0

\ /Ao 2 - 2/2)"1

TTto) -' wfc»(y)] - 0

1

,2 V y:[w(Vy) -I- w ( - V y ) ] 2/ > 0

2/ < 0(3)

(2)y < 0y >o0r(y) - Vy) -£>(- Vv)

(b) If y - X2, show that

W(y) - lal"^ f?/-&\a

/>r(y) -D

1 - D

(V~b\a

fy -va

a > 0

a < 0 (1)

(Mi)* - M3(1> + M3(2) + • • • M3(n)

^ = * l + X2 + • • • + Xn <TZ2 - <T12 + <T22 + ' • ' + *n2

(I)

(Xw)z - Xm<» + X*(2) + ' ' ' + Xm(n) m > 1 (2)

(c) For the semi-invariants of Z in (b), show also that

(o) The d.d. of U =71

1

XAis

0W(u) -

n

nC*pV"* *|f - I«* + 6(n - *)]} nCh - n!/(n - k)\k\ (1)

23SEC. 1.2] STATISTICAL PRELIMINARIES

by

(c) If Y — Ao cos (coot + <t>), where <f> is uniformly distributed in the interval (0 < 4> <2ir), show tha t

Page 22: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

24 STATISTICAL COMMUNICATION THEORY [CHAP. 1

What is the corresponding c.f.? (The distribution of U is called the binomial distribu-tion.)

(b) E{U] - n(ap + bq); E{U2\ = n(a2p + b*q) + n(n - l)(op + 6g)2, and hencethe variance is

<rj - npq(a - by (2)

1.4 In Prob. 1.3, consider the case p = X/n [X > 0 (fixed)], letting n —> oo.(a) Show that the d.d. of £7, with values u = a& (A; ^ 0), is the Poisson distribution

density

W(u) = £ -^y- 5(u - a*) (I)0

and that the c.f. when a* == ka isFtt(€) = gXc. --!) (2)

(6) Verify that £(i7] - Xa; £{([/ - w)2j = au2 = Xa2 (o* = ka).

1.5 (a) Show that the sum of any number of independent, normally distributedrandom variables is itself normally distributed with

Mean u — ) %u

VVariance <ru

2 == Y <rk2 (1)

that is, W(u)

ke-(w~w)2/2«rtt2

\/27r<rM

(6) Show that any linear function of independent normal random variables is itselfnormal.

1.6 (a) Let X be a normal random variable with zero mean and unity variance.First show that, for Z = X2, w(z) = r / V ^ (« > 0), w{z) = 0 (z < 0), and thenverify that its c.f. is (1 - 2i | )-#.

(6) Now, letting Xi, . . . , Xn be n independent random variables of the above type,

show that the d.d. of U - V Xfc2 = x2 is

^ ( W ) 10 t* < 0 ( 1 )

Sketch the curves of wn(u) for n = 0, 1, 2, 3, 4. [Equation (1) is closely related tothe x2 distribution.]

1.7 (a) Show that if one has n + 1 independent normal random variables X, Xh

. . . , Xn} each with zero means and variance a2, the random variable U — X/Z, Z =n

T Xk\ has the d.d.

Note that Sn(u) does not depend on a2. [Equation (1) is known as Student's d.d.](6) Obtain the c.f. for Sn(u), viz.,

Fu(S) - 21-^2r(n/2)-Kn|f|)" /2^n/2(V^UI) (2)

00

W{u) =o

Xfce~x

A; Id(u — (Ik)

Fu(S) == ex^a~^

Mean u — #&

k

Variance <ru2 = (Tjfc2

W(u) -

Wn(tt) =0

I 2-"'2rfn/2)~1e-" /2w» /2-1 M > 0u < 0

iSn(t*) ==1

rVTT^

/?l + 1\\ 2 J [ ^ . « > N

ra>

-(n+D/2(1)

(1)

(2)

(1)

(2)

*•«(«)= eX (c^*-i)

ft = l

Xk\ has the d.d.

F»($) - 21-^2r(n/2)-Kn|f|)"/2^n/2(V^UI)

A;

Page 23: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

S E C . 1.3] STATISTICAL PRELIMINARIES 25

where K is a modified Bessel function of the second kind.j Note that, for n « 1,FU(Q = e-W of the Cauchy d.d. [Eq. (1.32)].

(c) Verify that

^ « (2m) In" w - 1. ^+1 ( .W 2"»m!(n - 2)(n - 4) . . . (n - 2m) ™ S 2 'U U W

1.8 (a) If FVvyt(Zi,h) = exp [-^(£i2 -f £22 + 2£i£2p)/2] is the characteristic function

of the two random variables Y\, Y2, show that

WMM) = ^ l - ^ + v ' - ^ ) / ^ - ^ (1)2inp y/l - p2

(W2 is now the bivariate normal distribution density of y\ and 2/2.)(6) Verify that

£1 = £2 « 0 yi2 * y22 = ^ 3/11/2 = ^ P (2)

What is the covariance matrix? Find the semi-invariants.(c) Obtain the conditional probability density and its associated characteristic func-

tion.W,(y2\yi) = ( V 2 ^ V T ^ V ) - 1 exp [~(y2 - P2/i)2/2^(l - P*)) (3a)^».(€i|yi) = exp [itoyi - ^(1 ~ P2)^2/2] (36)

(d) Verify that the d.d. in the more general case for which

*Wv.(*i,fc) = exp [i^yi + «tf . ~ M(cri2^2 + a22^2 + 2paKr2^2)} (4a)

is/yi - ViY (vt - fcY , o yi - yi 2/2 - y21 —• •— 1 i • 1 " p « p —————

W2(yify2) - [2.^^(1 - p2)*] exp V ^ ; V £ Jp2) ^ ^ _(4b)

1.3 Description of Random Processesu~n

As mentioned in Sec. 1.1, a random or stochastic process from the physicalpoint of view is a process in time, occurring in the real world and governedat least in part by a random mechanism. From the mathematical view-point, a stochastic process is an ensemble of events in time for which thereexists a probability measure, descriptive of the statistical properties, orregularities, that can be exhibited in the real world in a long series of trialsunder similar conditions. Of course, in practice one almost never has thecomplete ensemble physically, but only a few members (i.e., a subensemble),from which, however, it is often possible to deduce the statistical propertiescharacteristic of the corresponding mathematical model. This analyticaldescription, like all quantitative accounts of the physical world, is at bestan approximation, albeit a good one if sufficient pertinent data can beobtained.

More concisely, then, we may say that a stochastic process, y(t), is an

t Cf. G. N. Watson, "Theory of Bessel Functions," 2d ed., pp. 78-80, Macmillan,New York, 1944.

?i2m —

2mm\(n - 2)(n - 4) • • • (n - 2m)(2rn)\nm

m <. n - 1

2 ' i u*m+l - 0 (3)

W2(VuV2) = exp [-(wi2 + y2* - 2pyiy2)/2Ml - p2)]

2W V l ~ p2 (1)

2/i = yz « 0 2/12 * y22 = ^ 3/1I/2 = ^ P (2)

W*(yt\yi) = ( V 2 ^ V l ~ P2)"1 exp [~(y2 - P2/i)2/2^(l - P2)] (3a)

FV2(h\yi) = exp [itoyi - *(1 - p2)$22/2] (36)

*Wv.(*i,fc) = exp [tfrfr + »W« ~ M(cri2^!2 + a22^2 + 2p<r1<r2£1£2)] (4a)

^ i - ??r<ri

2f

^1t/2 - #2'

< 2

2(1 ~ PZ)

2

+ 2,3/1 - J7i V2 - !?s

<Tl <Ti

(4b)

25

Page 24: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

26 STATISTICAL COMMUNICATION THEORY [CHAP. 1

ensemble, or set, of real-valuedf functions of time t, for example, 2/(1)W,y™(t), . . . , VU)(t), . . . , yiM)(t) (M finite or infinite), together with asuitable probability measure by which we may determine the probabilitythat any one member (or group of members) of the ensemble has certainobservable properties. $ By suitable probability measure here is meant oneof total measure unity, obeying the usual conditions for measurability (seeSec. 1.2 and references). A function yU){t) belonging to the process is calleda member junction of the ensemble, or a representation, and the process itselfy(t) is a stochastic or random variable, whose distribution and frequencyfunctions accordingly exhibit the same general properties as the randomvariables discussed in Sec. 1.2. § The member functions yU)(t) may bedefined for discrete values of the time t, namely t\} fe, . . . , on a finiteinterval (Q,T) or on an infinite one (— 00,00); or they may be defined overa continuum of values of t for such intervals. The former is usually calleda random series and the latter a random process (with the term processapplied as well in the more general sense given above).

For example, consider the ensemble defined by the set of sinusoids

y(t,<f>) = A o cos (wot + <t>) (1.67)

where AQ and wo are fixed and the phase <t> has random properties. Theensemble is also defined as a collection of different units, y{k)(t) (k = 1,. . . , N), and these units (indicated by the superscript k) are taken to bethe points of our probability or measure space (sometimes called indexspace), upon which the probability measure characteristic of a stochasticprocess is defined. Each of these different units may consist of one or moreof the member functions y(i)(t) (j = 1, . . . , M) of the ensemble, so thatM > N, which occurs when various of the ensemble members are identical.Then, with each unit (or set of units) is associated a number, which is theprobability that on any one trial a member function yU){t) is selected whichhas the particular value y(k)(t) associated with the unit k.

To illustrate this, let us suppose that the ensemble (1.67) contains a finitenumber N of units; that is, N different values of y, namely, y(k)(t) (k = 1,. . . , JV) are possible [for any allowed t in the selected time interval (0,T),(—• oo,oo), etc.], depending on a discrete set (M) of values <j>u\ say, in theinterval (0 < <j> < 2T). If we require further that to every distinct unit kthere corresponds a single value of <£0<), then j = k, M = N, and all repre-sentations are equally important. Consequently, the probability of pick-

t Only real-valued functions are considered here for the time being.t For our purposes, these are to be identified ultimately with measurable quantities in

the physical world.§ Henceforth, we shall not distinguish between the random variable Y, or Y(t), and its

values y(t). This identity always implies (at least a possible) experiment whereby thevalues y(t) can be observed physically. See Sec. 1.2.

Page 25: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 27

ing the jth member on a given trial is the same as that for any other mem-ber i) that is, pj = pi = M~l = N~l. Similarly, the probability that on agiven observation a member function is selected belonging to the sub-ensemble 0", j + Z), say, is simply (I + 1)/M, and so forth.

Usually, however, there are more member functions yU) in an ensemblethan there are units y{h); that is, M > N. In our example above, this corre-sponds to several identical representations for each distinct value of <£.Here not every unit is equally significant, and thus different probabilities or

N

weightings pk (> 0) may exist for each k = 1, . . . , N, with Y pk = 1

for unity measure. A similar argument can be applied to the cases where4> has a continuum of values, and the ensemble an infinite set of distinguish-able units, as well as member functions. In many cases, the probabilitydistributions and probability densities are then continuous or can be repre-sented by our formalism of Sec. 1.2-4, the process in question being assumedmeasurable. Our next example is typical.

We remark, finally, that the representations of the ensemble need not beentirely random; members containing definite periodic or aperiodic com-ponents are not excluded. In fact, sets of functions exhibiting no randommechanism at all, e.g., Eq. (1.67), where <j> also has but a single, specifiedvalue, are included as limiting cases in the general notion of ensemble.These are ensembles of a single unit, where all members are identical. Ofcourse, not any ensemble can be used to describe a process: we must placesome restrictions on the ensemble properties, namely, that the probabilitydistributions for the function values do exist. When this is so, we have asuitable random process from which predictions and estimates can be made.

1.3-1 A Random Noise Process. To show how the various distributiondensities which embody the statistical properties of the ensemble may inprinciple be obtained for a stochastic process, let us consider as our secondexample the case of an ensemble of random noise voltages y(t). Here yU)(t)is a typical member function, such as might be observed at the output of acathode-ray oscilloscope when shot or thermal noise voltages are appliedat the input of the device. Assuming the existence of this ensemble, then,and observing that y can take a continuum of values, we exclude for the timebeing any deterministic—i.e., steady, periodic, or nonrandom aperiodic—components, so that y(t) is a continuous, entirely random process. [Lateron, we shall deal with mixed processes, having deterministic components.These, we may expect, are described in terms of mixed densities (cf. Sec.1.2-4).]

Consider now the ensemble shown in Fig. 1.5, consisting of representationsspecified in the interval (to, to + T), and let us begin by determining thefirst-order p.d. Wi(yi}k). First let the ensemble contain a finite number Mof these representations, where M is very large. Next, divide y into a set

Page 26: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

28 STATISTICAL COMMUNICATION THEORY [CHAP. 1

y™(0

FIG. 1.5. An ensemble of random noise voltages yM, y™, . . . , y{M).

of N (< M) units, each of width Ay, and then select an interval (yi, ?/i + Ay),where Ay is suitably small and y\ = I Ay. We then count the numberni(M) of member functions yU) (j = 1, . . . , m, . . . , M) which at timeh fall in the interval (yi,y,+Ay), i.e., which belong to the unit correspondingto (yi,yf+Ay). The probability measure associated with this interval or

t

t

t

-*o+*3y2+hy

t

t

|*2I'l

y<M)(t)

yi+Ay

~y\ I

y(J)(t)

per

yQ)(t)\

y(2)(t)\

Page 27: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 29

unit is then simply

P(yi <y <yi + &y; h) - [n1(M)/M]h (1.68)

o, Ay •—> dy, we assume that ni(M) —> oo in such aTaking the limits M —» «», Aj/ -way thatf

lim P(yi < y < yi + Ay; h) —> TFi(?/i,*i) dyi

== probability that y at time h falls in interval /i, 2/i + dy) (1.69)

Repeating this counting operation for each unit (yif y\ + Ay), allowing y\ torange over all possible values, then gives in the limit f the desired probabilitydensity Wi(yi,h).

Wxfatz)

O atti dXtz y-*>

FIG. 1.6. First-order d.d.'s at two different times (cf. Fig. 1.5).

A similar procedure is followed for the second-order density W2(yi,h;y2,t2).Here we select as a typical unit the joint interval (2/1, y± + Ay), (y2, yi + Ay)and then count the number of pairs ni^M) of crossings in the two regions(cf. Fig. 1.5) for all possible values of (2/1,2/2). We accordingly definethe joint probability P(yx < y < y± + Ay; h; y2 < y < 2/2 + Ay; t2) =ni2(M)/M, which in the limit M —» 00 ? Ay —> dyh dy2 is assumed to become

lim P(yt < y < Ay + yi; ix; y2 < y < y% + Ay; t2)Ay—*dyitdy2

—* W2(yhti;y2,t2) dyi dy2

== joint probability that y takes values in range (yh y\ + dyi)at time t\ and values in interval (2/2, 2/2 + dy2) at time U (1.70)

The higher-order densities are obtained in a similar fashion. The countingprocess is laborious, even for W2, while for Wn (n > 3) it may prove entirelyimpractical in actual physical situations, although conceptually possible inthe mathematical model. It is important to note the times (ti,t2, . . . ,tn)

f For the physical systems considered here, this limit will always exist, at least in thesense of Sec. 1.2-4, within our powers of observation. For the corresponding mathemati-cal model above, we postulate the existence of these limits.

\Wi(yi;tl)

O atti at«2 y-*-

Page 28: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

30 STATISTICAL COMMUNICATION THEORY [CHAP. 1

at which these probabilities are computed. The actual structure of thed.d.'s may be quite different for different times, as shown, for example, inFig. 1.6.

1.3-2 Mathematical Description of a Random Process. The hierarchyof distribution (densities) Wi, W2j . . . , Wn, as n ~-» <» f accordingly pro-vides the mathematical description of the random process, f Since the Wi,W2f • • - j Wn-i are marginal d.d.'s of Wn, all statistical information forthe process y is essentially contained in Wn, in general as n —> oo. Know-ing Wn, as n —> oo, we may say that the process is completely% described [ pro-vided, of course, that certain conditions of regularity for the realizationsy{1)(t), ym{t)y etc., are imposed, so that these distribution densities exist[Eqs. (1.69) and (1.70)], for all t in the interval (t0, t0 + T)). It sometimeshappens that knowledge of Wi or W2 is sufficient to give us Wn for all n,so that a complete description in the above sense follows. We shall seeexamples of this in Sec. 1.4.

1.3-3 Some Other Types of Process. Besides the continuous randomprocess (a) discussed above, there are several other categories to be distin-guished (cf. Fig. 1.7). For instance, if we allow the stochastic variable y{t)to take a continuum of values but restrict the parameter t to a discrete setti, . . • , tn (which may be finite or infinite), we call y(t) a continuous ran-dom series (b). An important example is the classical problem of the ran-dom walkj19'20 where the "steps," or displacements, at any one move maytake a value yu)(tk) from a continuum (a < y <b)} say, but these movesare made only at the discrete instants t\, t^ . . . , fa, etc. A third possi-bility arises when y is restricted as well to a discrete set of values z/(fa)i,y(tk) 2y y(tk)d, . . . at discrete times fa. Such processes are often called dis-crete random series (c), one instance of which is provided by a sequence oftossed coins. The fa represent the times at which the coin is tossed; thetwo possible values t/(fa)i, y(fa)2 correspond to heads and tails, respectively.Finally, there are the cases where y again can take only a set of discretevalues, but now t represents a continuum such as occurs, for example, in theoutput of a clamped servomechanism, or counter system, etc. (where timemoves on continuously and the output of such systems maintains one of adiscrete set of possible values for various periods, changing abruptly to newones). These are discrete random processes (c?).§

For most of the applications in this book, we shall be concerned with thecontinuous random series (6) and with the continuous random process (a).

t The description is equally well given in terms of the characteristic functions FVl(£ifti),^11/2(^7^^2^2), etc.

t Strictly speaking, we must require, in addition, the topological property of separa-bility of the process (see Doob,16 chap. 2, sec. 2).

§ In distinguishing the various types, it is helpful to observe that the terms discreteand continuous refer to the stochastic variable y, while process and series are applied tothe parameter t.

Page 29: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 31

Besides providing a useful model of the detailed structure of various noisemechanisms (cf. Part 2), processes of type b occur frequently when con-tinuous data are quantized (in time) by sampling procedures (cf. Part 4).On the other hand, the continuous process is appropriate for the macroscopictreatment of electronic noise and other fluctuation phenomena, observed ata scale well above the level where the fundamentally discrete nature of thephysical mechanism is apparent. It is appropriate, also, to many situationsin communication theory where continuous (and discrete) data are processed

y(i)(t)

Mt-2*?£

y<»(»

<.*"

*1*2

(b)

0)

tk t

y-*—-T--ri'-*J

y=*a-

y(j)(t)

! h«ii Yt

(c) (d)FIG. 1.7. Four types of random processes: (a) continuous random process, a random noisevoltage; (6) continuous random series, a one-dimensional random walk; (c) discrete randomseries, sequence of tossed coins; (d) discrete random process, a clamped servo output.

continuously, e.g., linear and nonlinear filtering, modulation, rectification,etc. Moreover, it is often necessary to regard the continuous process, orcontinuous random wave, as a limiting form of the continuous series, such asoccurs, for example, in the analysis of optimum decision systems for thedetection and extraction of signals from noise backgrounds [cf. Part 4,Chaps. 18 to 20]. We remark that one can, of course, equally well discussthese processes in terms of their distributions, instead of their distributiondensities, although the 5 formalism (Sec. 1.2-4) provides the appropriated.d. for "pure" processes, and for mixed ones as well, containing deter-ministic components, e.g., steady periodic terms, etc. The more refinedmathematical concepts16'18-21"25 of a random process, besides the compara-tively simple notions of continuity, introduce questions of bounded vari-ation, differentiability, convergence in probability, expandability, etc., to

h

O ! i h

yn

a

yO)(t)

t-2 *-, O tx t2 t-O

._'*_.

tn

(a)

1 1 1

o t

(e) (d)

Page 30: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

32 STATISTICAL COMMUNICATION THEORY [CHAP. 1

which for our purposes we shall here give no more than the briefest oftreatments (cf. Sees. 2.1-1, 2.1-2); the interested reader is referred to theappropriate literature.

1.3-4 Deterministic Processes. Besides the entirely random processes(a) to (d) (Sees. 1.3-2, 1.3-3), there are degenerate forms of such processes.Consider, for example, the process y(t) = Ao cos (coot + <t>), where Ao, w0 arefixed and # has some d.d. in (0,27r). Then, representations or realizedvalues of <j> [for example, 0(?) (j = 1, . . .)] can be deduced by measure-ments of y^it), and the future behavior of each y{j){t), corresponding toeach </>0), is accordingly specified exactly. When a finite number (here two)of such measurements determines the future behavior of any representationof the ensemble, the process is completely deterministic. Deterministic proc-esses possess a definite functional dependence on time, while their randomcharacter appears parametrically. We shall normally reserve the term sto-chastic or random process for those cases, e.g., thermal noise, shot noise, etc.,to which such definite functional structures (in time) cannot be given andshall call those processes "mixed" which contain in addition a deterministiccomponent. A typical example of interest is the additive mixture of noiseand signal ensembles, representations of which constitute the received wavesin many communication problems. Sometimes, too, the mixture is multi-plicative, as well as additive: the desired signal may be a noise-modulatedcarrier, for example, Ni(t) cos (coot + <£), appearing in a noise backgroundJV2(£)j a n d so forth. A variety of mixed processes will occur in practicalapplications, as we shall see in subsequent chapters.

1.3-5 Signals and Noise. We define in general a signal to be anydesired component of a transmitted or received wave, while noise is theaccompanying undesired component. Signals may be deterministic orentirely random, while the noise, usually random, may in many casespossess the deterministic properties frequently associated with signal ensem-bles. In particular problems, it will be evident which is which.

We may use these ideas to describe the signal types with which we haveto deal. Writing

y.(t) = Asfo'fi) (1.71)

for a general signal ensemble, we note that A is a measure of the powercontent (i.e., scale in amplitude) of this signal, while s(0, a normalizedwaveform, indicates its structure in time, which is assumed given. Here0 represents all other descriptive parameters, which may or may not possesscontinuous or discrete p.d.'s, while e is an epoch, relating the wave in timeto some arbitrary origin, as shown, for example, in the ensemble of Fig. 1.8.The epoch is measured from some definite (i.e., functionally determined)point in the waveform to some chosen origin of time.

Suppose 6 has measure unity for 8 = 0O. A signal ensemble of (infinitely)many units is produced if we allow e to have a (continuous) d.d. in the inter-

Page 31: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 33

val (a < e < &). For periodic waves, this usually becomes (0 < e < To),where To is the period of a typical member. For aperiodic waves, a similarsituation exists for 0 < e < Ts, with a p.d. for e in this interval. We observealso that the descriptive parameters (A,e,8) may be themselves functions oftime, albeit stochastic ones. For example, our signal may be the triangularwave shown in Fig. 1.8, but with random periods (for example, To possessesa d.d.); or the periods may be fixed [w(T) = d(T — To)], but the amplitude(i.e., scale) of the wave may be subject to random variations, either frommember to member (but determined for any one representation) or fromtime to time (for each member of the ensemble). Still other possibilities

y(1)(0

FIG. 1.8. An ensemble of periodic signals, of fixed amplitude and structure, but randomepoch e in (0,7*0).

may occur. The signal may be of the form S(t) = AN(t), for example,where A is determined and N(t) is an entirely random process. Clearly,any combination of random and deterministic components is possible;exactly the same sort of statement may be made about the accompanyingnoise, although in most cases considered here the noise is assumed to con-tain no deterministic components, at least before it is processed with thesignal.

1.3-6 Stationarity and Nonstationarity. The process is said to be non-stationary when its ensemble properties, i.e., the descriptive hierarchy of

O^ £ ^

-r0-

~T

tK £O

y*>(t) To-

yO)(t)

O_£^

-T(T

O+—e- t

t

-Trr-y(2)(t)

Page 32: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

34 STATISTICAL COMMUNICATION THEORY [CHAP. 1

y™(t)

FIG. 1.9. A stationary ensemble of random noise voltages ya), y{2\ iM)

probability densities (cf. Sec. 1.3-2), depend on the absolute origin of time,through fa, £2, • • • , tn, and not alone on the differences fo — fo, fo — t\, . . . .Expressed somewhat more concisely, a process is nonstationary if its ensem-ble does not remain invariant under an arbitrary (linear) time displacement.Thus, if the ensemble y(t) goes into a different ensemble y'(t) = y(t + t'),this is equivalent to a change of measures (WiyWz, . . . ,Wn, . . .) to anew set (W[, . . . ,Wf

ni . . .) with t'. Physically, the underlying physical

yi+by

?iy(J)(t)

yW(t)

y*Ht)

yi+tky

T

t

-+00]y2+Ay

\

« • » •

t

\hh

y(M)(t)

Page 33: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 35

mechanism changes in the course of time, as shown, for example, in Figs.1.5, 1.6: Wi at h is quite different from Wi at t2 {T* fa), and so on.

Conversely, a process is said to be stationary2* if the ensemble remainsinvariant of an arbitrary (linear) time shift, y(t) —•> y(t + tf) = y(t); theensemble remains unchanged, or equivalently all probability measuresremain unaffected by this displacement, viz.,

W n ( y h f a ; . . . ;yn,tn) - Wn(yh h + t'; . . . ; y n , tn + t') n > \ ( 1 . 7 2 )

The statistical properties of the ensemble are independent of the absoluteorigin of time and are functions only of the various time differences betweenobservations, so that we may write

W*(y\}t\\y*M) . . . ;yn,tn) = Wn(yi,O;yt,ti-ti; . . . ;yn,tn-ti)= Wn(yijy2,t%; . . . ;yn,tn) n > 1 (1.73)

letting fa — 0 for convenience. Equivalently, for every arbitrary (linear)shift t —> t'y each representation yu) of the original ensemble goes over intoanother member yik) of the new ensemble in such a way that the weightingor measure associated with corresponding units in the two ensembles remainsunchanged. Figure 1.9 shows a typical ensemble of stationary randomnoise voltage waves, where each representation persists over the interval(~oo < t < oo).f

The hierarchy of defining probability densities is determined as before inthe nonstationary case (cf. Sec. 1.3-1). We write

Wi(yi) dyi = Wi(y) dy = probability that y takes values in interval(y, y + dy)

W2(yi] 2/2, 2 — fa) dy\ dy2 = joint probability that y takes values in range(yh y\ + dyi) at time t\ and values in interval(2/2, 2/2 + dy2) at another time 12

Wn(yi'9yi9 h — h; . . . ;yn, tn — h) dyh . . . , dyn = joint probability offinding y in intervals (yh y\ + dyi), (y2, 2/2 +dy2), . . . , (j/n, Vn + dyn) at an initial time ^and times t2 — h, . . . , tn — fo, respectively

(1.74)

where usually, to simplify the notation and the discussion, we shall arbi-trarily take t\ == 0 as the time origin [cf. Eq. (1.73)]. In contrast with thenonstationary cases, the underlying physical mechanism of these randomprocesses does not change with time.

t Note that, if the interval (to < t < to -f T) over which the member functions are ingeneral different from zero is finite while the members are zero outside, the process isnonstationary, even if within that interval the measures remain invariant: one can alwaysfind an arbitrary translation \tf\ > T which takes one outside the original interval andtherefore yields a different ensemble, each member of which is now zero. The process ofFig. 1.5 is nonstationary for this reason, in addition to the obvious variation of its scalein the interval (to, U + T).

Wn(yhti;y2,t2; . . . ;yn}tn) = Wn(yi,O;yt,ti-ti; . . . ;yn,tn-ti)= Wn(yijy2,t%; . . . ;yn,tn) n > 1 (1.73)

Wi(yi) dyx = Wi(y) dy =

W2(yi] y2,t2 — h) dyi dy2 =

Wn(yi',yi,t* - h; . . . ;yn,

probability that y takes values in interval(y, y + dy)joint probability that y takes values in range(yh 2/1 + dyi) at time ti and values in interval(2/2, 2/2 + dy2) at another time 12tn — ti) dyi, . . . , dyn = joint probability offinding y in intervals (yh 2/1 + dyi), (y2, y2 +dy2), . . . , (j/n, Vn + dyn) at an initial time ^and times t2 — tiy . . . , tn — th respectively

(1.74)

Wn(yhk; . . . ;yn,tn) - Wn(yhh + tf; . . . )yn,tn + t') n > 1 (1.72)

; y», U - h)Wn(yi,0iVt,ti - h) . . .Wn{y 1W2M) ' • • »n,«n) n > 1 (1.73)

Page 34: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

36 STATISTICAL COMMUNICATION THEORY [CHAP. 1

Processes for which Eq. (1.72) is true are also called strictly, or completely,stationary processes, f A less strict form of stationarity arises when werequire only that the covariance function of the process depend on timedifferences alone.

E{[Y(tk) - E[Y(tk)}][Y(U) - E{Y(ti)}]}= Ky{\h - h\) E{Y*\ < co, \E{Y}\ < « (1.75)

[cf. Eqs. (1.52), (1.60a)]. This, of course, is not enough to ensure that thedistribution densities Wi, W2) etc., remain invariant of an arbitrary shift tf

[cf. Eq. (1.72)]. For example, consider the process (e.g., series) for which

(1.76)

where the yk are statistically independent (cf. Sec. 1.2-10). Then unless thewk(yk) are all alike, higher moments, for example, yk2yi2ym2, Vhzy?ym, willdepend on k, I, m, not on the differences I — k, m — fc, etc., and the processis not completely stationary.

When a process obeys Eq. (1.75), it is said to be wide-sense stationary, orstationary to the second order, f It is clear that processes that are strictlystationary are also wide-sense stationary, but the reverse is not true, exceptin the very important special case of the normal, or gaussian, process, wherethe process itself is determined entirely by the structure of the covariancefunction (or matrix) (and the expectations y) [cf. Eqs. (1.60), (1.61), andChap. 7]. The conditions for the existence of processes stationary in thewide sense are thus determined by the conditions for the existence of thecovariance function.

Of course, stationarity, either in the wide or in the strict sense, is anidealization. Physically, no process can have started at t = — °o} nor canit continue to t —> <*> without changes in the underlying physical mechanism.In actuality, one always deals with a finite sample, but if the fluctuationtime of the process—roughly, the mean spacing between successive "zeros"in time of any member function yU)(t)—is small compared with the obser-vation time, or sample length, and if the underlying random mechanismdoes not change in this interval, we may say that the process is practicallystationary. Then we expect that results calculated on the assumption oftrue stationarity will agree closely with those determined by taking thefinite duration of the ensemble into account. Accordingly, our first appli-cations of probability methods assume strict stationarity, either for theensemble itself or for the various subensembles comprising the process. Amore refined theory follows which considers the effects of finite sample(and ensemble) size in time (cf. Chaps. 18 to 23).

t Doob,16 chap. 2, sec. 8; Bartlett,17 sec. 6.4.t Doob, op. tit., chap. 2, sec. 8, pp. 94, 95; Bartlett, op. cit.y sec. 6.1.

B\Yk(=yk)\ = 0£{Ffc

2(=^2)} =<r2(>0)Ky(k) = 0 k ?*0Ky{0) = <r2

(1.76)

E\[Y(t>) - ElY(tk)}][Y(tt) - EiYih)}}}= Ky{\tt-tk\) E{Y*} < *>,!£{ 7} | < » (1.75)

Page 35: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 37

1.3-7 Examples of Stationary and Nonstationary Processes. To illus-trate the remarks of Sec. 1.3-6, let us consider a number of simple examples:

(1) The process y(t,<l>) = Ao cos (wot + <£). Here we have an ensembleof sinusoids (Ao and coo fixed), where <£ has a p.d. w(<j>) defined in the pri-mary interval (0,2TT). [Because of the periodic structure of the memberfunctions, w(<t>) is similarly defined in all other period intervals of the same

Primary interval <f»(b)

FIG. 1.10. Probability densities of <f>, <j>f for the (a) stationary and (6) nonstationary en-semble [cf. Eq. (1.77)].

ensemble.] Testing the definition of (strict) stationarity above, we see thatthe arbitrary shift t —> t' = t + fJ gives

y(t + t'o, 0) = -Ao cos (wot + coot'o + *) = Ao cos (o)0t + </>')

w(4>) -> w'(6')(1.77)

where wr(<t>') is the corresponding p.d. for </>'. Unless 0 is uniformly dis-tributed in the primary interval (0,2a-), 0' will have a different distributionin its own primary interval, as can be seen in Fig. 1.10a and b} and themeasures w, wf will not be the same. For the former d.d. w, then, theprocess is stationary, f For the latter, it is not.

t Strictly speaking, we must show that all Wn (n > 1) for y remain invariant [cf. Eq.(1.72)]. However, here it is clear that the transformed ensemble has the same form andthat, since w(<t>) = w1'(<£'), every subensemble [e.g., those y's corresponding to <t> and to </>'in the interval (0,?r/2)] has the same measure, so that the ensemble itself remainsunchanged. Using the techniques of Sec. 1.3-8, the reader may find it instructive toshow that this is equivalent to the invariance of Wn (n > 1) for this particular ensembley (see Prob. 1.10).

Primaryinterval * ' j

| w(<t>)=w(<i>')~1

IKi

O 2TT 4 * 6 * (0,00

(a)Primary interval 4>

Primaryinterval <£'

I w(<t>)*w(<t>')

O 2ft An 6« (0,0')

w(4>) - > w'(4>')Ao cos (wo< + o>o<o + <t>) = A o cos (coot + <£')y(t + t'o, t) =

Page 36: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

38 STATISTICAL COMMUNICATION THEORY [CHAP. 1

(la) The process y(t,4>) = A$ cos (uot + <!>)+ N(t). Here N(t) is anoise ensemble. If N(t) is stationary, then y is also stationary, providedthat $ is uniformly distributed in the primary interval (0,27r), as we havenoted above. On the other hand, if either (or both) of the subensemblesAo cos (osot + <£), N(t) is nonstationary, so also is y.

(2) The continuous processy(t}<t>) = A0[l + X cos wm(t + <£)] cos (<aot +0 + 0O). This is an ensemble of sinusoidally modulated carriers, Ao, wm,<l>o, coo fixed, and A chosen so that Ao( ) > 0. To test it for stationarity,let t -> t' = t'o +

l, s o t h a t

y(t + t'o, <{>) = AO{1 + X cos [umt + tf + (wTO - «0)^]} cos (wot + 0o + <*>')0' = 0 + Wo o (1-78)

Even if w(0) is uniform, so that wf(<t>') is also [in its primary interval (cf.Fig. 1.10a)], the same ensemble does not result for arbitrary t' [cf. Eq. (1.78)]and so y(t,4>) here is not stationary. This is an example where the Wi} W%,Wsj . . . , for y do not remain invariant, f although w(<t>) does. J

1.3-8 Distribution Densities for Deterministic Processes. In manypractical cases, the signals which one has to deal with are deterministic.Their ensemble properties are accordingly derived from the probabilitymeasures associated with the various descriptive parameters A, c, 0 [cf.Eq. (1.71)] of the signal. From the methods of Sec. 1.2-14, especially Eqs.(1.62), (1.63), we may represent the hierarchy of d.d.'s of y3 = As(t;efi) by

Wn(yi,h; . . . ;yn,tn)s

= / • • • / e~ihF*> "(M1'' • • • ;*»»f")*d*1(2r)'»<**" n ~ 1 ( L 7 9 )

— 00

where specifically the characteristic function Fg is

Fyi, •• • ,Vn(£htl', - - - ',^n,tn)s00

= I • • • J e x p [iA{Zi8(ti;efi) + • • • + tn8(tn;efl)]]w(A,efi) dA de d6— 00

(1.80)

In the still more general cases where the defining parameters are also time-dependent, Eq. (1.80) becomes

oo n

Fs = / ' ' ' / 6XP [*2 kkAk8(tk,ekfik)]— 00 k

X w(Ai, . . . ,An)eh . . . ,€n;8i, . . . ,6n) dA de dOi • • • d8n (1.81)

t See p. 34.t For deterministic processes, it is usually possible to decide at once by inspection

whether or not the same ensemble is generated as t —-> t' — t'o -f t, without having explic-itly to demonstrate the invariance or noninvariance of Wn.

4>f = 0 + wo*o (1-78)

— 00

00

= / . . . / e~ibFn Vn(^,h; . . . #n,u)a dfl • • • rf|n(2IP)-

n > 1 (1.79)

FVl vSiutl', • • • ',^n,tn)s

— oo

/ exp [iAiZis(h',efi) + • • • + £„«(<„;«,e)}]w(A,«,8) dA de d8

(1.80)

Fs =— oo

X w(Ah . . . ,An',ei, . . . ,€n;8i, . . . ,6n)

/ exp i ) £kAks(tk,ek,Qk)n

k

t See p. 34.

cos (coot + 0o + <t>')[0>mt+ <*>' + (WTO ~ »0)*J]}y(t + t'o, <t>) = A 0 { l + x cos

let t->t' = t'0 + t, so that

WnCVuk; • • • ;2/n,*n)s

(1.79)*,IZn,tn)se-*F* vMuh; • • •=

i

dA de dOi • • • d8n (1.81)>e»Pu ,*n)

Page 37: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 39

As an example, let us consider the simple ensemble of periodic wavesy, = s(t,e) = s(t + e)j with period To, so that s(t + To, e) = 8(1,*), etc.,where the epoch e is uniformly distributed in a period interval (0,T0) (cf.Fig. 1.8). This is equivalent to a uniform distribution of the phase <j> =

-Ao oJL

FIG. 1.11. First-order probability density Wi(y) of the ensemble of sine waves y(t,4>) ~Ao cos (wot -j- $)> 0 uniformly distributed in (0,27r). (Experimental points afterN. Knudtzon, Experimental Study of Statistical Characteristics of Filtered Random Noise,MIT Research Lab. Electronics Tech. Rept. 115, July 15, 1949.)

2-ire/To in (0,2ir). Applying Eq. (1.80) to Eq. (1.79) for the case nwe can write at once

1,

2T""" Jo 2TT

2TTe - i t o ^ P V i . ( o * > ^ = Tf l(?/l)s (i.82)

2x

since the process is stationary under these conditions (cf. Prob. 1.9a), andhence independent of fa, which we set equal to zero here for convenience.In a similar way, the higher-order densities depend on t only through thedifferences U — h, fa — h, . . . . As an example, Fig. 1.11 shows Wi(y)s

for the ensemble of sinusoids y8 = Ao cos (coot + tf>), where specifically fromEq. (1.82) we get (cf. Prob. 1.11)

0

— Ao < y < Ao

Iz/I > Ao(1.82a)

An experimental demonstration of this has been given by Knudtzon.27

While Eqs. (1.79), (1.80) give these higher-order densities in a generalfashion, we can take advantage of the periodic and stationary character ofthis ensemble s(2;</>) to obtain (Wn)s more simply. Let us illustrate with

^ 0

(nAorl

Wi(y)COo e

J 00

— ao

Wi(lMi)« =f*00

00

P2?r

/oei(i*Vv,V -

sd<l>2TT

r27r

/o

d{ie-*«iin 6*«i«(0;*]

C ^ < ^

2TT= TF^yOs (1.82)

1Wi(y)s == 7T VAo 2 - y2

0

— Ao < 2/ < Ao

Iz/I > Ao

g-t'tll/Id | i

Page 38: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

40 STATISTICAL COMMUNICATION THEORY [CHAP. 1

(W2)s, introducing the conditional density W2{y\\y2\ t2 — fa)s through therelation f

W2(2/1; 2/2, U - hh = W1(y1)sW2(yi\y2) k - h)s (1.83)

Now, y(h) — s(ti + e), and 2/(Z2) = s(^ + e), where e is uniformly distrib-uted over a period interval, since y is assumed to be stationary. Further-more, since y is deterministic, we can invert these relations to obtain

h + € - s-l(yi) U + e = s~l(y2) (1.84)

Also because of the deterministic structure of y} once s is known at a timefa + e, it is specified (functionally) for all other times, so that the con-ditional density in Eq. (1.83) becomes simply

Wi(yi\y2', U - fa)s = %2 - s(t2 + e)] = 6[y2 ~ s(t2 - fa + fa + e)] n _ .- « { l / 2 - * « - h + r-^yi)]} u " w ;

The second-order density [Eq. (1.83)] can then be written

W2(yi) y2, U ~ h)s = WiiyOa HV2 - s[t2 -h + irl(yi)]} (1.86)

when Wi(yi)s is obtained from Eq. (1.82). Higher-order densities may beconstructed in the same way (cf. Prob. 1.13). Thus, with Wn (n > 1) forentirely random processes and (Wn)s (n > 1) for deterministic processes,we can now, in principle at least, give a complete description (cf. Sec. 1.3-2)of the mixed processes which occur most frequently in applications.

PROBLEMS

1.9 (a) Show that any ensemble of periodic waves with stationary parameters isstationary.

(6) Verify that the ensemble of periodically modulated carriers, fit) cos (coot -f- <f>)}

/ (0 ~ fit + TQ), is in general not stationary. When is it stationary? When is it wide-sense stationary?

1.10 If ^ is uniformly distributed over the primary interval (0,27r), show with thehelp of Sec. 1.3-8 that the ensemble y = Ao cos i^ot -f </») iA0) wo fixed) is stationary; i.e.,show the invariance of Wn(yi,ti; . . . ;yn,tn) in > 1) under an arbitrary (linear) timeshift.

1.11 For the ensemble of Prob. 1.10, show that

Wl{V) ( 0 \y\> Ao { }

Fvtt) = «/o(^o^) Bessel function of first kind (2)

t See Sec. 1.2-9, especially Eqs. (1.45) et seq. Note, however, that here we use thealternative notation: the quantity preceding the vertical bar is assumed given; thatfollowing is the argument of the function.

ITid/i; 2/2, h - ti)s = WitodsWifaly,; h - h)s (1.83)

(1.84)h + € = s-l(yi) U + e = s~l(y2)

Wi(yi\yt; U - h)s = %2 - s(t2 + «)] = % 2 - s(t2 - k + h + e)] (1.85)= S{y* - 4h - h + 8-\yi)])

Wi(yi; 2/2, h - k)s = Wi(yda «{»i - »[«i - h + sr1(y1)]} (1.86)

PROBLEMS

WM - {(;F,W = MAoS)

(«• VAo* - y1)-1 —A« < v < At,\V\ > A,Bessel function of first kind

0)

(2)

For the ensemble of Prob. 1.10, show that1.11

Page 39: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.3] STATISTICAL PRELIMINARIES 41

(See Fig. 1.11.) Show also that the distribution function of y is

10 y < - i i 0

i+isin-i-f -Ao<y<Ao (3)i y>A0

1.12 Show for the ensemble 2/ = Ao cos (coot + 0) [A0| wo fixed, <£ uniformly distrib-uted in (0,2a-)] that

FVvVl(ti, £2; h - fr) - ^ ( - D V m W o W / m W o W cos mco0(*2 - h) (1)m = 0

md

TFitoi, 2/2; *2 - h) = 7 = 5 {2/2 - AQ COS COS"1 -f1 + (t, ~ ti)<*A \ (2a)

or

Wi(yi, y*; U - <i)

1 Y cos (m sin- f + ^ ) cos (m sin- f + ^ )-^ > em cos rawofe — 1) V AQ 2 / \ AQ 2 /T »n = 0 V ^ o 2 - 2/i2 V A T " 1 ^ (2&)

— A o < 2/i, 2/2 < Ao

0 I2/1I, |2/2| > Ao

Since y is stationary, W2 depends only on the time difference t2 — t\.1.13 Show for the ensemble of Prob. 1.12 that

w - l

77 —— I ! 5 ^*+i ~ Aocos (cos"1 —- 4-^+iwo)

~Ao < yk < Ao0 otherwise

(i)

1.14 (a) For the ensemble of Prob. 1.12, show that the joint first-order density ofy and y is

Wi(y,y) = Wi(y)[5(y + co0 VA02 - y2) + d(y - co0 V A O

2 - y»)]/2 (1)

where Wi(y) is given in Prob. 1.11, Eq. (1), if we observe that

Fv.v(ti,h) = MAo V^i2 + coo2^2) (2)

(b) Show also that y and 2/ are uncorrelated, that is, yy = 0, but that y and 2/ are notstatistically independent.

1.15 (a) Repeat Prob. 1.14, except that now one is to show that the joint first-orderdensity of y and y is

Wi(y,$) = Wi(y) 8(y + m*y) (1)

with the characteristic function

FvAZuh) - /o[Ao(€i - »o*€i)] (2)

(6) Verify that 2/ and are not statistically independent, and show specifically that

y £ - -Ao2wo2/2 (3)

Dx(y) -

I!; +17T

1 sin"1 yAo

2/ < - A o

—Ao < 2/ ^ Ao

2/ > Ao

(3)

n, .y,(«i , ^J i52 - ti) =

00

m = O

(-l)we7W/m(Ao^i)/m(Ao^) cos mco0(i2 - <i) (1)

(2a)

and

Ws(yi, y2; h ~ h) =1

TT \ / A o 2 - 2/i25 ! Wo — A n COS COS" 1 Ml

Ao+ (h - <i)»0

17T2

1^2(2/1, 2/2; <2 ~ tl)

or

?n = 0

0

em COS mwo(^2 " ^1). COS

V A 0 2 - 2/i2

2/i , nnr\Ao ' 2 j+m sin"1 cos m sin"~ 1 2/2 , rrnr\

Ao ' 2! +

V A o 2 - 2/22 (26)—Ao < 2/1, 2/2 < Ao

|2/i|, I2/2I > Ao

yk+i — Aocos (1I cos"x Z/i

Ao+ fo+lWQ )

n - l

5

fc=l7T V A o 2 - ^l 2

1

0~^lo < yk < Aootherwise

(1)

Wxiytf) = FTi(y) « « + o>o2y)

Fy.vttuh) - Jo[Ao(€i - »o*€i)]

(1)

(2)

1.12 Show for the ensemble y = Ao cos (coot + (f>) [Ao, w0 fixed, <j> uniformly distrib-uted in (0,2a-)] thatuted in (0,2a-)] that

Wi(y,y) - Wi(y)[5(y + o>0 VA02 - y2) + d(y - co0 Ao2 - yz)]/2 (1)

(2)*W€l,fc) = W o V^i2 + CO02 22)

with the characteristic function

(6) Verify that # and y are not statistically independent, and show specifically that

yi) = - ^ l o 2 w o 2 / 2 (3)

Wn(yi;y%9U; . . . ;s/«,*n) =Wniviiyrf*; Wnttn) —

Page 40: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

42 STATISTICAL COMMUNICATION THEORY [CHAP. 1

1.4 Further Classification of Random Processes

We have seen in Sec. 1.3-2 that knowledge of the probability densitiesWn for all n > 1 completely describes the random process. Many physicalprocesses, however, are not so statistically complex that all Wn are neededto provide complete information about the process; it is frequently sufficientto know Wi or W2 alone. As we shall observe presently, once these dis-tribution densities have been found to contain all the possible statisticalinformation, we can then determine from them the higher-order densitiesWn (n —> oo), completing the descriptive hierarchy. An example is theperiodic deterministic process of Sec. 1.3-8, where (Wn)s is completely deter-mined once (Wi)s has been given [cf. Eq. (1.86)]. Other examples ofentirely random processes are considered below in Sees. 1.4-2, 1.4-3.

Classification is important, because often, for the solution of particularproblems, we need to know whether or not Wi, or W2, etc., really containsall the information about the process. For example, in the theory of theBrownian motion, Wi(r,f)t is not sufficient to describe the stochastic proc-ess completely. One needs in addition the joint second-order densityW2(r1} fi; r2, i2; U ~ h). Similar remarks apply in the theory of thermalnoise (cf. Chap. 11). Furthermore, although a second-order density isadequate for many uses (cf. Chaps. 5 and 12 to 15), all orders of distri-bution density may be required for both linear and nonlinear cases whenthe more general methods of statistical decision theory are applied to sys-tem analysis and design (cf. Chaps. 18 to 23).

1.4-1 Conditional Distribution Densities. It is convenient at this stageto extend the notion of conditional probability mentioned in Sec. 1.2-9 andvery briefly in Sec. 1.3-8. We define the simplest conditional p.d. for arandom process y{t) byj

W2(yi,ti\y2,t2) dy* = probability that, if y has value y\ at time ti,y will have values in range (2/2, 2/2 + dy2) attime t2 later (1.87;

Higher-order densities may be defined in a similar way. We have as onegeneralization

Wn(yi,tif - • • ',yn-i,tn-i\yn,tn) dyn = probability that, if y takesvalues yh . . . , yn^x at re-spective times tij . . . , tn-i,y will have values in interval(yn, yn + dyn) at time tn later(tn > k_i, . . .) (1.88)

f Here r is the vector displacement and r the velocity of the heavy particle in sus-pension.

t Here the hypothesis precedes the argument, which is the reverse of the notation inSec. 1.2-9 (cf. second footnote, p. 17).

Page 41: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.4] STATISTICAL PRELIMINARIES 43

From the definitions (1.45), etc., we have the following relations for Wn(\)and Wn( ),

F2(2/iA;i/2,«2) = Wi(yi,ti)W%(yi9ti\y*,h) (1.89)W^yhh;y2,t2;yzyh) = W2(yi)ti]yh^)Ws(y1)t1;y2yh\yhU)

= Wi(yx,h)w2(yi,h\y2,t2)wz{yhh;y*MyiM) (l.oo)

and so on, for all n > 2. Since Wn(\) is a probability density, it mustsatisfy the conditions

Wn(yi,ti; . . . ',yn-i,tn-i\yn,Q > 0 (1.91a)JWn(yi,tim, . . . ;yn-i,tn-i\yn,tn) dyn = 1 (1.91b)

J • • • JTTn^i^i; . • . )yn-~htn-i)Wn{yX)ti) . . . ;yn-.htn-i\yn,tn)Xdyi- • • dyn-! - W^ynjn) (1.91c)

which follow at once from the definition of conditional probability. Notethat here the times are ordered, namely (h < t2 < * • • < tn). Otherconditional distribution densities exist also. For instance, one can alsodefine a marginal conditional probability density of the form Wm,n =Wm,n{y\M) • • • ;ym,fc»l2/m+i,fcn+i; . . . ;yn,tn), where as before the quantitiesto the left of the vertical bar are given and those to the right are to bepredicted. In subsequent applications, however, Wn(. . . \yn,tn) is suffi-cient. Note that other properties of Wn(. . . | . . .) are consequences ofthe defining relations (1.87), (1.88),

lim W2(yi,ti\y*>h) = &(y* - yO

therefore with (1.92)lim Wi(yi,ti;y2,ti) = Wi(yhh) 5(y2 - yi)

since it is certain that y2 = yi, a given quantity, when the times of obser-vation coincide [remember that W2(\) is a probability density; cf. Sec. 1.2-4].At the other extreme, we have

lim WiiyiMytfh) = Wifafa)

therefore with (1.93)lim W2(yhh;y2,t2) = Wi(yi,h)Wi(yi,t2)

which, for the entirely random processes assumed here, is simply a state-ment of the fact that there is no "memory," or statistical dependence ofvalues of y observed at two different times sufficiently separated in time.fSimilar interpretations follow directly for Wm,n as fo — t% —> <*> or 0.

1.4-2 The Purely Random Process. J We return now to classification.The simplest type of process y(t) is said to be purely random when sue-

t When the process contains a steady or periodic component, then there is memory fort2 — £i —> oo ; the values of y at U are dependent in this respect on values at t\. See Sec.3.1-5, for example.

% Note the distinction between purely random and entirely random (cf. Sec. 1.3-1).

U - « 1 ~ > «

tz — «|~*<*>lim W2(yi,ti',y2A) = ^1(2/2^1) TFi(2/2,^)

(1.93)

lim W2(yi,ti\y*A) = ^1(^2)

^ _ ^ _ , olim 1^2(?/iA;2/2^) = Tfi(2/i^i) 5( /2 - 2/1)

(1.92)<2 —«1—>0

lim ^2(2/1^1)2/2^2) = d(y2 - 2/1)

Xdyi • • • dyn-t = TFi(W n ) (1.91c)

(1.91o)(1.916)

/ • • • JWwQ/i,<i; . . . ;yn-i,tn-i)Wn(yi,ti; . . . ;yn^i,tn^i\yn,tn)jWn(yi,h; • • • ;yn-i,tn-i\yn,tn) dyn = 1

Wn(yi,h; . . . jj/n-iA-il^A) > 0

= W^yuhWtiyiMytAWsiyifayzMyhts) (1-90)Wi{yi,h;yi,ti;ys,h) = W2(yhti;y2,h)Wz(yhti;y2,t2\y3,h)

W2(yi,h;y2,h) = Wi(yi,tdWt(3ji,ti\v*,td (1.89)

Page 42: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

44 STATISTICAL COMMUNICATION THEORY [CHAP. 1

cessive values of y are in no way correlated, i.e., when successive values ofthe random variable are statistically independent of the preceding ones (nomatter how small the interval between observations). This can be writtenfrom Eq. (1.88) as

Wn(yi,ti; • • • ;2/n-l,*n-l|Wn) = Wltynjn) (1.94)

Substitution into Eqs. (1.89), (1.90), etc., for n > 2 gives us the hierarchy

W2(yi,ti;y%fh) = Wi(yi,ti)Wi(y2,t*) (1.95a)

Wn(yhh; . . . ;yn,tn) = [] W^h) (1.956)

for all orders n. From Eq. (1.59), we may say that the ?/i, . . . , yn arestatistically independent for every tj ^ th. All information about the proc-ess is contained in the first-order d.d. Wiy which now enables us to describethe process completely, according to Eq. (1.956) (cf. Sec. 1.3-2). In practi-cal cases, it is easy to give examples of a random series, i.e., when t is dis-crete (cf. Sec. 1.3-3). The toss of a coin at stated times, yielding a sequenceof heads and tails, is one example. The method of random sampling inclassical statistics is another. However, when t is continuous, the purelyrandom process is strictly a limiting class; in physical situations, y\ and y2

are always correlated when the corresponding time difference t2 — h isfinite (> 0) though small. In fact, it is the nature and degree of such cor-relations that are of particular interest in physical cases.

1.4-3 The Simple Markoff Process, f The next more complicated proc-ess arises when all the information is contained in the second-order proba-bility density ^ f e i ^ i ; ^ , ^ ) . These are called Markoff processes,!28 afterthe Russian mathematician who first studied them. They form a class ofconsiderable importance, since in one sense or another most noise and manysignal processes are of this type. By a simple Markoff process we mean astochastic process y{t) such that values of the random variable at any setof times (fr, . . . ,fe, . . . ,tn) depend on values of y at any set of previoustimes (h, . . . ,U, . . .) only through the last available value yn-i. Thus,we write for the conditional density [Eq. (1.88)]

Wn(yifti] . . . ;yn-iftn-i\ynitn) = W2(yn-iftn-i\yn,tn) h < t2 < ' ' ' < L(1.96)

t For a discussion of Markoff processes in the discrete case, e.g., Markoff chains, ordiscrete Markoff processes, see Feller, op. tit., chap. 15, and also Bartlett, op. cit., sec. 2.2,and chap. 2 in general (sees. 6.3-6.5 consider continuous Markoff processes). For someapplications to random walk, renewal, and queueing problems, see Bartlett, chap. 2.Further references are contained in Bartlett's bibliography.

$ Here we consider only simple, or first-order, Markoff processes.

Page 43: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.4] STATISTICAL PRELIMINARIES 45

and with this and the definition of a conditional d.d., viz.,

WkiyiA; • • . ;vk,tk)= Wk-i(yi,h; . . . Wk-i,tk-i)W*(yk-i,tk-i\yk,tk) k>2 ( 1 . 9 7 )

we see that

Wn(yhti] . . . ;yB,«n) = Wi(yifh)Wi(yiMy2tt2)Wt{yitti;y%tt2\yztti) ' ' '= Wi(yiM)W*(yiMyhU)Wi(yiMv*,t*)

' ' • W2(yn-l,tn-l\yn,tn)n

= Wi(yi,h) [] TFid/t-iA-il^fc) all n > 2

(1.98)

and the process is completely described (cf. Sec. 1.3-2), it being assumed,of course, that Wi and the various W%{\) exist. The Markoff condition(1.96) also implies that the process y is entirely random. Then, since

lim Wi{yk-htk-i\ykM) = Wi(yk,tk)

lim T^2(2/,_i,^_i;^,^) = W1(yk^htk^)W1(y/6jtk) u " w ;

from Eq. (1.93), it is easily seen that our more general expression Wn [Eq.(1.98)] reduces to that for a purely random process [Eq. (1.956)] when allobservation times tk (k = 1, . . . , n) are infinitely far apart. Similarly,at the other extreme where all tk are identical, we use Eq. (1.92) to writeEq. (1.98) as

n

lim W*(yt,ti] . . . ;?Mn) = W^yitU) fl *(y* - y*-i) (1-100)

In the former case, there is no memory between successive values; in thelatter, the memory is perfect: yn = yn-\ = • • • = y\ with probability 1.When the process contains a deterministic component, these relations mustbe suitably modified. Equations (1.96) to (1.100) then apply only to theentirely random portion of the process. Finally, we observe that morecomplicated, or higher-order, Markoff processes can easily be constructedby altering the condition (1.96). For example, we can require dependenceon two previous events, e.g.,

WniVlftl] • - • Wnftn) = Wz{yn-2,tn-2;2/n-l^n-l|yn,fci) h < t2 < ' ' ' < tn

(1.101)

and so on. Note that, for stationary processes (cf. Sec. 1.3-6), the Markoffcondition (1.96) reduces to Ws(yn-i\yn, tn ~ k-i), so that Eq. (1.98) is simply

Wk(yi,ti; • . . ;yk,h)= Wk-i(yi,ti; . . . ;yk-i,tk-i)W2(yk-i,tk-i\ykftk) k>2 (1.97)

we see thatWn(yi,h) . . . ;yn,tn) =

- Wi(yiM)Wt(yiMy*,t*)W*(y*,t2\Vht*)Wi(yi,ti)W2(yiMy*A)Wi(yutiW2,t2\yz,U) • ' •

• • • Wi(yn-i,tn-i\yn,tn)

= Wi(yi,h) [] W2(yk-i,h-i\yk,tk)n

A = 2

all n > 2

(1.98)

tk — <fc_i—»°o

lim

limtk—tk-i-*<°

Waiyk-utk-iWkJk) = Wi{yk-i,tk-i)Wi{yk,tk)(1.99)

Wr,(y*_i>fc_i|»*>fc) = Wifafy)

tn—>tn-l-* • • • —>t

lim Wn(yi,ti', • • • ;y»A)= T^itoi^On

k = 2

d(yk - yk^) (1.100)

Wn(yi,h; - - • ',yn,tn) = Wr8(yn-2,<n-2;yn-ll*n-l|yn,fci) tl <t2< * ' ' < ^

(1.101)

W^foiJlMa; • • • »«iO = ^i(2/0fc-2

n

W2(yk-i\ykAtjc) tek = 4 — fc-i

(1.102)

45STATISTICAL PRELIMINARIESSEC. 1.4]

2

Page 44: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

46 STATISTICAL COMMUNICATION THEORY [CHAP. 1

where it is only the various time differences Atk that are now significant, notthe absolute times fa, . . . , tn.

1.4-4 The Smoluchowski Equation. In the treatment of Markoff proc-esses, we cannot take W2(\) or W2 as an arbitrary function of its variables,although W%{\) (or W2) completely describes the process [cf. Eq. (1.98)].Not only must the general conditions (1.91) be satisfied, but so also must

FIG. 1.12. Diagram showing transitions from yi at time ti to y2 a-t time t2 through possibleintermediate values y at time U.

the Smoluchowski equation28"30 hold (or, equivalently, the equation ofChapman and Kolmogoroff30), viz.,

WiiyiMv*,**) = lW*(yi,ti\v,to)W2(y,tQ\y*,t*) dy h < k < U (1.103a)

which in the stationary case is

W2(yi\y2;t) = JWi(yi\y9to)Wt{y\y2f t - k) dy 0 < to <t (1.103b)

The Smoluchowski equation is essentially an expression of the transition(it need not be continuous), or "unfolding," of probability from instant toinstant for different choices of t = t2 — h. Starting with y\ at a time t\9

one can go to a value y at some arbitrary later time U. Then, given thisparticular value of y as a new starting point, one has a certain probabilityof ending finally in the range (y2, yi + dy2) at still later time t2 (> k > h).The unfolding is repeated for all allowed values of y, and the W^O) appear-ing in the integrand of Eqs. (1.103) are accordingly interpreted as transitionprobabilities. This is illustrated schematically in Fig. 1.12.

To see how Smoluchowski's equation is established, let us start with thethird-order density for the process, namely, Wz(yijti;y,to;y2jt2) (ti < to < t2).From the definition of conditional probability, we write at once

WiiyuhWjto'^to) = Wi(yitti\y.to)W^yhU;ytto\y2lti)

yz+dy2

yz

yi+dy!

y\

t

y+dy

y

h \h u

Page 45: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.4] STATISTICAL PRELIMINARIES 47

Integrating over y and applying the Markoff condition (1.96) gives

Wt(yi,hwt,h) = JW2(yi,h',y,to)W2(yMy*,t2) dy (1.104)

With the help of Eq. (1.89), Wi(yi,ti) can be factored from both membersof Eq. (1.104), and Eq. (1.103a) is the result.

1.4-5 Higher-order Processes and Projections.29 We can continue as inSees. 1.4-2 and 1.4-3 for the higher-order cases by appropriate extensions ofEq. (1.96). Thus, Wziyijtiiy^Mv^) o r ^ s describes completely the nextmore complex class of process; W^il) or Wtj the next; and so on.

For the majority of purely physical problems, however, the simpleMarkoff process is the one of chief importance. Often, an apparentlynon-Markovian process can be transformed into a Markoff system byintroducing one or more new random variables, such as z = y, u = yy etc.,or by letting z represent a coordinate of another system. Then, it mayhappen that for y and z together we have a simple Markoff process, for whichthe Smoluchowski equation now is

W2(yi,zhti\y2,Z2,t2) = SfWi(yi,ziMy9z}h)W2(y,z9to\y2,Z2,ti) dy dzh<h< U (1.105)

If we can satisfy Eq. (1.105) with our new distribution density, we have asmarginal d.d.

W%(yx,h)y2}t2) = ffW2(yi9zi,h;y2,Z2,t2)dzidz2 (1.106)

and in general ^ ( y i A ; ^ , ^ ) no longer describes a Markoff process. If wecan find a conditional density W2{\) which obeys Eq. (1.105), the jointdensity W^iyiMWh^ m a v be regarded as a projection of the more compli-cated conditional Markovian distribution density H^O/i^i^il^^jfo). ThatW2(yiyti;y2,t2) or W2(yiMy^) by itself does not describe a Markoff processis due to the fact that we have not originally given a complete enoughstatistical account of the system. Our ability to select appropriate addi-tional random variables zf u, v, . . . in order to extend the given processto one that is Markovian depends on the physical mechanism of the processitself, as we shall see presently in Chap. 11.

PROBLEMS

1.16 (a) Show that for a simple Markoff process one can also write

n

[I Wiiyk-uth-iWktth)

Wn(yi,U; . . . ;ynitn) = * = 2 n_x n > 3 (1)

n WAVHM)

h < to < t2 (1.105)

(1.106)W2(yi,h',y2,t2) = ffW2(yi9zi,h;y2,z2,t2)dzidz2

PROBLEMS

n

A = 2

W2(yk~i,tk-i;yk,th)

n-\

WAVHMfc = 2

Wn(yhti; . . . -,yn,tn) = n > 3 (1)

1.16 (a) Show that for a simple Markoff process one can also write

Page 46: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

48 STATISTICAL COMMUNICATION THEORY [CHAP. 1

(6) Verify that, if y(t) is a process which is completely described by a third-order den-sity, Wn is given by

n~l

f l FFa(yk-i,«*-i; . . . ;yk+\,tk+i)

Wntouh; . . - ',yn,tn) = ^ - ^ 2 » > 4 (2)

(c) Show that the generalization of the Smoluchowski equation for (6) is

Wt(yi,ti;y2My3>h) = J ^(I/I^IJ^^II/^OJWBCJ/Z^^V^OI^B,^) dy (3)

for all *o such that h < to < U (h < t2 < h).1.17 Suppose that W2(y\x) = I^2(z|s/), where x, y are continuous random variables

in (— oo < x, y < oo).(a) Show that, if W2{y\x) vanishes nowhere in the finite part of the x, y plane, both

x and y must be uniformly distributed.(6) For the stationary random process y(t), where x = y{h) = yh y = y(t2) = z/2,

we haveW2(yi\y2; U - h) = W2(y2\yi; h - U) (1)

We assume in addition that Wi(y) = W\(—y) and is a monotonically decreasing, non-vanishing continuous function in the interval (0 < y < QO). Show that

W2(yi\y2;t2 - h) = A 8(Vi - y2) + (1 - A) b{yx + y2) 0 < A < 1 (2)

and il may depend on t2 — t\, that is, A ~ A(t2 — £i).

1.5 Time and Ensemble Averages

We have seen in Sec. 1.3 how from the ensemble y(t) one can obtainstatistical information about the process in the form of probability densities(or distributions). It is also natural to ask what can be obtained from asingle representation yu)(i) when this member function is considered oversome period of time (t, t + T), since in most physical situations we shall nothave the entire ensemble actually at our disposal at any one instant, butonly one, or at most several, more or less representative members. In thiscase, certain useful quantities depending on the given yU)(t) can be obtainedfrom a suitable measurement over an observation period and are generallyknown as time averages. Analogous quantities can be defined over theensemble at specified times fa, 2, . . . (cf. Figs. 1.5, 1.9) and are calledstatistical, or ensemble, averages. After defining these average quantitieshere, we shall examine in the next section (Sec. 1.6) how and under whatconditions the two types of averages can be related.

1.5-1 Time Averages, Let us begin rather generally by consideringsomeyth member, Gu)(y!(l'\ . . . , yn

(i))} of the ensemble G, which is itselfa function of the ensemble y(t). Here, yk = y(t0 + fo) (& = 1, . . . , n),and the 4 are parameters representing n times at which observations on the

Wn(Vl,h\ - • - ',yn,tn) =

n~\

fc-2n - 2

fc = 2

W2(yk,tk;yk+ittk+i)

Wz(yk-i,tk-i; . . . ;yk+i,tk+i)

n > 4 (2)

WziyiMwiMvM = J Wz{yiM\yiMyMWi(y2M,yMy^M) dy (3)

W2{yi\y2; U ~ h) = W2(y2\yi; h - t2) (1)

W2(yi\y2;t2 ~h) = A 8(Vl - y2) + (1 - A) 5(yi + y2) 0 < A < 1 (2)

and A may depend on t2 — ti} that is, A — A(t2 — tO.

1.5 Time and Ensemble Averages

Page 47: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.5] STATISTICAL PRELIMINARIES 49

y ensemble yield the particular members yk{i) = yu){U + tk). Then wedefine the time average of Gu) as

(CWKfa), • • • ,v<Htn)])

s lim ^ fT/2 dtQGM[yW(to + h), . . . , yW(fc + *»)] (1.107)!P->« i J-T/2

This definition applies for members of entirely random, deterministic, ormixed processes (cf. Sec. 1.3-4), provided that Gu) is suitably bounded sothat the limit exists.

It is, of course, possible to introduce other definitions of the time average.Here Eq. (1.107) is called the principal limit, t where the interval( — T/2jT/2) is extended (in the limit 5T —> <x>) to include the entire mem-ber function GU) for all — oo < t < oo, but one can also consider the one-sided limits

<G<0>+QO s lim ~ (T dhGW[yM(to + h), . . . , y<HtQ + tn)] (1.108a)r- oo i Jo

1 ro<(?<>>>.•_«s lim i f° *off

W)[l/W(«o + «i), • • • , 2 / ^ o + U] (1.108b)

which, as noted in the next section, are under certain conditions equal tothe corresponding principal limit for nearly all member functions of theensemble. Still other possibilities, like

(GM)tl ^ lim I [tl±TdtoG<W>(to + h), . . . , y™{t* + Q] (1.108c)r->«> i jti

can be considered, but for nearly all cases in the present work we shallfind it convenient to employ the principal limit and in certain cases theone-sided form (GW))+00 when the two are equivalent [cf. Sec. 1.6, Eqs.(1.126), (1.127)].

At this point, observe that the existence of the principal limit [Eq. (1.107)]implies that ((?(7)) depends on the time differences U -— t\} . . . , tn — t\ andis independent of the absolute time scale of the process. % For if we lett'Q- to + h in Eq. (1.107), we can write

<G<»>> = lim T± / dt'0GM[yW{ti>), • • - , Vu)(tn - h + *J)] (1.109)r->oo i Jh-T/2

1 f Tt This is also often written lim TJ™ / G(7) dto. It is assumed that the yM, and

T-*<* *1 J —Thence the GU), persist for all time (— «> < t < oo) and that these various limits exist.

% This is not the same thing as saying that G(»> belongs to a stationary process, since itdoes not necessarily follow that the statistical properties Wn in > 1) for G remain invari-ant under a (linear) time shift, even if we add the condition that ((?») exists for almost allj[ci. Eq. (1.100)].

(CWKfa), • • • ,v<Htn)])

= lim 1T

PT/O.

l-T/2dtQGM[yW(to + h), . . . , y<Hh + Q] (1.107)

(G«>)+« lim ^/o

d*oGw)fo(')fto + *i), . . . ^ ^ t t o + fc*)](1.108a)

(1.108b)1

^Jlim- t »

— 7*

*cdtoG^lyUKto + h), . . . ,y«><h + Q]

<G«>> = limT7—>oo

1T

"<i+r/2

Jtx-T/2<U'0QW[y<HtS), . . . ,!/<'•>(*„ - *i + «J)] (1-109)

t This is also often written limr->oo

12T

r T

-TG(jl) dU. It is assumed that the y^, and

hence the GU), persist for all time (— oo < t < oo) and that these various limits exist.

Page 48: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

50 STATISTICAL COMMUNICATION THEORY [CHAP. 1

The integral can now be split into three parts,

1 fr/2 1 fti-T/2 1 /T/2+«i

T J-T/2 1 J-T/2 2 7772

only the first of which in the limit T —» <*> in general is finite (p* 0), so thatEq. (1.109) becomes, on dropping primes,

1 fr/2<(?<>•>> = lim ~ dtoGW[y<Hto), . . . , y">«n - *i + <o)] (1.110)

y_» oo i J-T/2

which establishes the statement that (Gu)) depends only on the time differ-ences fa — fa, . . . , tn — $i. From this we might be tempted to say thatthe limit exists only if yU) (and hence GU)) belongs to a stationary process,but the limit can exist also when y(t) is not stationary. Consider the ensem-ble of Sec. 1.3-7(2), and let G = y2, for example. Then we have explicitlyin Eq. (1.110)

A 2 fT/2

<G«>) = lim -— [1 + X cos fato + «<«)]» cos2 (co0io + «W) + *o) dt05T-> oo 1 J-T/2A*2 \ CT/2 [

= ^ lim ^ / 1 + X cos («m«o + 0(?>))\2 )

+ T [1 + cos (2aao + 2«">)] [1 + cos (2a>o*o + 20 '> + 2^)} dt0z )

= -~ ( 1 + "rt") wmj coo incommensurable (1.110a)

Of course, finite time averages like

~ fTdtoG^[y^(to + h), . . . , yW(t0 + tn)]

I fT + h= ^ J^ dfoGWfe^tfo), . . . , !/<««» - <i + <o)]

do depend on the absolute origin of time, through fa. We observe also thatsimilar remarks hold for more general situations, e.g., where G(i) depends onseveral random processes x(t), y(t),z(t), . . . ,sothat(? ( / ) = Gu)[x^'}(fa + to),. . . ; y ( H h + t o ) , . . • ; z ^ ( f a + t o ) f . . . ; . . . ] .

Time averages of particular interest in practical applications are:1. The mean value

(yW(t)) = lim » / y<Ht0 + t) dt0 = <yW>> (1.111)

2. The mean square or intensity

1 fT/2

<VW)(02> = lim ^ / yW(to + t)*dh = ((y^)2) (1.112)I7-* « i y - T/2

1

rr /2

/-r/2( ) * i - 1

rCti-T/2

-T/2( )d« +

1r

/T/2+«i

/JT/2 ( )««S

^4o2

2 1 +XN

2wm, coo incommensurable

2+X2

[1 + cos (2umto + 2*<«)] [1 + cos (2co0 o + 20 "> + 2^o)] dt0

(1.110a)

1 + X cos (wmi0 + 0O))2 r-*oo T J-T/2 I

liiri r77/2Ao2

<6^> = lim5T->oo

4o2

T

CT/2

-T/2[1 + X cos («w*o + 0(y))]2 cos2 (cooio + * 0 ) + 0o) d o

1r

/•?

hdtoG^[y^(to + fa), . . . ,y<Hto + tn)]

1T

[ ? + <!

7*i*0fl(«[l/W«0), • • • . I / ^ f t n - i l + WI

(2/(?)(0>1

L TCT/2

-r/2 'V(«fto + t) dU = <M<«> (1.111)

(y<»(ty) = lim ='•r/i

-r/2y^ih + tYdh = <(y(")»> (1.112)

1limr-»oc T

1

Tlim (1.112)

Page 49: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.5] STATISTICAL PRELIMINARIES 51

3. The autocorrelation function

1 fT/2WHhWH**)) - lim ^ yW(t0 + h)yM(to + fa) dtQ

T->* 1 J ~T/2

= RyMffii - fa) (1.113)(cf. Prob. 1.18c).

4. The cross-correlation function1 fT/2

(y<Hti)z«>(h)) = lim ~ yw(tQ + fa)z™(to + fa) dt0

y_>« I J -T/2

= Ry^(fa - fa) (1.114)The first item (1) gives the steady, or "d-c," component in the wave

yW(t), if any, while item 2 is a measure of the intensity of the disturbanceand is proportional to or equal to the mean power in this wave (dependingon the units in which yW is given). The auto- and cross-correlation func-tions (1.113), (1.114) are averages of special importance in practical oper-ation, as we shall see later in Part 3. A discussion of their salient featuresis reserved for Chap. 3. Note from the argument of Eqs. (1.108) to (1.110)that the first-order time moments or time averages (yu)), (y(i)2), . * . , (y(y)n),etc., are independent of t, while the second-order averages depend only on thedifferences £2 — fa [or fa — fa in some fashion regardless of whether yW is amember of a stationary or nonstationary ensemble; cf. Eqs. (1.113), (1.114)in the simplest cases]. For periodic waves, we observe that if we divide theinterval (Zi — T/2, fa + T/2) into N periods, plus a possible remainderwhich never exceeds the contribution of a single period interval To, we canwrite the familiar result

[ 1 fNTo/2 / 1 \ 1

T 4 - / y<Ht* + h)dto + o(±)\-/V 1 o J-NTo/2 V V J

I f(To/2,To)= HT / yU)(tO)Perdto (1.H5)

•* 0 J(-To/2,0)since the integral over N periods is equal to N times the integral over asingle period. Generalizations for periodic yu)(t)n, G(?), etc., are directlymade.

1.6-2 Ensemble Averages. The other type of average (as already notedin Sees. 1.2, 1.3) is defined over the ensemble, at one or more times fat fa, . . . ,as distinct from the time averages above, which are performed on a singlemember of the ensemble over an infinite period. Let us consider again thegeneral ensemble G = G(yi,fa; . . . ;yn,tn) and for the moment restrict it toa finite number of members M. Thus, we define the ensemble, or statistical,

(y^Wfa) 1 CT/2= km ,= I

r-»» 1 J-T/2= J2/«(«, - h)= RMh - h)

T / 2

-T/2y^(to + h)y^{U + U) dk

(1.113)

<vw«i)*(fl(«»): r i p/2

= km = /T->» i J-T/2

= 22wW(*l - tO

(T/2y^(to + h)zw(to + h) dU

(1.114)

(y'Hther) [ I rrfn/2 /1 \ 1

jfnr / y^Hk + tOdtn + o(j,)\JVi 0 J-NTv/2 \"/J

I f(T«/2,To)-TrT / »(fl(«o)p»««O

i 0 J (-To/2,0)

(1.115)(-7V2,o;

'•(ro/2,To)y(i)(to)PerdtOTo±

lira1

iVTo

/•JV7V2

l-NTo/2ya)(t0 + tO dt0 + 0

1^

(cf. Prob. 1.18c).4. The cross-correlation function

3. The autocorrelation function

The first item (1) gives the steady, or "d-c," component in the wavey^it), if any, while item 2 is a measure of the intensity of the disturbance

since the integral over N periods is equal to N times the integral over asingle period. Generalizations for periodic yu)(t)n, (?(?), etc., are directlymade.

1.6-2 Ensemble Averages. The other type of average (as already notedin Sees. 1.2, 1.3) is defined over the ensemble, at one or more times tit t2, . . . ,as distinct from the time averages above, which are performed on a singlemember of the ensemble over an infinite period. Let us consider again thegeneral ensemble G = G(yi,ti; . . . ;yn,tn) and for the moment restrict it toa finite number of members M. Thus, we define the ensemble, or statistical,

Page 50: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

52 STATISTICAL COMMUNICATION THEORY [CHAP. 1

average of G asM

G{yi,h; . . . ,W») - ^ ^GW)h/<»(ti), . . . ,2/">(«n)] = <? (1.H8)

In terms of the measure, or weighting, P(G) of G, this is equivalent tof

G = JgG dP{G) = | G GWX(G) dG (1.117)

where, as previously, we may use the 8 formalism (cf. Sec. 1.2-4) if the dis-tribution of G contains discrete mass points. When, as is usually the case,the number M of representations becomes infinite, we extend the definition to

M

G(yi,h; . . . ;Wn) - Jljm ~ ^ ( ? ^ W , . . . ,y<HQ] (l.H8a)

where it is assumed in addition that this limit exists and converges to theexpectation of G, so that Eq. (1.117) applies. J In terms of the randomvariables y\} . . . , yn, we may write also

G = E{G\ - f • - - fG(yi,h; . . . jWO^-dMi ; • • • ;v*,O di/i • • • dyn

(1.1186)

and the observation times fa, . . . , tn appear now explicitly as parameters.Equations (1.116), (1.118a) suggest the "counting" operations that one goesthrough in determining G from a given ensemble, just as Eqs. (1.68) to (1.70)indicate how the hierarchy of probability densities may be computed for theprocess itself.

Setting G(yh . . . ,yn) equal to z/in = y(h)n (n = 1, . . .), we obtain then first-order moments

M

y? - Um ± 2^ WHh)]" = j VinWi(yi,h) dy, (1.119)3

and, like Eqs. (1.111), (1.112), y1} yi2 are analogous measures of the meanand mean intensity, but now with respect to the ensemble at time t\} asdistinguished from the steady component and mean intensity of a particularrepresentation throughout T —* <*>. More complicated statistical averagesare also possible:

y(ti)y(t2) = JfymWiiyrfiiytfa) dyxdy2 = Mv(tht2) (1.120a)y(h)hy(t2)

ly(h)m - J'Syiky*ly*mWz(yxM',V*M\y*>U) dyi dy2 dyz (1.1206)t Cramer, op. cit., sees. 7.4; 7.5 and chaps. 7-9. See also Sec. 1.2-4 above and com-

ments. Specifically, for moments or statistical averages, see Sec. 1.2-5 above andCrameV, sec. 15.3.

X W\(G) will, of course, be a different density function here, ordinarily.

G(yi,tl', ' • • )Vn,tn)1

M

M

3 = 1

GW[yW{ti), . . . ,2/«>(«n)] = G (1.116)

C1 =}G

GdP(G) -10

, GWi(G) dG (1.117)

U\yiA\ • • • ;Ww = lim1

My = i

AT

(?0)[y0)(^i), . . • ,yU)(Q] (1.118a)

(? = £{(?} « / • • • /G(yi|<i; . . . ;yn,t) l^n^iA; . . . ;yn, n) dyi • - • dyn

(1.1186)

yin lim -r-r

y

[j/(y)(<i)]n = / yi"Wi(yi,h) dyi (1.119)

Mv(txM) (1.120a)dyidytdy* (1.1206)

y(ti)y(t2) = Jfyiy2W2(yi,ti;y2)t2) dyidy2 =sy(ti)hy(t2)

ly(h)m - nyiky2lyrW9(yht1;y2,t2;yhh)

Page 51: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

53

Tto

g ' ^ • • i S j r ^ ^ u X ^

iT | i ' "I T ^

s' s' 3. I ^ g ^ ^

•3J 3 ^ ' < ^ o ^ &5

Si ^

I I8*- 5 1 ,«* i 1 * S 1&>|S« I I § * ^^ i S l|H",a03 d ^ ' - ^ I I . , — ^ ^ - s l l » - » - a « ? f t l | ^

3 « J i i . J - - » * i x l r ^ o -» if IS i J J * ' « * sJJg ^^^B|- M II II II ' ' ' S " II II .2

~S Is J§ ^ i < ^ o

; f '• i Ii ; § '• t

i i l l i I !I U 1 I 1!H s I I 1 g I

I I I I %$i l l i | 1

rH «N CO ^ «O

Page 52: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

54 STATISTICAL COMMUNICATION THEORY [CHAP. 1

and still more generally, where y, z, x) etc., are different random variables,

G(yi,Zi, . . . txn) = / * * ' / dyidzt • • • dxnG(yhh]z2,t2) • • • jWn)(1.120c)

Statistical and corresponding time averages of particular interest are sum-marized in Table 1.1.

For stationary processes (cf. Sec. 1.3-6), we can set t\ = 0, rememberingthat t2, £3, etc., are measured now from t\ = 0, and write Eq. (1.1186)accordingly. Specifically, we have

yf - y(hY = fynWi(y) dy = y (1.121a)

My(h,h) — yiy2 = y(ti)y(t2) = SfyiytWtiyi, y2) h — h) dyi dy2

= ))yiy2W2(yhy2',t) dyi dy2 = Mv(0, etc.(1.1216)

where we adopt the convention that t2 — t\ = £, recalling that only the timedifferences 22 — h = t, U — fa, etc., between observations are importantnow. In particular, observe that, if in the limit t2 — t\ —> °°, we obtainfrom Eq. (1.99)

lim y±y2 = ffyiy* lim W2(yh y2; t2 — h) dyx dy2

- ffyiylwTiyiWtiy,) dVl dy2 - y* (1.122)

for stationary processes, while in nonstationary cases this becomeslim 2/1 /2 = 2/12/2. For purely random processes (Sec. 1.4-2), this is true

for all finite \t2 — ti\ > 0 and for all moments (that exist). Note, finally,that, if we go to the other extreme (t2 — h —> 0) of observations at identicaltimes, we may use Eq. (1.100) to write

lim yi2/2 = JJyiyiWi(yi) 6(y2 — yt) dyi dy2

= Svi*Wi(yi) dyi = y? - y5 (1.123)

with 2/12 = y(ti)2 for nonstationary situations. From this and Eq. (1.122),it is clear that we can readily obtain the lower-order moments y2, y* directlyfrom 2/12/2 by the suitable limit on t2 — ti. This is a consequence, of course,of the circumstance that Wi(yi), Wi(y2) are marginal distribution densitiesof W2 and reflects the fact that first-order information is completely con-tained in the second- (and higher-) order distributions.

PROBLEMS

1.18 (a) Does Eq. (1.122) hold for deterministic processes? Discuss, consideringSees. 1.3-4, 1.3-5. Compare stationary and nonstationary cases.

(b) Show, for the ensemble y(t,4>) = A cos (ojQt + 4) [A, w0 fixed, <$> uniformly distri-

) = JfyiyiWsiyi, 2/2; h — h) dyi dy2

= JJyiy2W2(yi,y2',t) dyi dy2 = Mv(t), etc.(1.1216)

My(tht2) = 2/12/2 = y(ti)y(t2)

l i m 2/12/2 ffyiy* lim W2(y1} y2; t2 — h) dyx dy2

= SJyiViWi(yi)Wi(y2) dyxdy2 = y2 (1.122)

l i m 2/12/2 = //2/i2/2Wri(2/i) 5(2/2 - 2/1) dyi dy2

= fVi*Wi(yi) dyi = 2/12 - y2 (1.123)

PROBLEMS

ti — h—>«J

h-tx-^O

t 2 -tl~* 00

<2-<l-»«

Page 53: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

S E C . 1.6] STATISTICAL P R E L I M I N A R I E S 55

buted in the primary interval (0,2TT)], that by an actual calculation of yxy2 and {yiU)y2{i))

RJtKh - fa) - WHhWHh)) - m*= J- Ao2 cos o0(t2 — fa) probability 1 (1)

(c) Show that #*»>(«* - fa) - JV^C** - *,) and that #,(*, - fa) = #„(*! - *,).(d) Show also that 0 < |/y>(*2 - fa)\ < Ry

{i)(0) and, for the ensemble, that 0 <\Kv(t2 - fa)\ < Kv(0), 0 < \My(t2 - fa)\ < MM-

1.19 (a) For the ensemble of additive and independent signal and noise processes,y(t) = A cos (wot + 0) + iV(0, where S(t,<f>) = A cos (wo£ + 4») has the properties ofProb. 1.186, show that the autocorrelation function of y is

RyU)(t2 - h) = RglHti ~ fa) + RN^(t2 - fa) (1)

and if N(t) is normal, with W2(N\, iV2j 2 — 1) given by Prob. 1.8a, show that

Ky{h - <i) = ^ cos W0(<2 - «i) + TT"*pN(t* - fa)

= Ks(t2 - <0 -f KJVO. ~ fa) (2)

(6) If the noise contains a steady component N ^ 0, modify Eq. (2) explicitly.(c) If the noise contains a deterministic component ta (a fixed), what is Ry

U)t? Whatnow is the form of the covariance function Ky?

(d) In (a), let us suppose that signal and noise are no longer independent. Show thatthe autocorrelation function of y is now

Ry^Kt) - RsM(t) +R*r">(t) +RSN<»(t) +RNS"W- RsU)(t) 4- RrfHt) + J W > « + W > ( - 0 (3)

1.20 (a) Prove Wiener's lemma:26'36 If g(t) > 0 for all ( - 00 < T < 00), and if1 f T/2 1 /* !T/2+*i

lim 7= / gr(io) d o exists, then lim ^ / , S'( o) di0 exists for all (real) fa and7»_ oo ^ y r / 2 r-^oo •* y— r /2+hthese two limits are equal.

(6) Show that if g(t) is not bounded at least from one side—i.e., if g(t) > 0 or g(t) < 0does not hold for all (— 00 < t < 00)—Wiener's lemma fails.

(c) Extend (a) to cases where g(t) > C or g(t) < C for all (— « < < < 00), for somereal constant C.

1.6 Ergodicity and Applications

Although an effective theory of noise and signals in communication sys-tems is necessarily based on probability methods (cf. Sec. 1.1), it is notusually possible to deal experimentally with the entire ensemble of possiblesignal and noise waves. One considers, instead, a single member (some-times several) of the ensemble y(t) and observes this member's behavior inthe course of time, obtaining, for example, the time averages (y{j)), ((?/0))2),(yU)(ti)yu)(ti + t)), etc., discussed in Sec. 1.5.

Now, for a theory to be useful in applications, it must permit us to relatethe a priori quantities, predicted by the theory from the assumed statisticalproperties of the ensemble, to the corresponding quantities actually observedexperimentally (i.e., a posteriori) from the given member of the ensemble inthe course of time. In other words, we wish to know (1) in what sense andunder what conditions time and ensemble averages can be related and (2)

RvuKt* - fa) - <y«>(W>(«,)> - 2/12/2= SHJAO2 COS WO(/2 — 2i) probability 1 (1)

RvU)(t2 - fa) - RsU)(t2 ~ fa) + RNU)(12 - fa) (1)

Ao2

2Ky(t2 - fa) = cos ojo(t2 — fa) + N2pN(t2 — fa)

= Ks(t2 - *0 -f KN(U - fa) (2)

Ry^Kt) - RSM(t) +R*r">(t) +RSN<»(t) +RNS"W= Rs(i)(t) + RN

U)(t) 4- JWKO + Rs^K~t) (3)

1.6 Ergodicity and Applications

Page 54: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

56 STATISTICAL COMMUNICATION THEORY [CHAP. 1

whether our particular representation yU)(t) belongs to an ensemble wherethis is possible. Sets for which a definite relationship between time andensemble averages exists, so that (1) can be applied, are then said to beergodiCj and the conditions for an inter changeability of time and ensembleaverages are established by a form of the ergodic theorem. The condition 2may be called an ergodic hypothesis. This is the assumption, usually made,that we are actually dealing with a member of an ergodic ensemble andhence can apply the ergodic theorem, although in physical situations this isnever strictly true, since it is not possible to maintain the required con-ditions, as we shall see in more detail presently. The notion of ergodicityappeared first in kinetic theory and classical statistical mechanics, mainlythrough the work31 of Maxwell, Boltzmann, Clausius, and Gibbs, with morerefined concepts culminating in various forms of ergodic theory in theresearches of G. D. Birkhoff,32 Von Neumann,33 Hopf,34 and others duringthe 1930s and subsequently and with important application to the theory ofturbulence, also at about this time.35 Applications of this concept to com-munication theory are more recent, dating in large part from the work ofWiener36 in the early 1940s and appearing since in studies of signals andnoise in electronic systems.37

1.6-1 The Ergodic Theorem. Let us begin by stating first what ismeant by an ergodic ensemble:

An ensemble is ergodic if (a) it is stationary (in the strictsense; cf. Sec. 1.3-6) and (b) it contains no strictly station-ary subensemble(s) with measure other than 1 or 0 (1.124)

To illustrate, let us consider again the ensemble y{t}<f>) = AQ cos (wo£ + </>)of Sec. 1.3-7(1). If $ is uniformly distributed in (0,2TT) and AQ, W0 are inde-pendent of time, then y(t,<t>) is strictly stationary, as we have already shown.Moreover, if Ao, o)0 are fixed, i.e., have measure 1 at a single value (^Ao,co0)}

then from the above we observe also that y(t,4>) is an ergodic ensemblesince it contains no stationary subsets of measure different from 1 or 0.This is easily seen from the following example: Consider the subset for[0 < 0 < a (< 2?r)]. Then, with an arbitrary (linear) translation, one gets0', with the <£' subset as indicated in Fig. 1.13. The measure of the twosubsets (in their respective primary intervals) is now quite different. Themeasure is no longer invariant, and the subset (0 < <£ < a) is accordinglynot stationary. On the other hand, if now we allow Ao to have a distri-bution density of the form W(A0) = A0e~A°2f2(rya2 (Ao > 0), W(A0) = 0(Ao < 0), with (j> uniformly distributed as before in (0,27r), the ensemble isstill strictly stationary but it is no longer ergodic. The subset of functionsfor which Ao lies between 1 and 2, for example, is also stationary and pos-

sesses a measure / W(A0) dA0 which is not 1 or 0.

We can now state the ergodic theorem in a form which allows us to

Page 55: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.61 STATISTICAL PRELIMINARIES 57

interchange time and ensemble averages in almost all cases, when externalconditions permit. Given that the ensemble in question is ergodic [Eq.(1.124)], the following has been proved:32"34'38

Ergodic Theorem. For an ergodic ensemble, the average of a function of therandom variables over the ensemble is equal with probability unity to the average(of the corresponding function) over all possible time translations^ of a par-ticular member function of the ensemble, except for a subset of representationsof measure zero.%

In brief, except for a set of measure zero, time and statistical averagesare equal (with probability 1) when the ensemble is ergodic. Thus, if

0-a («.*')

-Primaryinterval

for <>'

FIG. 1.13. A nonstationary subset (O,0,a) of the ensemble y(t,<f>) = A cos (coo£ + <£)•

GU)(yi, • • • ,Vn) (cf. Sec. 1.5-1) is some O"th) member function of an ergodicensemble, then the ergodic theorem above says that

<(?<%(«i),

or

,V(Q]) = G(yi;y2,h; . . . ;yn,tn)<G> = E\G) = fGGW(G)dG

probability 1 (1.125a)

probability 1 (1.1256)

if we wish to express the measure in the G rather than in the (y±, .probability space.

The essential force of the theorem, in cruder language, is that, if theensemble is ergodic, essentially any one function of the ensemble is typicalof the ensemble as a whole and, consequently, operations performed in timeon the typical member yield the same results as corresponding operationstaken over the entire ensemble at any particular instant. It is easy to seethat the condition of (strict) stationarity [Eq. (1.124a)] is necessary, for ifit did not hold, ensemble averages at various times h} £2, . . . would havedifferent values in general and, moreover, would not be equal to the corre-

t The time translations which take the ensemble into itself, i.e., leave its measureinvariant in the sense of strict stationarity (cf. Sec. 1.3-6), are said in this case to bemetrically transitive; i.e., they have here the further property [Eq. (1.1246)] that only sub-sets in the ensemble which are stationary have measure 1 or 0.

t This theorem is also known as the strong law of large numbers for (strictly) stationaryprocesses (cf. Doob38).

Primary• interval -

for ^

-0-a -0-a

0-a

f GW(G)dG',Vn,tn)

(G) = E\G) -

,V(tn)]) = G(yi;y2,fo; . . . probability 1 (1.125a)

probability 1 (1.1256)

,Vn)(Vi, • • -

Page 56: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

58 STATISTICAL COMMUNICATION THEORY [CHAP. 1

sponding time average over a particular member function (cf. Figs. 1.5, 1.6for noise or Fig. 1.106 for a deterministic process). Strict stationarity isnot enough, however, if we are to equate time and ensemble averages. Thecondition [Eq. (1.124a)] of metrical transitivity is needed to ensure that thejth member is truly representative. For example, in the case above wherethe amplitudes A0 have the measure W{A0) = Aoe~Ao2'2<x2/a2 (Ao > 0), mem-bers in the subset (0 < Ao < 2a) are much more likely to be chosen thanare members in the subset (2a < Ao < «>). Time averages on membersfrom neither set can then be expected to give the same statistical average,when the whole ensemble is considered, since the members of neither set arerepresentative of the entire ensemble. For some examples where metricaltransitivity does hold, see the "geometrical" cases discussed by Kamp6 deF&riet35 (chap. V, sec. 8).

The time average referred to in the statement (1.125) of the ergodictheorem is the principal limit [Eq. (1.107)]. However, if G(i) is chosen froman ergodic ensemble, then it is also true39 that the one-sided averages ofEq. (1.108) exist and are equal, with probability 1, that is, for almost everymember G(i\ so that

<(?W)>+oo = (G™)— probability 1 (1.126)

and from this it follows at once that

lim -L (T G">(t0) dto = \ «(?<«>+. + <(?»>_)

= <G<*>±a0 = <GW)> probability 1 (1.127)

since the left-hand member of Eq. (1.127) is equivalent to (GU)) [Eq. (1.107)].Because of ergodicity, (G^)±w = <(?<»">> = G^ (probability 1), and the limitsare identical for (almost) all members of the ensemble. However, evenwithout the additional condition of metrical transitivity required forergodicity [i.e., Eq. (1.124a)], it is still true that Eqs. (1.126), (1.127) hold,provided only that the process G is strictly stationary and measurable with2£{|(?|} < oo.39 The actual values of these limits vary, of course, with j(i.e., over the parameter of distribution), assuming the common limiting orensemble value G with probability 1 when the process G is also metricallytransitive.

As Wiener has emphasized, the equivalence of (GU))+OQf (G^)-*,, and (OU))under the above conditions is a very important result, for it permits us touse the "past" of an individual member (where now we take £0 — 0 in{Cr -oo to be the "present" of the observation) to predict its "future" insome average sense. Thus, time averages carried out on the past (—T,0)are equivalent (probability 1) to those which are to be carried out (0,T)with respect to the disturbance's future behavior (lim T —> oo). If; more-over, G is ergodic, in almost all cases we can determine beforehand from thegiven ensemble data just what this average should be, provided that it is

<(?W)>+oo = (G™)— probability 1 (1.126)

r^L 2T/ rf__r

G^(t0) dt012

= <G<*>±00 - <G«>>(W+- + (G^)-J

probability 1 (1.127)

Page 57: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.6] STATISTICAL PRELIMINARIES 59

possible actually to achieve these limiting operations in a physical system.The precise future structure of individual waveforms cannot, of course, bedetermined from the past, unless the process is completely deterministic,which is clearly a completely trivial case for communication purposes. Theinformation-bearing members are all essentially unpredictable in this strictsense: it is only the ensemble properties, and the time averages related tothem, that are predictable and hence useful quantitatively.

1.6-2 Applications to Signal and Noise Ensembles. In many problemsof statistical communication theory, the signal, noise, and noise and signalensembles together are not ergodic, even for the idealized situations wherestationarity is assumed. The theorem [Eq. (1.125)] cannot then be applieddirectly. However, it often happens that these more complicated ensem-bles are made up of subensembles which are themselves ergodic, and to theselatter the theorem can be applied separately. In other cases, both time andstatistical averages may be required to determine the expected behavior ofa system. Let us illustrate these remarks with some typical examples.

Example 1. As the first illustration of the interchangeability of time andensemble averages under suitable conditions, consider the ensemble y(t) =a(t) cos [wot + 4>(t)]} where a(t)j <j>(t) are, respectively, an envelope and phasethat are entirely random; co0 is fixed. If a(t) and <t>(t) are stationary, andif <£(0 is uniformly distributed over (0,2T), the ensemble y(t) is also station-ary with y = 0. Furthermore, if both a{t) and <j>{t) are ergodic [cf. Eq.(1.124)], then so also is y. Now suppose that a(t), $(£) are statistically inde-pendent, that we are given the particular data yu)(t) over (— oo < t < <»),and that we wish to find the autocorrelation function {yU) (U)y{i) (to + t))[cf. Eq. (1.113)], viz.,

RyU>(t) = lim i [T/2 i/»(t»)y^(t0 + 0 *o (1-128)

This we can do directly by performing the indicated operations of Eq.(1.128) on the given data. However, if the statistical properties of theensemble are known, we can also obtain RU)(t) directly from the covariancefunction K(\t\) (cf. Table 1.1) with the help of the ergodic theorem (1.125).We have

«vw)(0 = (yi(i)y2u)) - Kv(t) = ym = y(h)y(h)t = k - h, y = 0, probability 1 (1.129)

Explicitly, this becomes

RyU)(t) - J • • • Ja(ti)a(*,) cos M i + 4>(t2)]X cos [wot* + <t>(t2)]Wi(al94>i',ai,4>i]t) dax • • • dfa

= J • • • Jaia2 cos (axrfi + 0)X cos (co0$2 + 4>)W*(ai,a%;4>$) daxda2d4> probability 1 (1.130a)

since 0 is uniformly distributed. Also, since the (a,0) processes are sta-

Ry<Ht) = lim1T

[T/2

f-T/2yU)(k)y^(t0 + t) dto (1.128)

* = h - h, y = 0, probability 1 (1.129)«v w ) (0 = (yi(i)y2U)) - Ky(t) = ylVt = y(h)y(h)

RyU)(t) - J • • • /a(<i)o(t,) cos [cooh + <j>(t2)]X cos [a>0*2 + <t>(t2)]W2(a1,<l>i',a2,<t>2',t) dax • • • d<t>t

= J • • • faia2 cos (coo i + 4>)X cos (mh + <t>)W2(aha2;<t>;t) daxda2d4> probability 1 (1.130a)

Page 58: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

60 STATISTICAL COMMUNICATION THEORY [CHAP. 1

t ionary and independent , we have [cos (coo i + </>) cos (coo£2 + tfOta =

[cos <t>' cos (coot + <£')]0 = }/2 cos a>ot, so t h a t finally

00

Rv(i)(t) — XA cos o>0£ If aia2W2(ai,a2;0 dai d!a2

= M<*i<*2 cos wo£ probability 1 (1.1306)

The y{t) may represent an ensemble of narrowband noise voltage or currentwaves such as are encountered in radio or radar reception or an amplitudemodulation of a carrier cos (uot + 0) (cf. Chaps. 12, 13, 17 to 19 for specificapplications).

Example 2. Mixed ensembles, consisting of signal and noise processes insome linear or nonlinear combination, offer further opportunity for theinterchange of time and ensemble averages. Even if the combined processis ergodic, the most convenient form of result is often a mixture of bothtypes of average, as our example below indicates. Accordingly, instead ofnoise alone, let us consider now the ensemble z = g(y), where g is a one-valued function of y and y = y(t) is the sum of a noise ensemble N(t) and asignal set #(£), for example, y = S + Ny with the additional conditions thatS and N are statistically independent and ergodic. The function, or trans-formation, g} for example, may represent the effect of an amplifier or asecond detector in a radio receiver, or some other linear or nonlinearf oper-ation, such that the instantaneous output of the device in question is

*">(*) - g[yM(t)] y<«(<) = 8™(t) + N«\t) (1.131)

when y(i)(t) is the input wave.Let us calculate the correlation function of this output wave, viz.,

1 fr/2

R.WQ) = lim ± g[y<'\t*)]g[y<Hto + t)]dt0 (1.132)T~* oo i J - 272

from the corresponding statistical average gig2, with the help of the ergodictheorem. We have

RBi»(t) = I 0(*°)0(*° + 0 = E\g(U)g(h + 0} probability 11 yig(yi)g(y2)W2(yi,y2;t) dyi dy2 = g\g* t = h ~ h

(1.133a)

When the signal is at least partially deterministic, usually it is easier toevaluate the integrals in Eq. (1.133a) by taking advantage of the explicitsignal structure. Writing

R*U)(t) = 0OS1 + N1)g(S2 + N2) = J - • • JgiSt + NJgiS* + N2)X W2(ShN1;S2)N2;t)s+NdSl • • • dN2 (1.1336)

f For a definition of linear and nonlinear operation, see Sec. 2.1-3(2).

Page 59: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

SEC. 1.6] STATISTICAL PRELIMINAKIES 61

where (W2)S+N now is the joint second-order density of S and N, andremembering that S and N are independent, we see that (W2)S+N factors,namely, (W2)S+N = (W2)S(W2)N. Next, we employ the ergodic theoremagain, now to replace the statistical average over the signal components bythe equivalent time average, which makes direct use of the signal's func-tional form. The final result is a mixed expression containing the statisticalaverage over the noise terms and the time average over the signal elements,viz.,

RiHgit) = ^ 2 = (IJgWfi + N1)g(S^) + N2)W2(NhN2;t) dNidN*)a

= <flf(5i(« + Ni)g(8t™ + N2)W)S probability 1(1.133c)

where, following the convention of Sec. 1.5, the bar refers to the ensembleaverage and the pointed brackets ( )s to the time average.

Example 3. In Examples 1 and 2, we considered stationary (and ergodic)systems only. What happens now when we wish to calculate averages fornonstationary systems? Here the ergodic theorem cannot be applied in tototo interchange time and ensemble averages. Instead, for useful results wemust apply both time and statistical averages. Let us illustrate by modi-fying Example 2, extending it to include the case of a periodic amplitudemodulation of a carrier cos co0£, so that the signal S(t) has the deterministicstructure S(t) = Vmod(t) cos wot [Vmod(t) > 0]. Then the ensemble y(t) =S(t) + N(t) for the input to a nonlinear system g, and the correspondingensemble z = g(y) for the output, are no longer stationary (cf. Sec. 1.3-7,also Prob. 1.96) and therefore not ergodic. In this case, we cannot com-pute the correlation function (for either y or z) a priori from the ensembleproperties of y and the transformation gr, but only the average of this quan-tity over the ensemble (when such an average exists). Thus, we have

1 fT/2

R«Kt) = lim £ g[y<>'Kt*)]g\y{Ht* + t)]dtoy_>« i J -T/2

= (g(yi^)g(y2u)Yc'N))mod (1.134a)fl.w)(0 = < / • • • JgW> + NMS2(» + N2)W2(ShNi;S2,N2;t)c+N

XdN1 - • • dS2)mOd (1.134b)RSHt) = i(g(SiU) + NJgiS^ + N2)m)c)mod (1.134c)

where we have used the fact that for fixed phases ( ^ t') of the modulationthe subensemble formed of modulated carrier plus noise, for example, y' =Vmod{t') cos (coot + </>)+ N(t) (tf fixed), is ergodic. Finally, we perform atime (or equivalent statistical) average over the epochs tf of the modulationas indicated by ( }mod in Eqs. (1.134b), (1.134c); ( )<? is the time averageover the carrier, cos a>o£, which with ( )mod takes advantage of the deter-ministic structure of the signal. For nonstationary systems, then, both

Page 60: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

62 STATISTICAL COMMUNICATION THEORY [CHAP. 1

types of averaging are needed if one is to obtain results independent of theparticular member of the ensemble originally available, unlike the entirelyergodic cases of Examples 1 and 2, where only one averaging process isrequired.

Usually, we are interested in the (ergodic) signal or modulation sub-ensemble after reception in noise. To calculate its correlation function,for example, or to observe and obtain the latter experimentally from thedata of a particular representation, we must be able to separate it from thecarrier ensemble. The ergodic theorem may then be applied directly tothis subensemble, as in Example 1 or 2. A radio or radar receiver actuallyperforms this separation. The process of detection and filtering embodiedin the transformation g enables us to pick out the desired signal (contami-nated by noise, of course), after which the various time or ensemble averagescan be performed upon it. When the modulation is not ergodic, both typesof average are again required. Finally, we emphasize once more thatergodicity here is an assumption, an idealization which can give only anapproximate account of actual physical situations, because of finite obser-vation times, e.g., finite samples in the statistical sense, and because of theultimately nonstationary character of all physical processes. Nevertheless,for many applications in statistical communication theory the notion ofergodicity is a useful one and is a convenient starting point for the techni-cally more involved and realistic, but conceptually no more difficult treat-ment which specifically considers the finite observation period or samplesize (cf. Chaps. 17 to 23).

REFERENCES

1. Wiener, N.: " Extrapolation, Interpolation, and Smoothing of Stationary TimeSeries," Technology Press, Cambridge, Mass., and Wiley, New York, 1949; publishedoriginally at Massachusetts Institute of Technology, Feb. 1, 1942, as a report on DICContract 6037, National Research Council, sec. D2.

2. Shannon, C. E.: A Mathematical Theory of Communication, Bell System Tech. / . ,27:379,623 (1949).

3. Rice, S. O.: Mathematical Analysis of Random Noise, Bell System Tech. J., 23: 282(1944), 24:46 (1945).

4. Wald, A.: "Statistical Decision Functions," Wiley, New York, 1950.5. Van Meter, D., and D. Middleton: Modern Statistical Approaches to Reception in

Communication Theory, IRE Trans, on Inform. Theory, PGIT-4: 119 (September,1954).

6. Middleton, D., and D. Van Meter: Detection and Extraction of Signals in Noisefrom the Point of View of Statistical Decision Theory, / . Soc. Ind. Appl. Math.,3(4): 192 (December, 1955), 4(2): 86 (June, 1956).

7. Cramer, H.: "Mathematical Methods of Statistics/' Princeton University Press,Princeton, N.J., 1946.

8. Kolmogoroff, A.: "Grundbegriffe der Wahrscheinlichkeitsrechnung," Berlin, 1933.9. Borel, E., et al.: "Traite* du calcul des probability et de ses applications," Paris,

1924ff.

Page 61: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

STATISTICAL PRELIMINARIES 63

10. Fre"chet, M.: "Recherches the*oriques modernes sur la the'orie des probability," Ref. 9,vol. 1, pt. I l l , 1937, 1938.

11. Uspensky, J. V.: "Introduction to Mathematical Probability/' McGraw-Hill, NewYork, 1937.

12. Feller, W.: "An Introduction to Probability Theory and Its Applications," Wiley,New York, 1950.

13. Arley, N., and K. R. Buch: "Introduction to the Theory of Probability and Statis-tics/' Wiley, New York, 1950.

14. Van der Pol, B., and H. Bremmer: "Operational Calculus, Based on the Two-sidedLaplace Transform," Cambridge, New York, 1950.

15. Clavier, P. A.: Some Applications of the Laurent Schwartz Distribution Theory toNetwork Problems, Proc. Symposium on Modern Network Synthesis, 1956, pp.249-265.

15a. Lighthill, M. J.: "An Introduction to Fourier Analysis and Generalized Functions,"Cambridge, New York, 1958.

16. Doob, J. L.: "Stochastic Processes," Wiley, New York, 1950. See chap. 2, sees. 1and 2, for example.

17. Bartlett, M. S.: "An Introduction to Stochastic Processes," sees. 1-3, Cambridge,New York, 1955.

18. Blanc-Lapierre, A., and R. Fortet: "Theorie des fonctions aleatoires," chaps. 3, 4,Masson et Cie, Paris, 1953.

19. Rayleigh, Lord: "Scientific Papers/' vol. 1, p. 491; vol. 3, p. 473; vol. 4, p. 370;vol. 6, p. 604, 1920. For further references, see Chap. 7.

20. Ref. 17, chap. 2, also parts of chap. 3.21. Wiener, N.: Generalized Harmonic Analysis, Ada Math., 66: 117 (1930).22. Kolmogoroff, A.: Korrelations Theorie der stationaren stochastichen Prozesse,

Math. Ann., 109: 604 (1934).23. LeVy, P.: "Processes stochastiques et mouvement brownien," Paris, 1948.24. Moyal, J. F.: Stochastic Processes and Statistical Physics,J. Research Roy. Stat. Soc,

B l l : 167 (1949).25. Ref. 12, chap. 17.26. For a concise discussion of the notion of a stationary random function of time, from a

more rigorous mathematical viewpoint, see J. Kampe* de Fe*riet, Intoduction to theStatistical Theory of Turbulence. I l l , Soc. Ind. Appl. Math., 2: 244 (1954),especially chap. V, sec. 6.

27. Knudtzon, N.: Experimental Study of Statistical Characteristics of Filtered RandomNoise, MIT Research Lab. Electronics Tech. Rept. 115, July 15, 1949; cf. fig. 2-10.

28. Markoff, A.: Extension of the Law of Large Numbers to Dependent Events, Bull.Soc. Phys. Math. Kazan, (2)16: 135-156 (1906); "Wahrscheinlichkeitsrechnung,"Leipzig, 1912.

29. Wang, M. C , and G. E. Uhlenbeck: On the Theory of the Brownian Motion. II,Revs. Modern Phys., 17: 323 (1945).

30. Ref. 12, sec. 17.9; Ref. 17, sec. 2.2, for example.31. Ter Haar, D.: Foundations of Statistical Mechanics, Revs. Modern Phys., 27: 289

(1955); detailed discussion and very extensive bibliography. The interested readeris also referred to Ter Haar, "Elements of Statistical Mechanics," app. I, Rinehart,New York, 1954. See also A. I. Khintchine, "Statistical Mechanics," chaps. II,III, Dover, New York, 1949.

32. Birkhoff, G. D.: Proof of the Ergodic Theorem, Proc. Natl. Acad. Sci. U.S., 17:650, 656 (1931).

33. Von Neumann, J.: Proof of the Quasi-ergodic Hypothesis, Proc. Natl. Acad. Sci.U.S., 18:70 (1932).

34. Hopf, E.: "Ergodentheorie," Chelsea, New York, 1948.

Page 62: 1.1 Introductory Remarks$ Among these, see in particular Cramer.7 The modern mathematical foundations and development of probability theory stem mainly from the work of Kolmogoroff,8

64 STATISTICAL COMMUNICATION THEORY

35. For a detailed discussion, see Ref. 26, I, 2: 1 (1954); II, 2: 143 (1954); III, 2: 244(1954); IV, 3 : 90 (1955), with an extensive bibliography of earlier work in 2: 143(1954); of. pp. 172-174 and also especially pt. I l l , chap. V.

36. Ref. 1, sees. 0.8, 1.4.37. See, for example, Ref. 3; D. Middleton, Some General Results in the Theory of Noise

through Non Linear Devices, Quart. Appl. Math., 5: 445 (1948); also, Chap. 13.38. Doob, J. L.: "Stochastic Processes," Wiley, New York, 1953; cf. p. 464 and also pp.

515ff.39. Ref. 38, pp. 515ff.