Markov Models and Metaphors

J. M. Williams Metaphors and models of behavior 1

Markov Models as Time-Process Metaphors and as

Behavioral Models

by John Michael Williams

[email protected]

2012-07-28

A detailed elementary presentation of Markov chains, focussed upon their application toward human behavioral insight, ethology, and

experimental psychology.

Keywords: Markov process, Markov chain, semi-Markov, Markov model, exponential distribution, metaphor, ethogram, ethology, time process,

stochastic process, stationary process, lumpable chain, waiting time, absorbing chain, animal courtship, behaviorism, experimental psychology, Monte Carlo

model, game theory, information theory.

Copyright (c) 2012 by John Michael Williams.All rights reserved.

cjohn

Text Box

cjohn

Typewriter

[email protected]


Preface

This is a rewrite of an old paper which I wrote while I was at Columbia University. Although I have not done anything in psychology for many years, at the time I was studying abnormal psychology under Professor Howard Hunt, to whom I owe a great debt of gratitude for his patience and encouragement.

The references cited are dated in the middle twentieth century, but this should not be crucial to understanding the subject material, which is not very dependent upon ephemeral facts or opinions. The text here is independent for the most part of the references, many of which now are accessible only in libraries or technical collections. Unfortunately, the final chapters have been lost; happily, what remains is coherent and is independent of that loss.

I hope that the reader, whether a scientist, therapist, or just an interested individual, can benefit from some further study of quantitative methods as guides to understanding.

This paper is divided into five more-or-less linked chapters, as follows:

1. Introduction: Metaphors and models of behavior

2. Markov processes and chains.

3. The organism-environment metaphor: Experimental psychology and ethology distinguished.

4. The ethogram and Markov models in ethology.

5. Markov models in experimental psychology.

met-a-phor -- noun. A figure of speech in which one object is likened to another by speaking of it as if it were that other, as in He was a lion in battle: distinguished from simile by not employing any word of comparison, such as "like" or "as".

-- after Funk & Wagnalls Standard College Dictionary.


I. Introduction: Space, Time, and Mathematics

How far back can you remember?

How many hours is it from New York to Chicago?

-- Commonplace questions.

The ancient Egyptians and Babylonians both possessed well-developed empirical geometries and arithmetics. These skills could be used to manipulate -- and thus to comprehend -- the spatial interrelationships of the component parts of vast works of institutional architecture. The Babylonians, at least, recorded temporal astronomical data by means of numbers (Boyer, 1949, pp. 15 - 16).

Later, the less-ancient Greeks applied the essentially grammatical rules of logic to the manipulation of spatial relationships, thus creating -- and comprehending -- a generalizable "Euclidean" geometry (see Kramer, 1951, ch. 2). Subsequently, the grammatical rules of algebra began to develop (see Kramer, 1951, ch. 3); and, following Descartes' demonstration that by analytic geometry the language of algebra could be translated to and from the language of geometry, Western scholars began to have as clear a comprehension of space as their species-specific learning-set for verbal discrimination learning would allow them. The clarification of the human feel for time, a somewhat vaguer, more "ordinal" (and often more nominal) feeling, was delayed until recent centuries, perhaps mainly because of a lag in the development of accurate instruments for the measurement of short periods of time.

It is historically reasonable, then, to assert that the modern Western precise comprehension of time should be based upon the earlier-acquired precise comprehension of space. More succinctly, it is historically reasonable that the modern metaphorical basis of time should be space (cf. Embler, 1966, pp. viii-ix, 143). Modern scientists and others are using the space-time metaphor whenever they use terms such as "time interval" (an interval on a time line of some sort), "time process" (a process as a string of events in time), "temporal pattern" (a pattern or gestalt in time -- a pattern perhaps not recognized until the data are displayed spatially in a printout), "equally spaced intervals in time", and so on ad infinitum.

The fabric of modern scientific thought is woven on a frame of space and time and mathematics. Scientists think of time either as being "continuous", like the points or numbers along the real line, or as being "discrete", like the integers -- all depending upon how these scientists gather, analyze, or present their data. However, it should be recognized that the purportedly real underlying continuity of the "flow" of time is no less a construction of the human mind than is the continuity of the real numbers; and, in fact, it should be clear that comprehension of the underlying "continuity" of time depends on prior exposure to the continuity of the points on the real line.


According to the current history of mathematics, first the integers were constructed, then the rational numbers, then the algebraic numbers, then the real numbers. We are taught, however, that the integers, the rationals, and the algebraics all are "special cases" of the reals -- that all are overlapping subsets of the uncountable set of all real numbers. The integers are discrete; the real numbers are continuous. Correspondingly, discrete time processes are viewed as defined only on sequences of discrete, countable sets of points in time; whereas, by contrast, continuous time processes are viewed as defined on uncountable sets of points in time. Those points in time which are elements of the latter uncountable sets are seen as distributed continuously along what might be called the real time line. Points in time for which discrete time processes are defined also are viewed as distributed (discretely) along the same real time line.

We may bring this to a summary by pointing out that terms such as "the real time line", "discreteness", and "continuity", like terms such as "function", "variable", "organism", and "environment", all are products of the human metaphorical operations that are implicated in, and in fact constitute, all operationally definable scientific concepts.

Metaphors and Mathematical Models

Insofar as our present comprehensions of space and time have developed hand-in-hand with our comprehension of mathematics, it is not surprising that mathematics should be of primary importance in most of our scientific work.

In addition to the mathematical (and spatial) metaphors which are used to guide the comprehension of scientists, there also are mathematical models which serve as tools to be guided by scientists in order that those tools might be caused to fit the observed or known data produced in experiments. This kind of fit is just an attempt to describe the experimental data. If the fit is good, then the scientists can comprehend, and thus manipulate, the experimental conditions in terms of that model. The model then acquires dual functionality both as a model to describe the data and also as a metaphor which further guides scientific thought concerning the experimental situation. For example, a behavioral scientist who uses a game-theory model to describe the behavior of human subjects in a learning experiment may come to "see" his subjects (via the metaphor) as "using strategies" and "maximizing their gains"; thus, the game theory metaphor soon takes over from the descriptive game theory model.

In the present writer's opinion, the primary use of mathematics in science should not be for model building as such, but, it should be, rather, for the purpose of making available an endless supply of metaphors by which scientists can comprehend their data. It seems likely that the operation of any heuristic device may be seen as involving the emergence of a new metaphor guiding the scientist toward a way of attempting a new solution to whatever might be the problem. In this context, Feller has been quoted as saying,

". . . for a mathematical theory to be applicable, it is by no means necessary that it should be able to provide accurate models of


observable phenomena. Very often, in applications [model-building] is less important than the economy of thought and experimentation resulting from the ease with which qualitatively reasonable working hypotheses can be eliminated by mathematical arguments . . .. Mathematical theory can become an indispensible guide not only to a better understanding, but even to a proper formulation of scientific problems."

-- First Berkeley Symposium (Cane, 1959, pp. 57 - 58).

Long before Feller, Poincaré was saying as much when he wrote that

"If a new [mathematical] result is to have any value, it must unite elements long since known, but until then scattered and seemingly foreign to each other . . .. Then, it enables us to see at a glance each of these elements in the place it occupies in the whole. . . . It is economy of thought that we should aim at."

-- Poincaré (1914, pp. 30; 33)

We now move on to the topic of mathematical models proper.

Models of Time Processes

In general, mathematical models of time processes can be classified either as deterministic or stochastic (Bartholomew, 1967, p. 2). Deterministic models specify with certainty the time-change which will occur in the state of the system under consideration. Stochastic models specify with certainty time-changes in conditional probability distributions defined on the various possible states ("events") of the system. Stochastic models do not specify with certainty the states ("outcomes") actually successively entered by the system in time. A deterministic process is a sequence of events in time; a stochastic process is a sequence of (dependent) random variables in time (Feller, 1968, p. 419 and note).

By a curious complementarity, mathematical analysis of deterministic continuous processes is far simpler than that of deterministic discrete processes, which latter analysis, in all strictness, would require the solution of Diophantine equations (see Bell, 1951, pp. 236 - 238). On the other hand, mathematical analysis of stochastic continuous processes is far more complicated than that of stochastic discrete processes (but, compare Feller, 1966, p. 2). Our discussion of time processes will confine itself primarily to continuous deterministic, and discrete stochastic, models.

Deterministic Processes

Memoryless processes. The simplest and most familiar deterministic time processes are those such that, given complete knowledge of the state of the system at time t, the states at all future times t +Δt are specified with certainty -- which is to say, are


specified so that they are perfectly predictable. One is reminded of Laplace's Divine Calculator who, "knowing the velocities and positions of all the particles in the world at a particular instant, could calculate . . . all that would happen in the future (Mason, 1962, p. 296). These simplest of deterministic processes have no memories, and any past history leading the process to a given state x implies a future entirely determined by that state x alone.

Memoryless deterministic processes also often are such that the previous value of the state x at time t not only determines the future but also the past. Such processes are reversible in that a reversal of the direction of change of the time parameter leads the system into a completely determined progression of states corresponding to the system's sequential past history. The world of Laplace's Divine Calculator was reversible, because, as it happened, that device also could calculate "all that had happened in the past (Mason, 1962, op. cit.).

Much of classical thermodynamics is based on the theory of reversible thermodynamic processes (e. g., Resnick and Halliday, 1966, p. 296). A reversible thermodynamic process moves very gradually through a succession of states of thermodynamic equilibrium; if the motion takes place too rapidly for equilibrium to be maintained continuously, then the process becomes irreversible. In this context, irreversible means that the final state x after the said rapid change will be such that it can not, alone, furnish complete knowledge of exactly how the rapid change occurred.

Another example of a memoryless deterministic process can be found in the elementary theory of infinitesimal strain: By Hooke's Law, the force F stretching a spring of length L by a distance ΔL is entirely determined by the distance of the stretch, provided that ΔL is small compared with L. In this theory, F = −k ΔL , for ΔL / Lsmall. The history of the spring -- how it was stretched or compressed in the past -- has no relevance, and no memory of that history is retained by the spring.

Memoried processes. In the Hooke's Law example above, if ΔL / L becomes too large, if the spring is stretched too far, then the spring will be irreversibly plastically deformed, leading to a condition describable by finite strain theory in which the precise history of the deformation becomes necessary in order to predict the spring's subsequent behavior under stress. The deterministic process acquires a memory.

In an example from everyday experience, if a rubber erased on a writing pencil is deformed by kneading and twisting it back and forth, then the eraser's precise shape at a given time during or after deformation depends on its structural memory of exactly in which directions, how far, how quickly, and in what order the deformations took place. Once bent, such an eraser never will return to its exact original shape. The eraser remembers its past.

These examples of memoried deterministic time processes illustrate processes the future states of which depended not only upon their present states at time t, but also upon their exact past histories up to time t. Without historical knowledge, prediction for memoried processes can not be precise.


Stochastic Processes

In psychology, and in twentieth century social sciences in general, deterministic models usually are inappropriate. They can be inappropriate either (a) because the states of the systems under study are too complex to be known precisely, as in the problem of knowing the total physiological state of a vertebrate animal (or even an amoeba); or, (b) because the systems themselves fundamentally are indeterminate. In this latter case, one might consider a social worker trying to determine, face-to-face, the present state of mind of an accomplished con-man or a serial killer. Or a physician trying to describe the population rate of infection of a new infective disease. Because of such problems, a social scientist frequently must have recourse to probabilistic methods of approach -- in particular, the science may require use of stochastic models.

In the present context, it is worth mentioning here that probabilistic mathematical models do exist which are not time-process models at all; therefore, they can not be stochastic models. One example would be that of static game theory models. The predictions of nonstochastic models often can be tested against those of stochastic models. For further information on this, see Luce, et al, (1963, pp. 571 ff.) for a contrasting of game theories versus stochastic learning theories. Sometimes, other probabilistic models can be seen as equivalent to models of stochastic processes; on this topic, Miller (Ed., 1964, pp. 178 - 183) has published a "sequential situation" application of game theory which was developed by Altmann (1965, pp. 508 ff.) to the extent of expressing a stochastic process in information theory terms.

Memoried processes.

Stochastic processes may or may not have memories. A stochastic process has a memory if and only if, for any given time t, the probability distribution of states entered later than t depend upon states entered earlier than t.

For discrete-time stochastic processes, in which t only can take on integral values, a more precise definition of memory is not difficult:

By definition, a system undergoing a time process enters, or reenters, various states x successively in time. Consider the set of all possible states x in which a given system can exist. Call this set [x]. Then, for the process to be stochastic, a process random variable Xt must exist for every integral time t for which the process is defined. This process random variable Xt must be such that the frequency function f of Xt, defined on the set [x], gives the probabilities of the system being in the various states x at time t. Now consider the time t - 1, during which the system is in the state xt - 1: At time t - 1, the frequency function f(Xt) gives the probability of the state the system will be in next, at time (t - 1) + 1 = time t. In general, f(Xt) will be a function of xt - 1. Now, if f(Xt) depends upon, or is in any measurable way affected by, any state or states xs, where x < t - 1, then the stochastic process in question is said to have a memory.

Feller (1968, pp. 421 - 423) gives several examples of what he calls "non-Markovian processes" -- stochastic processes with memories.


Memoryless processes.

Corresponding to the memoryless deterministic processes discussed above are the memoryless stochastic Markov processes. Markov processes may be seen as probabilistic analogues of the deterministic processes of classical mechanics (Feller, 1968, p. 420) or to those of classical thermodynamics.

We now proceed to the main topic of this chapter, Markov processes and chains.


II. Markov Processes and Chains

Markov Processes

Feller (1968, p. 420) defines a Markov process is a stochastic process such that, given the present state xr, nothing concerning states of the system in the past can alter the conditional probability of the state x at a future time. Karlin (1966, p. 19) defines a Markov process as "a process with the property that, given a value of [a random variable] Xt, the values of Xs, s > t, do not depend upon the values Xu, u < t", and this is equivalent to defining a Markov process as a stochastic process which has no memory. In fact, lack of memory in a stochastic process is called the Markov property (e. g., Feller, 1966, pp. 8 - 9, 93 - 94; cf. Bartholomew, 1967, p. 9). Mathematically more formal definitions of Markov processes may be found in Feller (1968, p. 420), Kemeny and Snell (1960, p. 24), and Karlin (1966, p. 21). Probabilistic examples of some typical Markov processes are available in Feller (1968, pp. 375 - 382).

Henceforth, unless otherwise specified, it will be assumed that all process random variables X will be discrete and defined only upon finite (and therefore countable) sets of system states [x]. Thus, the processes under discussion will be finite Markov processes. We also shall assume that time proceeds in discrete as opposed to continuous steps, an assumption equivalent to one that the process time parameters t are ones taking on integer values, only. Thus, all processes with which we shall be concerned will be discrete-time Markov processes. Also, in the following discussion, because every distinct random variable X may be considered associated with its own specific, distinct frequency function f(X), it often will be taken as understood that reference to a particular random variable X also is a reference to its frequence function f(X).

Because a discrete-time Markov process is a sequence of dependent random variables X, the distribution of those variables X at time t, Xt, in general will be a function of the particular state x t−1 in which the system was just prior to time t. The distribution Xt also in general will be a function of the time t at which the process is observed.

For a finite number n of possible process states xi, i = 1, 2, . . ., n, let us call (pjk)t the probability that a transition from state xj to state xk will occur in exactly one step at a given time t. These (pjk)t will be conditional probabilities which are determined (a) by the dependency between Xt and Xt - 1 and (b) by the time t. The set of all (pjk)t can be arranged in a matrix P⃗ t of one-step conditional transition probabilities p, in which P⃗ t is given by the following expression:


P⃗ t = (p11 p12 . . . p1k . . . p1n

p21 p22 . . . p2k . . . p2n

. . . . . . . . p j1 p j2 . . . p jk . . . p jn

. . . . . . . . pn1 pn2 . . . pnk . . . pnn

) (1)

Here, P⃗ t is a square matrix with n rows and n columns. In (1), p12 gives the conditional probability, given that the system was in state x1 up until time t, that the system will move in a single step to state x2 at time t. The diagonal entries of the matrix in (1), those at p11, p22, p33, . . ., pnn, of course give the probabilities that the system will not change state at time t if the system was in state x1, x2, x3, . . ., xn just before time t.

Markov Chains

If P⃗ t in (1) above is a constant function of t, in other words if P⃗ t does not change over time, then the Markov process is said to be a Markov chain (see Kemeny and Snell, 1960, p.25; Luce, et al, 1963, p. 568; Feller, 1968, pp. 444 - 445). However, there is some terminology overlap, because the terms "Markov process" and "Markov chain" are used more or less interchangeably by Feller (1968, pp. 372, 374, and 420 - 421) and by Karlin (1966, pp. 27 - 28). In the present work, a Markov chain always will be considered a Markov process for which P⃗ t is stationary (constant) for all t.

A Markov chain is completely defined by its one-step transition probability matrix P⃗and the specification of a probability distribution f(X0) on the system-states [x] of the process at time 0 (see Karlon, 1966, p. 40).

The simple definition of Markov chains by their P⃗ and X0 alone makes them convenient basic models of stochastic processes of several types. Often, P⃗ is estimated easily from observed data, as exemplified in Thorpe and Zangwill (1961, pp. 363 - 366), Altmann (1965), and also in Cohen (1958) -- although Cohen's entire model, as defined, would seem to be of dubious utility (see the final chapter below).

Models related to Markov chains. Given the basic Markov chain definition, several related stochastic models can be generated.

For example, Altmann's sequential analysis information theory approach mentioned above can be understood as equivalent to a Markov chain approach: A sequential process is studied for which

". . . [the] conditional uncertainty of the nth event when the preceding n - 1 events are known is the nth order approximation to


the uncertainty of the system. These conditional uncertainties are a monotonic increasing function of n: observing antecedant events will decrease . . . the average uncertainty of our predictions . . . except for the case of sequential independence [for which the average uncertainty will remain the same]" (Altmann, 1965, p. 509).

With a slight change of notation to make his n + 1th order approximation the more standard nth order approximation, Altmann's set of approximations of increasingly higher orders can be generated by adding just a consideration of the order of the highest required n-step transition probability matrix P⃗n to the initial distribution f(X0) to describe completely the stochastic process under consideration (cf. Altmann, 1965, pp. 500 ff, and Ash, 1965, p. 194).

In a more general vein, if the states x entered successively by the system are independent, then the process can be described completely by means of a zero-order approximation. The zero-order sequential approximation consists (a) of specification of the initial distribution f(X0) and (b) of specification of another distribution f(X) which applies when the process is running and is not dependent on the states x actually successively entered over time. An example is Cane's Bernoulli model (Cane, 1959, p. 37), in which f(X) is a binomial distribution.

In such contexts, the f(X) distribution is equivalent to a one-step transition probability matrix P⃗0 in which all rows are identical. The distribution of X0 may or may not be the same as f(X). A first-order approximation requires specification of X0 and a one-step transition probability matrix P⃗1 = P⃗ and completely describes processes equivalent to the Markov chains discussed above. A first-order sequential approximation thus describes what might be called a first-order Markov chain, a process completely described by (a) the order of the required transition probability matrix P⃗1 and (b) the value of X0.

Here are the meanings of the various degrees of order of a Markov chains: A zero-order sequential approximation can describe completely only a stochastic process which in a sense is oblivious to everything but differences between the states x of the system. A first-order approximation describes processes such as first-order Markov chains which can tell differences between states and can account for which state they are in at present, but which have no memory of past states. A second-order approximation, by analogy, would describe processes which can remember (= store) the state in which they last were in, the current state, and the differences between those states. In this context, then, a second-order Markov chain would be described completely by a second-order approximation and thus would require specification of (a) X0, of (b) a one-step transition probability matrix P⃗1 , and of (c) a two-step transition probability matrix P⃗2 in order to be defined completely. Likewise, a third-order Markov chain would require specification of an X 0 , P⃗1 , P⃗2 ,and P⃗3 -- and so on, up to nth-order Markov chains. As the order n of the chain increases, so does the length of the memory of the stochastic process being described. Sequential approximations per se are discussed somewhat differently by Altmann (1965, pp. 495 - 496 and 500 ff). In our work to follow, we always shall assume


we are dealing with a first-order Markov process unless otherwise stated.

Notice the metaphorical relationship of "length of memory" to "order of chain" in the preceding discussion.

Markov chains have been a fertile ground for model-building. Another model generated from the Markov chain is the semi-Markov process, also called the Markov renewal process. This adaptation is defined by Bartholomew (1967, p. 28; see also Cane, 1959) as a process in which "changes of state occur [at times t'] according to a Markov chain and in which the time intervals between changes [i. e., between successive times t'] are random variables". We then may accept that, for any stochastic process to be a Markov chain, decision points (i. e., Bartholomew's times t') must occur (a) at regular, equally-spaced intervals in time or (b) randomly in time (Bartholomew, 1967, pp. 29 - 30).

Some properties of Markov chains. Three properties of Markov chains which are useful in general, and which will assist in our later discussion, are: reversibility of the chain, equilibrium of the chain, and lumpability (and expandability) of the chain.

The sequence of states successively entered by a system may or may not be observed in the actual order of entry. A Markov process always is a Markov process of some sort whether the process is observed in forward or reversed order -- that is, whether the time parameter t is increasing or decreasing. A Markov chain, then, always will be a Markov process when t is reversed; however, Markov chains are not necessarily Markov chains when reversed, because, in general, reversal of t makes the process's reversed transition probabilities functions of t (see Kemeny and Snell, 1960, p. 26).

If a Markov chain with one-step transition probability matrix P⃗ happens to be a Markov chain when observed in reverse order, then that chain may be a reversible chain. Defining the first useful property, a Markov chain is a reversible chain if and only if it is the same Markov chain, with matrix P⃗ , whether run in forward or reversed order (Kemeny and Snell, 1960, p. 105; Feller, 1968, p. 414).

As has been mentioned, a Markov chain is completely defined by P⃗ and the specification of the probability distribution of a random variable X0 defined on the set [x] of the states of the system at time 0.

As our first example, suppose that we have a Markov chain M1 with a set of system states [x1, x2, x3] for which P⃗1 is given by

P⃗1 = (.90 .05 .05.50 0.0 .500.0 .05 .95) (2)

Suppose, also, that for M1, X0 is given by

f (X 0) = ( p1 , p2 , p3)0 = (1, 0, 0) . (3)


Let's look at how the Markov chain M1 might get started:

The value of the particular X0 given by (3) means that at the initial starting time 0 (viz., t0 = 0), the probability of M1 being in state x1 is unity -- certainty -- because, as given in (3), p1 = 1. The probability that M1 might start in state x2 or x3 is 0 = p2 = p3.

Continuing with this example, from the first row of (2), then, the probability that at time 1 (= t0 + 1 = t1) the process M1 will be in state x1 is given by the transition probability p11 = .90; the probability that M1 will be in x2 at t1 is given by p12 = .05; and, the probability that M1 will be in x3 at t1 is given by p13 = .05. Therefore, with X0 given as in (3), M1 most likely will be in state x1 at t1.

Still continuing, at time 2 (= t2), looking at matrix P⃗1 in (2) above, it is clear also that M1 again most likely will remain in state x1 -- although, if M1 had entered state x3 at t1, then, from the third row of (2), it is clear that M1 could not be in x1 at t2 (because p31 = 0).

Now let us look at another example, a different Markov chain M 1' for which the

transition probability matrix P⃗1' is the same as P⃗1 in (2) but for which X 0

' is given by,

f (X 0') = (0, 0, 1) . (4)

Proceeding then as for M1, considering (4) and the third row of (2), we easily see that at time t1, the probability of state x1 will be p31 = 0, that of state x2 will be p32 = .05, and that of state x3 will be p33 = .95. So, for X 0

' as given by (4), M 1' most likely will be in state x3

at time t1.

Finally, let us consider a third Markov chain M 1' ' in which X 0

' ' now is given by,

f (X 0' ') = (0, 1, 0) . (5)

Again, from (5) and the second row of (2) above, at time t1 the probability of M 1' ' in

state x1 will be p21 = .50, of state x2 will be p22 = 0, and of state x3 will be p23 = .50. So, forX 0

' ' as given by (5), M 1' ' is equally likely to be in state x1 or x3 at time t1, but it cannot

be in state x2.

If not obvious, it can be seen from the preceding discussions of M 1, M 1' , and M 1

' ' that the probabilities of finding a Markov chain in a given state at time t1 depend strongly upon the state of the system at t0. The state of the system at time t0, in turn, is governed by X 0 , which gives the initial distribution at t0. However, the influence of X 0

decreases with time, as the final comment on the first chain M 1 above suggests: If the

processes M 1, M 1' , and M 1

' ' were observed at time t10, say, then the differences due to

the different X 0 , X 0' , and X 0

' ' would be much less noticeable than they were at time t1.

At time t10, M 1, M 1' , and M 1

' ' would have very similar probabilities of being in the various states x1 , x2 , and x3. With time, the processes become more and more under the control of P⃗ and less and less under the influence of X 0 .


So, it should not be surprising that it is a fundamental theorem for Markov chains that, given certain mild restrictions, the longer the process is permitted to run, the less do the probabilities for finding the process in a given state depend upon X 0 (see Kemeny and Snell, 1960, pp. 70 - 72; Feller, 1968, p. 456). After running for a long enough time, such a process can be said to become probabilistically stationary in the sense that the probabilities of finding the process in a given state at time e, for e a large integer, become independent of X 0 (see Karlin, 1966, pp. 20 - 21).

Given a long enough running time te, then, the probabilities pi of finding the system in a given state xi are equal for all Markov chains M with equal one-step transition probability matrices P⃗ . These probabilities pi also are stationary (unchanging) at all times te + k or te − k not greatly different from xi.

A Markov chain is said to become a stationary process after a long enough running time. For the M 1 in our example above, with P⃗1 as in (2) and given any X 0 , the stationary probabilities ( p1 , p2 , p3)e of finding M 1 in states [x1 , x2 , x3] after a long enough running time te can be calculated easily and in fact are

( p1 , p2 , p3)e = f ( X̄ e) = ( .24, .05, .71 ) , (6)

within a small rounding error. For the method of calculation, see Kemeny and Snell (1960, pp. 72 - 73).

A second useful property is that a Markov chain which has become probabilistically stationary as described above is said to have come into equilibrium (Kemeny and Snell, 1960, p. 80; Feller, 1968, pp. 394 - 395; Karlin, 1966, p. 20). A Markov chain can be started at t0 in equilibrium if and only if the initial probability distribution f (X 0) has been set equal to the limiting stationary distribution f ( X̄ e) which eventually would be reached for any f (X 0) , given enough time for the process to come into equilibrium. The term "stationary process" often is used and is equivalent to "process in equilibrium"; this term should not be confused with "stationary transition matrix", which refers to a property of all Markov chains, whether or not they are in equilibrium (Karlin, 1966, p. 21).

In a deeper sense, our use of the words "reversible" and "equilibrium" is metaphorical. "Reversibility" and "equilibrium" derive directly from the deterministic thermodynamic models discussed previously. Reversibility and equilibrium are concepts guiding our thoughts about stochastic processes according to what we know about deterministic processes. Thus, reversibility and equilibrium are metaphors which transform thoughts about stochastic processes along deterministic lines. It should not be surprising, then, that it is found that a Markov chain in equilibrium is a reversible chain (Kemeny and Snell, 1960, p. 105).

The deterministic-stochastic metaphor is not exact, of course, because, if it were, deterministic thermodynamics and stochastic Markov chains would be perfectly redundant, both operationally and conceptually. Some of the words coincide, as they


must, for a metaphor. This coincidence extends to the usage in which both thermodynamic processes and Markov chains which are in "equilibrium", may be said to be "reversible". It must be emphasized, though, that the state of a given classical thermodynamic process in equilibrium either is constant or is changing very gradually, by differentially small increments (e. g., Resnick and Halliday, 1966, pp. 620 - 621). By contrast, the state of a given Markov chain in equilibrium is fluctuating just as randomly (as given by P⃗ ) as when it was not in equilibrium. It is this randomness, in fact, which

is the difference between deterministic and stochastic processes.

In limiting cases, the meanings of some deterministic and stochastic words may be shown to be operationally, but not conceptually, equivalent, as in the equivalence of classical thermodynamic "entropy" to information theory "uncertainty". However, such an equivalency is not a one-to-one equivalence of deterministic to stochastic system states; it is, instead, an equivalence of differentially changing deterministic states to differentially changing stationary probability distributions. The equivalence is strengthened but not justified because changes in the averages of large numbers of stochastic processes may be shown, in the limit, to be equivalent to changes in a single deterministic process. Feller warns against confusing deterministic and stochastic "equilibria" (Feller, 1968, p. 456; see also p. 395).

In the present context, for a large number N of identical, independent Markov chains simultaneously running in equilibrium, the expected values E (p1) , E ( p2), . . . E (pn) of the proportions of the N chains in each of the n states x of the system at a given instantte are stationary expected values and are, state for state, equal. These expected values

are, in fact, given by the equilibrium distribution f ( X̄ e) . That is,

[E (p1) , E ( p2) , . . . E ( pn)]e = f (X̄ e) = (p1, p2, . . . pn)e . (7)

Returning to our example chain M 1 above, if there were N = 100 Markov chains M 1 ,

as given by P⃗1 in (2), all running simultaneously in equilibrium, then (6) above would lead us to expect that, at any time of observation te , 24 of those 100 chains would be in state x1, 5 in state x2, and 71 in state x3.

At this point, it seems reasonable to emphasize, again, that as we have defined them, Markov processes do not have memories, regardless of whether they are Markov chains and regardless of whether they are in equilibrium or are reversible or not.

A third useful property is that a Markov chain may or may not be lumpable with respect to a given partition of its states.

In this context, a partition is a set which is formed by pooling or classifying together some of the members of another set. More formally, a partition [s ' ] of a set [s ] is a collection of mutually exclusive nonempty subsets of [s ] such that every element of [s ] is in one of the subsets of the collection. A more complete definition is given by Kinsolving (1967, p. 27), but we require no more for now.


For example, the first two elements of the set [s ] above might be pooled together as follows:

[s ] = [s1 , s2 , s3 , . . . , sn ] = [(s1∪ s2), s3, . . . , sn ] .

Now, if the subset (s1∪ s2) is called s' 1 , and if s3 is called s2, and so forth, the new partition [s ' ] can be defined as follows:

[(s1∪ s2) , s3 , . . . , sn ] = [s ' 1 , s ' 2, . . . , s 'n−1] = [s ' ] .

We shall be dealing with finite Markov chains -- which is to say, with Markov chains for which the set [x ] of distinct states which may be entered by the system consists of a finite, integral number of them. To see how such states might combine, suppose that two or more states of a given Markov chain M are pooled to form the partition [x ' ] . The stochastic process M ' resulting from such a pooling of states will be different process from the original M. In an extreme example, if all the states of the chain M were pooled, then the new process M ' would consist of one which could exist in only one state; the partition [x ' ] would contain only one element; and, in this example, the one-step transition probability matrix P⃗ of M would degenerate to a one-row, one-column matrixP⃗ ' given by P⃗ ' = (1) .

Now, a Markov chain M is said to be lumpable with respect to a given partition [x ' ] if, for every choice of initial starting distribution f (X 0)' , the pooled-states process M 'with states [x ' ] is a Markov chain, the one-step transition probability matrix P⃗ ' being independent of the choice of f (X 0)' . On this definition, see Kemeny and Snell (1960, p. 124), where their partition A is the present author's partition [x ' ] .

Lumpable chains produce Markov chains, whereas, in general, pooling the states of a Markov chain results in a process which is not a Markov chain (Bartholomew, 1967, p. 18). The choice of a particular partition [x ' ] determines whether or not a given Markov chain M will be lumpable with respect to [x ' ] . Examples of chains which are lumpable with respect to one partition [x ' ] but not with respect to another can be found in Kemeny and Snell (1960, pp. 125 and 134).

Choice of [x ' ] also can result in a pooled-states process M ' such that there will exist at least one choice of f (X 0)' for which M ' is a Markov chain. If there exists at least one such choice of f (X 0)' , then the process M ' is said to be weakly lumpable with respect to [x ' ] (see Kemeny and Snell, 1960, pp. 132 and 134).

Two important theorems connect the lumpability of Markov chains with the two previously-discussed properties of reversibility and equilibrium will be stated next, without proof:

Theorem 1. A reversible Markov chain is reversible when lumped. This can be found in Kemeny and Snell (1960, p. 137). Stated in other words, if a Markov chain M is lumpable with respect to a certain partition [x ' ] , then, if M is reversed, the pooled-state chain M ' also will be reversible. The reversibility of a Markov chain is not lost by


pooling states, provided the chain is lumpable with respect to the pooling process.

Theorem 2. For a reversible Markov chain, weak lumpability implies lumpability. This also can be found in Kemeny and Snell (1960, p. 138). This theorem means the following: Suppose the states of a Markov chain M are pooled in a certain way [x ' ] . Suppose, too, that it is found that the resulting pooled-states process M ' is a Markov chain, provided that M ' is started in any one specific initial distribution f (X 0)' defined on [x ' ] , which is the set of pooled states. This theorem states that if the original chain M is reversible, then any choice of starting distribution f (X 0)' will make M ' a Markov chain.

The two theorems immediately above can be shown to hold with certain mild restrictions on the nature of the Markov chain involved. The second theorem, plus another theorem stated previously (p. 13), yields the result -- assumed by the present author -- that, with respect to a given partition [x ' ] , a weakly lumpable Markov chain in equilibrium is lumpable. Recalling that equilibrium refers to the long-run expected distribution of states of many simultaneously-running, identical, and independent Markov chains, this result leads to the conclusion that, after a long-enough running time, a process M ' will be, on the average, indistinguishable from a Markov chain; this, provided that M ' was formed by the partitioning of a Markov chain M such that M is weakly lumpable with respect to that partition.

Lumpability of a Markov chain would seem not to have any useful analogue for deterministic processes. One would not expect such an analogue because, as was mentioned above, deterministic processes for the most part involve continuous-time systems which pass through uncountably infinite numbers of (continuous) states. For this reason, identification of the state of a deterministic process at any specific time t generally consists of the measurement of the value(s) of some continuously varying dependent variable, or collection of variables, at that time t.

Calculus and other tools of mathematical analysis enter directly into the definitions of the majority of deterministic models, and only a sort of conceptual clumsiness results from discrete-time manipulations such as the pooling values -- if it were applied upon the range of a deterministic dependent variable with the misguided intention of forming a discrete number of intervals along that range.

Sometimes, deterministic problems can be solved by discrete-valued computer operations; and, sometimes the formal bases of certain branches of mathematical analysis can be derived from discrete partitioning of continuous variables. One example of this last would be the setting up of the Riemann sums which yield the integral calculus (Johnson and Kiokemeister, 1964, pp. 183 ff.; Rudin, 1964, pp. 104 ff.). These applications notwithstanding, to the present author's knowledge, no deterministic model of a time process employs discrete-valued system states.

Thus, lumpability, unlike reversibility or equilibrium, has no function in current models of deterministic time processes. So, the classical thermodynamic metaphor being destined to fail, a scientist can use the Markov chain concept of lumpability only with


such clarity of comprehension as might be derived (a) from everyday experience, where "lumping" means the sticking together of (discrete) things, (b) from experience with mathematical operations (metaphors) applied to probabilistic models involving the pooling of discrete categories, states, or outcomes, or (c) from practice with the lumpability concept itself, in the context of Markov chain models.

To complete our discussion of these properties of Markov chains, it should be mentioned in a somewhat pickayune way that the lumpability metaphor should be sidestepped when considering the seemingly symmetrical operation of the expanding of a Markov chain. Indeed, one would expect the name of the operation opposite of lumping to be, say, "separating". One does not "expand" lumped things -- one separates them, or unlumps them, or, maybe, unpacks them. But, when a Markov chain M is expanded, a new Markov chain M ' is formed from M by defining a new one-step transition probability matrix P⃗ ' on the pairs of states entered successively by the original process M in accordance with the original transition probability matrix P⃗ (Kemeny and Snell, 1960, pp. 144 - 145). In the case of expanding, the lumpability metaphor justly is sidestepped, because, as may be concluded from Kemeny and Snell's definition (1960, pp. 140 - 141), a Markov chain can be expanded in only one way, whereas the same chain can be lumped with respect to any number of possible partitions. The fact that the symmetry of the words "lumpable" and "expandable" is less than exact also helps keep straight the nonsymmetrical operations involved.

The distribution of waiting times. It was mentioned previously (p. 11) that a Markov chain entails distribution points which must vary either regularly or randomly in time. One implication of this is that the sampling of a running Markov chain must be performed at fixed or at random intervals if the sampled result itself is to be a Markov chain (see Bartholomew, 1967, pp. 28 - 30; Cane, 1959, pp. 45 - 46).

Intervals between the decision points at which changes of state occur are called waiting times. For a Markov chain, waiting times tij are defined to describe changes of state from any state xi to any other state xj. The waiting time tij is the length of time in xi

before the system enters xj. The waiting time refers to time determined by the chain itself, not to any possible triggering process which might be imposed by a nonMarkov controlling device.

The lengths tij of Markov chain waiting times must be randomly distributed in time. This randomness follows because the probability that a given change of state will occur at a given decision point always may be assumed to be greater than zero and at least sometimes less than 1, for the mildly restricted kinds of Markov chains to which, so far, we have limited the discussion; these chains are called regular Markov chains. A different kind of Markov chain, an absorbing chain, allows the system to enter an "absorbing" state which can not be exitted thereafter. In the case of an absorbing chain xi, the ith entry of the ith row of the chain's one-step transition probability matrix P⃗ is equal to pi i = 1 . A general discussion of absorbing Markov chains may be found in Karlin (1966, pp. 30 - 34); a detailed treatment is in Kemeny and Snell (1960, ch. 3); and, interesting related expositions are in Hersh and Griego (1969) and Gardner (1969).


Before going further, it should be specified what was meant when it was stated that Markov chain waiting times must be randomly distributed in time. The question is, Within a given Markov chain, what sort of random variable T can describe the distribution of lengths tij of waiting time for passage from state xi to state xj?

The answer can be found in the primary restriction on the choice of T: The random variable T must have no memory .

Otherwise stated, the distribution of T must in no way depend upon the past. In particular, the probability that a given waiting time tij will terminate at time t (i. e., the probability that the system will change from state xi to state xj at time t) must not depend upon the length of the waiting time before t; if it did, the chain would have a memory.

Recall that, for a Markov chain, the one-step transition probability matrix P⃗ does not change with time. T therefore must be a random variable such that the probability that a given waiting time tij beginning at time t1 will terminate at a certain time t2, witht2 > t1 , must be constant for all time intervals of length t2 − t1 . Following this line of

reasoning, it can be proven that, for discrete-time Markov chains, T must be geometrically distributed (Feller, 1968, pp. 164 - 166; 268 - 269). Of all random variables, only those which are geometrically distributed can describe discrete durations of probabilistic phenomena which have no memory (see Feller, 1968, pp. 328 - 329).

To illustrate this point, a typical geometric frequency function has been graphed here in Figure 1.

Fig. 1: The discrete-time geometric probability distributionf (T ) = (1/6)⋅(5 /6)

t . After Equation (9) below, and photocopied from the original handwritten jpeg image.


The graph of Figure 1 can be interpreted to describe a distribution of waiting times tij with constant probability of termination p = 1/6 ≈ .17 . In general, the frequency function f (T ) for the geometric distribution with parameter p is given by

f (T ) = {p (1 − p)t, t = 0, 1, 2, . . .

0 , otherwise, ] (8)

which, for p = 1/6 , becomes

f (T ) = (16)⋅(5

6)t

, t = 0, 1, 2, . . . , (9)

omitting the "otherwise" for brevity. Equation (9), graphed above in Figure 1, should be interpreted to give the probabilities that the system will change from state xi of present time t = 0 to its next, succeeding state xj at the time t = 1, t = 2, etc. For example, from Figure 1, the probability P [tij = 0] that T will assume the value t = 0 and terminate the waiting time immediately is just P = 1/6 ≈ .17 -- as it must be, because p = 1/6 from above.

Continuing with the mathematical considerations, as the discrete time intervals become more and more finely subdivided, the vertical bars of Figure 1 become more and more numerous; eventually, the points marking the upper tips of the bars become a continuous line. The discrete random variable T becomes, for all practical purposes, a continuous random variable Z. In the limit, the geometrical distribution can be shown to approach the exact form of the exponential distribution (see Feller, 1966, pp. 1 - 2; 8; 1968, p. 458).

For continuous-time Markov chains, then, waiting times will be exponentially distributed.

The exponential distribution can be proven to be the only continuous frequency distribution, also referred to as a density, which has the Markov characteristic -- that is, which has a complete lack of memory (Feller, 1966, pp. 8; 1968, p. 458). A rigorous proof of the unique lack of memory of exponentially distributed random variables can be found in Feller (1968, pp. 459 - 450) and is related to Karlin's comments on the Poisson process (Karlin, 1966, pp. 181 - 183). The general relationship between Markov and Poisson processes in discussed in Feller (1968, pp. 444 - 460).

The exponential density g(Z ) with parameter a is given in general by

g(Z ) = { a⋅e−a t , 0 ≤ t < ∞

0 , otherwise.] (10)

For the special case of a = 1/5 ,

g(Z ) =15⋅e−t /5 , 0 ≤ t < ∞ . (11)

It should be recalled that continuous frequency functions yield probability densities,


not probabilities, in the ordinate values of their graphs; thus, the probability that the random variable Z above, say, will assume any (point) value along the continuous t-axis must be 0, because such values are uncountably infinite in number along any interval of finite length. For continuous random variables such as Z, nonzero probabilities will be found associated only with finite lengths or intervals along the t-axis.

So, from g(Z ) in (11) above, the probability P [0 ≤ ti j ≤ 1] that the system will change states so as to terminate the waiting time tij at some time t between t = 0 and t = 1 is given by

P = ∫0

1

g (t ) dt = ∫0

115

e−t /5 dt = [−e−t/5]01 ≈ .18 . (12)

As a result, we have P [0 ≤ ti j ≤ 1] ≈ .18 for the continuous case under consideration, and this clearly is very close in magnitude to the f (0) = .17 = P [ ti j = 0] of the intentionally similar discrete geometric distribution graphed in Figure 1. For a direct comparison with Figure 1, the exponential density of equation (11) is sketched in Figure 2.

Figure 2: The exponential density function g(Z ) =1 /5⋅e−t/5 . After Equation (11) above, and photocopied from the original

handwritten jpeg image.

Considering the general exponential density g(Z ) of (10) above, let us calculateG(Z ) , its cumulative distribution function, plotted from the left. G(Z ) is the

distribution function which gives the probability that Z will assume some value greater than or equal to a given value Z = t . Thus,

G(Z ) = P [ Z ≥ t ] = ∫t

∞

g (t) dt = ∫t

∞

a⋅e−at dt ; or,

G(Z ) = e−at , 0 ≤ t ≤∞ . (13)


Thus, G(Z ) is an exponential function with exponent equal to the exponent in the density function g(Z ) . The left-cumulative distribution function G(Z ) and the density function g(Z ) for parameter value a = 1/5 have been graphed together in Figure 3, below.

For this Markov chain model, the left-cumulative distribution G(Z ) in Figure 3 gives the frequency of waiting times ti j which are equal to, or longer than, the value of t on the abscissa.

Figure 3: The left-cumulative distribution function G(Z ) = e−at with a = 1/5 for the exponential density of Figure 2. The density from Figure 2 is replotted -- notice the change in ordinate scale, compared with that of Figure 2. After Equation (13) above, and photocopied from the original handwritten jpeg image.


Finally, in Figure 4 below, f (T ) , g(Z ) , and G(Z ) have been plotted on a semilogarithmic scale; the semilog graph of G(Z ) has a special name; it is called a "survivarship curve" and is discussed later.

Figure 4: The graphs of Figures 1 - 3 plotted on a semilogarithmic frequency scale. When plotted in semilogarithmic format, the left-cumulative distribution of waiting times is referred to as a survivorship curve.

Photocopied from the original handwritten jpeg image.

Because the exponents of g(Z ) and G(Z ) are equal (equations (10) and (15) above), their graphs become parallel straight lines in Figure 4. In Figure 4, the y-intercept of

g(Z ) is 0.20, which is the value of a; also, that of G(Z ) is 1.0, and that of f (0) is of course 0.17.

On a scale with semilogarithmic ordinate, all exponential functions graph as straight lines, but with varying slopes and intercepts. Notice in Figure 4 that the graph of the geometric frequency distribution exponential f (T ) systematically crosses that of the exponential density g(Z ) , with f (T ) graphing lower than g(Z ) for t less than 8; this


becomes higher than g(Z ) for t equal to, or greater than, 8. The mean slope of the graph of f (T ) is less negative than the slope of g(Z ) , because g(Z ) , as given by equation (11), is not the limiting distribution of T. In regard to this, recall that the parameter of g(Z ) is a = 1/5 and the parameter of f (T ) is p = 1/6 ; if a and p were equal, then f (T ) would not cross g(Z ) when plotted in Figure 4. For a = p, g(Z ) andf (T ) approximately would coincide.

The graph of the left-cumulative distribution function G(Z ) in Figure 4 is called a survivorship curve and is used by Nelson (1964a, 1964b) to detect nonrandomness (viz., memory) in waiting time data. The graph of G(Z ) will be a straight line if and only if Z is exponentially distributed. If the graph of of an obtained G(Z ) was convex upward, then the random variable Z would have to have been such that changes of state from xi to xj were too regularly spaced in time for Z to have been exponentially distributed. If the graph of an obtained G(Z ) was convex downward, then Z must have been such as to cause states xi and xj to tend to follow one another too quickly for Z to have been exponentially distributed (see Nelson, 1964a, p. 530).

As might be gathered from Figure 4 above, the left-cumulative (discrete) distribution function of a discrete random variable T also would plot as centered along a linear survivorship curve if and only if T were geometrically distributed.

The preceding discussion of waiting times concludes the presentation of the theoretical background of stochastic time processes in general and of Markov chains in particular.

We now are ready to turn to the ethological applications of Markov chain models and to the related metaphors.


III. The organism-environment metaphor: Experimental psychology and ethology

distinguished.

Metaphor and the Operational Definition

As Emblet writes, when we use a word metaphorically, we are using a fact of nature with which to form an idea (Embler, 1968, p. 393). Artifacts, in particular, are visible, familiar, public, operational facts of nature. On this topic, a reference article on the bases of language is worth some contemplation:

". . . the ways of talking about artifacts and about how to make [and use] them constitute the bulk of the vocabulary of any language and the largest proportion of all actual speech. . . . In all kinds of metaphors, or even in plane talk about other areas of culture, there is constant [metaphorical] use of vocabulary items and turns of phrase coming from technology and the allied fields ('Strike while the iron is hot,' 'As ye sow, so shall ye reap,' 'The mills of the gods grind slowly,' . . . and many more) . . . The systems involving the use of materials in making symbols are of fundamental importance to the symbolic systems as such, especially communication."

-- Trager (1957, p. 698)

Notice Trager's own metaphorical use of "allied fields" -- fields allied to farmland, or maybe vehicle parking, in which, for example, ideas might be sown.

Not all metaphors are based on artifacts, of course, but one of the defining characteristics of a metaphor is that it always leads from a well-comprehended situation to another situation. In particular, all operational definitions are definitions by metaphor, because operational definitions take perceivable, public, often artifactual sets of actions as models and then transform the usages of the defined words according to the ways by which those sets of actions are thought. Because their models are perceivable, operational definitions tend to be unambiguous, at least among human beings who are sharing the same cultural backgrounds.

For example, take "intelligence," which may be defined operationally as what IQ tests measure. If we are familiar with IQ tests (how they are made up, answered, scored, etc.), then the definition guides us to a certain (partial) comprehension of intelligence.

More generally, operational definitions in a given science will yield insight into the phenomena defined in that science if and only if one is familiar with the tools and experimental procedures of that science. Related to this are assertions such as the one


stating that "if you can not do physics problems, then you don't understand physics".

The Organism and the Environment

The Organism-Environment Metaphor

As was mentioned above in passing (p. 4), the terms "organism" and "environment" reflect the presence of a commonplace metaphor in the behavioral sciences. This organism-environment metaphor might seem to be based upon the model of a biological "organism" which lives in a certain "environment". However, in experimental psychology, the organism most often is thought of as a system which is controlled by its environmental energy inputs, as in the system-environment model of, say, classical thermodynamics (see Denbigh, 1955, pp. 5 - 6 for three kinds of thermodynamic system-environment model). The organism is thought of as a system which receives stimuli across its boundaries and which changes its state as a result.

As a single instance in experimental psychology, it probably is historically accurate to say that, guided by the system-environment metaphor, a series of Skinner-box experiments was performed by Skinner and others (Skinner, 1959); a Skinner-box model of behavior was developed; and, at the same time, a Skinner-box metaphor was formed. Such a model thereby was revealed by the terminology, and, within the metaphor, this model then enabled those familiar with the experiments to comprehend behavior in general, at least in a Skinner-box way.

What is E?

In experimental psychology, the experimenter, and the contingencies invented for experimentation, often are identified with the environment. For instance, in discussing the problem of how to decide which events should be classified together to define a "response" in an experiment in learning theory, Logen has proposed that

"Responses are separated if the environment . . . distinguishes between them in administering rewards . . .. Thus, if the reward is given independently of how [a] rat gets into the correct goal box of a T-maze, then the various ways of doing it can be classified together [as the same response]"

-- Logan, as cited in Luce, et al (1963, p. 20).

Obviously, Logan meant experimenter when he wrote environment, for, surely, lacking spiritual collaboration, an "environment" cannot distinguish between responses or administer rewards. Or can it? -- Are the anthropomorphic metaphors still of value to a scientist in comprehending the operation of the conceptual entities defining scientific thinking? Probably yes, but probably it is progress of some sort that modern scientific


prose style should demand the use of the passive voice of impersonal thought, a voice crying out from the collective mind of a sort of omni-absent Presence who sets the stage for the action to occur scientifically.

It would be interesting to analyze how Western scientific thought still is guided by metaphors based on cultural artifacts of one or another god, the systematic theological artifacts that succeeded the anthropormorphic gods of the Mediterranean agricultural city-states.

Ethology

One of the defining characteristics of ethology is that identification of the experimenter or the environment carefully is avoided: The environment, for an ethologist, is the "natural" surroundings of an organism in which, presumably, that organism has evolved. The natural surroundings of some organisms such as rats may, of course, include human beings as important factors.

Ethology is the study of behavioral patterns as they interrelate a given organism (a) with its conspecifics and (b) with the environment in which the species has evolved. In ethology, the experimenter primarily is an observer, and, if experiments are performed, then the purpose is to clarify the behavioral relation of the organism with respect to some facet of its environment. The purpose is not to learn how to change the state of the organism. The ethological metaphor is one of biology, not of physics.

Classification of states in ethology. Because the experimenter is not seen as manipulating the organism in ethology, a problem arises as to how to classify the "states" of the organism at a given time. In particular, if the states of an organism are to be seen as undergoing a stochastic process, then, How should these states be identified? Which behavioral elements should be seen as making up the behavioral state-sets on which the process random variable(s) should be defined?

There can not be any one answer to these questions, even for a specific organism, simply because the current metaphor of a given ethologist will determine what is meant by the word "state". Let us look at three different points of view:

1. An ethologist's states may be an exhaustive, finite set such as "eating" or "not eating". But, oversimplification can lead to problem; for example, it is important to realize that a animal is in a different state at the beginning of a meal than at the end (see Skellam, in Cain, 1959, p. 55). Likewise, a dog can be barking or not barking -- but, the condition of the dog's tail as wagging or not wagging often is an important behavioral index, too, and the definition of the dog's state often should take the dog's tail into account. The importance of the dog's tail is an example of the crux of the ethologist's problem in classifying states: If the ethologist were a postal service employee or a judge at a dog show, then a criterion easily could be established which would decide whether the wagging tail was important, and how. But, the ethologist is supposed to be studying states of the organism which are important to the organism itself -- and its species -- in a purposeless "natural" environment. An ethologist can only hope that the observations


actually made represent discoveries of what is important for the theoretical model being used or built.

2. An ethologist's set of defined behavioral states may be finite and nonexhaustive; the set may be outlines on a list which can be expanded or further broken down on the basis of behavioral observations during the course of the study or afterwards. Such a list simply may consist of all acts which are readably codable, thus reflecting a trend of the late twentieth century (see Luce, et al, 1963, p. 567). But, in this case, there is a problem of deciding when the list is complete and whether the acts as coded are important. On these issues, for a statistical decision procedure, see Goodman (1949); on the question of importance, see Cane (1959, p. 55) as above and also Altmann (1965, pp.520 - 521).

3. The ethologist may believe that behavior in some sense is continuous. Consistent with such a belief, or, better, defining it, the ethologist may not code discrete "fixed action patterns" or suchlike; instead, the study of behavior may be in terms of the monitoring of continuous quantities such as weight or location of the subject animals; or, the entire investigation may consist of sound-recordings or photographs of the animals. In any case, both before and after the data are gathered, the ethologist must decide what is important. Regardless of how slavishly the ethologist worships "objectivity", the interests of the science can not be satisfied by just describing the data, or with just making a model to fit either the data or the numbers used to represent the data. Any such model must clarify the situation for the ethologist, or the model is worse than useless. The model must lead to or expand some metaphor, or it fails any purpose at all.

In summary, an ethologist is in a difficult position when deciding how to classify the behavioral states of the organism(s) being studied. The ethologist must be an observer, only; thus, in the search for criteria for classification of states, there may not be any recourse either (a) to personal ethological purposes or (b) to the outside purposes of fellow scientists or the general public.

Otherwise stated, an ethological experimenter may not be the environment of the organism(s) studied. Thus, the importance of a given observable behavioral element, as reflected by the way in which it is classified, must be decided on grounds of importance to the organism and/or to the species of that organism.

An experimental psychologist can invent contingencies and make them stick, like the rules of those human artifacts called games; an ethologist, however, whether realizing it or not, must infer which contingencies, existing in nature, will make the ethological classifications of state important or trivial. For example, an ethologist must infer how "natural selection" shapes behavior so that "pro-survival" behavioral traits are "preserved". Because such inferences are guided by ethological metaphors, the ethologist can only have faith that the models constructed somehow do correspond to the actual states of affairs which are reflected in the contingencies laid down by nature.

Metaphor in ethology. As we have seen, metaphors determine how an ethologist (or anyone else) comprehends a given situation. It follows, then, that the number of ways in which the observed states of an organism can be classified will be a function of the number and kinds of metaphors available for comprehending that organism's behavior.


Each different metaphor guides the ethologist to a different comprehension of the states of the organism being studied. Each metaphor yields at least one different set of states -- states which perceptually are constructed by the ethologist, that is, by the way the ethologist thinks. If the study is scientific, then the ethologist will be gathering new information continually, and each new piece of information will be comprehended in turn, fulfilling or altering the model, confirming or changing the metaphor.

Let us define an accurate metaphor as a metaphor based upon a model which correctly fits some set of data. We assert again, that it is metaphor, whether accurate or not, which guides the thoughts and the behavior of any experimenter. In particular, a metaphor guides the experimenter's behavior when he designs or uses artifacts or tools.

In general, if an experimenter's metaphor is not accurate, and if the experimenter should care to make predictions, those predictions will not be confirmed by the data, which is to say, by the situation being studied. Such an experimenter will find the situation surprising or confusing, or both. In fact, with an inaccurate metaphor guiding all thinking, an experimenter probably will not be able to recall accurately the data gathered while forming the badly-fitted model. But, this is a whole other subject, the study of metaphor and mnemonics.

In ethology, the thoughts and behavior of the experimenter ideally are not relevant to the behavior of the animal(s) being studied. Certainly, the ethologist's own behavior, and the functioning of the ethological experimental artifacts, if any, can be concealed well from the subject(s), if necessary, with no ill effect upon the data. This is not so in experimental psychology, where the contingencies in their effects are direct results of the behavior of the experimenter and/or of the functioning of the experimental artifacts; otherwise stated, the experimental psychologist's metaphor physically affects the subject(s). To use a playful metaphor, the contingencies, or rules of the game, in experimental psychology reflect the metaphors of the experimental psychologist, and that experimental psychologist aims at finding out how the subject plays the game. By contrast, the rules of the game in ethology are the inferred rules of nature, and the ethologist's ultimate aim is to form a metaphor consistent with that of the Deity, perhaps, or of something equivalent.

In ethology, then, as opposed to experimental psychology, there is some justification for considering the experimenter as isolated both from the organism under study and from its environment. Ideally, the metaphor-consistent working practice of an ethologist physically will not alter the data gathered to test a model, whether or not the data happen, in fact, to be important. The ethologist, like the experimental psychologist, can design valid scientific models to fit observed changes of state of the organism in its native environment. However, unlike the experimental psychologist, who cannot be included as such in the model, the ethologist can design valid scientific models to fit observed changes of state of the isolated organism-environment system as a whole. Both the experimental psychologist and the ethologist can see the states of an organism as undergoing a stochastic time-process and can validly test classifications of those states; such classifications may be of sets of behavioral elements taken to represent theoretical states, for example to determine whether the organismic process involved might be, say, a


Markov process.

For ethology, though, regardless of how accurate might be a metaphor for comprehending the data, and regardless of how valid might be the tests of a formal model, the importance of the data, as gathered, must remain an open question -- open for so long as ethology separates the experimenter from the environment.


IV. The ethogram and Markov models in ethology.

Metaphor and the Ethogram

An ethogram is defined as a collection of field observations which cataloges all the observed behavior of some species over the entire life cycle (see Hess, 1962, pp. 159 - 160). Ethologists ideally believe that all the facts on behavior must be acquired before hypotheses are formulated (Hess, 1962, p. 160). The history of this belief is outlined by Hess, who also has discussed how ethograms might be used once they have been formulated (Hess, 1962, p. 160; pp. 199 ff.).

Problems of the Ethogram

Inherent obsolescence. Some of the inherent difficulties of behavioral catalogues such as ethograms were hinted previously above, where classification of states was discussed. What was hinted was the idea that if any objective behavioral catalogue were used to formulate an hypothesis, which is to say, was made important to the ethologist, then it would be necessary to check the validity of that hypothesis against actual animal behavior, not just against the catalogued data. This hint just alludes to the difference between the study of recorded history and the scientific method.

For example, suppose an ethogram of the Impayan phesant contained the observation that at a certain phase of courtship, the male raises and spreads his tail and wings while bending his head low, and that the female then begins to search for food in front of him (see Hess, 1962, p. 204).

Now, suppose a certain ethologist wished to conclude from the preceding observation that the male's behavior, as outlined, was a "releaser" for food-seeking behavior in the female. If so, the ethologist should have to make a separate set of field observations of the behavior of the Impayan phesant in order to gather data to verify the new, releaser hypothesis. Our ethologist would have to gather new data for this standard, universal reason: The ethogram, because it was objectively drawn up with no such releaser hypothesis in mind, might have misclassified or omitted details and/or large-scale aspects of the Impayan phesant's courtship behavior which might be crucial to the hypothesis and which would be obvious only to an observer who was aware of the release hypothesis in question.

Again, this is just the scientific method: It is assumed that scientists change with time, and that scientific progress is possible. Therefore, it is assumed that after formulating an hypothesis, the hypothesis will enable a scientist to do more in the experimental situation than that scientist would have been able to do before. In


particular, it is assumed that something new will be visible to a scientist after a new hypothesis has been formed. It is assumed that such a scientist's new hypothetical model will alter the metaphor for it. Therefore, the scientist's data records must be revised continually so that they will reflect that scientist's increasingly sharper comprehension of the world.

In the Impayan phesant example, the ethogram might not have included an observation that, say, only when the birds were oriented so that the male's shadow was falling upon the ground directly in front of the female is the female's food-searching behavior elicited. The shadow might be the releaser, not the male's own bodily movement. Discovery of the shadow releaser would mean that the entire ethogram would have to be revised to include time-of-day and compass-direction information for the birds during courtship. The old ethogram would be obsolete.

In any case, any ethogram is inherently obsolescent to the extent that it contains scientifically valid data. In science, new models continually are becoming new metaphors; and, the new metaphors continually are introducing new comprehensions to replace the old.

Elusiveness of "importance to the species". As previously discussed, ethologists must deal with the inherent problem of deciding which behavioral acts of a given organism are important to its species. As implied in the preceding paragraphs, it is certain that, at any given time of observation, important behavioral acts or classifications of state will be invisible both to the ethologist and to any artifacts -- pieces of recording equipment -- in use in the field. These conditions will be invisible simply because the ethologist's current metaphor does not permit them to be thought, and because, therefore, the available apparatus has not been built or operated to detect them.

Examples of invisible but potentially important classifications of state are very numerous, because they collectively are equivalent to the history of behavioral science. In experimental psychology, for example, almost no experimental data gathered before 1930 is of any use in confirming or refuting any 1960's hypothesis put forth in the field of operant conditioning. Before 1930, no one knew what an operant looked like; the idea of an operant, right or wrong, didn't yet exist.

In science, new experiments must be performed, because the old-time experimenters just didn't pay attention to what is now important. The same holds for the current year of 2012. Our present-day comprehensions of scientific phenomena depend only in part, and in general, on the comprehensions of past experimenters.

Some ethologists attempt to sidestep the importance-to-the-species problem by saying that they will concentrate upon classifying behavioral acts which are relatively inflexible, species-specific instinctive acts, because such behavioral classifications will be "so overwhelmingly apparent" to them (Thorpe and Zangwill, 1961, p. 87). Yes, ethologists do seem to recognize that problems exist in classifying the units of behavior; after all, that is where the problems involved with Markovian splitting and lumping units are to be found (Altmann, 1965, pp. 492 - 493). However, ethologists rarely seem to recall that their units, their instinctive acts, and their fixed action patterns, which were so


"overwhelmingly apparent" to them in 1970, were not at all overwhelmingly apparent even to the best educated, most intelligent naturalists before Darwin.

Models lead to metaphors, metaphors guide thoughts, and very few data that can't be thought are recorded. Few of these ethologists recognize that it is their metaphors which underly their behavioral classifications at any given time.

Similar problems in field anthropology. Although the present chapter is not the place to comment on human behavior as such, it is interesting to note that anthropological field studies are plagued with the same recording problems as those of the ethogram -- but, only more so! A field anthropologist sees and records only what that anthropologist can comprehend. Especially, such persons see only what is important to their hypotheses or what is important in the light of what that person considers "objective". Examples of such recording problems, problems in which different field anthropologists have seen wholly different cultural psychologies in the same societies, are cited by Krech, et al (1962, pp. 358; 360). These examples include the famous study by Margaret Mead, of the New Guinea Arapesh. Without intending any specific failure, one does wonder whether Margaret Mead actually saw (= a metaphor) men's "womb-envy" of women (e. g., Mead, 1949, pp. 78 - 104), or whether her "womb-envy" was more a simple comeback to counter Sigmund Freud's childhood-based metaphor of women's "penis-envy" of men. Metaphorical dreamers, both of them -- like us all. The model versus metaphor question with regard to womb-envy or penis-envy or of any other explicitly postulated model meant to govern human behavior is one aspect of a fundamental indeterminacy which will be discussed further in a future work.

Avoiding the Ethogram Problem in Markov Chain Models

As suggested by Cane (1959, pp. 45 - 46), a behavioral record such as an ethogram can be produced as a system in its own right, not merely as a record of the states of some unspecified behavioral system. In particular, in studying a given set of observations as a stochastic process, the record of the set of observations may be considered as the thing being sampled (Cane, 1959, pp. 45 - 46); thus, all classifications of state become internals of a single observer-centric system. States classified by the observed become valid data as sampled. If, as it happens, these data are observed as a Markov chain, then the observation itself, as sampled, also will be a Markov chain (Cane, 1959, p. 46).

Thus, if the "real" process in fact is a Markov chain, then, by studying the record rather than the behavior as such, the ethogram problems discussed above can be avoided: Out of sight, out of mind. However, the sampled record approach does not remove sampling biases which might distort inferences concerning certain non-Markov-chain models (see Cane, 1959, pp. 46 - 48).


Applications of Markov Models in Ethology

In this section, we shall be studying in detail certain ethological writings by the following authors: Landau, Cane, Altmann, and Nelson. Before beginning, we should recall that one of the features that distinguish stochastic models from, say, game theory models, is that stochastic models incorporate psychological parameters applicable to the subjects or systems undergoing the process of interest (see Luce, et al, 1963, pp. 571 - 572). If such psychological parameters are dependent upon the past history of the process, then the process cannot be a Markov process. Such stochastic models also may incorporate environmental parameters such as contingencies of reinforcement; such contingencies also must be memoryless, for a Markov process (see Luce, et al, 1963, p. 572; Cane, 1961, p. 372)

The question of Markovian status has been resolved differently by Luce, et al (1965, pp. 254 - 255; 1963, pp. 569 - 570): For any fixed pair of subjects, the "social factor" of rank difference would be a "memoryless" psychological parameter if the process was defined on the states of a "two-headed" subject.

A Markovian Model of Social Dynamics (Landau)

Landau has inferred that, if dominance were solely determined by the inherent individual (psychological) characteristics of a group's members, the dominance hierarchies rarely would be created (Landau, 1951, p. 254). However, such hierarchies are known to occur commonly, for example in groups of domestic hens; so, Landau suggests that social factors or relations among members may be sufficient causes of such hierarchies. Landau therefore has built a model based on the idea that social rank throughout a hierarchy can be considered to be the result of a continuously ongoing round-robin tournament among the members (Landau, 1951, p. 258). In this model, each encounter between any two members is seen to constitute one contest of the tournament, and the winner of each such contest becomes dominant over the loser, at least until the next contest between them. No member of the group can remember the results of previous contests. The extent to which round-robin dominance is transitive throughout the group is the extent to which a dominance hierarchy exists.

Landau's model is implemented as a Markov chain the states of which are the overall dominance configurations within the group. Transitivity of dominance in each state measures the extent to which that state is a hierarchy. As Luce, et al (1963, p. 570) explain, for Landau's model, P⃗ would be the one-step transition probability matrix for transitions between group dominance configurations, and the equilibrium distributionf ( X̄ e) would represent the relative frequencies with which the different dominance

structures would be expected to be occurring a sufficient time after the group first formed (compare Landau, 1951, pp. 256 - 258).

From the Markov model, Landau is able to conclude, among other things, that if there


is a group-wide uniform bias against individual contest resulting in a reversal of dominance, then, no matter how large the bias, so long as the probabilities of reversal are not zero, the bias will have no effect on the equilibrium hierarchy structure (Landau, 1951, p. 259).

Yet more interestingly, assuming some degree of transitivity of dominance to exist momentarily, the Markov model yields the conclusion that social factors which restrict winnable contests only to individuals of approximately equal rank also will cause a hierarchy to be the only stable social structure for the group (Landau, 1951, pp. 261 - 262). While for a large group, the identities of the individuals occupying specific positions in the hierarchy will be changing constantly, the hierarchical structure, once attained, will be stable and permanent (Landau, 1951, pp. 261 - 262).

Landau's model, like any model of dominance, is particularly interesting because one can see its human applications: The Landau model can provide answers, right or wrong, to such human questions as, Who should I challenge? Can I get ahead if I act as though people won't remember whether I have dominated them or not? Machiavelli would say yes -- but that was very long, and very ignorant, ago. What company should I keep if I want to maintain my position? Or improve it? Some beautiful implications are half-concealed in the Landau model; but, they are soft implications, and, like the soft, half-concealed implications of other kinds of model, they can be discriminative stimuli for certain ill-fated, naughty responses.

At this point, it should be understood that the ingenuity of any math model of dominance almost certainly will be less than the inventiveness of the dominant members of a real human hierarchy. It is a fact that mathematicians do not rule the world, any more than theologians rule the church.

Behaviors As Semi-Markov Chains (Cane)

One of the few papers discussing applications of stochastic models specifically in ethology is Cane's, published in 1959. Cane's study is a methodological survey which compares Markov, semi-Markov, and "ethological" models as they could be applied to the data in three different ethological papers: One by van Iersel, published in 1953, on stickelback (minnow) courtship; a second by Hinde in 1958, on nest-building behavior of canaries; and, a third by Bastock and Manning in 1958, on drosophila (fruit fly) courtship, but including reference to an hypothesis concerning excitatory centers in the brain. Cane's paper emphasizes applications of the semi-Markov model, only.

Cane's (1959) thesis is that the stochastic organization of behavior may well be more important in recording behavior than is the simple frequency of occurrence of the behavioral events (see Cane, 1959, p. 58). Possible results of some applications of her thesis to human psychology will be discussed later.

Cane's Markov model. Of Cane's three stochastic models, the Markov model was, of course, discussed already, in an earlier part of the present paper. Cane's own discussion


of the Markov model is brief and scattered; it also is confined to side comments on bout lengths and on the parameters needed to specify the model.

Cane's semi-Markov model. The idea of the semi-Markov model was introduced briefly above. In addition to Cane's 1959 writings, another discussion of the semi-Markov model can be found in Cane's 1961 paper (pp. 367 ff.); a detailed treatment of this model is available in Schrady (1966).

As Cane points out, the semi-Markov model has an advantage over the Markov chain model in that the categorizing together of semi-Markov states yields, in general, another semi-Markov process (Cane, 1959, p. 39). By contrast, Markov chains are not in general lumpable, as was explained above. On the other hand, according to Armitage (in Cane, 1959, pp. 49 - 50), the distribution of waiting times ("bout lengths") for the semi-Markov model is not known very clearly in general, although Schrady (1966) has presented some such waiting times for special cases.

It is of considerable importance to keep in mind, as mentioned earlier, that a semi-Markov process is not memoryless unless its decision points are exponentially (geometrically) distributed in time.

Cane's ethological model. According to Cane (1959, p. 37), the model which ethologists seem to have in mind when they summarize behavioral records is what she calls the ethological model. Of course, we understand such a "model" as a metaphor. The Cane ethological model is such that it

"assumes that the organism begins in one state, say S1, continues in[S1] until it has finished with that sort of activity, changes to

alternative state S2, and, after a certain time in S2, changes back to S1, and so on. The times spent in S1 [are variants of a random variable X and] have a fixed distribution [ f (X )] , since this is regarded as steady-state behavior, of mean length μ1 , say; similarly there is a fixed distribution [ g(Y )] for the lengths of stay in S2, with mean μ2' '

-- Cane (1959, p. 37)

Thus, according to Cane, ethologists already have in mind a stochastic model for animal behavior. As Cox (in Cane, 1959, pp. 53 - 54) explains, the ethological model involves two sequences of independent random variables; these sequences are

{X 1 , X 2 , X 3 , . . . }

for state S1 bout lengths, and

{Y 1 ,Y 2 , Y 3 , . . . }

for state S2 bout lengths. The independent random variables Xi all have frequency function f (X ); the independent random variables Yi all have frequency function g(Y ); and, the order of behavior is given by the sequence


{X 1 ,Y 1 , X 2 ,Y 2 , X 3 ,Y 3 , . . . } .

Cox, cited by Cane, goes on to develop some theory; he also discusses past applications of the ethological model to the study of the precision of systematic sampling in industry.

Rhesus Monkey Markov Chains (Altmann)

Altmann made a two-year effort to compile a sort of ethogram for communication acts of rhesus monkeys living wild on the island of Cayo Santiago in the West Indies (Altmann, 1965, p. 492). Altmann's 1965 paper presents a series of stochastic models for rhesus behavior equivalent to n-th order Markov chains and based on the sequential record he gathered for his ethogram.

Altmann claims three interpretations for his models: (a) As descriptions of rhesus behavior, (b) as predictors of rhesus behavior, and (c) as measures of the memories of rhesus monkeys (Altmann, 1965, p. 513). He reports data from his analyses up through Markov chains of the third order (Altmann, 1965, p. 507).

Altmann's methodology. Because Altmann uses the sampled-record approach in order to avoid certain problems with the ethogram (see above, too), one effect is inevitable: Namely, the more the order of the Markov chain model is increased, the better the model fits the data in the sampled-record. This effect is inevitable because, when the order of the chain becomes so large as to span the entire record, the fit necessarily must become perfect; Altmann's Figure 4 (1965, p. 510), which summarizes his results, reflects this effect, at least to some extent.

Altmann samples his record so that his decision points occur effectively at equal intervals in time. His record, itself, thus retains only ordinal aspects of the behavioral events being classified (see Altmann, 1965, p. 494). Therefore, no meaningful inferences concerning waiting times can be made. However, because the fits of Altmann's models are summarized in terms of conditional uncertainty left after nth-order approximations have been made to the record, the fits of the Markov model waiting times are not crucial to the validity of his approach.

The "memory" interpretation of Altmann's models. What is most interesting about Altmann's rhesus study is that he uses the order of approximation needed to describe completely the monkeys' behavior as a measure of the monkeys' social memory (Altmann, 1965, p. 513; also pp. 509, 515 - 516). The higher the order of approximation needed, the more remote are the past events which affect present behavior, and, hence, the longer must be the monkeys' memories.

Using the general semantics term "time-binding" to describe the ability of the monkeys to base their present behavior on past experiences of the species, Altmann also points out some evolutionary advantages of long social memories; in this context, he discusses implications of the stereotypy of behavioral patterns which he found in the monkeys (see, e. g., Altmann, 1965, pp. 491, 515 - 516). For information on the original usage of "time-binding" as the defining difference between humans and animals, see, for example,


Korzybski (1950, pp. 147 - 149).

Altmann's behavioral nonreleasers. One more observation should be made concerning Altmann's paper: Altmann finds that none of the behavioral patterns he defined had an invariant antecedent (or successor) pattern (Altmann, 1965, p. 518). He advances this as evidence contradicting Lorenz's simple theoretical picture of behavior as a stereotyped chain of releasers, and he observes that, "a behavioral system in which each behavior pattern has one and only one antecedent 'releaser' is a determined system" (see Altmann, 1965, p. 517). Altmann also cites Aronson as having pointed out how simple, unqualified observation of animal behavior can cause an ethologist to gain the impression that certain acts in effect are "releasers" (Altmann, 1965, p. 517).

Markov chains in fish courtship (Nelson)

Nelson, in his paper on temporal patterns in fish courtship (Nelson, 1964b), has presented the only ethological evidence known to the present author, as of 1970, in which an animal's behavior could be described well by a Markov chain model.

Nelson's computational methodology. To obtain his data, Nelson observed long-term courtship behavior of pairs of members of four species of glandulocaudine fish. He recorded coded behavioral actions using a keyboard coupled to a 20-channel recording instrument. Whereas Altmann (above) was recording behavioral sequences, Nelson was recording the temporal patterning of behavior; in this way, Nelson obtained valid data on bout lengths and waiting times between actions. The difference between sequences and patterning is discussed further by Klopfer and Hailman (1967, p. 212).

In a way slightly reminiscent of Altmann's approach, Nelson defined a courtship sequence as a series of statistically dependent events bounded by an intersequence interval at each end. The reader should note here the very obvious space-time metaphor in use. Given the preceding breakdown, an intersequence interval then was defined as an interval which separated statistically independent events with probability of error α= .05 (Nelson, 1964b, p. 100). Chi-square tests were used to establish independence

and thus locate the beginnings and ends of courtship sequences.

With the data thus determined, the Markov model then was applied to describe events within courtship sequences. Cumulative distributions of bout lengths within sequences were plotted as survivorship curves, a display format which has been described here previously.

Nelson's courtship results and conclusions. The conclusions calculated for the species studied were:

1. Courtship in Pseudocorynopoma doriae is non-Markovial (Nelson, 1964b, p. 125).

2. Courtship in Coelurichthys tenuis and Glandulocauda inequalis generally is Markovian, except for a certain pattern ("gulping") in Glandulocauda which tends to occur rhythmically (Nelson, 1964b, pp. 128, 130).


3. Courtship in Corynopoma riisei varies: In the male, it is Markovian, while in the female, it is not (Nelson, 1964b, pp. 110 ff.).

As summarized in Klopfer and Hailman (1967, pp. 51, 212 - 215), Nelson concentrated on Corynopoma, leaving a detailed discussion of Glandulocauda courtship for another paper (Nelson, 1964a). Nelson presented an evolutionary argument for the differences in courtship behavior between the Corynopoma male, which is physically very active, and the female, which hardly responds in any way at all. The Markovian organization of the male's courtship is described as less efficient than memoried courtship but more efficient than, say, sequences of mutually independent actions. Nelson suggests that the Markov chain may be the most determinate courtship activity possible for the male, in view of the unresponsiveness of the female (Nelson, 1964b, pp. 134 - 135).

Nelson also claims that because Corynopoma is internally fertilized, there is a selective advantage to having the male always ready to impregnate but not subject to the drastically-fluctuating internal motivations which would result from memoried courtship; for the phlegmatic female, he claims that mating readiness is best induced by a massive courtship bombardment initiated by the male (Nelson, 1964b, pp. 138 - 139) -- but, a little facetiously, in compensation, should the male be rejected, he never can get depressed.

Nelson's discovery of the Markovian nature of Corynopoma courtship probably was helped greatly because he was able to plot survivorship curves for the waiting times: In several instances, classifications of state yielded nonMarkovian processes, but the curves enables Nelson to see how to reclassify states and thus make the resulting separate- or combined-state survivorship curves linear. For example, this last is described for Glandulocauda in Nelson, 1964b (p. 120 and pp. 129 - 130).

A solution to some of Nelson's discrepancies. Before closing our discussion of this topic, we should point out that Nelson reports that, during his chi-square testing for within-sequence second-order independence, he came across certain nonMarkovian discrepancies in the frequencies of waiting times occurring in the early stages of the Corynopoma sequences (see Nelson, 1964b, p. 107). When data only in the longer sequences was used, these discrepancies were found to be less (Nelson, 1964b, pp. 107 - 108), and Nelson explains the discrepant effects as caused by longer intervals between early intrasequence events, longer intervals making events at the beginnings of sequences more independent of one another because of nonstationary one-step transition probabilities pi j (Nelson, 1964b, pp. 107 - 109, 110, 120). On this effect, it should be mentioned that Anderson and Goodman (1957) have devised a statistical decision procedure for testing whether the pi j in fact are constant.

However, a different explanation of the Nelson discrepancies is possible: Suppose that the pi j indeed were stationary, as is required of a Markov chain, and suppose that the "true" underlying behavioral chain was weakly lumpable with respect to Nelson's particular partitioning (= classification) of states. If this were so, then the real chain would not become lumpable until after it had been running for some time after the start of each courtship sequence. Assuming all this, given a suitable, slightly nonequilibrium


distribution f (X 0) at the start of each sequence, the discrepancies then would be explained, because the Nelson process as classified by him would not have become a Markov chain until after it had become lumpable. And, such a process would not become lumpable until it had had time to reach equilibrium shortly after the start of each courtship sequence.

Thus, according to this explanation, the underlying real process still might be assumed to have been a Markov chain, and the observed discrepancies thus may be assumed to have been caused by problems caused by our old friend, classification of state.


V. Markov Models in Experimental Psychology

Language is sometimes called a cemetary of dead metaphors.

-- John Lotz (1956)

Touching upon Behaviorism and Experimental Psychology

Behaviorism

In one variation of the organism-environment metaphor, one considers the state of the organism as being completely defined by the state of the organism's boundary. Under this assumption, the organism is to be seen as identical to its boundary with the environment, a boundary which may be studied in macroscopic or microscopic detail, and before or after partial vivisection. This variation of the organism-environment metaphor generates what is called behaviorism. This metaphor may be rephrased as the central premise of behaviorism, namely, that the organism ends at its skin.

Not that behaviorism doesn't depend upon experimentation, the behavioristic metaphor is a premise which says to ignore any invisible "images", "thoughts", sensations", etc. which might be assumed to reside inside the skin of the organism(s) being studied. This premise postulates that the study of the behavior of this skin alone, a skin which moves and can be seen and touched and manipulated, is sufficient to exhaust all meaningful research in psychology. Only the organism's skin, its outermost boundary, should be perceived by the experimenter, and all such perceptions are virtually guaranteed to be public and "operational".

It is this behavioristic variation of the organism-environment metaphor which enables behaviorists to think in operational terms.

Experimental Psychology

In this field, in the present author's view, the experimenter and associated artifacts must constitute important facets of the subject's environment. This importance, as discussed in the beginning chapter above, is taken to be the primary distinction separating animal experimental psychology from ethology. Not that experimental psychology doesn't depend upon behavior, in experimental psychology, the contingencies invented by the experimenter (E) and expressed in E's behavior and/or in the operation of E's experimental artifacts are said to "hold" for the subject (S). E need not infer whether S is responding "correctly" or not, "adaptively" or not, "cooperatively" or not with other Ss, etc., because E -- not god, not nature -- is the important structuring force in E's environment, by definition.

Here is why the study of learning dominates the field of experimental psychology as of


the late twentieth century: E knows what will be judged to be correct, adaptive, etc., so what could be more natural than to study how S finds it out? Or, in the case of the study of sensation and perception, what could be more natural than to study how S sees, hears, etc. what is correct, what is presented by E?

The remainder of the present chapter will be concerned primarily with applications of Markov models in the study of human behavior and in experimental psychology. Applications of Markov models in learning, in group dynamics and role theory, and in the study of language all will be touched upon.

Markov Learning Models

The most widespread application of Markov models in psychology to-date has been in the field of learning. A vast literature has grown up around Markov learning models, and it would be impossible, and if possible, pointless, to survey this literature in the present paper. Partial surveys may be found in Suppes and Atkinson (1960, pp. 42 - 46) and in Bush (1960). References to nearly all the literature may be found scattered throughout Volumes II and III of the Handbook of Mathematical Psychology (Luce, et al, 1963, 1965b) and in Bush and Mosteller (1955). A formal, mathematical presentation of one of W. K. Estes' learning models may be found in Kemeny and Snell (1960, pp. 182 - 191). The Handbook, of course, especially Volume II, overflows with explanations and examples of Markov learning models. Suppes and Atkinson (1960) also treat several models clearly and in great detail.

In our present chapter, we shall discuss just one example of a Markov learning model in some detail; a few remarks then will be provided on some of the relationships between Markov (and other stochastic) models and typical information theory or game theory approaches.

A Markov Learning Model: Noncontingent Reinforcement

According to at lease one stimulus sampling theory, stimulus elements can be sampled (a) by component, which is to say, in sets of mutually exclusive elements; or (b) by pattern, which is to say, by interelemental gestalten (see Luce, et al, 1963, p. 153). To avoid complications on this issue, it will be assumed for the present example that only one stimulus "element" exists, and that that one element is presented on each trial. In the interests of further simplification, it also will be assumed that reinforcement is noncontingent -- which is to say that reinforcement depends in no way upon responses made by S, or upon past reinforcements.

Given this, we follow an example of Luce, et al (1963, p. 569), by defining the state xt of the subject S at time t as a set, or tendency, to omit one of two possible responses the next time the stimulus is presented, namely, at time t + 1. So, if the subject is in state 1 at time t, then response A1 should be the one which occurs at time t + 1; if S is in state 2 at time t, then response A2 should be the one which occurs at time t + 1. Thus, the response


set up by the state of the subject is conditioned to the stimulus.

Two parameters then enter the model, keeping in mind that reinforcement typically consists of delivery of a reward of some kind which is intended to reinforce (= render more likely) whatever the subject has done to deserve it:

(a) A reinforcement parameter π which gives the constant probability that response A1

will be reinforced on any trial. Reinforcement being noncontingent, 1 − π gives the probability that A2 will be reinforced on any trial.

(b) A behavioral parameter θ, unique for each subject, which gives the constant probability that the stimulus will be conditioned to response A1 if A1 is reinforced on trial t. Here, 1 − θ gives the probability that on trial t the stimulus will retain whatever conditioning it had previously. So, if response A2 happens to be reinforced on trial t, thenθ gives the probability that the stimulus will be conditioned to A2; 1 − θ gives the

probability that no change in previous conditioning will occur.

To complete the model, all that remains is to determine the transition probability matrix for a Markov model of the learning process. This matrix will be 2 × 2 , of course, because the system (subject) can be in two different states. So, we will have

P⃗ = ( p11 p12

p21 p22) . (14)

The terms in this matrix can be determined very straightforwardly. It should be understood that, throughout this exercise, the term "time t" equivalently may be read as "trial t".

For example, finding p11 is equivalent to finding the probability that S will be in state 1 at time t + 1, given that S was in state 1 at time t. But we know that for S to remain in state 1 for two consecutive trials, on the second trial the stimulus either must retain its previous conditioning to A1 or must be reconditioned to A1. From the above, then, the probability P1 that the stimulus will retain its previous conditioning is just p1 = 1 − θ ; the probability P2 that reconditioning will take place is just θ multiplied by the independent probability π that A1 will be reinforced, which is p2 = θ⋅π . Therefore, because retention and reconditioning are assumed to be mutually exclusive on any trial, the probability p11 that the subject will remain in state 1 given that the subject last was in state 1 is just p1 + p2 = θ⋅π + (1 − θ) . Thus, p11 = θ⋅π + (1 − θ) .

Reasoning in (14) above as we just have done for p11 , the other three transition probabilities p12 , p21 , and p22 can be found easily, and the matrix P⃗ for the postulated Markov chain can be seen to be

P⃗ = (θπ + (1 − θ) θ(1 − π)θπ 1 − θπ ) , (15)

and this is exactly the matrix presented in the example of Luce, et al (1963, p. 569).


Luce, et al do go on to state, slightly inaccurately, that, P⃗ as in (15) above "defines the Markov chain, which determines the entire time course of the process" (p. 569); however, as we have seen in the earlier parts of the present paper, an initial starting distributionf (X 0) also must be specified in order to define the chain. But, this is s small point.

A Monte Carlo experiment. To illustrate a Markov learning model in action, the present author used a table of random numbers to create a group of ten "Monte Carlo" subjects. These Ss, it should be mentioned, were found to be more docile, easy to care for, and cooperative than even the most tame of inbred albino rats; their use in future psychological experiments is recommended highly.

Anyway, the learning conditions were as in the noncontingent reinforcement case above, with the parameters π set at 0.9 and θ at 0.1. From (15) above, these parameter values yielded the matrix,

P⃗ = ( .99 .01.09 .91 ) . (16)

Five Ss were started in state 1 and five in state 2. Of course, the values of parameters chosen, as in (16) strongly reinforced response A1 rather than A2.

The experiment was terminated after 50 trials, and Figure 5 below displays the Markov process average learning curve obtained, with trials grouped in blocks of five.

Figure 5: Acquisition of the A1 response for a group of 10 Monte Carlo subjects under the noncontingent reinforcement schedule

of the text example (16). The ordinate is mean number of A1

responses per subject per block of five trials.


As an aid to better understanding of the meaning of the parameters in the transition probability matrix P⃗ of equation (15) above, consider a case in which π = 1 , yielding CRF (continuous reinforcement) of response A1; the resulting matrix is

P⃗ ' = (1 0θ 1 − θ ) . (17)

Here, as one might expect, P⃗ ' is the matrix for an absorbing Markov chain: Examining the model, once S enters state 1, there can't be any exit; state 1 is a sure thing. However, looking again at P⃗ ' , it also should be evident that until S has entered state 1, the value of the behavioral parameter θ still governs how long that particular S will take to latch on to 1, on the average.

As a similar aid, setting π = 0 reverses the contingencies, yielding CRF of response A2. From P⃗ of expression (15), we then obtain a new matrix

P⃗ ' ' = (1 − θ θ0 1 ) , (18)

and, again, according to the model, S will be absorbed into an unending orgy of responses in the reinforced state. A mathematical model for nirvana!

A Touch of Information Theory

An interesting discussion of the relationship of Markov chains to information theory, enzymes, and physical entropy can be found in Hockett's 1953 review of Shannon and Weaver's The Mathematical Theory of Communication. Although Hockett's discussion was more directly related to Markov models of language behavior, it is mentioned here because that review's relevance to language, and therefore to learning, as well as the importance of memory to the temporal integration of behavioral acts.

To skirt around the edges of information theory, one might recall the Markov learning model above and note that Luce, et al (1965b, pp. 425 - 426) have pointed out that a noncontingent schedule of reinforcement can be defined as one in which E sets up the situation so that the probability that reinforcement will follow a given response to the present stimulus is independent of past history, which is to say, of any information concerning any past response.

This definition of noncontingency means: (a) that the conditional uncertainty (conditional upon past responses) of whether or not a reinforcement will follow also is independent of past responses; (b) that the discrete time-process of successive reinforcements is a sequence of independent random variables; and, (c) that a zero-order approximation to this sequence is sufficient, with no further information, to describe the process involved.

A memoryless subject can, of course, learn a noncontingent schedule of reinforcement as well as a memoried subject can learn it. As a clear example of this, the reader will recall from above that Nelson's male Corynopoma had no discernable courtship memory


and yet courted successfully; so did the female, who in effect made it easier on the male by being so noncontingent as to exhibit no discernable courtship behavior at all.

To accommodate different influences of information to quantitative problems in different schedules of reinforcement, Luce, et al (1965b, as above) went on to define simple contingent schedules, double contingent schedules, etc., relating them to nth-order Markov chains and thus to nth-order sequential approximations of the uncertainties in the sequences of reinforcements experimentally to be delivered.

This last Luce, et al treatment may be seen as directly related to Luce's (1960, pp. 93 - 95) discussion of findings on the relationship of meaningfulness, textual redundancy, and recall of English text in humans. Human memory, like that of Altmann's (1965) rhesus monkeys, can be described partially in information-theory terms. Indeed, the evolutionary appearance of complex temporally integrated actions may well be a monotonic function of some measure of the development of the cortex of the brain in mammals (see Lashley, 1951, p. 182). However, Chomsky (1957) has shown that nth-order sequential approximations to this temporal integration are not sufficient to describe it for language behavior, at least in humans.

The Relevance of Game Theory

Throughout Suppes and Atkinson (1960), there reverberates the recurrent theme that game theory simply is not very promising as a descriptive, empirical approach to learning behavior. In experimental contexts of several sorts, Suppes and Atkinson repeatedly found (e. g., pp. 90, 282) that their (human) subjects simply did not learn to approximate an optimal game strategy. In addition, the asymptotic values (average payoffs) of these games to the players nearly always was found to be better predicted by a stochastic model than by a corresponding game theory model (Suppes and Atkinson, 1960, pp. 90 - 92, 133, 151, 178, 189; cf. p. 198).

An interesting example of how the beauties of the game theory metaphor can becloud the inadequacies of its applications can be found in Davenport (1960): In an anthropological study, Davenport purports to show how a Jamaican fishing village has "learned", over the years, to "play" a strategy which maximizes the income of its fishermen. In the Davenport model, the player-opponent of the village is presented as a combination of nature (irregular water currents) and the fish merchants of the outside general market economy of the island.

At first glance, Davenport's results seem encouraging in regard to the use of game theory: His predictions of the village's strategy are perfect. On closer examination, though, his reasoning can be shown to be tautological, in that he implicitly incorporates the village fishermen's strategies of where to fish when he computes, for the village's "game", the payoffs of his game matrix. Davenport then proceeds to derive back these strategies as "predicted" by the fishermen. Davenport does not seem to realize that he is dealing with an ethological problem for which a "payoff matrix" can not be induced except from the vageries of fate and the arbitrariness of the value of money.; he is not dealing with an experimental-psychology problem for which it might meaningfully be claimed


that the rules of the game have been laid down by him and must hold by definition.

Thus, although appearing otherwise at first glance, Davenport's study does nothing to weaken the rejection of game theory suggested by Suppes and Atkinson.

Suppes and Atkinson (1960, p. 33) also have pointed out that, like classical economics, game theory is a static, equilibrium-based approach which is concerned with the ideal behavior of a rational human being. Game theory thus is not behavioristic in the sense defined above because, if the organism ends at its skin, then it has no room for "rationality". This is why Markovian and other stochastic learning models contain behavioral parameters and are "mechanical", leaving no room for "insight", "strategy", or "logic" (Luce, et al, 1963, p. 572). These parameters are the behavioristic correlates of rationality. In the stochastic learning theory metaphor, the rationality of the subject is replaced with the orderliness of the parameters, with numbers, and thus is pushed back into the mind of E.

It is interesting that Chomsky (1957) has been able to show that language behavior cannot be linearly mechanical. One wonders whether adequate models of language behavior, should any such exist, will turn out to require the postulation of nonbehavioristic entities, entities such as Skinner's "speaker" and "listener", existing within the skin of every human being and, unobservably, controlling one-anothers verbal behavior (Skinner, 1957, pp. 73 - 74). Or, on the other hand, perhaps an equally unobservable, transcendant "verbal community" might be required, an entity continuously "arranging contiguities" so that humans will emit self-descriptive responses (Skinner, 1963, p. 954). One might speculate, somewhat discouragingly, that where language is concerned, perhaps the entire math-models approach, dealing only with often trite "countable acts" (see Luce, et al, 1963, p. 567), like so many other approaches, only may succeed in drawing another blank.

As was, after all, Suppes and Atkinson's main theme, stochastic models often do seem to be useful in learning situations in which language behavior is not involved. Actually, Cane has suggested that a semi-Markov model might be of use in studying the acquisition of slowly-learned skills in humans. The suggestion (Cane, 1959, p. 48) was that a semi-Markov model might provide the basis for deciding whether the improvement of mentally defective trainees which occurred during manual skill training was due to greater speed on the individual component tasks or to a better appreciation of the correct order of those tasks.


Conclusion

In the end, a model is a metaphor, and a metaphor is a thought.

We think in tools and in actions; we think in what we've seen and felt: As scientists, we can appeal to reality; but, in the end, we only appeal to what we can manipulate. When we think otherwise, when we think of what we say, we beg too many questions -- we appeal too much.

We can know all the answers, but only if we don't use thoughts.

(Note: The final chapters of this paper have been lost)


References

Altmann, S. A. Sociobiology of rhesus monkeys. II: Stochastics of social communication. J. Theoret. Biol., 1965, 8, 490 - 522.

Anderson, T. W. and Goodman, L. A. Statistical inference about Markov chains. In Luce, et al (Eds.), Readings in mathematical psychology, vol. I. New York: John Wiley and Sons, 1963.

Ash, R. Information theory. New York: John Wiley & Sons, 1965.

Bartholomew, D. J. Stochastic models for social processes. London: John Wiley & Sons, 1967.

Bastock, M. and Manning, A. The courtship of Drosophila melanogaster. Behavior, 1955, 8, 85 - 111.

Bell, E. T. Mathematics: Queen and servant of science. New York: McGraw-Hill, 1951.

Boyer, C. B. The history of the calculus and its conceptual development. New York: Dover, 1949.

Bush, R. R. A survey of mathematical learning theory. In Luce (Ed.) Developments in mathematical psychology. Glencie, Illinois: The Free Press, 1960, 121 - 165.

Bush, R. R. and Mosteller, F. Stochastic models in learning. New York: John Wiley and Sons, 1955.

Cane, V. R. Behavioral sequences as semi-Markov chains. Royal Statistical Society Journal, Series B, 1959, 21, 36 - 58.

Cain, V. R. Some ways of describing behavior. In W. H. Thorpe and O. L. Zangwill, (Eds.) Current problems in animal behavior. London: Cambridge University Press, 1961.

Chomsky, N. Three models for the description of language. IRE Transactions on information theory, vol. IT-2, #3: Proceedings of the 1956 symposium on information theory, September 1956, pp. 113 - 124.

Cohen, B. P. A probability model for conformity. Sociometry, 1958, 21, 69 - 81.

Davenport, W. Jamaican fishing: A game theory analysis. Yale Univ. Pubs. in Anthro., 1960, 59, 3 - 11.

Denbigh, K. The principles of chemical equilibrium. Cambridge: Cambridge University Press, 1955.

Embler, W. Metaphor and meaning. Deland, Florida: Everett/Edwards, Inc., 1966.

Embler, W. The metaphor of the underground. ETC: A Review of General Semantics, 1968, 25, 392 - 406.


Feller, W. An introduction to probability theory and its applications, Vol. I. New York: John Wiley & Sons, 1968 (updated).

Feller, W. An introduction to probability theory and its applications, Vol. II. New York: John Wiley & Sons, 1966.

Gardner, M. Mathematical games: The rambling random walk and its gambling equivalent. Scientific American, 1969, 220(5), 118 - 124.

Goodman, L. A. On the estimation of the number of classes in a population. Annals of Mathematical Statistics, 1949, 20, 572 - 579.

Hess, E. H. Ethology. In Brown, et al, New directions in psychology. New York: Holt, Rinehart and Winston, 1962.

Hersh R. and Griego, R. J. Brownian motion and potential theory. Scientific American, 1969, 220(3), 66 - 74.

Hinde, R. A. The nest-building behavior of domestic canaries. Proc. Zool. Soc. London, 1958, 131, 1 - 48.

Hockett, C. F. Review of The Mathematical Theory of Communication by Claude L. Shannon and Warren Weaver. Language, 1953, 29, 69 - 93. Republished in S. Saporta (Ed.) Psycholinguistics: A book of readings. New York: Holt, Rinehart and Winston, 1961, 44 - 67.

Johnson R. E. and Kiokemeister, F. L. Calculus with analytic geometry (3rd ed.). Boston: Allyn and Baker, 1964.

Karlin, S. A first course in stochastic processes. New York: Academic Press, 1966.

Kemeny, J. G. and Snell, J. L. Finite Markov chains. Princeton, N. J.: Van Nostrand, 1960.

Kinsolving, M. R. Set theory and the number systems. Scranton: International Textbook Co., 1967.

Klopfer, P. H. and Hailman, J. P. An introduction to animal behavior. Englewood Cliffs, N. J.: Prentice-Hall, 1967.

Korzybski, A. The manhood of humanity. Garden City, N. Y.: Country Life Press, 1950.

Kramer, E. E. The main stream of mathematics. Greenwich, Conn.: Fawcett Publications, 1951.

Krech, D., Crutchfield, R. S. and Ballachey, E. L. Individual in society. New York: McGraw-Hill, 1962.

Landau, H. G. On dominance relations and the structure of animal societies: II. Some effects of possible social factors. In Luce, et al (Eds.), Readings in mathematical psychology, vol. II. New York: John Wiley and Sons, 1965.


Lashley, K. S. The problem of serial order in behavior. In L. A. Jeffress (Ed.) Cerebral Mechanisms of Behavior. New York: John Wiley and Sons, 1951. Republished in S. Saporta (Ed.) Psycholinguistics: A book of readings. New York: Holt, Rinehart and Winston, 1961, 180 - 198.

Lotz, J. Linguistics: Symbols make man. In Frontiers of Knowledge, L. White, Jr. (Ed.). New York: Harper and Brothers, 1956. Republished in S. Saporta (Ed.) Psycholinguistics: A book of readings. New York: Holt, Rinehart and Winston, 1961, 1 - 15.

Luce, R. D, Bush, R. R., and Galanter, E. (Eds.). Handbook of mathematical psychology, Vol. II. New York: John Wiley & Sons, 1963.

Luce, R. D, Bush, R. R., and Galanter, E. (Eds.). Handbook of mathematical psychology, Vol. III. New York: John Wiley & Sons, 1965b.

Luce, R. D, Bush, R. R., and Galanter, E. (Eds.). Readings in mathematical psychology, vol. II. New York: John Wiley & Sons, 1965.

Mason, S. F. A history of the sciences. New York: Collier Books, 1962.

Mead, M. Male and female. New York: William Morrow, 1949.

Miller, G. A. (Ed.) Mathematics and psychology. New York: John Wiley & Sons, 1964.

Nelson, K. The evolution of a pattern of sound production associated with courtship in the characid fish Glandulocauda inequalis. Evolution, 1964a, 18, 526 - 540.

Nelson, K. The temporal patterning of courtship behavior in the glandulocaudine fishes (ostariophysi, characidae). Evolution, 1964b, 24, 90 - 146.

Poincare, H. Science and method. Tr. Francis Maitland. London: Nelson, 1914. (Republished: New York: Dover, 1952).

Resnick R. and Halliday, D. Physics, part I. New York: John Wiley & Sons, 1966.

Rudin, W. Principles of mathematical analysis. New York: McGraw-Hill, 1964.

Schrader, D. A. Individual choice behavior and Markov renewal processes (Doctoral dissertation, Case Institute of Technology). Ann Arbor, Mich.: University Microfilms, 1966 (No. 66-14, 317).

Skinner, B. F. Verbal behavior. New York: Appleton-Century-Crofts, 1957. Introductory pp. 1 - 12 republished as "A functional analysis of verbal behavior" in S. Saporta (Ed.) Psycholinguistics: A book of readings. New York: Holt, Rinehart and Winston, 1961, 67 - 74.

Skinner, B. F. Cumulative record. New York: Appleton-Century-Crofts, 1959.

Skinner, B. F. Behaviorism at fifty. Science, 1963, 140, 951 - 958.

Suppes, P. and Atkinson, R. C. Markov learning models for multipersonal interactions. Stanford, Calif.: Stanford University Press, 1960.


Thorpe, W. H. and Zangwill, O. L. (Eds.) Current problems in animal behavior. London: Cambridge University Press, 1961.

Trager, G. L. Language. Encyclopedia Britannica, 1957, vol. 13, 696 - 703.

van Iersel, J. J. A. Parental behavior of the male three-spined stickleback. Behavior, 1953, Supplement 3.

Markov Models and Metaphors

Documents

Transcript of Markov Models and Metaphors