Dissertation

Queues, Multiple values, and Information Cascades

A literature review and experimental design for investigating the impact of waiting costs and multiple values on Information cascades.

Student: Jethro ElsdenStudent number: 4248665

Supervisor: Professor Sonderegger

This Dissertation is presented in part fulfillment of the requirement for the completion of an MSc in the School of Economics, University of Nottingham. The work is the sole

responsibility of the candidate.

Table of Contents

1. Introduction 22. Literature Review 23. Experiment 183.1. Outline 183.2. Basic Treatment 193.3. Waiting Treatment 203.4. Multi-value Treatment 233.5. Experiment Summary 233.6. Technical Analysis 243.6.1. Analysis of Basic Treatment 253.6.2. Analysis of Waiting Treatment 273.6.3. Analysis of Multi-value Treatment 293.7. Running the Experiment 313.8. Extensions and Improvements 34

4. Conclusion 355. Appendix 366. Bibliography 43

1

1. Introduction

Information cascades are widespread phenomenon that can have large economic

consequences and are important in explaining the large amounts of conformity present in

human societies (Bikhchandani, et al., 1992). There is a large literature existing on the topic

of information cascades, and the subject is examined both theoretically and experimentally.

However there are several interesting aspects of the topic that are relatively unexplored and

the experimental literature have focussed on only a few of the issues concerning the topic.

This study first reviews the key theoretical and experimental literature on the topic of

information cascades. An innovative experiment is then developed which focuses on

exploring the impact on cascade formation and information aggregation of queueing (and the

associated waiting costs) and a larger number of options than have previously been offered to

subjects in experiments concerning information cascades.

2. Literature Review

Theoretical Literature

The seminal theoretical paper on information cascades is Bikhchandani, et al (1992); most

theoretical and experimental papers on the topic have been informed and influenced by this

paper. Of particular importance has been the underlying mechanism contained within the

paper. The authors seek to explain conformity and imitation using information cascades,

defined as an agent ignoring their own information and instead basing their decision on the

actions of predecessors. This contrasts with the traditional explanations for conformity:

deviation sanctions, positive payoff externalities, conformity preferences, and

communication. These four factors imply robust conformity which will strengthen the longer

it lasts. But this implication is empirically suspect, conformity is often fragile not robust;

small shocks can lead to large shifts in behaviour.

2

The underlying mechanism:

Imagine a scenario where individuals choose to adopt or reject an action having first observed

the decisions of all prior agents and use Bayesian learning. If the cumulative weight of prior

decisions outweighs the individual’s private information, then it is optimal for them to imitate

these previous decisions and ignore their private information. This individual’s decision is

now no more informative than the prior decisions upon which it was based; because the

decision doesn’t transmit any of the individual’s private information it is useless to

subsequent agents. These subsequent agents will have the same information as the current

individual except for their private signals, and so they may make the same decision and also

imitate. Hence these subsequent decisions will also transmit no additional information, this is

an information cascade.

The above illustrates the primary issue with cascades: It is rational for individuals to imitate,

but imitation doesn’t transmit any of their private information. This can lead to social

inefficiency, because every agent that imitates reduces the volume of private information that

is publicly aggregated. This may result in poorer decision making by subsequent agents and

this will produce a social loss.

This issue is compounded by the fact that cascades can begin on the basis of only a small

amount of information: All that is required for a cascade to begin is for the accumulated

decisions of prior agents to outweigh the present agent’s private information; this might arise

after only a few agents. For instance if the first two agents both adopt, even if the third agent

has private information indicating that they should reject the action, the fact that the first two

agents adopted may be enough to cause the third agent to ignore their private information and

also adopt. If all agents receive signals of similar strength and precision, then all agents

following the third agent are likely to make the same decision and imitate. Therefore an

information cascade has been started based on only the private information of the first two

agents. This is likely to further worsen decisions and increase the social loss; it also means

that cascades are likely to be fragile, with cascades occurring close to the borderline between

adoption and rejection. A small amount of new contrary information (such as a public

information announcement) could break or reverse the cascade.

3

The history of medicine provides many examples of unproven medical fads, and encapsulates

the main issues surrounding information cascades. For many years tonsillectomy was, despite

no conclusive evidence, routinely performed unnecessarily on healthy children, often to their

detriment (Robin, 1984). The prime motivation for surgeons performing these operations

seems to have been the fact that other surgeons were performing them, resulting in naïve

imitation and cascade behaviour. This point is supported by data on the prevalence of

tonsillectomy across England, which shows wide variation between regions, suggesting the

presence of local cascades (Taylor, 1979). However it should be emphasised that imitation

and conformity are not universally disadvantageous; they can be either harmful or beneficial

depending on the context. For instance, inexperienced agents may find it beneficial to copy

the actions of more experienced agents, but these experienced individuals may still make

errors1 and if they do this could result in a large social loss.

It is important to note as Bikhchandani, et al (1992) do that it is entirely rational for

individuals to take account of other agents’ actions when making their own decision, even if

they place no value on conformity for its own sake. Imitation is widespread throughout nature

and society, imitation among animals is common when selecting territory or foraging or

mating. The main contribution of this paper is modelling imitation as information cascades:

The actions of other agents, provided they aren’t imitating, are based on the private

information of those agents, the current agent can infer this private information from the

actions. Therefore, considering the actions of other agents allows the current agent to

accumulate more information than just their own private information. Having more

information should produce improved decisions. Furthermore individuals who act in this

manner don’t need to have a preference for conformity, not caring whether they conform or

deviate from the majority decision. Such individuals decide to imitate not because they want

to conform, but having observed the actions of other agents they decide to copy these actions

and ignore their own information because they judge that this will maximise their payoff.

1 Here an individual makes an error if they fail to choose the option that would maximise their utility. In this model factors such as conformity preferences etc. are excluded, so utility is purely the return an agent receives from their action. Therefore an individual would be committing an error if they failed to choose the action that would maximise their return given the information they held at that point. In reality individuals may derive utility from conforming or even from deviating, if this is the case then an individual may maximise their utility if they choose an action that doesn’t maximise their return, this would not be an error since they are maximising their utility even though they aren’t maximising their return.

4

Of key importance is the form of information transmission: here agents only observe the

actions of other agents. They don’t observe the other agents private information, they have to

infer this from those agents’ actions. A different setup where agents do observe the private

information of other agents will yield significantly different results: even if a cascade starts

private information will continue to be publicly aggregated, so better decisions are likely to

be made with lower social loss.

In real life it is an unrealistic assumption that agents won’t observe other agents private

information. Agents are likely (at least at the localised level such as family and friends), to

tell each other some of their private information. Therefore agents will probably have more

information than just their private information. But they will nevertheless have only a small

fraction of all agents’ private information, so the assumption is still reasonable. Furthermore

it is assumed that “actions speak louder than words”, i.e. actions are more credible than talk.

So agents infer more information from actions than from words.

In an attempt to ensure greater comprehension Bikhchandani et al (1992) first present a

restricted specific model, which they later extend into a general model. Their model consists

in essence of a sequence of agents deciding whether to adopt or reject a certain action. This

action has a cost and value, denoted C and V respectively, which are uniform across agents.

In the specific model, V is restricted to 0 or 1, whereas in the general model, V is drawn from

a finite set of possible values. In both models, C is set constant at ½. An agent should only

adopt if the value of the action exceeds the cost of the action, i.e. V > C. However agents

don’t know what V is, instead they each receive a private signal indicating the level of V (i.e.

what the state of the world is). In the specific model, the signal is either H, implying V = 1, or

it is L, implying V = 0. In the general model the signal is drawn from a finite set

corresponding to the set of V. However these signals are uncertain, they indicate what the

state of the world is with some probability, denoted p. Each agent observes the decisions of

any prior agents and their own private signal and then decides whether to adopt or reject the

action. In cases of indifference there is a tie breaking convention, which in the specific model

is for agents to randomly mix between adoption and rejection, in the general model any

indifferent agents adopt.

5

The decision rule agents use will vary depending on their position in the sequence, because

the information agents have varies according to their position. The first agent has no

predecessors to observe, so will decide purely on the basis of their private signal. The second

agent observes this choice and infers the first agents signal. If the first and second agents

signals match then they make the same decision, if the signals differ then the second agent

computes the expected value of adoption, which equals ½, so the second agent randomly

mixes between adoption and rejection.

The third agent observes the actions of the first two agents, if both prior agents acted

identically then the third agent imitates, ignoring their own private signal and thus a cascade

begins.2 However if the prior agents choose different actions, then the third agent faces the

same problem as the first agent, so their private signal determines their choice. In this

scenario the fourth agent faces the same problem as the second agent and the fifth agent faces

the same problem as the third agent and so on.

The decision rules outlined above are used by Bikhchandani et al to compute the probability

of an ‘up’ cascade, ‘down’ cascade, and no cascade occurring after an even number of agents,

which can be seen in the set of equations (1). They also compute the probability of a correct

cascade (here an ‘up’ cascade), an incorrect cascade, and no cascade occurring after an even

number of agents, which can be seen in the set of equations (3).

1− (𝑝+ 𝑝2)𝑛/22 , 1− (𝑝+ 𝑝2)𝑛/22 , (𝑝− 𝑝2)𝑛/2 , (1)

𝑝(𝑝+ 1)[1− (𝑝− 𝑝2)𝑛/2]2(1− 𝑝+ 𝑝2) , (𝑝− 2)(𝑝− 1)[1− (𝑝− 𝑝2)𝑛/2]2(1− 𝑝+ 𝑝2) , (𝑝− 𝑝2)𝑛/2 , (3)

The authors use these equations to show that as the number of agents (n) and the precision (p)

of the information agents receive increases, cascades become more likely and are more likely

to occur earlier3. Furthermore as the volume of information agents receive increases the

likelihood of the correct cascade occurring increases.

2 Cascades in Bikhchandani et al (1992) are split into ‘up’ where agents adopt, and ‘down’ where agents reject.3 As n increases, the probability of no cascade declines exponentially.

6

The most significant result from the general model is that after applying only a few mild

assumptions it can be shown that a cascade will eventually start: Provided an agent is far

enough along the sequence, then by the strong law of large numbers, they will be able to infer

the true value of V from prior actions with near certainty, thus the agent will ignore their

private signal and start a cascade. But as the authors emphasise cascades will tend to begin

before the agent can infer V with near certainty, meaning they will often be incorrect4 (see

also Banerjee (1992) and Welch (Welch, 1992)).

Bikhchandani et al (1992) emphasise the fragile nature of cascades, and this is the key

distinction of conformity arising from information cascades compared to the traditional

explanations of conformity (deviation sanctions etc). This fragility is down to the lack of

information aggregation within a cascade: Agents in a cascade imitate their predecessors, so

none of their own information is transmitted through their action; consequently subsequent

agents make identical inferences from the decision history and will also imitate. Therefore

decisions in a cascade are based on less information than if no prior agent had imitated.

Additionally, cascades can begin on the basis of a small amount of information, and the

authors suggest this will be true for many cascades, in their model a cascade starts if the first

two agents receive the same signal. But if cascades begin on the basis of only a small amount

of information, the opposite is also true, a small amount of new contradictory information can

cause the cascade to breakdown or even reverse. However this fragility helps partially to

alleviate the lack of information aggregation, since it means that multiple cascades are more

likely, which will increase the amount of information aggregation. Additionally fragile

cascades will mean that incorrect cascades will be more likely to be reversed.

Bikhchandani et al (1992) suggest that the fragility of cascades and the resulting cascade

reversals can explain fads, sudden often inexplicable changes in social behaviour. If agents

initially converge on an incorrect cascade, this is obviously non-optimal, but because of the

fragility of cascades any new information suggesting the current cascade is incorrect can lead

to the rapid breakdown and reversal of the cascade. This process can occur multiple times

with agents potentially alternating between adoption and rejection cascades. Additionally, if

there is some possibility (even if minute), that the underlying value will change, this can also

4 Even for very informative signals, high p.

7

cause fads, since agents will try to anticipate when the underlying value will change.

Therefore behaviour may change more often than the underlying value.

What form could this new information take? It is unlikely to be the private information agents

receive, since this can only affect one agent at a time, a more likely candidate would be

public information releases such as government advice or the widely publicised findings of a

scientific study. A key conclusion of Bikhchandani et al (1992) is that public information

releases may not always be socially beneficial. If before a cascade, ‘noisy’ public information

is released, i.e. an inconclusive study whose findings may subsequently be overturned, then

this can result in harmful transient fads. The authors illustrate this point using real life

examples, for instance oat bran: an initial study suggested that consuming oat bran could

lower cholesterol levels; this led to a fad, however after a later study contradicted this finding

the fad died. A further study later found that the original findings may have been somewhat

correct, suggesting the original cascade may not have been incorrect. Although agents were

not directly harmed, this example does illustrate how ‘noisy’ public information releases can

result in non-beneficial behaviour change, which may be costly in itself.

Alternatively public information releases will definitely be socially beneficial if they are

conclusive, since they then won’t trigger a harmful fad. Furthermore, once in a cascade all

public information releases are unambiguously beneficial, because they are a source of new

information. The release of even a small amount of public information (e.g. less than a single

agent’s signal) can be sufficient to shatter a cascade. Here the authors underline the fragility

of cascades, by pointing out that even a long lasting cascade with a large number of agents,

can be shattered, provided that the public information release can offset the information

transmitted by the last agent prior to the start of the cascade.

Significantly the authors also find that even if public information releases are infrequent,

provided there is a positive probability that one occurs, then the correct cascade will

eventually occur: By the law of large numbers, as public information releases accumulate the

correct choice becomes clearer and the correct cascade becomes more likely. The authors

support this point using a numerical example, which shows that even if public information

releases occur at only 10 in every 1000 agents, the correct cascade is substantially more likely

to occur.

8

The authors examine the predictions of their model in a number of contexts. For instance they

examine the effect of agents with heterogeneous precision levels. They find that if high

precision agents decide first, then a cascade will arise sooner and even less information will

be aggregated. This is because low precision agents are liable to imitate their high precision

peers, believing this will result in a better choice (further supported by findings from

experimental psychology (Deutsch & Gerard, 1955)). The authors interpret this finding as

implying that sudden, inexplicable changes in social behaviour may be due to cascades

arising from the actions of so called “early adopters” or community leaders, rather than grand

causal forces. If the wider community perceives such individuals as better decision makers,

they will likely imitate their actions, resulting in a cascade.

Furthermore the authors posit that socially, it may be optimal for less precise agents to be

near the start of the sequence and more precise agents to be near the end. High precision

agents trust their signal more, so are more likely to deviate and break a cascade. This is

important since if the initial cascade is wrong and all higher precision agents are at the start

of the sequence, then the incorrect cascade won’t reverse. In other words, having high

precision agents towards the end of the sequence is socially beneficial since it increases the

chances of multiple cascades, meaning greater information aggregation and thus better

choices. However the authors concede that such a sequence is unlikely to arise naturally

given the propensity for imitation by low precision agents.

The authors challenge a common explanation for conformity: peer pressure, which can be

interpreted as a coercive non-monetary sanction against deviants. Instead they argue that

rather than peer pressure, inexperienced agents imitate their more experienced peers in order

to obtain better information. This is how Deutsch & Gerard (1955) interpret the results of

Asch’s (1952) line-length group evaluation experiment.

The authors also examine the issue of stigma, negative typecasting of agents who deviate

from social norms. Drawing on Ainlay, et al (1986) stigma is a localized phenomenon, which

agents learn from other group members, especially parents. The authors argue that stigma

could result from a ‘down’ cascade: Initial bad information leads to rejection of a job

9

applicant, this rejection transmits a signal leading to further rejection and so on resulting in a

cascade and the stigmatisation of the job applicant. The authors point out that the reverse may

be true: Good early performance could earn an agent a reputation as a star performer, other

firms would then be more likely to hire the agent simply based on their reputation, thus an

‘up’ cascade could occur. However as the earlier analysis of the fragility of cascades

demonstrated, such a reputation is likely to be brittle, a few poor performances could soon

break the ‘up’ cascade.

A criticism that the authors admit is that their model doesn’t effectively address fashions, i.e.

where there is no true underlying value, instead the value of adoption depends on how other

agents act. However, as the authors point out: in such situations, where agents are attempting

to forecast the actions of other agents, information cascades will still play a prominent role. It

is a weakness of the paper that the authors do not propose how their model could be modified

to account for such scenarios.

Another criticism of the paper would be that the authors do not address the issue of learning.

Potentially this could have a big impact; since cascades are widespread they will affect a

diverse range of contexts, so there would be plenty of scope for learning. The paper also does

not examine situations in which agents face more than just two options. Such situations are

widespread, for instance there is a large range of models to choose from when buying a car.

The paper also doesn’t address the importance of agent error or communication, both of

which could have a substantial impact.

The authors conclude their paper by summarizing the main findings and implications; they

also suggest how the paper could be extended. For practical applications they advise

including other factors alongside information cascades, such as deviation sanctions or

conformity preference, so as to provide a richer analysis. An extension the authors suggest is

heterogeneous but correlated values of adoption, i.e. V is no longer uniform across agents. An

alternative avenue of investigation would be ‘liaison’ agents, who straddle multiple groups

and can therefore spread or break local cascades across groups. They also advocate the use of

experimental tests to investigate how cascades form and change in order to deepen our

understanding of social change.

10

There are several other theoretical papers on information cascades that make contributions

that are worth considering. Bikhchandani, et al (1998) return to the topic of information

cascades in a later paper in which they cover much of the same ground and use an identical

model as in their seminal paper (Bikhchandani, et al., 1992). They reach broadly similar

conclusions, however they do arrive at some new findings. They discuss the case of more

than just two alternatives, concluding that as the number of alternatives increases, cascades

will still arise but will take longer to start and more information will be aggregated.

Furthermore they conclude that a degree of finiteness is key to cascade formation, if the set of

alternatives is continuous (e.g. all points in the interval [1,10]) then a cascade cannot arise

because all agents, even those late in the sequence, will adjust their choice based on their

private signal even if only slightly. This means that subsequent agents can infer the private

information of prior agents and this prevents a cascade from arising.

Chamley (2004) devote an entire chapter of “Rational Herds: Economic Models of Social

Learning” to the topic of information cascades. In particular the author discusses the

difference between a cascade and a herd, which can be summarised as: A cascade occurs

when all agents herd on the basis of a sufficiently strong public belief and private information

is disregarded, there is no social learning and the cascade lasts forever. Whereas in a herd all

agents take the same decision but not all agents may be herding, some may make their

decision based on private information. Because some agents may not be herding and there is

some probability that the herd may be broken, some social learning is possible although the

amount is likely to be low. In other words, agents who deviate from the herd imply that they

have received a strong private signal, subsequent agents infer this and act accordingly. The

author also raises an interesting point, which is that queues may increase social welfare. This

is because queues carry a cost this may induce agents to deviate from the cascade and

increase experimentation with alternative options. This experimentation may lead to useful

discoveries and increased information aggregation which will increase social welfare.

11

Banerjee (1992) also developed a theoretical model of information cascades and herding.

This model differs somewhat from that of Bikhchandani, et al (1992). For example in the

general model of Bikhchandani, et al (1992) indifferent agents follow their own information,

whereas in Banerjee (1992) an indifferent agent ignore their own information and copies their

predecessor. Banerjee justifies this tie-breaking convention by arguing that if a second agent

receives a signal different to that which the first agent received then precisely because the

first agent has already made their decision the second agent will disregard their own

information and imitate.

Banerjee (1992) reaches similar conclusions to Bikhchandani (1992), they find that in

equilibrium in their model herding will be extensive. Further as the probability of an agent

receiving a signal and the precision of the signal increases, then the likelihood of the correct

cascade occurring increases. Herding makes agents less responsive to their own information,

this makes the agents’ decisions less informative to subsequent agents, which can result in

inefficient outcomes. So as to prevent this it may be in the social interest that some agents

are constrained so that they have to use their own information.

However Banerjee (1992) also raises some interesting points. For instance the author

suggests that often there is a payoff to being an early adopter of the correct option, this

creates an incentive to deviate and go against the majority, which acts against herding.

Banerjee (1992) also suggests that waiting costs may impact the result, since they may alter

the order of choice: If waiting costs are low enough, the order could rearrange so that agents

that are of higher precision or are better informed choose first, and lower precision or worse

informed agents choose later, leading to an efficient outcome. However, even if waiting costs

are low some low precision agents may choose before their high precision peers, i.e. the order

of choice doesn’t fully rearrange, this would result in an inefficient outcome.

However Banerjee (1992) has been criticised because the idiosyncrasies of the structure of

the model means the robustness of its properties can’t be analysed (Chamley, 2004). Further

Banerjee (1992) doesn’t emphasise the fragility of cascades to the extent that Bikhchandani,

et al (1992) does thus it may be less useful in explaining the sudden changes that are

characteristic of fads.

12

Experimental Literature

Since the publication of Bikhchandani, et al (1992) there have been a number of papers

published involving experimental tests of information cascades. The most influential

experiment is that developed by Anderson and Holt (1996), the following is a summary. In

the experiment there are two urns, denoted A and B. In urn A there are 2 red marble and 1

blue marble, in urn B there are 2 blue marbles and 1 red marble. Prior to the start of the

experiment a coin flip is used to select one of the urns, the contents of which are then

transferred into a third urn. Subjects, having been randomly sorted into a sequence, observe a

private draw from the third urn and then guess which urn has been selected; these guesses are

recorded and made public. This experimental design has formed the basis for most

subsequent experimental investigations of information cascades. Several other papers have

been authored by Anderson & Holt (1997; 2000; 2008) in which they build upon the design,

adding modifications such as incentivising subjects (Anderson & Holt, 1997).

Çelen & Kariv (2004) adapt the experiment by using a continuous rather than a discrete

signal, i.e. signal is drawn from the range [-10, 10]. They also elicit subject beliefs by asking

subjects to choose a cut-off point for their signal which will determine their action. These

alterations are made so as to be able to distinguish between information cascades and

herding. This is difficult because a subject in a herd or cascade act identically and it is

therefore necessary to know the beliefs of subjects. In two further papers (Çelen & Kariv,

2003; 2005) the same authors make further modifications: Investigating the different

outcomes arising under perfect versus imperfect information, the authors simulate imperfect

information by limiting subjects to observing only their immediate predecessor.

Hung & Plott (2001) modify the payoff institutions of the standard design: they include the

usual individual payoff institution where the payoff of subjects is determined solely by that

subject’s actions. Additionally a majority payoff institution is included, under which payoffs

of subject depend on whether the majority decision was correct. Further they include a

conformity payoff institution, subjects payoffs depend on whether their decision is the same

as the majority.

Other studies that also adapt the Anderson and Holt experimental setup include Ziegelmeyer,

et al (2010), Nöth & Weber (2003), and Kübler & Weizsäcker (2004). Despite its widespread

use in experiments investigating information cascades and herding, there are alternatives to

13

the Anderson and Holt urn experiment. For instance Camerer & Weigelt (1991) conduct an

experiment into very transient cascades in the financial markets. Their experimental setup

involved subjects taking part in a laboratory double-oral auction having been given an

endowment of assets. A certain number of subjects were designated as ‘insiders’ and given

information about the returns on assets that other subjects were ignorant of.

Camerer & Weigelt also provide an important justification for using experiments to

investigate information cascades, despite being widespread it would be difficult to study

cascades using real world data since many cascades are very short term. Furthermore

laboratory experiments allow the experimenters to control the flow of information, which is

key for distinguishing between a herd and a cascade.

Huck & Oechssler (2000) develop an interesting experimental design to test the link between

rational Bayesian learning and information cascades. They design a series of decision tasks

which they include as part of an undergraduate economics exam. To test whether those

subjects whose decisions conform to Bayesian learning are intentionally doing so, subjects

are asked to explain the reasoning process behind their decision.

In terms of the results of experiments concerning information cascades, there is strong

support for many of the theoretical conclusions of Bikhchandani, et al (1992) provided by

Anderson & Holt (1996; 1997; 2000; 2008). Their results are that cascades are commonplace,

in 80% of the occasions where it is possible for cascades to form they do. They also find that

cascades are fragile and easily broken by errors. These results have been replicated in further

studies for instance Hung & Plott (2001).

However they also find that errors, when incorporated into the model as a possibility (which

is not the case in Bikhchandani, et al (1992)) help to explain the experimental results. Further

when subjects are incentivised (Anderson & Holt, 1997; 2008) this substantially reduces the

number of errors subjects make; it is key that there is an incentive, but the size of the

incentive doesn’t seem to matter.

Anderson & Holt also raise questions about the use of Bayesian learning in explaining subject

behaviour, previous controlled experiments have found subjects deviate from Bayesian

14

learning and employ other methods such as simple counting heuristics. Furthermore much of

the evidence used to support the standard theoretical explanation is anecdotal, and hence open

to scrutiny (Anderson & Holt, 1997). However they conclude from their results that Bayesian

learning is widely employed by subjects, although it is not universal; additionally, deviations

from Bayesian learning are mainly due to error.

Çelen & Kariv (2004) find that cascades occur frequently and that incorrect cascades are very

rare (Hey & Allsopp (1999) make the same finding). They also find that subjects tend to

overvalue their own information over public information but over time this weakens and

subjects move closer to Bayesian updating. Building upon Anderson & Holt (1997; 2008)

they incorporate errors into a Bayesian learning model, the resulting model successfully

predicts subject behaviour in the experiment. This suggests that the theoretical assumption

that subjects use Bayesian learning is generally correct and that subject error can largely

explain deviations.

In later papers (Çelen & Kariv, 2003; 2005) the same authors examine a different aspect of

information cascades. They compare perfect information, under which subjects observe all

prior decisions, with imperfect information, under which subjects observe only a fraction of

all prior decisions. The theoretical discussion of information cascades (including

Bikhchandani, et al (1992)) assumes perfect information, but in reality imperfect information

is more likely. Their results show that under imperfect information, imitation is substantially

reduced; hence the frequency of cascades is reduced. The authors conclude that Bayesian

learning performs well in explaining subject behaviour under perfect information; however

when there is imperfect information its performance is reduced. The authors also find that

imperfect information can help to improve modelling of cascades; in particular imperfect

information explains the episodic instability characterising fads and fashions. Under

imperfect information agents will face uncertainty about whether the cascade is correct; this

will result in sharp reversals and because subjects will continue to operate under imperfect

information these reversals may occur many times.

The study of Camerer & Weigelt (1991) looks at very short term cascades in the financial

markets called ‘mirages’, i.e. trades based on the belief that another trade was motivated by

15

private information even if it wasn’t. Their results suggest that such cascades occur

seemingly at random, without any tell-tale signs. The authors conclude that ‘mirages’ may be

caused by behavioural factors such as the representativeness bias: Subjects overgeneralise

from their past experiences of mirages. Alternatively ‘mirages’ may arise if Bayesian

learning is compromised, which the presence of noise traders would accomplish. This

supports the argument of Anderson & Holt (1997; 2008) and Çelen & Kariv (2004) that

deviations from Bayesian learning are mainly due to error.

Kübler & Weizsäcker (2004) conduct an experiment in which subjects may purchase signals.

They find that early subjects over purchase signals but that later subjects don’t. The

explanation for this is that subjects believe that they are less likely to make errors than are

other subjects, so they place more faith in their own information over public information (this

overconfidence of subjects is supported by the findings of Nöth & Weber (2003)). Early

subjects therefore lack confidence in the decisions of predecessors and so purchase a signal,

but eventually enough subjects will have made their decision that subjects will have enough

confidence and so won’t buy a signal. This suggests that contrary to the theoretical

assumptions of Bikhchandani, et al (1992) the fragility of a cascade will vary, as it lengthens

subjects grow more confident that it is correct and it therefore becomes less fragile, this

argument is further supported by the findings of Ziegelmeyer, et al (2010).

The experimental evidence of subject overconfidence ( (Çelen & Kariv, 2004), (Kübler &

Weizsäcker, 2004), (Nöth & Weber, 2003)) suggests the tie-breaking convention of

Bikhchandani, et al (1992) that indifferent agents follow their own information5, is more

realistic than the tie breaking convention of Banerjee (1992) where indifferent agents ignore

their own information and imitate their predecessor. The evidence suggests that subjects put

greater weight on their own private information than public information, which suggests that

if indifferent they will rely on their own information.

Huck & Oechssler (2000) conclude from their study that a simple heuristic such as following

own signal, is better able to explain experimental data than Bayesian learning (This is also a

finding of Hey & Allsopp (1999)). In the study about 50% of subjects made decisions

consistent with Bayesian rationality, however when asked to explain their decision very few

could successfully explain Bayes rule. The authors re-examine Anderson & Holt (1996) and

find that a simple heuristic would work just as well as Bayes rule much of the time. This

5 This is the tie breaking convention for the general model.

16

suggests that although it may appear that Bayesian rationality is being employed by subjects,

in fact it is not and simpler heuristics are being used. However a counter point to this would

be that although subjects aren’t using Bayesian learning it is as if they are (Friedman, 1966),

evidence for this is provided by the fact that the decisions of subjects are the same as if they

had applied Bayesian learning.

In Summary the key theoretical paper concerned with information cascades is Bikhchandani,

et al (1992). The authors show that while it is individually rational to imitate, this limits the

transmission of private information, which means that less information is publicly

aggregated. This reduces the quality of later decisions, leading to a reduction in social

welfare. Furthermore they emphasise that cascades are innately fragile precisely because only

a little information is aggregated. The experimental evidence supports many of the

conclusions of Bikhchandani, et al (1992), in particular Anderson & Holt (1996; 1997; 2000;

2008) who develop the standard experiment that has been used and modified in most later

experimental studies of the topic. However other experimental results raise questions about

the rationality assumptions, Bayesian learning doesn’t appear to be universally applied,

simpler heuristics are often used by subjects (Huck & Oechssler, 2000) and evidence of

overconfidence is also found (Kübler & Weizsäcker, 2004).

17

3. Experiment

3.1. Outline

We now present an experiment designed to test the following hypotheses:

The first hypothesis is developed to address a point made by Chamley (2004, p. 84), which is

that since queuing carries an opportunity cost it will increase experimentation by agents,

which should improve information discovery. Thus a queue could improve social efficiency.

The hypothesis is stated as:

i) H0: The imposition of a queue in which subjects face a cost of waiting, will result

in increased deviation from the cascade and more information aggregating.

Ha: The imposition of a queue in which subjects face a cost of waiting, will not

result in increased deviation from the cascade and more information aggregating.

The second hypothesis arises from a point raised in Bikhachandani, et al (1998), that if there

are more than just two options, cascades will take longer to form and more information will

be aggregated. The hypothesis is stated as:

ii) H0: If the number of options increases from two to three, then this will result in

cascades arising later and more information aggregating.

Ha: If the number of options increases from two to three, then this will not result

in cascades arising later and more information aggregating.

We first give an outline of the three treatments we run. We then present a section detailing

the technical analysis outlining our predictions for subject behaviour in each of our

treatments. A further section details how the experiment can be run and any practical issues

which may arise. Finally we explore how the experiment could be further extended.

18

In all treatments in this experiment we define a cascade as: a subject ignoring their own

information and instead imitating the choice of prior subjects. In this setup cascades are

infinitely long, since precision of subjects and signals doesn’t vary, thus the information and

inferences of the subject who started the cascade will be the same for all subsequent subjects,

who should therefore make an identical choice thus the cascade is propagated. The only way

a cascade can be broken in our setup is through subject error.

3.2. Basic Treatment

This treatment is based on the urn experiments of Anderson and Holt (1996; 1997; 2008),

however those experiments are here modified so that other treatments may be run. Anderson

and Holt had subjects guess which urn they were observing a draw from, whereas in this

experiment subjects choose the urn which they wish to receive a draw from and following

their draw they guess the ‘true value’ of that urn. Subjects know the probability that a draw

from each urn is the true value, in this experiment the probability is constant at 80% for both

urns. We run this treatment to act as a control for the other treatments in our experiment.

This treatment can be summarised by the following: There are 2 urns containing 10 balls of

two types (red and blue). In each of these urns one of the ball types is the ‘true value’ and

yields an increased payoff. Each urn contains a certain number of each ball type; the

proportion that are the true value ball type represents the probability of the urn, i.e. the

likelihood that a draw from the urn will be the true value. In all the treatments of this

experiment 4/5 or 80% of the balls in an urn are the true value; hence the probability of the

urn is 80%. Subjects are informed of the probability of the urns prior to the start of the

experiment.

Subjects sequentially choose one of the urns from which they then observe a private draw,

based on this draw and the past decision of other subjects they then select the ball type that

they believe is the true value of the urn. The subject’s urn and true value choices are made

public and subsequent subjects can hence aggregate this information to make more accurate

judgements. Once all subjects have performed this process the round ends and subjects are

19

privately informed of their earnings from that round. Prior to the start of the next round new

true values are randomly chosen for the urns, the process then starts again, and this continues

until all rounds are complete.

3.3. Queueing Treatment

This treatment was developed to test the first hypothesis that if subjects face a queue with a

waiting cost, then deviation from the cascade will increase and more information will be

aggregated.

There are several motivations for researching this aspect of information cascades. Queueing

is a widespread real life phenomenon; the opportunity costs of queueing can be high. Many

people are willing to queue for hours or even days to gain access to a music festival or to be

able to purchase the latest technological innovation. Furthermore, many of these queues are

motivated at least in part by fads and fashions, thus analysing queueing from an information

cascade perspective may yield useful and interesting findings. However there has been little

of no investigation into queueing in either the theoretical or experimental literature.

A further motivation is that in the literature the main inefficiency of information cascades is

that while it is rational for individual agents to conform, such conformity at the aggregate

level results in social loss. However if as Chamley (2004, p. 84) suggests queueing acts to

reduce the social loss of cascades, then it represents a ready-made institution for alleviating

the costs of cascades. Given the high occurrence of queueing, it also suggests that the current

literature may have exaggerated the inefficiency arising from cascades. Thus investigating

queueing allows us to provide clarity about the social costs of cascades, and to what extent

these are alleviated by queueing.

The following analogy, similar to that used by Easley and Kleinberg (2010, pp. 425 - 430),

will help to further explore the queueing aspect of information cascades: Imagine a street

with two new restaurants, which we denote A and B respectively. Customers have no

information about the relative quality of the restaurants, but they infer that the busier

restaurant is better and so choose it. Assume the first customer chooses restaurant A, the next

customer observes this and infers that restaurant A is superior and so imitates, as does the

20

next and a cascade occurs with all customers choosing restaurant A over restaurant B. The

restaurant soon fills up and a queue forms outside the restaurant. However queueing carries

costs, first there is the opportunity cost of waiting and secondly if the queue is long enough

then customers at the back of it will not get served, thus for each customer in the queue there

is some probability that they will not get served. These costs of queueing can be interpreted

as a negative externality of the information cascade. They arise because it is rational for each

individual agent to conform and imitate the choices of prior agents; however this results in

too many agents making the same choice, leading to the queue. To summarise, the aggregate

conformity of agents forces individual agents to pay a cost in the form of a queue.

It is important to note that the externality doesn’t affect all agents, only those agents in the

queue, early agents get to the restaurant before a queue forms and thus avoid the externality.

Further, the effect of the externality on queueing agents varies depending on the agent’s

position in the queue: An agent toward the start of the queue will queue for only a short

period of time and the probability that they don’t get served will be very low, so the

externality will be small. Whereas an agent near the end of the queue will have to wait a long

time and face a high probability (if the queue is long) that they won’t get served, hence the

externality will be large.

If the costs of queueing are sufficiently high customers will view it as in their interest to leave

the queue (deviation from the cascade) and enter the empty restaurant B. Of course since

restaurant B is empty they have no indication of its relative quality. However it may be that

the first customer chose restaurant A at random or by error in which case the cascade is

incorrect. If this is the case and both restaurants are of equal quality then deviating customers

will be better off since they will avoid the cost of queueing and they will consume a meal of

equal quality, so they reduce their costs without lowering their benefits. In other words

deviating customers increase their individual efficiency, but how will this impact social

efficiency? The answer to this depends on how the information transmitted by deviating

customers is interpreted by other customers still in the queue, i.e. do they view it as true or

due to error. If the information is interpreted as arising from erroneous decisions then it will

be ignored and any social efficiency increase will be minor. But if it is interpreted as true then

it will be aggregated and further agents will deviate, thus social efficiency will increase

significantly.

21

Whether information arising from deviations is interpreted as true and thus aggregated may

depend on how many customers are willing to deviate and experiment. If only one customer

is willing to deviate then this is more likely to be interpreted as an error, but if several

customers are willing to experiment when faced by a queue, then other customers are more

likely to interpret this as an accurate reflection of quality and also deviate.

This raises a question: Why would some customers be more open to experimenting than

others? One answer to this would be that behavioural biases will have a considerable impact

when subjects decide whether to deviate, in particular the degree that an agent is risk averse

will be important. A highly risk averse agent won’t want to switch since this would mean

losing all the aggregated information about restaurant A and switching to restaurant B where

no information has been accumulated. Whereas a less risk averse or risk loving agent will be

much more willing to switch, they will be less concerned about the loss of aggregated

information and more attracted by avoiding the costs arising from queueing.

Risk aversion will reduce experimentation to a lesser extent as the queue lengthens and the

probability of missing a turn increases. The risk of missing a turn has increased, while the

risk of switching to the other restaurant has remained constant. Thus risk aversion will act in

two ways: first to discourage experimentation due to the risk of lost information from

switching to the other restaurant, secondly to encourage experimentation due to the risk of

losing a turn because of a long queue. If the queue is long enough then the later effect will

become more important and risk aversion will act to encourage experimentation.

Therefore differences in the degree of risk aversion between agents will cause some agents to

experiment by switching, while others will prefer to queue. Of course other biases may also

have an impact, for instance agents may have conformity preferences which will discourage

switching even if the queue is extremely long.

Here note that the information transmitted by deviations from the cascade isn’t simply

constrained to the act of the deviator leaving the queue and entering the other restaurant. In a

practical sense the information could also include how satisfied these deviators seem in the

restaurant and on leaving the restaurant, if they seem satisfied then other queueing customers

can infer that the other restaurant is of adequate quality and thus it is in their own interest to

also deviate.

22

3.4. Multi-value Treatment

This treatment was designed to test the second hypothesis that if the number of options

subjects can choose from increases from two to three, then cascades will arise later and more

information will be aggregated. It is the same as the basic treatment except for one

modification, which is that there are now three ball types (red, blue, and yellow) instead of

two (red and blue). It arises from a point raised in Bikhachandani, et al (1998), that if there

are more than just two options, cascades will take longer to form and more information will

be aggregated. This treatment is designed to test this assertion by adding a third ball type

(Yellow). Furthermore, the treatment also acts as a robustness test of prior experimental

results, i.e. will the additional option curtail imitation and prevent cascades from arising?

Many real life situations involve more than two options, for instance when buying a car there

is usually a large range to choose from. If experimental results can’t be replicated when the

range of options rises from two to three, this implies that they can’t be generalised and their

practical applications are limited.

3.5. Experiment Summary

- 2 Urns: Urn A, Urn B.

- Each urn contains 10 balls of either 2 or 3 types depending on treatment.

- Series of subjects (10 – 20).

- 3 treatments: Basic, Waiting, Multi-value.

- Series of rounds for each treatment.

- Series of rooms: one for subjects to wait in, another containing the urns, and a third for

subjects to wait until the experiment is over. The two waiting rooms should be separate so

that communication between subjects who have observed a draw and subjects who are

waiting to observe a draw is not possible.

23

Treatments:

- Basic, this is the control treatment, 2 ball types: Red and Blue, each type has a ½ probability

of being randomly chosen as the true value. Out of the 10 balls in the urn 8 will be the true

value and 2 will be the non-true value, meaning there is a 4/5 probability of the true value

being drawn.

- Waiting, same setup as in Basic, except that once a cascade develops later agents are forced

to queue.

- Multi-value, 3 ball types: Red, Blue, and Yellow, each type has a 1/3 probability of being

randomly chosen as the true value. Out of the 10 balls 8 will be the true value and 1 ball each

will be the two non-true values, so there is a 4/5 probability of the true value being drawn.

Thus if Red is the true value then P(R) = 4/5, P(B) = 1/10, P(Y) = 1/10.

Summary:

Subjects sequentially observe prior history, they then choose an urn from which they observe

a private draw, which they combine with prior history to choose a value which they think is

the urns true value. At the end of the round subjects learn their earnings from that round.

3.6. Technical Analysis

In the following analysis we follow the example of Bikhchandani, et al (1992) by using

Bayesian learning to model how subjects make their urn and value decisions. Subjects know

the prior probabilities of each value being the true value and the true value being drawn from

each urn, Bayes theorem is used to calculate the posterior probability of a value being the true

value given the observations the subject has observed. For a fuller exposition on the use of

Bayesian Learning in this study see appendix A at the end of this paper, the appendix also

includes the calculations of the posterior probabilities used in this study. Subjects also are

assumed to be rational utility maximisers. Furthermore in our technical analysis subjects are

assumed to derive no utility from conformity etc, they derive utility only from the return they

receive from their decisions in the experiment. So as in Bikhchandani, et al (1992) a rational

utility maximiser aims to solely maximise the return they receive.

24

3.6.1. Analysis of Basic Treatment:

Prior to the start of the round, red or blue is randomly and independently selected as the true

value of urns A and B, for convenience let us assume that the true values are red and blue

respectively.

There is no prior history of choices for the first agent to observe, so they randomly choose

between the urns6, and this choice will determine which urn subsequent agents choose, since

it will give the chosen urn an information advantage. For convenience let us assume that they

select urn A. The first agent determines their value choice by the signal they observe, i.e. if

they observe a red ball they will select red as the true value.

The second subject observes the first subject’s choices and infers their signal. We predict

they will also choose urn A, because the first subject choosing urn A gives it an information

advantage over urn B: If the second subject switches to urn B the probability that their signal

is the true value is 80%, however if the second subject stays with urn A and receives the same

signal, the probability is 94.12%, a significant difference. Of course if the second agent

receives a different signal from urn A then they would be better off switching, however the

choice of urn must be made before the subject receives their private signal. We judge that the

high degree of certainty the subject enjoys if they receive the same signal will outweigh the

lower degree of certainty if the signal differs, therefore we feel our assumption is justified7.

The second agent will determine their value choice according to the signal they receive. This

assumption is backed up by the experimental findings of Kübler & Weizäcker (2004) who

found that subjects generally believe other subjects are more error prone than they are, so

they place more weight on their own information than public information. Thus this implies

that if the second subject received a different signal from the first subject, they would follow

their own signal.

The third subject observes the prior choices and infers the prior signals. Again we predict that

they will also choose urn A because of its information advantage which is more pronounced 6 The only exception to this, would be if the probabilities of the urns differed, in which case the subject would select the urn with the higher probability, however here both urns have probabilities of 80%.7 It may be that risk aversion will affect which urn the agent chooses: subjects exhibiting higher risk aversion will be more likely to switch to urn B and vice versa. However other behavioural biases may also have an impact, for instance conformity preferences. Trying to incorporate these aspects into our analysis would unnecessarily complicate things, and given that the different biases may act against each other, we have left them out.

25

for the third subject: Unlike the second agent there are no circumstances in which switching

to urn B confers an advantage over urn A: for instance the sequence R,B,R, yields an 80%

probability of the true value being red which is the same as the probability if the subject

switched. While if the sequence were R,R,R, the probability would be 98.46%. The third

subject can begin a cascade if the first two choices are identical, in which case the subject

should imitate these prior choices and ignore their own signal. For instance Bayesian

inference implies the sequence R, R, B, has an 80% probability that red is the true value, thus

the subject should ignore their own blue signal and choose red, initiating a cascade. If

however the prior choices differ then the third subject will determine their choice based on

their private signal.

If the third agent initiates a cascade then all subsequent subjects should imitate and join the

cascade. Since only the first two signals are aggregated then all subsequent subjects are in the

same position as the third subject and will make the same inferences, resulting in the same

choice to imitate. Of course in an alternative setup this may not be the case, if the precision of

subjects or the signals they receive varies, then subjects may deviate from the cascade,

however in our setup the precision of subjects and signals is constant. A further point is worth

noting, that only odd numbered subjects can begin a cascade, even numbered subjects can’t.

The fourth subject observes prior history and infers prior signals. Again we predict, because

of the information advantage, that they will choose urn A. If a cascade is in progress then the

fourth agent should imitate and join it. In the absence of a cascade the fourth agent should

choose according to their private signal.

The fifth subject observes prior history and again chooses the same urn. If a cascade is in

progress then they imitate. If no cascade exists, they can start one if the two prior subjects

received identical signals, as in the sequence R,B,R,R, whatever signal the fifth subject

receives, Bayesian inference implies that they should ignore it and instead imitate by

choosing red. However if the two prior subjects choices differ then the fifth subject should

follow their own signal.

The above analysis can be applied to all subsequent subjects: If there is a cascade, then

imitate. If there is no cascade, then depending on prior choices and the subjects position

either start one by being the first agent to imitate, or choose according to your private signal.

26

Although it is possible that a cascade never arises (for example the sequence:

RBRBRBRBRB…..), it is highly likely as Bikhchandani et al (1992) conclude that one will

eventually occur. This is intuitive, since all a cascade requires is that the two subjects prior to

an odd numbered subject receive an identical signal, given that the true value will be drawn

with an 80% probability, this is very likely to occur at some point, and hence cascade is

highly likely to occur.

3.6.2. Analysis of Waiting Treatment

This is identical to the basic treatment until the point at which a cascade develops, after

which all subsequent subjects have to queue behind the ‘cascade urn’. Queueing carries a

penalty of a 50% probability that subjects will miss their turn8; however they can avoid this

by switching to the other urn. Thus up to the point that a cascade develops and subjects have

to queue the technical analysis is identical to that in the basic treatment9.

We will illustrate this using an example, for convenience let us assume that by the fifth

subject a cascade has developed in urn A. Thus from the sixth subject a queue develops

behind urn A. The sixth agent knows that if they queue then there is a 50% probability that

they will miss their turn and forgo any potential earnings from the round. Alternatively they

can avoid this by switching to urn B, guaranteeing them a turn, but at the cost of losing all the

information that has aggregated in urn A. We predict that if the sixth agent switched, they

would be the first agent to receive a draw from that urn, thus urn A has a substantial

8 It might be more realistic if the probability of missing a turn varied with the subject’s position in the queue, i.e. subjects at the start of the queue face a low/negligible probability, as the queue lengthens the probability rises. Eventually the queue will be so long that those subjects at the end will be certain to miss their turn. However for the purposes of simplicity and convenience a constant probability of 50% is used in this experiment. 9 Note here that risk aversion will have a different effect on urn selection in this treatment compared to the basic treatment. This is because there is now a risk for queueing agents of missing their turn, which was not present in the basic treatment. Hence risk aversion to have two contrasting effects: Firstly, there is the risk of switching due to the loss of aggregated info, this will discourage the agent from switching. Secondly, there is the risk of queueing due to the 50% probability of missing their turn; this will encourage the agent to switch. Furthermore, the above is only true if a cascade has already formed, if a cascade has yet to form then risk aversion, as noted in footnote 2 in the basic treatment, can encourage switching. For instance the second agent can be 80% sure that their signal will be the true value if they switch, but if they don’t switch and receive a different signal to the first agent then they can only be 50% certain that their signal is the true value. In this situation a risk averse second agent may choose to switch.

27

information advantage over urn B: If they switch they face a 80% probability that their signal

is the true value, however queueing and joining the cascade in urn A yields either a 98.46%

probability of red being the true value if they receive a red signal, or a 80% probability if they

receive a blue signal. Thus the sixth subject must calculate if the cost of the 50% probability

of missing a turn is larger or smaller than the information cost of switching urns. If the former

cost is larger than the later, then the subject should switch and vice versa.

If we assume the sixth subject switches and observes a blue signal and chooses blue. The

seventh agent infers from this that if they also switch and receive a blue signal then there is a

94.12% probability that the true value of urn B is blue, in this scenario it is better to be the

second switcher than the first. However if the seventh subject received a red signal

contradicting the sixth subjects signal, then the probability is equal at 50% that blue or red is

the true value, in this scenario it is better to be the first switcher than the second.

If both the sixth and seventh subjects switched and received the same signal, then the eighth

subject should switch and start a cascade, from this point on all queueing subjects should

switch and join the cascade, since urn A no longer has an information advantage over urn B.

Alternatively if the sixth and seventh subjects switched and received contradictory signals,

then the eight subject is in the same position as the sixth agent: whatever signal they receive

there is an 80% probability that is the true value.

The above analysis can be applied to all subsequent subjects. A major conclusion we draw is

that the more subjects that switch the more likely a cascade is to occur and thus the more

attractive it becomes to switch. Furthermore it is better to be a latter switcher than an earlier

switcher, since more information will have been aggregated and it is more likely that a

cascade will have already begun.

A scenario worth exploring is the second subject choosing a different urn to the first subject;

this could be due to error or subject preference. How would this alter the above analysis? We

conclude that while it would have some effect on the waiting treatment it would not

fundamentally affect the analysis of any of the three treatments included in this experiment.

Our explanation is thus: imagine the first subject chooses urn A and receives a red signal. The

second subject for whatever reason chooses urn B and receives a blue signal. The third

subject should be indifferent between the two urns, so randomly chooses urn A and also

28

receives a red signal. The fourth subject faced with the choice between urn A and urn B,

should choose urn A, imitate and begin a cascade. Alternatively the third agent could choose

urn A and receive a contrary signal to the first subject, if the fourth subject chooses urn A

then whatever their signal is the probability will be 80% whereas choosing urn B and

receiving the same blue signal as the second subject will result in a probability of 94.12%.

Let us assume the fourth subject chooses urn B, if they receive a blue signal then the fifth

agent should also choose urn B and start a cascade by imitating. If they receive a red signal

the fifth subject will be indifferent between the two urns, whichever they choose will give

that urn an information advantage so the sixth agent should then choose that urn. Eventually

two consecutive subjects will receive the same signal from one of the urns, this will start a

cascade and all subsequent subjects will herd into this urn. Our analysis would only be

significantly affected if subjects kept making errors and failing to herd into one urn thus a

cascade could develop in both urns; however we judge the likelihood of such an occurrence

as negligible.

Thus even if the first two subjects choose different urns, eventually a cascade will begin in

one of the urns, resulting in all other subjects choosing that urn. Although the information

advantage will be smaller than in our standard analysis it will still exist. In other words if the

second subject chooses urn B this will reduce the cost of switching for later subjects, but our

main conclusions are unaffected.

3.6.3. Analysis of Multi-value treatment

This is identical to the basic and waiting treatments except for one modification: 3 rather than

2 ball types. This means that the probability of each type being the true value is 33. 3̇% rather

than 50%. The probability of the true value being drawn remains 80% but because there are

now two non-true values each has a 10% probability of being drawn (i.e. if red is the true

value then out of 10 balls, 8 will be red, 1 will be blue and 1 will be yellow). This compares

to the 20% probability of the single non-true value being drawn in the basic or waiting

treatments.

The theoretical analysis of this treatment is very similar to that in the basic treatment: The

first agent has no history to observe, so will choose randomly between urn A and B, this

choice will then determine which urn all subsequent agents choose due to the information

29

advantage of the chosen urn. The first agent will choose their value according to the signal

they observe. The second subject will choose the same urn and also determine their value

according to their signal. Subsequent agents observe prior history and their own private

signal, through Bayesian inference they determine which ball type is most likely the true

value, which they choose.

A difference between this treatment and the basic treatment is that the true value probabilities

change, i.e. the sequence R,R, B yields an 80% probability of red being the true value in the

basic treatment but a 87.67% probability in the multi-value treatment. This results from there

being an extra ball type which lowers the probability of each type being the true value.

However these differences don’t fundamentally alter our analysis, since the choices of

subjects remains the same as in the basic treatment.

A more important difference between treatments is that in the basic or waiting treatment

cascades can only begin with odd numbered agents, but in the multi-value treatment an even

agent can sometimes begin a cascade. This is best illustrated using an example: Table 1

shows the signal sequence RYRB, Bayesian inference implies that the 4th subject should

choose according to their signal if they receive a R or Y signal (if they get a Y signal they

will be indifferent but we assume that in that case they follow their signal), but if they receive

a B signal as they do here then they should ignore it and imitate by choosing R. Thus they

begin a cascade. This implies that cascades are more likely when there are three possible

values than when there are only two. It is beyond the scope of this paper to establish whether

in general cascades become more likely as the number of options increases.

Table 1

Agent1 2 3 4

Private Signal Red Yellow Red Blue

Public Choice Red Yellow Red Red

3.7. Running the Experiment

Each subject is allocated an I.D. and the order of subjects is randomly determined, subjects

are given the instructions for the experiment and experimenters answer any questions subjects

30

may have. At the end of the experiment subjects privately collect payment of their earnings

from the experimenter.

Basic Treatment:

1) Prior to the start of the experiment, the experimenters randomly choose a true value

independently for urn A and B (This could be done using a coin toss, or by a draw

from a third urn with equal red and blue balls).

2) Subjects are brought into a waiting room, the urns are located in the next room so as

to ensure privacy when subjects observe their private draw.

3) Subjects sequentially leave the waiting room and enter the urn room, they choose one

of the urns.

4) Subjects observe a private draw from their chosen urn.

5) Subjects then choose a value.

6) Subjects leave into a third room, their choices are publicly recorded so that all other

subjects may view them (It may be simplest to simply write the choices on a large

whiteboard at the front of the waiting room). The choices of all subjects remain

publicly recorded for the entirety of the round so that subsequent subjects are aware of

the whole choice history.

7) Once all subjects have been through this process the round is over and subjects are

privately informed of their earnings from the round (this could be achieved by the use

of sealed envelopes).

8) Prior to the start of the following round, new true values are selected and the choice

histories are erased.

Waiting treatment:

Same as the basic treatment until the point that a cascade starts, then

31

1) All subsequent subjects are sequentially asked to choose between queueing behind the

cascade urn, or switching to the non-cascade urn. They are informed that if they queue

there is a 50% chance that they will miss their turn this round.

2) If subjects switch then they proceed with their turn as normal, receiving a draw from

the non-cascade urn and then choosing a value which is publicly recorded.

3) Once all subjects have chosen whether to queue or switch, a coin is flipped to decide

if all queueing subjects miss their turn. If they don’t miss their turn, then they proceed

as normal with their turn.

4) As in basic setup, at the end of the round subjects are informed of their earnings, new

true values are selected and the choice histories are erased.

Note: queue doesn’t have to start as soon as a cascade begins, the experimenter can start the

queue at any point once a cascade has started, i.e. if a cascade began with the third subject the

experimenter could wait until the sixth subject before implementing the queue.

Also note that subjects in this treatment are informed prior to the start of the experiment

about the queue and the resulting possibility that they may miss their turn and lose any

earnings from the round. It is an open question whether results would be different if they

were uninformed about the queue, perhaps there actions would be more instinctual and closer

to how they would actually behave in the real world. Whatever benefits that might arise from

failing to inform subjects about the queue, it represents the defining characteristic of this

treatment. Therefore failing to inform subjects of it or its consequences is, in our opinion

tantamount to deception or close enough as to be unpalatable.

Multi-Value treatment:

This is the identical to the basic treatment but with 3 values instead of 2, thus to select the

true value the experimenters should conduct a random draw from a nine ball urn with 3 balls

of each value.

Treatment Rounds:

How many rounds should be run in each treatment? One approach is for subjects to

participate in 10 rounds of whichever of the three treatments they are randomly chosen for.

32

The advantage of this approach is that only one set of instructions have to be given and the

experiment can proceed without interruption, however there is a potential issue in that

differences in the results of the experimental treatments may be caused by subject effects, i.e.

differences between subjects lead to differences between treatments.

An alternative approach that avoids this problem is for the experiment to be split into two

parts: In the first part, subjects take part in 5 rounds of the basic treatment. In the second part,

subjects then take part in another 5 rounds of either the waiting treatment or the multi-value

treatment. Any subject effects should be substantially weakened under this approach, instead

differences in results between treatments should be due to treatment effects. However this

approach would mean giving instructions for the basic treatment at the start of the experiment

and then half way through interrupting to give a further set of instructions for the other

treatment.

Furthermore learning effects may have some impact, by the time the second treatment starts,

subjects may have learnt enough to alter how they would otherwise have behaved in the

second treatment. This could be controlled for by running some subjects in a version where

the basic treatment comes first and is then followed by the waiting or multi-value treatment,

and then running other subjects so that the order is reversed

Payment:

Subjects are paid at the end of the experiment; they earn 2 ECU (Experimental Currency

Unit) per round for correctly guessing the true value, or 1 ECU for guessing a non-true value.

Each subject takes part in 10 rounds in total. In the basic and multi-value treatments this

means that each subject earns a minimum of 10 ECU’s and a maximum of 20 ECUs.

However in the waiting treatment, because there is a 50% probability that a queueing subject

will miss their turn a subject could potentially earn 0 ECUs, to ensure that all subjects earn

something each subject receives a showup fee of 10 ECU’s. Including this showup fee means

that subjects can earn a maximum of 30 ECUs.

3.8. Extensions and Improvements

33

There are a number of extensions and improvements that could be made to our experiment.

Firstly, the multi-value treatment could be extended to establish whether the contention of

Bikhchandani, et al (1998) that as the number of alternatives increases, cascades start later

and more information is aggregated, is generally true: does the point at which a cascade

begins continue to be pushed back each time an extra alternative is added and does the

amount of information aggregated continue to increase, or is the effect more specific i.e. once

you move beyond three or four alternatives the effect disappears or is so small as to be

negligible.

Secondly, the number of urns that subjects can choose between could be increased. In this

experiment the number of alternative values that subjects could choose from was increased,

but subjects also choose between urns and varying the number of urn might have a different

impact.

Thirdly, the probabilities of the urns could be adjusted. For example urn A would have a 9/10

probability of the true value being drawn, while urn B could have only a 3/5 probability. This

would be especially useful in the waiting treatment, since the experimenter, by adjusting the

probabilities during the experiment, could investigate how willing subjects are to switch from

a high probability urn to a low probability urn, if this means they avoid a waiting cost.

Finally, the waiting cost could be varied. The waiting cost here is a 50% probability that a

queuing subject will lose their turn, but it is unrealistic for this to remain constant throughout

the queue. It would therefore be more descriptively realistic if the probability varied with the

length of the queue, starting low but steadily rising as the queue length increases. Clearly at

the extremes, where the probability is very low or very high, the behaviour of subjects is easy

to predict, but it is much more difficult to predict how subjects will behave towards the

middle of the probability range.

4. Conclusion

34

This study reviews the theoretical and experimental literature on the topic of information

cascades. It then presents the design for an experiment that builds on existing experiments,

but adds important innovations so as to be able to extend the range of issues surrounding

information cascades that can be investigated experimentally. Three treatments are

developed: A control treatment similar to the experimental design of Anderson & Holt (1996;

1997; 2008). A waiting treatment designed to investigate the effects of queueing on

information cascades. A multi-value treatment designed to investigate whether increasing the

options subjects can choose from will have an impact on the timing of when cascades occur.

Technical analysis is presented for each of these treatments detailing our predictions for how

subjects should behave, and the practical steps necessary to run the experiment are discussed.

Finally, we examine how the experiment might be extended and improved.

5. Appendices

35

Appendix A: Bayesian Learning

Bayesian learning or inference involves adjusting prior beliefs given new data, posterior

probabilities are calculated using Bayes’s theorem. A simple version of Bayes theorem as

stated in Upton & Cook (2002) is:

P (A|B )= P ( A ) P (B|A )P (A )P (B|A )+P ( A' )P (B|A ' )

P(A|B) is the probability that event A occurs given that event B has been observed, for

instance this could be the probability that red is the true value given that the first agent

observes a red signal. P(A)P(B|A) is the prior probability of event A occurring, multiplied by

the probability of event B occurring if event A has occurred. A'is the alternative event that

could occur instead of event A. Thus P (A ' )P (B|A' ) is the prior probability that alternative

event A' occurs, multiplied by the probability of event B occurring if alternative event A' has

occurred.

The general form of the theorem is:

P (A j|B )=P (A j ) P (B|A j )

∑k=1

n

P ( A k )P (B|Ak )

Where events A1, A2, . . . , An are mutually exclusive and exhaustive, i.e. each event is

independent and two such events cannot occur simultaneously. In our context A1, A2, . . . , An

represent all the possible true values of the urn. While the simple form can be used in the

basic and waiting treatments, in the multi-value treatment where there are three ball types it is

necessary to use the general form.

To illustrate further imagine that the first three observations in the multi-value treatment are,

Red, Blue, Red. The posterior probability that the true value is red is thus:

P ( t=R|RBR )= P (t=R )P (RBR|t=R )P ( t=R ) P (RBR|t=R )+P (t=B )P (RBR|t=B )+P ( t=Y )P (RBR|t=Y )

The probability of the true value being drawn is 4/5, and the probability of each of the non-

true value being drawn is 1/10 for each value, so if red is the true value there is an 80%

chance that it will be drawn and a 10% chance that either blue or yellow will be drawn, these

36

probabilities remain constant as the number of draws increases because each draw is

independent. The prior probability of each value being the true value is 1/3. Thus the

posterior probability that red is the true value given the sequence of observations: Red, Blue,

Red, is calculated as:

P ( t=R|RBR )=

13×( 4

5× 1

10× 4

5)

( 13×( 4

5× 1

10× 4

5))+( 1

3×( 1

10× 4

5× 1

10))+( 1

3×( 1

10× 1

10× 1

10))

=

64375

( 64375 )+( 1

3000 )+( 13000 )

=256257

=99.61%

P (t=R|RBR )=

8375

( 8375 )+( 1

375 )+( 13000 )

=6473

=87.67 %

It is unnecessary to calculate every single possibility, for instance if the first three signals are

blue, red, blue, then the third agent clearly doesn’t need to calculate the probability that the

true value is red or yellow, since it is obvious that blue is more likely the true value.

Furthermore once you have calculated the probabilities for a sequence such as Red, Blue,

Red, i.e. P(t = R|RBR) = 4/5. You don’t need to recalculate if the next sequence is Blue, Red,

Blue, since the sequences mirror each other, i.e. P(t = B|BRB) = P(t = R|RBR) = 4/5. For this

reason in the calculations below we only show calculations for the true value being red.

Additionally if we calculate the posterior probability of a sequence such as: Red, Blue, Red,

Blue, Yellow, we don’t also calculate the probability for the sequence: Red, Yellow, Red,

Blue, Yellow. It doesn’t matter if there are 2 blue signals and 1 yellow or 2 yellow’s and 1

blue, the probability of the true value being red will be the same.

Basic and waiting treatments:

In these treatments there are two ball types, Red and Blue. The probability that either is the

true value is 1/2, the probability that the true value will be drawn is 4/5, the probability that

the non-true value will be drawn is 1/5.

If the first agent receives a red signal, then the posterior probability of the true value of the

urn being red (denoted by t = R) is:

37

ii)

P ( t=R|RRRRB )= P (t=R )P (RRRRB|t=R )P (t=R )P (RRRRB|t=R )+P ( t=B )P (RRRRB|t=B )

=6465

=98.46 %

iii) P ( t=R|RBRBR )= P (t=R )P (RBRBR|t=R )P ( t=R )P (RBRBR|t=R )+P ( t=B )P (RBRBR|t=B )

=45=80 %

Multi-value treatment:

There are three types of ball in this treatment so the calculations change: P(t = R) = P(t

= B) = P(t = Y) = 1/3

The probability that the true value is drawn from the urn remains the same at 4/5, but because

there are now two instead of only one alternative ball type the probability that a non-true

value is drawn from the urn is 1/10 for each of the two non-true value ball types:

P(t = R|R) = 4/5, P(t = R|B) = 1/10, P(t = R|Y) = 1/10

First agent

i) P ( t=R|R )= P ( t=R )P (R|t=R )P (t=R )P (R|t=R )+P ( t=B ) P (R|t=B )+P (t=Y ) P (R|t=Y )

¿

13× 4

5

( 13× 4

5 )+( 13× 1

10 )+( 13× 1

10 )=

415

( 415 )+( 1

30 )+( 130 )

=45=80 %

Second Agent

40

i) P ( t=R|RR )= P ( t=R )P (RR|t=R )P (t=R )P (RR|t=R )+P (t=B )P (RR|t=B )+P (t=Y )P (RR|t=Y )

¿

13× 16

25

( 13× 16

25 )+( 13× 1

100 )+( 13× 1

100 )=

1675

( 1675 )+( 1

300 )+( 1300 )

=3233

=96.97 %

ii) P ( t=R|RB )=

13× 2

25

( 13× 2

25 )+( 13× 2

25 )+( 13× 1

100 )=

275

( 275 )+( 2

75 )+( 1300 )

= 817

=47.06 %

Third agent

i)

P ( t=R|RRR )=

13× 64

125

( 13× 64

125 )+( 13× 1

1000 )+( 13× 1

1000 )=

64375

( 64375 )+( 1

3000 )+( 13000 )

=256257

=99.61%

ii)

P ( t=R|RBR )=

13× 8

125

( 13× 8

125 )+( 13× 1

125 )+( 13× 1

1000 )=

8375

( 8375 )+( 1

375 )+( 13000 )

=6473

=87.67 %

iii)

P ( t=R|RBY )=

13× 1

125

( 13× 1

125 )+( 13× 1

125 )+(13× 1

125 )=

1375

( 1375 )+( 1

375 )+( 1375 )

=13=33. 3̇ %

41

Fourth agent

i) P ( t=R|RRRR )=

2561875

( 2561875 )+( 1

30000 )+( 130000 )

=20482049

=99.95 %

ii) P (t=R|RRRB )=

321875

( 321875 )+( 1

3750 )+( 130000 )

=512521

=98.27 %

iii) P ( t=R|RBRB )=

41875

( 41875 )+( 4

30000 )+( 130000 )

= 64129

=49.61 %

iv) P ( t=R|RBYR )=

41875

( 41875 )+( 1

3750 )+( 13750 )

=45=80 %

Fifth agent

i) P (t=R|RRRRR )=

10249375

( 10249375 )+( 1

300000 )+( 1300000 )

=99.99 %

ii) P ( t=R|RRRRR )=

1289375

( 1289375 )+( 1

37500 )+( 1300000 )

=40964105

=99.78 %

42

iii) P ( t=R|RBRBR )=

169375

( 169375 )+( 2

9375 )+( 1300000 )

=512577

=88.73 %

iv) P ( t=R|RBRYR )=

169375

( 169375 )+( 1

37500 )+( 137500 )

=3233

=96.97 %

v) P (t=R|RBRYR )=

29375

( 29375 )+( 2

9375 )+( 137500 )

= 817

=47.06 %

6. BibliographyAcemoglu, D., Dahleh, M. A., Lobel, I. & Ozdaglar, A., 2011. Bayesian Learning in Social Networks. Review of Economic Studies, Volume 78, pp. 1201 - 1236.

Ainlay, S. C., Becker, G. & Coleman, L. M., 1986. The Dilemma of Difference: A Multidisciplinary View of Stigma. New York: s.n.

43

Akerlof, G. A., 1980. A Theory of Social Custom, of which Unemployment may be One Consequence. The Quarterly Journal of Economics, 94(4), pp. 749 - 775.

Anderson, L. R. & Holt, C. A., 1996. Classroom Games: Information Cascades. The Journal of Economic Perspectives, 10(4), pp. 187-193.

Anderson, L. R. & Holt, C. A., 1997. Information Cascades in the Laboratory. The American Economic Review, 87(5), pp. 847 - 862.

Anderson, L. R. & Holt, C. A., 2000. Information Cascades and Rational Conformity. In: Encyclopedia of Cognitive Science. s.l.:Macmillan Reference Ltd, pp. 540 - 544.

Anderson, L. R. & Holt, C. A., 2008. Information Cascade Experiments. In: S. a. Plott, ed. The Handbook of Results in Experimental Economics. s.l.:s.n., pp. 335 - 343.

Arthur, W. B., 1989. Competing Technologies, Increasing Returns, and Lock-In by Historical Events. The Economic Journal, 99(394), pp. 116 - 131.

Asch, S. E., 1952. Social Psychology. Englewood Cliffs, N.J.: Prentice-Hall.

Banerjee, A. V., 1992. A Simple Model of Herd Behaviour. The Quarterly Journal of Economics, 107(3), pp. 797 - 817.

Banerjee, A. V., 1993. The Economics of Rumours. Review of Economic Studies, 83(3), pp. 309 - 327.

Becker, G. S., 1991. A Note on Restaurant Pricing and Other Examples of Social Influences on Prices. The Journal of Political Economy, 99(5), pp. 1109 - 1116.

Bikhchandani, S., Hirshleifer, D. & Welch, I., 1992. A Theory of Fads, Fashion, Custom and Cultural Change as Informational Cascades. Journal of Political Economy, 100(5), pp. 992 - 1026.

Bikhchandani, S., Welch, I. & Hirshleifer, D., 1998. Learning from the Behavior of Others: Conformity, Fads, and Informational Cascades. Journal of Economic Perspectives, 12(3), pp. 151 - 170.

Camerer, C. & Weigelt, K., 1991. Information Mirages in Experimental Asset Markets. The Journal of Business, 64(4), pp. 463 - 493.

Çelen, B. & Kariv, S., 2004. Distinguishing Informational Cascades from Herd Behavior in the Laboratory. The American Economic Review, 94(3), pp. 484-498.

Çelen, B. & Kariv, S., 2004. Observational learning under imperfect information. Games and Economic Behavior, 47(1), pp. 72 - 86.

Çelen, B. & Kariv, S., 2005. An experimental test of observational learning under imperfect information. Economic Theory, Volume 26, p. 677 – 699.

Chamley, C. P., 2004. Rational Herds: Economic Models of Social Learning. Cambridge: Cambridge University Press.

Chamley, C., Scaglione, A. & Li , L., 2013. Models for the Diffusion of Beliefs in Social Networks: An Overview. IEEE Signal Processing Magazine, 30(3), pp. 16 - 29.

44

Conlisk, J., 1980. Costly Optimizers Versus Cheap Imitators. Journal of Economic Behaviour and Organisation, Volume 1, pp. 275 - 293.

Deutsch, M. & Gerard, H. B., 1955. A Study of Normative and Informational Social Influences upon Individual Judgement. Abnormal and Social Psychology, Volume 51, pp. 629 - 636.

Easley, D. & Kleinberg, J., 2010. Networks, Crowds, And Markets. Cambridge: Cambridge University Press.

Elsden, J., 2016. Review of: “A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades”, Nottingham: University of Nottingham.

Friedman, M., 1966. The Methodology of Positive Economics. In: In Essays In Positive Economics. Chicago: University of Chicago Press, pp. 3-16, 30-43.

Gale, D., 1996. What have we learned from social learning?. European Economic Review, 40(3 - 5), pp. 617 - 628.

Hey, J. & Allsopp, L., 1999. Two Experiments to Test a Model of Herd Behaviour. The University of York: Discussion Papers in Economics, 1999(24), pp. 1 - 24.

Huck, S. & Oechssler, J., 2000. Informational cascades in the laboratory: Do they occur for the right reasons?. Journal of Economic Psychology, Volume 21, pp. 661 - 671.

Hung, A. A. & Plott, C. R., 2001. Information Cascades: Replication and an Extension to Majority Rule and Conformity-Rewarding Institutions. American Economic Review, 91(5), pp. 1508 - 1520.

Katz, M. & Shapiro, C., 1986. Technology Adoption in the Presence of Network Externalities. Journal of Political Economy, 94(4), pp. 822 - 841.

Kübler, D. & Weizsäcker, G., 2004. Limited Depth of Reasoning and Failure of Cascade Formation in the Laboratory. The Review of Economic Studies, April, 71(2), pp. 425 - 441.

Lee, I. H., 1993. On the Convergence of Informational Cascades. Journal of Economic Theory, 61(2), pp. 395 - 411.

Nöth, M. & Weber, M., 2003. Information Aggregation with Random Ordering: Cascades and Overconfidence. The Economic Journal, 113(484), pp. 166 - 189.

Raafat, R. M., Chater, N. & Frith, C., 2009. Herding in Humans. Trends in Cognitive Science, 13(10), pp. 420 - 428.

Robin, E. D., 1984. Matters of Life and Death: Risks vs. Benefits of Medical Care. New York: Freeman.

Smith, L. & Sørensen, P., 2000. Pathological Outcomes of Observational Learning. Econometrica, 68(2), pp. 371 - 398.

Smith, V. L., Suchanek, G. L. & Williams, A. W., 1988. Bubbles, Crashes, and Endogenous Expectations in Experimental Spot Asset Markets. Econometrica, 56(5), pp. 1119 -1151.

45

Taylor, R., 1979. Medicine out of Control: The Anatomy of a Malignant Technology. Melbourne: Sun Books.

Upton, G. & Cook, I., 2002. Dictionary of Statistics. Oxford: Oxford University Press.

Welch, I., 1992. Sequential Sales, Learning, and Cascades. The Journal of Finance, 47(2), pp. 695 - 732.

Ziegelmeyer, A., Koessler, F., Bracht, J. & Winter, E., 2010. Fragility of information cascades: an experimental study using elicited beliefs. Experimental Economics, 13(2), pp. 121 - 145.

46

Dissertation

Documents

Transcript of Dissertation