Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

49
1 Stuck in the Wisdom of Crowds: Information, Knowledge, and Heuristics Yunwen He† Jaimie W. Lien‡ Jie Zheng± Initial Version: July 1 st , 2020 Current Draft: July 13 th , 2021 Abstract: 1 Collective knowledge is significantly affected by information about others’ viewpoints. However, under what conditions does the “wisdom of crowds” help versus harm knowledge of factual information? In this experiment, we present subjects with the task of answering 50 factual true or false trivia questions, with the potential opportunity to revise their answers after receiving different levels of information about other subjects’ answers and self-assessed confidence levels from an independent session. We find that information about others’ answers improves performance on easy questions, but tends to harm performance on difficult questions. In addition, information about answers provided by other subjects mainly improves performance for those with lower initial knowledge levels. Subjects in our Moderate-Information condition outperform those in either the Low or High-Information conditions, implying an optimal level of social information provision, in which the Majority Rule and Maximum Confidence rule complement one another. Although the Maximum Confidence rule can improve performance, yielding the lowest overall error rate out of the heuristics considered, subjects generally underutilize the information on other subjects’ confidence levels in favor of the Majority Rule heuristic. These findings shed light on possible directions for policies that can cultivate factual knowledge on online opinion platforms. JEL Codes: C91, D81, D83 Keywords: wisdom of crowds, information provision, decision heuristics, majority rule, surprising popularity, maximum confidence 1 † He: Department of Economics, School of Economics and Management, Tsinghua University, Beijing, China, [email protected]; ‡ Lien: Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Shatin, Hong Kong, China, [email protected]; ± Zheng: Department of Economics, School of Economics and Management, Tsinghua University, Beijing, China, [email protected]. Guangying Chen’s excellent research assistance is greatly appreciated. We are thankful for comments from Shouying Liu, Xianghong Wang, Chun-lei Yang, Maoliang Ye, Boyu Zhang, and session participants at the China Behavioral and Experimental Economics Forum (2020) in Beijing. We gratefully acknowledge funding from the National Natural Science Foundation of China, Tsinghua University, The Chinese University of Hong Kong, and Research Grants Council of Hong Kong. All errors are our own.

Transcript of Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

Page 1: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

1

Stuck in the Wisdom of Crowds:

Information, Knowledge, and Heuristics

Yunwen He† Jaimie W. Lien‡ Jie Zheng±

Initial Version: July 1st, 2020

Current Draft: July 13th, 2021

Abstract:1

Collective knowledge is significantly affected by information about others’ viewpoints. However,

under what conditions does the “wisdom of crowds” help versus harm knowledge of factual

information? In this experiment, we present subjects with the task of answering 50 factual true or false

trivia questions, with the potential opportunity to revise their answers after receiving different levels

of information about other subjects’ answers and self-assessed confidence levels from an independent

session. We find that information about others’ answers improves performance on easy questions, but

tends to harm performance on difficult questions. In addition, information about answers provided by

other subjects mainly improves performance for those with lower initial knowledge levels. Subjects in

our Moderate-Information condition outperform those in either the Low or High-Information

conditions, implying an optimal level of social information provision, in which the Majority Rule and

Maximum Confidence rule complement one another. Although the Maximum Confidence rule can

improve performance, yielding the lowest overall error rate out of the heuristics considered, subjects

generally underutilize the information on other subjects’ confidence levels in favor of the Majority

Rule heuristic. These findings shed light on possible directions for policies that can cultivate factual

knowledge on online opinion platforms.

JEL Codes: C91, D81, D83

Keywords: wisdom of crowds, information provision, decision heuristics, majority rule, surprising

popularity, maximum confidence

1 † He: Department of Economics, School of Economics and Management, Tsinghua University, Beijing, China,

[email protected]; ‡ Lien: Department of Decision Sciences and Managerial Economics, The Chinese

University of Hong Kong, Shatin, Hong Kong, China, [email protected]; ± Zheng: Department of Economics,

School of Economics and Management, Tsinghua University, Beijing, China, [email protected]. Guangying Chen’s

excellent research assistance is greatly appreciated. We are thankful for comments from Shouying Liu, Xianghong Wang,

Chun-lei Yang, Maoliang Ye, Boyu Zhang, and session participants at the China Behavioral and Experimental Economics

Forum (2020) in Beijing. We gratefully acknowledge funding from the National Natural Science Foundation of China,

Tsinghua University, The Chinese University of Hong Kong, and Research Grants Council of Hong Kong. All errors are

our own.

Page 2: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

2

1. Introduction

Many statements can be challenging to verify or refute on the spot. Such uncertainties about facts

are the origin of many controversies and disagreements. In such situations, the perspectives,

assessments, and comments of other people may be heavily used by decision-makers in order to

formulate their own judgments. Particularly in the digital era, online news and discussion forums have

considerably expanded decision-makers’ access to the opinions and suggestions of others.

Previous studies have demonstrated that groups can outperform the average in arriving at correct

answers, even outperforming the best individual response, which is often referred to as the “wisdom

of the crowd” effect. Utilizing collective intelligence has the potential to solve many important

decision-making problems regarding predictions or establishing facts, such as predicting stock markets

(Chen et al., 2014; Gottschlich & Hinz, 2014), political changes (Wolfers & Zitzewitz, 2004; Mellers

et al., 2014), or weather conditions (Baars & Mass, 2005), alleviating microfinance problems (Yum et

al., 2012), and boosting medical diagnostics (Wolf et al., 2015; Kurvers et al., 2016).

In most studies on the wisdom of crowds, the commonly used aggregation method is majority

voting (or averaging, for continuous estimation problems). However, decisions based solely on the

majority opinion are often criticized for ignoring diverse views and discarding potentially valuable

information content held by a minority of individuals. As a consequence, the effectiveness of a

majority rule approach may be limited to simpler tasks, while failing in many complicated settings

(Morton et al., 2019). For similar reasons, the simple averaging of judgments bears the risk of

neglecting experts by assigning equal weight to each person, which reduces the full potential of

benefitting from the wisdom of crowds (Lee et al., 2011).

Algorithms which have the potential to better extract the wisdom of crowds have been proposed;

for example, weighting judgments by individuals’ confidence levels (Cooke, 1991; Koriat, 2012) or

knowledge levels (Raykar et al., 2010; Aspinall, 2010; Wang et al., 2011; Budescu & Chen, 2014),

selecting the answer which is more popular than predicted (Prelec et al., 2017), pivoting and

recombining private and shared information optimally (Palley & Soll, 2019), and others. However,

each of these approaches are not necessarily guaranteed to succeed in practice, since the heuristics that

individuals actually use to gain access to the wisdom of crowds in practice have not been detected and

tested empirically.

In this paper, we establish several facts about how individuals form their beliefs about statements,

when provided with information about other individuals’ beliefs and decisions. We also study if and

when individuals change their minds when presented with differing degrees of information. We

conduct comparisons between three treatments with different levels of information, to identify the

determinants of subjects’ answer changes. The comparison between the baseline treatment without

information, and three successively informative information treatments allows us to study how access

other respondents’ information influences subjects’ final performances.

A key research area relevant to our work is the impact of social information on individuals’

decision making. Lorenz et al. (2011) finds that the knowledge about other respondents’ estimates is

detrimental to the “wisdom of crowd” effect by narrowing the diversity of opinions over periods in a

quantity estimation task. Bazazi et al. (2019) provide further evidence for a negative effect, while

showing that diminished crowd wisdom and increased social conformity are only present in the

Page 3: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

3

individual incentive structure and not in the collective incentive structure. In the online experiment of

artificial investment games conducted by Chen et al. (2020), forecasts about the financial market are

more concentrated within groups with forecast sharing, failing to reduce the prediction error, but

generating higher variability across groups. However, social information does not necessarily erode

collective intelligence when individual decisions are made sequentially, due to the tendency to imitate

successful individuals (King et al., 2012); or when information is provided in a complete,

disaggregated format (Silva & Correia, 2016; Jayles & Kurvers, 2020). The size of the group and the

size of the majority are also potentially important determinants of social influence (Brown, 2000).

While most of these prior studies focus on the conditions that increase or decrease the

performance of collective knowledge, our study seeks to establish at more detailed level, when and

how “crowd wisdom” helps or harms decisions individually. Prior experimental studies primarily focus

on quantitative estimation tasks while varying the form of shared information in different treatments.

However, the literature largely lacks studies focusing on the effect of different levels of information

along multiple dimensions, and their impact on successful persuasion of individuals through changes

to prior answers.

We address this important issue by implementing simple binary choice tasks, in which subjects

are asked to give their incentivized answers to 50 factual and uncontroversial true/false trivia questions.

Following Prelec et al. (2017), we further incentivize the subjects to report their confidence in their

own answer, their estimate of others’ giving the same answer, and their estimate of the average

confidence in own answer reported by all participants. In the information provision treatments, we

additionally provide subjects with the opportunity to revise their answers, after having received

different levels of aggregated information from low to high, generated from a baseline treatment.

Firstly, we show that information provision leads to improvement in accuracy on easy questions,

and undermines it among difficult questions, as measured by the baseline correct rates of each question.

This result follows the spirit of the insights in Lorenz et al. (2011) and Tump et al. (2018), which

propose that social information has an asymmetric impact, with better average individual performance

on easy questions and poorer average individual performance on difficult questions.

We also find that subjects with lower knowledge levels generally benefit the most from social

information in terms of improvement in the accuracy of their answers. Furthermore, confident subjects

lose the potential opportunity to benefit from information since they are less influenced by it, which is

consistent with the notion that confidence and accuracy are related to sensitivity levels to social

influence (Yaniv & Milyavsky, 2007; Chacoma & Zanette, 2015; Jayles et al., 2017).

We find that there exists an optimal level of information provision in our setting, reminiscent of

studies that find an optimal group size for exchanging opinions in the quantity estimation tasks (Jayles

& Kurvers, 2020). The moderate-level information condition which reveals other participants’ choice

frequencies and confidence levels generates the best performance, avoiding information overload

while enabling comparisons between options along multiple dimensions.

To examine how subjects use information in revising their opinions, we focus on three specific

heuristics which can be tested using the information provided to subjects in the experiment: Majority

Rule, the Maximum Confidence rule and the Surprising Popularity rule. Our experimental results show

that Majority Rule is the most attractive heuristic for decision-makers. Furthermore, subjects rely more

heavily on Majority Rule over the Maximum Confidence rule in our Full-Information treatment

Page 4: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

4

compared to the Moderate-Information treatment. This indicates that the greater complexity of

information processing in the Full-Information treatment leads individuals to a reduced processing of

the information available. We find little evidence for the use of the standard Surprising Popularity rule

proposed by Prelec et al. (2017) (that is, comparing the average estimated percentage of “True”

answers with the actual percentage) nor for the egocentric Surprising Popularity rule (that is,

comparing their own estimated percentage of “True” answers with the actual percentage) in practice

in our experiment.

Despite its prevalent usage, Majority Rule has serious limitations and tends to harm knowledge

on difficult questions. Our regression analyses on correct rates demonstrates that although information

provision generally plays a positive role, a higher proportion of participants having the same answer

as their own is associated with inferior final performance, especially when the relevant information is

revealed. Joint analysis on the two stages performance (initial answer and revision) also indicates that

the relative risk ratio of always giving the wrong answer is significantly higher when subjects’ original

view is widely accepted.

Our study fills a gap in the literature on the wisdom of crowds by examining the effects of social

conditions, in terms of the amount of information provided along multiple dimensions, on individual

factual knowledge and opinion changes. Furthermore, we focus on the context of binary choice

questions, which provides a convenient framework for analyzing heuristics utilized on factual

problems. Our experimental approach reflects many realistic situations, such as voting, making

investments, and fact-finding in online discussion forums, where decision-makers need to take a

position, after being potentially exposed to social information. Thus, our work complements previous

studies by showing that more social information is not always better for factual knowledge, and

furthermore, that there tends to be a strong individual bias in favor of following the majority.

The rest of the paper is organized as follows: Section 2 describes the experimental design. Section

3 evaluates the final performance in our task at the question level and the subject level, as well as

comparing the accuracies of three heuristic rules. Section 4 investigates the determinants of subjects’

answer changes and the associated rules. Section 5 discusses the overall impact of information

provision and initial answers on final performance. Section 6 concludes and discusses.

2. Experimental Design

2.1 Experiment treatments

We implement an experiment with 50 factual true/false trivia questions of varying expected

difficulty that test participants’ real-world knowledge. Participants are incentivized to answer the 50

questions correctly to the best of their ability. Subjects are also asked to report their confidence in their

own answer (ranging from 50%-100%), their estimate of the proportion of others’ giving the same

answer (ranging from 1%-100%), and their estimate of the average confidence of all participants in

own answer (ranging from 50%-100%).

Quiz questions and follow-up questions are incentivized in the following way: Subjects receive 1

RMB for each quiz question that is correctly answered. For their self-reported confidence, when it is

smaller than a random number Ri% generated by the computer from the interval [0%, 100%], they will

additionally receive 2×Ri% RMB; otherwise they will receive 0.2 RMB if the corresponding quiz

question is correctly answered, or 0 if the corresponding quiz question is incorrectly answered. This is

Page 5: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

5

exactly the Becker-DeGroot-Marschak (1964) mechanism utilized to incentivize subjects to report

their real confidence levels. Finally, 0.2 RMB will be added to their payment if the errors for their

estimate of others’ giving the same answer, or of the average confidence in own answer of all

participants are correct within 5 percentage points. Detailed descriptions of the experimental

instructions and the true/false questions can be found in the Appendix.

In the baseline treatment (NI; No-Information), the only experimental task for subjects is to submit

their answers and beliefs as described above. They receive monetary payments based on their

performance on all quiz questions and corresponding follow-up questions on beliefs.

In the three information provision treatments, subjects first complete the task with the same rules

as in the NI treatment, without being informed that there will be a second stage. After the first stage

has ended, they are informed that they are provided with an opportunity to revise their answers after

having received some information generated from a session conducted earlier. The information

presented to subjects in this second stage is from the baseline NI treatment. The monetary payments

are then based on their finalized answers to the quiz questions as well as their self-reported estimates

from the first stage.

In the Low-Information treatment (LI), subjects are only informed of the actual choice distribution

of answers given by the respondents in the baseline treatment. In the Moderate-Information treatment

(MI), the information about the average confidence levels of those who agree and disagree with the

statement from the baseline treatment are additionally provided. In the Full-Information treatment (FI),

in addition to the actual choice distribution and the average level of self-reported confidences, the

estimates about others’ answers and confidences are also provided in the. Table 1 summarizes the

treatments.

Table 1: Treatment Overview

Treatment First stage:

Question

Answering

and belief

elicitation

Second stage:

Information

Answer

revision

opportunity

Information provided in the second stage (generated from

NI treatment)

NI (No

Information)

Yes No -

LI (Low

Information)

Yes Yes 1) Actual choice frequencies of answers.

MI

(Moderate

Information)

Yes Yes 1) Actual choice frequencies of answers;

2) Average confidence level of those who agree and disagree

with the statement.

FI (Full

Information)

Yes Yes 1) Actual choice frequencies of answers;

2) Average confidence level of those who agree and disagree

with the statement;

3) Average estimate about percent agreement with own

answer, by those who agree and disagree with the statement;

4) Average estimate about average confidence level in own

answer, by those who agree and disagree with the statement.

Page 6: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

6

2.2 Experiment procedure

Four experimental sessions of 128 subjects in total (32 subjects per treatment), were conducted

on December 22nd, 2019 at the experimental laboratory of Beijing Foreign Studies University, with

university undergraduate students as the subject pool. The experiment was programmed and conducted

using the software z-Tree (Fischbacher, 2007).

At the beginning of each session, subjects were seated in front of their terminal and received a

copy of the experiment instructions. The terminals were isolated via partitions, and verbal

communication between subjects was not allowed. Subjects were instructed not to access their mobile

devices while participating in the experiment. After everyone understood the experiment rules and

payoff functions, the experiment could begin.

Each session lasted approximately 60 minutes, and the average payment received was 43.43 RMB

per subject (including a 10 RMB show-up fee), which is within the standard range of payment for

experiment participation in mainland China.

3. Analysis of Performance

3.1 Overall evaluation

Figure 1 (left panel) presents the mean of correct rates of each stage separated by treatment. It is

clear that there is no significant difference across treatments in the original mean of correct rates, which

is close to 0.5.2 In fact, the correct answer for 24 out of the 50 questions is “True”, that is, a participant

can achieve the average performance by always choosing “True” (or “False”).

However, the gaps in the correct rates across treatments are widened in the second stage, as seen

in the right panel. With the answer revision possibility, the final correct rates in the three information

provision treatments are all strictly higher than that in the control group (No-Information treatment),

which shows that information generally plays a positive role. On the other hand, the correct rate is not

monotonically increasing in the amount of information provision, showing an inverse-U shape as a

function of information provided.

The information provided in the Low-Information treatment only leads to a subtle improvement

in subjects’ performance, suggesting that merely knowing the majority opinion renders little help. The

correct rate in the Moderate-Information treatment is the highest, indicating that knowing the average

confidence of those who agree and disagree with the statement is effective for answer accuracy.

However, the additional information about other respondents’ estimates and beliefs as provided in the

Full-Information treatment, worsens the outcome compared with the Moderate-Information treatment.

While the Figures are shown to illustrate the aggregate result, detailed statistical tests are provided in

the next section.

2 Specifically, for two-sided t-tests at the question level: No-Information vs Low-Information: p=0.524; No-Information vs Moderate-

Information: p=0.235; No-Information vs Full-Information: p=0.594; Low-Information vs Moderate-Information: p=0.403; Moderate-

Information vs Full-Information: p=0.436; Low-Information vs Full-Information: p=0.915. Two-sided Welch t-test at the subject level:

No-Information vs Low-Information: p=0.698; No-Information vs Moderate-Information: p=0.372; No-Information vs Full-Information:

p=0.762; Low-Information vs Moderate-Information: p=0.542; Moderate-Information vs Full-Information: p=0.499; Low-Information

vs Full-Information: p=0.929.

Page 7: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

7

Figure 1: Mean of Correct Rates by Treatment Note: Confidence intervals are shown at the 95 percent level.

Figure 2 presents the standard deviation of correct rates at the question level (top panels) and at

the subject level (bottom panels) respectively. Compared with the first stage (left panels), the variation

in the question-based correct rates becomes larger in the second stage (right panels), while the variation

in the subject-based correct rates experiences a moderate decline. Later, we show that being provided

with the chance to revise answers leaves the performance even more disperse by individual questions.

In particular, the correct rates of difficult questions are further lowered, because the minority who

originally give the right answer are misled by the majority opinion in the second stage. On the other

hand, the correct rates of easy questions are further raised, because the minority who originally give

the incorrect answer benefit from the wisdom of crowds in the second stage. Furthermore, the

performance at the subject level becomes more uniform with information disclosure. Subjects who

initially perform outstandingly become worse off, while those who initially perform poorly then

improve in the second stage.

Figure 2: Standard Deviation of Correct Rates by Treatment

Page 8: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

8

3.2 Question-based Analysis

We use the ranking of question difficulties based on performance in the No-Information treatment

throughout the analysis below. For example, questions such as “The last name of Confucius is Zi.”

(Question 50) and “The Italian scientist and astronomer Copernicus was burnt to die for his adherence

to the heliocentric theory.” (Question 22) are difficult for subjects, with correct rates below 20%.

Examples of easy questions are “The Pyrenees is a natural border between Spain and France.”

(Question 20) and “The father of Dayu, who is the hero of controlling flood in the history of China,

was called Gu.” (Question 23), for which the correct rates are above 80%.

Given that there are no significant differences in the correct rates across treatments in the first

stage, we focus our comparisons on the second stage, and can attribute the outcome differences to the

information disclosure process. In the analyses below, we present results from the raw data and use

both distributional and means tests to examine the statistical differences.

3.2.1 Treatment Effects of Information

We first compare the three information provision treatments with the baseline treatment to test

for the impact of information provision.

Recall that in the Low-Information treatment, for each question, subjects have access to the

information about the percentage of subjects in the No Information treatment who agree or disagree

with the statement. Figure 3 (upper left panels) depict the gap in correct rates of each question in the

Low-Information treatment and the No-Information treatment (upper sub-panel). Subjects in the Low-

Information treatment perform better on half of questions and worse on the other half. Since we can

observe what subjects’ original answers in the first stage before information disclosure, we also plot

the change in correct rates within the Low-Information treatment (lower sub-panel). It turns out to

share a similar pattern with the gap, which reinforces that there are no systematic differences among

participants in the two treatments.

Page 9: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

9

Figure 3: Comparisons between the Information Treatments and the No-Information

Treatment

In the Moderate-Information treatment, for each question, subjects obtain the information not

only about the actual choice distribution of answers in the No-Information treatment, but also about

the average confidence levels of those who agree and disagree with the statement, respectively. It is

clear from Figure 3 (upper right panel) that subjects in the Moderate-Information treatment perform

better on most of the easy questions and partially better on difficult questions compared with those in

the No-Information treatment. Also, they improve their second-stage performance within the

Moderate-Information treatment (lower sub-panel), mainly on the relatively straightforward questions,

which indicates that information about other participants’ confidence levels helps subjects make more

frequent correct judgements.

In the Full-Information treatment, the information revealed in the second stage includes the actual

choice distribution of answers, the average confidence of those who agree or disagree with the

statement, as well as the average estimate of the percent agreement with own answer, and average

estimate of the mean confidence from the No-Information treatment. Once again, subjects in the Full-

Information treatment perform worse on difficult questions and better on easy questions compared

with those in the No-Information treatment as well as compared to the first stage within treatment

(Figure 3, bottom panel).

As for the general comparison, the statistical tests indicate significant differences in final correct

Page 10: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

10

rates between the No-Information and Moderate- (Wilcoxon matched-pairs signed-ranks test, p=0.002;

one sided t-test, p=0.001) or Full-Information treatments (Wilcoxon matched-pairs signed-ranks test,

p=0.009; one sided t-test, p=0.004), yet no evidence is found for the comparison between the Low-

Information treatment and the No-Information treatment (Wilcoxon matched-pairs signed-ranks test,

p=0.333; two sided t-test, p=0.359).

To summarize the observed treatment effects of information:

Result 1: Information at all levels (LI, MI, FI) improves performance on easy questions but does not

necessarily improve performance on difficult questions. Consequently, information provided about

others’ choices and beliefs creates a more disperse distribution of correct rates across questions.

3.2.2 Comparisons Across Information Conditions

The previous subsection discussed the general treatment effects of information compared to the

No-Information control. Here, we conduct comparisons across the three different information

treatments to test the marginal impacts of increasing information content.

When comparing the second-stage performance of subjects between the Low-Information

treatment and the Moderate-Information treatment, Figure 4 (upper left panel) shows that providing

information about subjects’ confidence levels improves the correct rates of difficult questions. The

distribution test demonstrates a significant difference between the two treatments (Wilcoxon matched-

pairs signed-ranks test, p=0.025). Meanwhile, the means test also verifies that subjects in the

Moderate-Information treatment perform better than those in the Low-Information treatment on

average (one-sided t-test, p=0.005).

Also illustrated in Figure 4 (upper right panel), subjects in the Full-Information treatment

generally perform worse than those in the Moderate-Information treatment (Wilcoxon matched-pairs

signed-ranks test, p=0.105; one sided t-test, p=0.044). One possibility is that the additional information

about former participants’ estimates and beliefs confuses the subjects in the Full-Information treatment.

The correct rates of difficult questions are especially negatively affected. Overall, more information

seems to be counterproductive at this level, in terms of improving correct answers.

Although the positive influence of the information about subjects’ confidence levels is partially

offset by that of subjects’ estimates, subjects in the Full-Information treatment outperform those in the

Low-Information treatment overall (Figure 4, bottom panel). There is a significant distributional

difference between the two treatments at the 10% level (Wilcoxon matched-pairs signed-ranks test,

p=0.073). Meanwhile, the means test shows that subjects in the Full-Information treatment perform

significantly better than those in the Low-Information treatment (one-sided t-test, p=0.021). The final

correct rates of difficult questions are relatively higher in the Full-Information treatment, while there

is little difference in terms of the easy questions (Figure 4, bottom panel).

Thus, we further observe that in terms of marginal treatment effects of information:

Result 2: The Moderate-Information condition yields the best performance out of the three information

treatments, mainly due to improvement on difficult questions. Though the Full- Information yields

lower overall performance than the Moderate- information treatment, it still generates higher

performance on difficult questions than Low- Information treatment, compensating for its negative

Page 11: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

11

impact on some of the easy questions.

Figure 4: Comparisons between Information Treatments

3.3 Subject-based Analysis

While the previous subsections focused on question-based effects of information provided,

another relevant angle to examine is the effects on individual subjects’ performance. In this section,

we analyze the performances across the treatments in terms of individual subject initial performance,

rather than by question difficulty. In particular, such analysis can inform us about the degree to which

originally well-performing or poorly-performing individuals are assisted by the various information

treatments.

3.3.1 Treatment effects

Firstly, the Wilcoxon rank-sum test shows no significant difference in subjects’ performance

between the No-Information treatment and the Low-Information treatment (p=0.427). The two-sided

Welch t-test for unpaired data also demonstrates that the mean of subjects’ correct rates in the No-

Information treatment is not significantly different from that in the Low-Information treatment

(p=0.404).

Compared with those in the No-Information treatment, subjects in the Moderate-Information

treatment appear to perform better on the whole. The distribution test shows that there exists a

significant difference in correct rates between the two treatments (Wilcoxon rank-sum test, p=0.001).

Besides, the mean of correct rate in the No-Information treatment is significantly lower than that in

the Moderate-Information treatment (one-sided Welch t-test, p<0.001), which is consistent with the

question-based results. We also observe that information mainly helps improve the performance for

the lower end of the performance distribution, i.e., the subjects whose original correct rates are

Page 12: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

12

relatively low, rather than for the higher end. This is intuitive since subjects in the higher end of the

distribution have higher knowledgeability, and therefore the choices of others are limited in terms of

the help provided. However, subjects from the lower end of the initial performance distribution can

more likely benefit from the wisdom of crowds, by inferring more correct answers from other

respondents’ choices.

Similar to the results from the question-based comparison between the No-Information treatment

and the Full-Information treatment, subjects in the Full-Information treatment perform significantly

better than those in the No-Information treatment (Wilcoxon rank-sum test, p=0.038; one-sided Welch

t-test, p=0.016). Meanwhile, the additional information still mainly improves the performance of

subjects from the lower end of the performance distribution, as we will see further in the next section.

Further evidence can be discerned from the Pearson correlation coefficients between the changes

in individual correct rates within each treatment (second stage - first stage) (Low-Information: corr =

-0.288, p = 0.109; Moderate-Information: corr = -0.562, p = 0.001; Full-Information: corr = -0.483, p

= 0.005). Thus, we can observe that Moderate or Full-Information significantly improves performance

of subjects with relatively low knowledge levels.

We also examine the Pearson correlation coefficients between the changes in correct rates and

subjects’ own average confidence, finding that performance improvements concentrate among less

confident subjects (Low-Information: corr = -0.368, p = 0.038; Moderate-Information: corr = -0.367,

p = 0.039; Full-Information: corr = 0.158, p = 0.387).3

Our main results in this section can be summarized as follows:

Result 3: Performance improves from first to second stage in the Moderate and Full-Information

treatments, at an individual subject level. The individual performance improvements are negatively

correlated with initial performance and initial confidence, indicating that the performance gains are

primarily concentrated among subjects with low initial knowledge and confidence levels.

3.3.2 Comparisons Across Information Conditions

In Figure 5, subjects are ordered by their correct rates in the first stage from low to high (left to

right) in each information treatment. Their performance is normalized by subtracting the mean of

correct rate of the baseline treatment, NI (0.482).

Information about the average confidence level lends a substantial advantage to the less

knowledgeable subjects in the Moderate-Information treatment, whereas subjects in the Low-

Information treatment did not seem to benefit from the additional information in their treatment.

Between the two treatments, there are significantly different distributions (Wilcoxon rank-sum test,

p=0.001) and mean values (one-sided Welch t-test, p<0.001).

Although the additional information in the Full-Information treatment about other respondents’

estimates and beliefs does not raise the correct rates at the question-level compared with the Moderate-

Information treatment based on the previous analyses, it does contribute to a significant change in the

3 Note that throughout the paper, we define difficult questions and knowledgeable individuals based on the correct rates

rather than the self-reported confidence level. We could also take confidence level as an ex-ante measure of question

difficulty and knowledge. The choice between the two measures depends on our focus: questions are objectively difficult

(and subjects are objectively knowledgeable), or the subjects think the questions are difficult (subjects regard themselves

as knowledgeable). Since we are interested in the wisdom of crowds as an information mechanism, we focus on the former.

Page 13: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

13

distribution of individual subjects’ correct rates (Wilcoxon rank-sum test, p=0.086). Specifically, the

average correct rate is lower (one-sided Welch t-test, p=0.045) and the variance of correct rates is larger

in the Full-Information treatment, as shown in Figure 2 (bottom panels).

As for the subject-level comparison between the Low-Information treatment and the Full-

Information treatment, the distributional test (Wilcoxon rank-sum test, p=0.161) does not indicate a

significant difference, while the latter treatment has a higher average correct rate (one-sided Welch t-

test, p=0.051). From Figure 5, the subjects with moderate knowledge levels are noticeably improved

in the Full-Information treatment, which is not found in the Low-Information treatment. This indicates

that the additional information about others’ confidence and estimates in the Full-Information

treatment is still beneficial compared to the basic information provided in the Low-Information

treatment.

Figure 5: Relative Correct Rate for Each Subject in Three Information Treatments

Note: Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.

These observations lead us to the following result regarding the marginal effects of information

levels on subjects with heterogeneous initial knowledge levels:

Page 14: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

14

Result 4: Subjects with relatively low initial knowledge levels benefit most from the moderate amount

(MI) of information.

To more directly examine the relationship between the effects of different levels of information

across different initial knowledge levels, we subdivide the subjects of each information treatment into

three knowledge groups of approximately equal size, based on their original correct rates as follows:

11 individuals in the low knowledge group, 10 individuals in the moderate knowledge group, and 11

individuals in the high knowledge group. The average improvement in correct rates (second stage -

first stage) for the three groups are 0.022, 0.012, and -0.002 respectively in the Low-Information

treatment, 0.116, 0.054, and 0.022 respectively in the Moderate-Information treatment, and 0.075,

0.058 and 0.005 respectively in the Full-Information treatment. Thus, in each information treatment

there is a negative monotonic relationship between initial knowledge and information-driven

improvement, with statistical test results as follows.

For the cross-group comparisons between the Moderate-Information treatment and the Low-

Information treatment, both the distribution test and means test demonstrate that the additional

information about the average confidence level helps the less knowledgeable subjects have

significantly better performance in the second stage relative to the first stage (low knowledge group:

𝑝𝑊𝐶𝑋<0.001, 𝑝𝑇1<0.001; for the moderate knowledge group: 𝑝𝑊𝐶𝑋=0.083, 𝑝𝑇1=0.047; and for the

high knowledge group: 𝑝𝑊𝐶𝑋=0.208, 𝑝𝑇2=0.225), where 𝑝𝑊𝐶𝑋 is the p-value of the Wilcoxon rank-

sum test (two-sided), 𝑝𝑇1 is the p-value of the one-sided Welch t-test, and 𝑝𝑇2 is the p-value of the

two-sided Welch t-test). Yet, no significant treatment effects are detected across the three groups

between the Moderate-Information treatment and the Full-Information treatment (low knowledge

group: 𝑝𝑊𝐶𝑋 =0.155, 𝑝𝑇2 =0.133; moderate knowledge group: 𝑝𝑊𝐶𝑋 =0.879, 𝑝𝑇2 =0.905; high

knowledge group: 𝑝𝑊𝐶𝑋=0.320, 𝑝𝑇2=0.463).

Finally, the subjects with a lower or middle level of knowledge in the Full-Information treatment

experience greater improvement in the second stage on average, compared with their counterparts in

the Low-Information treatment (low knowledge group: 𝑝𝑊𝐶𝑋=0.014, 𝑝𝑇1=0.014; moderate knowledge

group: 𝑝𝑊𝐶𝑋=0.268, 𝑝𝑇1=0.063; high knowledge group: 𝑝𝑊𝐶𝑋=0.921, 𝑝𝑇2=0.753).

3.4 Regression Analysis, Performance

We find further support for the basic results above through an econometric analysis of the link

between the information attributes and the final performance. Logit regressions of the binary outcome

of an individual giving the correct answer in the second stage are provided in Table 2.

Note that in the regressions, the independent variables are transformed from the objectively

displayed information in the experiment, to information framed from the perspective of the subject’s

own answer: consistent with the subject’s answer, and inconsistent with the subject’s answer,

respectively. This is because what subjects presumably care about in the answer revision decision, is

from the perspective of their own original answer. For example, in the Low-Information treatment,

subjects likely pay attention to how many participants have the same answer as their own, rather than

the raw information about the proportion of participants agreeing with the original statement, as

presented on the screen. For each subject, we define his SameGroup as being composed of those who

Page 15: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

15

agree with the subject’s answer to the corresponding question in the No-Information treatment, and a

subject’s OppoGroup as being composed of those who disagree with the subject’s answer in the

corresponding question from the No-Information treatment.

Table 2 documents the regression results of information treatments and other relevant variables

on final performance. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame

denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average

confidence in “own answer” of all participants, RealSame denotes the proportion of subjects having

the same choice as their own. Subjects’ confidence levels, knowledge levels and the ease of questions

are consistently positively associated with final performance.

Among the information treatments, the regressions help confirm the prior observation that

performance is the best in the Moderate-Information treatment (note Columns 2 and 5, differing by

inclusion of fixed effects). The negative coefficients on the interaction terms between the actual choice

distribution (RealSame) and information treatment dummies indicate that the negative relationship

between the percent agreement with own answer (RealSame) and the accuracy of answers, is

strengthened by information provision (Columns 3 and 6, differing by inclusion of fixed effects), the

rationale being that other factors equal, a larger proportion of participants having the same view as the

decision-maker boosts their confidence in the original answer. However, examining the coefficient

magnitudes and significance levels, such a relationship appears to weaken with greater levels of

information provided.

Table 2: Logit Regression: Dependent Variable: Correct Answer in 2nd Stage

Dependent variable:

I(NewTorF)

Marginal effects

(1) (2) (3) (4) (5) (6)

Cfdc 0.237*** 0.234*** 0.263*** 0.300*** 0.300*** 0.347***

(0.044) (0.044) (0.083) (0.050) (0.050) (0.098)

EstiSame -0.215*** -0.226*** -0.176* -0.238*** -0.238*** -0.246**

(0.047) (0.047) (0.095) (0.051) (0.051) (0.116)

EstiCfdc -0.051 -0.047 -0.043 -0.117 -0.117 -0.087

(0.066) (0.066) (0.126) (0.075) (0.075) (0.160)

RealSame -0.147*** -0.142*** 0.001 -0.181*** -0.181*** -0.060

(0.035) (0.035) (0.086) (0.043) (0.043) (0.094)

LI 0.016 0.297*** 0.037 0.267**

(0.014) (0.096) (0.084) (0.131)

MI 0.071*** 0.247** 0.164** 0.310**

(0.015) (0.099) (0.075) (0.128)

FI 0.044*** 0.135 0.131* 0.238*

(0.015) (0.096) (0.077) (0.128)

Cfdc×LI 0.120 0.086

(0.121) (0.138)

Cfdc×MI -0.175 -0.206

(0.125) (0.140)

Page 16: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

16

Cfdc×FI -0.043 -0.077

(0.120) (0.135)

EstiSame×LI 0.009 0.086

(0.129) (0.149)

EstiSame×MI -0.199 -0.108

(0.142) (0.157)

EstiSame×FI -0.043 0.028

(0.128) (0.147)

EstiCfdc×LI -0.383** -0.373*

(0.179) (0.212)

EstiCfdc×MI 0.269 0.231

(0.192) (0.218)

EstiCfdc×FI 0.101 0.040

(0.182) (0.211)

RealSame×LI -0.191* -0.174*

(0.106) (0.103)

RealSame×MI -0.187* -0.135

(0.109) (0.106)

RealSame×FI -0.183* -0.161

(0.107) (0.104)

Knowledge 0.761*** 0.738*** 0.757***

(0.055) (0.056) (0.056)

Easiness 1.127*** 1.128*** 1.129***

(0.015) (0.015) (0.015)

Individual fixed effects N N N Y Y Y

Question fixed effects N N N

Y Y Y

Observations 6,400 6,400 6,400 6,400 6,400 6,400

Pseudo R2 0.269 0.272 0.275 0.311 0.311 0.313

Notes: Logit estimation of the effect of information provision on the final answer accuracy. Dependent variable is whether a subject

correctly answered a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame

denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all

participants, RealSame denotes the proportion of subjects having the same choice as their own, LI, MI and FI are dummy variables for

the Low-Information, Moderate-Information and Full-Information treatment respectively, Knowledge and Easiness refer to the aggregate

correct rate of each subject in the first stage and the aggregate correct rate of each question in the No-Information treatment respectively.

Coefficients displayed are marginal effects. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.

3.5 Summary of Information Content

Before investigating the underlying mechanisms for the observed results via subjects’ decision

heuristics, we first provide a comprehensive understanding of some features of the information

generated from the baseline (No-Information) treatment, which is the information that subjects in the

information treatments base their second round decisions upon.

Subjects’ real confidence levels are notably significantly higher than their estimates of the mean

Page 17: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

17

confidence level (Wilcoxon matched-pairs signed-ranks test, p<0.001), which implies that individuals

are typically more confident than they believe others are. In addition, the average actual confidences

in some difficult questions are even much higher than those in the easy questions. For example, all

subjects in the No-Information treatment choose “False” for the question “The last name of Confucius

is Zi.” with the average confidence level as high as 90.13%, which is actually a true statement. 4 Such

questions are not as simple as they seem, such that subjects are often not knowledgeable about the

underlying difficulty level.

Since our design provides separate information for the respondents agreeing and disagreeing with

the statement, we analyze the Right group composed of the subjects who answer the question correctly,

and the Wrong group composed of the subjects who answer the question incorrectly.

For all questions, the average confidence level of the Right group is 74.01%, which is 2.07%

higher than that of the Wrong group (71.94%) (Wilcoxon matched-pairs signed-ranks test, p=0.013).

However, for some difficult questions such as (translated from Chinese) “The first woman to proclaim

herself emperor in the history of China is Zetian Wu.” (Question 47), “Severe winter refers to the

twelfth month of the lunar calendar.” (Question 46) and “The Italian scientist and astronomer

Copernicus was burned to die for his adherence to heliocentric theory.” (Question 22), the Wrong group

are more confident than the Right group.

The pattern for the average estimated mean confidence is similar to that for average real

confidence, however the difference between the two groups’ estimates is not statistically significant

(Wilcoxon matched-pairs signed-ranks test, p=0.201). As a consequence, subjects in the Right group

are more likely to exhibit overconfidence in their answers, (i.e., the average real confidence level is

higher than the average estimated confidence level, Wilcoxon matched-pairs signed-ranks test,

p<0.001), yet the phenomenon is not significant in the Wrong group (Wilcoxon matched-pairs signed-

ranks test, p=0.141).

We also examine the overconfidence phenomenon at the individual level. We provide an overall

measure for each subject by defining the overconfidence degree as the percentage deviation of his

average self-reported confidence from his original correct rate ((average confidence- original correct

rate)/ original correct rate). The overconfidence degree ranges from 0.02 to 1.35 in the No-Information

treatment, and those who exhibit severe overconfidence are typically less knowledgeable than the

average. While adopting this measurement, the overconfidence degree of the Right group is lower than

that of the Wrong group (Wilcoxon matched-pairs signed-ranks test, p<0.001).5 Therefore, the subjects

in the Right group appear more overconfident than those in the Wrong group if we compare their self-

reported confidences with their beliefs on the average confidence level of others, while they appear

less overconfident if we compare their self-reported confidences with their actual performance.

In addition to the confidence level, we also collect each subject’s estimate of the proportion of

participants who have the same choice as their own. Subjects tend to overestimate the proportion of

those having the same choice as their own (Wilcoxon matched-pairs signed-ranks test, p<0.001 in both

the Right and Wrong group). The average for each question is above 50%, and there are very few cases

4 We do not have the relevant information about the average confidence, as well as the average estimates and beliefs of

those who agree with the statement from the No-information treatment for this particular question. 5 This result still holds if we define the overconfidence degree as the difference between a subject’s average self-reported

confidence and correct rate.

Page 18: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

18

in the raw data in which a subject predicts the percent agreement with his own answer for a given

question among other subjects is below 50%. In other words, nearly every subject believes that the

majority of other respondents choose similarly to their own answer, and such consensus bias (Marks

and Miller, 1987; Brosig et al., 2003) also prevails in our information treatments.

Result 5. Subjects tend to be overconfident about their self-reported correct rate compared to their

actual performance. Furthermore, they tend to overestimate the proportion of participants having the

same choice as their own.

3.6 Investigation of Decision Heuristics

We now evaluate three simple heuristics in the context of our experiment: the Majority Rule (MR),

the Maximum Confidence rule (MC), and the Surprising Popularity rule (SP) (Prelec et al., 2017).

We consider an answer as in the “majority” (MR) if at least 50% of respondents in the No-

Information treatment selected it as their response. We categorize an answer as consistent with

Maximum Confidence (MC) if the average confidence level of subjects agreeing with it is not lower

than that of subjects disagreeing with it; Finally, an answer satisfies the Surprising Popularity (SP) rule

if the actual proportion of subjects choosing it is not lower than the average predicted proportion.6

The Surprising Popularity rule has the potential to produce the best answer under certain

behavioral assumptions, as found in Prelec et al. (2017). As an illustration, in the classic Philadelphia

question (Prelec et al., 2017), most respondents mistakenly regard Philadelphia to be the capital of

Pennsylvania. The confidences associated with ‘yes’ and ‘no’ votes are roughly similar, which results

in the failure of both the Majority Rule and the Maximum Confidence rule. However, respondents

voting ‘yes’ believe that most others will agree with them, while respondents voting the correct answer

‘no’ expect that few people possess such specialized knowledge as them, so that the average predicted

percentage of ‘no’ votes falls short of the actual percentage of ‘no’ votes. Therefore, the answer ‘no’

turns out to be surprisingly popular, while it is in fact the right answer.

Ideally, as shown in Prelec et al. (2017), as long as the subjects have enough evidence to determine

the correct answer, the Surprising Popularity rule is superior to the Maximum Confidence rule and

Majority Rule, since it can extract more information from the available evidence. However, this

heuristic is in practice not as successful in our experiment, because empirically, in our No-Information

treatment, subjects choosing “True” or “False” always expect to be in the majority among the

respondents, a pattern which is also encountered in the experiments of Baillon et al. (2020), which

makes the Surprising Popularity rule incapable of overriding an incorrect majority answer.

Figure 6 compares the accuracies of these three decision rules (MR, MC, SP) in our experiment.

On the vertical axis, a value of 1 (solid bar) indicates that rule gives the true answer, while a value of

0 indicates that the rule gives the incorrect answer. Majority Rule yields an overall correct rate of 52%,

which is by definition, fully concentrated on the easier questions, demonstrating the serious limitations

of this heuristic. The distribution of accuracies for the SP rule is quite similar in our experiment, with

a correct rate of 50%, and mostly concentrated on the easier questions. This does not necessarily

6 Note that the criteria “not lower than” is never binding when we define the Maximum Confidence answers and

Surprisingly Popular answers. There is only one statement (Question 1) for which the percentage of respondents agreeing

is exactly 50%, thus considering equal outcomes as satisfying majority rule does not affect the results.

Page 19: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

19

invalidate the heuristic, but indicates that it requires the proper combination of the correct answer being

popular in practice but not in terms of beliefs over others’ answers, which apparently is not met among

our subjects and questions. Compared to the two other rules, the Maximum Confidence rule is more

uniformly distributed across the question difficulty levels. It yields an overall correct rate of 68%,

which far exceeds 98.44% (126/128) of subjects’ original performances in stage 1 of the experiment.

Fig. 6 Accuracies of Three Decision Rules

Questions ordered from difficult to easy (left to right)

The overlap of the answers provided by the three rules can also be inferred from Figure 6. The

number of questions which yield different answers between the Maximum Confidence rule and the

Majority Rule (Surprising Popularity rule) amounts to 20 (23), but the Majority Rule is difficult to

distinguish from the Surprising Popularity rule since their predictions coincide for 45 out of the 50

questions. To investigate which of the decision rules is favored most by subjects, we limit our attention

to the questions which yield different answers based on the pairwise set of rules under consideration,

and compare the rate of coincidence in subjects’ final answers to the answers to the questions using

the given rule.

We begin with the comparison between the Maximum Confidence rule and the Majority Rule,

which are potential competing approaches in the Moderate-Information and Full-Information

treatments. There are 16 (24 in FI treatment) out of 32 subjects whose answers coincide with Majority

Rule more often than coinciding with the Maximum Confidence, and 12 (6 in FI treatment) out of 32

subjects whose answers have the reverse feature favoring the Maximum Confidence rule, in the

Moderate-Information (Full-Information) treatment, which indicates that the majority rule is generally

more favored. This pattern is robust to a greater threshold gap between the rules, for example, a

“favored” rule being defined as used over 20% more often by a subject. Using this criterion, there are

8 (22 in FI treatment) subjects whose answers coincide with the Majority Rule over 20% more often,

and only 4 (4 in FI treatment) subjects having the opposite pattern in the Moderate-Information (Full-

Information) treatment.

Comparing the Maximum Confidence rule to the Surprising Popularity rule, which are possible

approaches in the Full-Information treatment, we find that more subjects favor the Surprising

Page 20: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

20

Popularity rule (22 subjects) over the Maximum Confidence rule for the 23 eligible questions studied,

with 10 subjects having the reverse pattern. This is consistent with the earlier mentioned finding

comparing Majority Rule to Maximum Confidence, given the high similarity between Majority Rule

and Surprising Popularity in our setting.

The opportunity to distinguish between the Majority Rule and Surprising Popularity rule is limited,

due to the high coincidence of predictions between the two rules in our setting. As for the particular

five questions where the Majority Rule and the Surprising Popularity rule do imply different answers,

81.25% of the subjects choose answers consistently with Majority Rule more often. Thus, Majority

Rule appears to be the most favored heuristic behaviorally.

Table 3: Decision rules and question difficulty, as measured by confidence

Easy questions (self- reported

confidence≥75%)

Difficult questions (self-reported

confidence<75%)

Decision rules Treatment % of subjects

favoring the

former rule

% of subjects

favoring the

latter rule

% of subjects

favoring the

former rule

% of subjects

favoring the

latter rule

Majority rule vs

Max.

confidence rule

MI 53.125% 31.25% 43.75% 43.75%

FI 84.375% 15.625% 64.516% 25%

SP rule vs Max.

confidence rule

FI 68.75% 21.875% 62.5% 28.125%

Majority rule vs

SP rule

FI 70% 10% 60% 20%

To further explore whether the subjects’ reliance on a certain rule is mediated by the perceived

difficulty of questions, we examine subsamples and use self-reported confidence levels as a measure

of the question’s perceived difficulty. We set the benchmark that a question is thought to be relatively

difficult for the subject if his self-reported confidence is strictly below 75%, which is the middle value

between 50% and 100%, and also close to the mean confidence level of all the subjects (74%), while

a question is defined to be easy otherwise. The results show that no matter whether questions are

perceived as difficult or easy, when the suggested answers by any two of the three rules differ from

one another, the Majority Rule is always attractive. However, subjects are comparatively more likely

to favor the Maximum Confidence rule over the other rules for difficult questions than for easy

questions, as shown in Table 3.

To summarize the results of this section on subjects’ use of decision heuristics:

Result 6: Decision-makers’ answers are most frequently consistent with Majority Rule, where

discrepancies between the three rules exist, even though following the Maximum Confidence rule can

yield better performance.

As for the intuition behind the above result, one possible explanation is the cognitive burden

associated with the different heuristics. Determining the answer through Majority Rule is relatively

simple, while consider other subjects’ confidence levels and utilizing Maximum Confidence could

require more cognitive processing. In particular, since many questions have high mean confidences

Page 21: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

21

reported, subjects may doubt others’ reported confidence levels or suspect overconfidence of others,

leading to discounting of this potentially valuable information.

4. Analysis of Answer Changes

The previous section analyzed how performance varies with different amounts of information

provision. To gain further insight on the effects of different information levels, we further analyze

subjects’ answer changes in stage 2 at the individual-level. We first present results from the raw data

and then construct regression models that control for variables that could potentially affect the answer

change decisions.

We define a subject’s revision rate as the frequency with which they revise their answers across

the 50 questions. The overall likelihoods of revision are 18.81% in the Moderate-Information treatment,

and 18.19% in the Full-Information treatment, both of which are significantly higher than in the Low-

Information treatment’s revision percentage of 12.94% (Wilcoxon rank-sum test, LI vs MI: p=0.004;

LI vs FI: p=0.005). We cannot reject the null hypothesis that the likelihoods of revision are the same

in the Moderate-Information treatment and the Full-Information treatment (Wilcoxon rank-sum test,

p=0.968). The relatively low revision rate in our data is in line with the egocentric judgment widely

found in the literature (e.g., Chambers & Windschitl, 2004; Yaniv & Milyavsky, 2007; Tump et al.,

2018; Niu et al., 2019), implying that subjects tend to discount public information in favor of their own

original perspective.

In addition, we define the question-based revision rate as the proportion of subjects who choose

to change their answers to a given question from their initial choice, and explore the interaction effect

between question difficulty and information attributes.

4.1 Role of Self-reported Confidence Levels

Each subject’s self-reported confidence level is a reasonable indicator of how certain they are of

the answers they chose in the first stage. It is natural that subjects who are less confident in their answer

are more likely to be influenced by the information provided, whereas more confident subjects are

likely to have stronger information requirements to revise their original answer.

We calculate the revision rates over low confidence answers (self-reported confidence < 75%)

and high confidence answers (self-reported confidence≥75%) separately for each subject and each

question.7 As seen from Figure 7, revision rates over low confidence answers exceed those over high

confidence answers, whether considering the subject level or the question level. In addition, there are

higher revision rates in the Moderate-Information and Full-Information treatments than in the Low-

Information treatment for both high and low confidence answers, as seen by comparing the rows of

Figure 7.

In Figure 8, we distinguish between high and low knowledge subjects. Around half of subjects’

knowledge is fairly well-matched with their confidence. The average revision rate is generally higher

among subjects with less knowledge. When the information amount is low or moderate, the high

7 Note that among the three information treatments, there are only two subjects who never modify their answers. Besides, all the subjects

encounter both the questions they have high confidence in and the questions they have low confidence in, except that one subject in the

Low-Information treatment always reports a confidence level below 75%. .

Page 22: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

22

confidence subjects (average self-reported confidence≥75%) revise answers less frequently than the

low confidence ones, conditional on having low knowledge levels (original correct rate<50%). By

contrast, high knowledge subjects are more likely to revise answers conditional on having high

knowledge levels (original correct rate≥50%).

Figure 7: Revision Rate and Self-reported Confidence Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number

is ordered by question difficulty levels in the No-Information treatment from difficult to easy.

Figure 8: Revision Rate and Consistency between Confidences and Performance

4.2 Influence of Other Respondents’ Choices

Comparing the percentage of agreement for each question across treatments, for the statements

for which over half of subjects in the No-Information treatment agree, the average rate of agreement

is 84.00% in the Low-Information treatment in the second stage (71.40% in the first stage), 72.92% in

the Moderate-Information treatment in the second stage (67.42% in the first stage), and 76.99% in the

Full-Information treatment in the second stage (67.05% in the first stage). Furthermore, the average

rates of agreement in the second stage of the three information treatments are always higher than that

in the No-Information treatment (70.83%), and that in the corresponding first stage, demonstrating a

consensus building effect of information.

Page 23: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

23

Similarly, for the statements for which over half of subjects in the No-Information treatment

disagree, the average rate of agreement falls to only 28.13% in the Low-Information treatment in the

second stage (39.06% in the first stage), 25.39% in the Moderate-Information treatment in the second

stage (38.09% in the first stage), and 22.85% in the Full-Information treatment in the second stage

(37.70% in the first stage), which is lower than that in the No-Information treatment (35.74%). Thus,

when the majority in the No-Information treatment agree or disagree with some statements, subjects

in the information treatments tend to gravitate towards the majority’s judgement, further strengthening

it.

Examining subjects’ responses to information regarding the actual choice distribution of answers,

it is rare for subjects to change answers that are consistent with the majority, especially in the Low-

Information treatment. Revisions primarily concentrate on the answers that are not widely consistent

with others’ (see Figure 9). Given that subjects also tend to expect themselves to be in the majority,

many of them revise their answers in the second stage once learning that they are in the minority,

consistently with the Majority Rule. Conditional on having an initial answer against the majority, the

revision rates on the easiest and most difficult questions far exceed the revisions on moderate-level

questions, indicating that the strength of a majority answer also matters.

Figure 9: Revision Rates and Majority Answer Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number

is ordered by question difficulty levels in the No-Information treatment from difficult to easy.

4.3 Influence of Other Respondents’ Confidence Levels

For subjects in the Moderate-Information treatment, a straightforward way to utilize the

information provided about average confidence levels is to compare the average confidence of those

who agree and disagree with the statement, and favor the answer with higher average confidence, i.e.,

the Maximum Confidence rule.

As shown in Figure 10, while having an answer consistent with the majority is clearly influential,

the revision rate also depends on the average confidence level. Subjects seldom revise their answers

Page 24: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

24

which are simultaneously in line with both a majority answer and maximum confidence, while being

most likely to revise answers that are in the minority and minimum confidence category.8

Figure 10: Revision Rate in the Moderate-Information and Full-Information Treatment Notes: Subject number is ordered by subjects’ correct rates in the first stage in each treatment from low to high; Question number

is ordered by question difficulty levels in the No-Information treatment from difficult to easy.

The patterns of the Full-Information treatment differ slightly, in that subjects tend to put relatively

more weight on the majority answer. The revision rates of subjects whose answers are in the minority

but with maximum confidence are much higher than those of subjects whose answers are in the

majority but with minimum confidence, when compared to the Moderate-Information treatment. One

possible explanation is that some subjects in the Full-Information treatment might have difficulty

processing all the information provided leading them to focus on the information about the majority

answer.

From the above results on determinants of answer revision in the second stage, we can summarize:

Result 7: Although subjects’ own confidence level matters, subjects are significantly influenced by

the answers and confidences of others in their answer revision choices.

8 In fact, 60% of the majority answers are associated with higher average confidence.

Page 25: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

25

4.4 Regression Analysis, Revision Decisions

Now we have established some basic findings regarding under what conditions individuals

change their judgments against differing degrees of persuasive information. To test whether the results

in the raw data are robust to conditioning on control variables, we conduct regression analyses.

Specifically, we regress the binary variable of a subject revising his answer to a question in the second

stage, on the provided information and treatment variables.

The regression results reported in Table 4 show that subjects’ own confidence levels (Cfdc) and

the actual percentage of participants having the same answer as their own (RealSame) show

consistently negative effects on revision decision at the 1% level in the information treatments. If we

measure the difficulty levels of questions (Easiness) by the corresponding correct rates in the No-

Information treatment, and measure the knowledge levels of subjects (Knowledge) by their initial

correct rates in the first stage, we find that knowledgeable subjects are less affected by the information

provision - yet we do not find strong evidence that the tendency to revise answers differs by the

difficulty level of questions. The revision decisions in the Moderate-Information and Full-Information

treatments are less affected by the actual choice distribution of answers compared to the Low-

Information treatment, regardless of whether controlling for individual and question fixed effects.

These results are largely consistent with our basic findings described in the previous section.

Table 4: Logit Regression of Revision Decisions

Dependent variable:

I(Revision)

Marginal effects

(1) (2) (3) (4) (5) (6)

Cfdc -0.319*** -0.324*** -0.485*** -0.333*** -0.333*** -0.448***

(0.043) (0.043) (0.088) (0.046) (0.046) (0.103)

EstiSame 0.043 0.039 -0.130* 0.075 0.075 0.041

(0.041) (0.041) (0.076) (0.046) (0.046) (0.093)

EstiCfdc 0.115* 0.102 0.406*** -0.050 -0.050 0.017

(0.063) (0.064) (0.123) (0.070) (0.070) (0.147)

RealSame -0.759*** -0.753*** -1.176*** -0.719*** -0.719*** -1.146***

(0.022) (0.023) (0.078) (0.024) (0.024) (0.081)

MI 0.066*** -0.228*** -0.041 -0.420***

(0.012) (0.081) (0.061) (0.115)

FI 0.050*** -0.138 -0.067 -0.278**

(0.011) (0.084) (0.052) (0.113)

Cfdc×MI 0.192* 0.125

(0.115) (0.129)

Cfdc×FI 0.183 0.133

(0.112) (0.125)

EstiSame×MI 0.229** 0.048

(0.102) (0.120)

EstiSame×FI 0.217** 0.036

(0.103) (0.119)

EstiCfdc×MI -0.393** -0.031

Page 26: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

26

(0.160) (0.186)

EstiCfdc×FI -0.364** -0.099

(0.164) (0.183)

RealSame×MI 0.657*** 0.649***

(0.093) (0.094)

RealSame×FI 0.417*** 0.411***

(0.099) (0.097)

Knowledge -0.228*** -0.246*** -0.275***

(0.054) (0.053) (0.055)

Easiness 0.045 0.043 0.041

(0.029) (0.029) (0.028)

Individual fixed effects N N N Y Y Y

Question fixed effects N N N Y Y Y

Observations 4,800 4,800 4,800 4,700 4,700 4,700

Pseudo R2 0.259 0.267 0.286 0.355 0.355 0.372

Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject

revised his answer to a particular question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer,

EstiSame denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own

answer of all participants, RealSame denotes the proportion of subjects having the same choice as their own, MI and FI are dummy

variables for the Moderate-Information treatment and the Full-Information treatment respectively, Knowledge and Easiness refer to the

aggregate correct rate of each subject in the first stage and the aggregate correct rate of each question in the No-Information treatment

respectively. Coefficients displayed are marginal effects. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05,

***p<0.01.

The regressions also demonstrate that subjects’ own beliefs are important as well, which

influences the way subjects use the Surprising Popularity rule. Recall that the Surprising Popularity

algorithm compares the average predicted percentage of “True” answers with the actual percentage of

“True” answers. However, subjects have also established their own predictions in the first stage. They

might be surprised at the “False (True)” answer if the actual percentage of the “True (False)” answers

is lower than their own predictions. In other words, it is likely that they put more weight on their own

judgement than the group’s consensus, and would like to revise their answers if the opposite answer is

subjectively surprising to them.

Such a Surprising Popularity rule (which we refer to as an ‘egocentric SP rule’) based on the

private prediction and the actual choice distribution of answers is possible no matter whether in the

Low-Information, Moderate-Information or Full-Information treatment. Since the standard SP rule

produces the same answer as the majority rule in most cases, and we find little evidence for its

application in the Full-Information treatment, we examine the egocentric SP rule in this section as a

variation of the Surprising Popularity rule.

To test whether the three heuristic rules are utilized and which one is the most commonly used in

subjects’ revision decisions, we define three dummy variables at the individual level: IsMajority,

GroupCfdc_IsLower, IsSurprising. To be specific, IsMajority=1 if the subject’s answer in the first

stage turns out to be in the majority, i.e., RealSame ≥ 50%, otherwise IsMajority=0;

GroupCfdc_IsLower=1 if the average confidence level of his SameGroup is strictly lower than that of

Page 27: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

27

his OppoGroup, i.e., SameGroup_Cfdc< OppoGroup_Cfdc, otherwise GroupCfdc_IsLower=0;

IsSurprising=1 if the actual percentage of participants having the same answer as his is strictly lower

than the subject’s own prediction (the opposite answer emerges to be unexpectedly popular), i.e.,

RealSame< EstiSame, otherwise IsSurprising=0.

The Logit regressions for the Low-Information treatment in detail are shown in Table 5. The

coefficients of the confidence variable are again, always significant at conventional levels (p<0.001).

Adding the variable of the actual choice distribution of answers (Column 2) or the majority indicator

(Column 3) absorbs more variation and largely improves the Pseudo R2. However, further adding the

IsSurprising variable and its interaction term yields statistically insignificant coefficients, and leaves

the Pseudo R2 slightly changed (Columns 4-5), thus there is little evidence for the use of the egocentric

SP rule.

Table 5: Logit Regression of Revision Decisions, Low-Information Treatment

Dependent variable:

I(Revision)

Marginal effects

(1) (2) (3) (4) (5)

Cfdc -0.279*** -0.284*** -0.266*** -0.264*** -0.264***

(0.092) (0.073) (0.072) (0.072) (0.072)

EstiSame -0.180* -0.026 -0.052 -0.085 -0.083

(0.094) (0.070) (0.068) (0.073) (0.076)

EstiCfdc -0.019 0.040 0.036 0.047 0.046

(0.138) (0.114) (0.113) (0.110) (0.111)

RealSame -0.932***

(0.111)

IsMajority -0.321*** -0.279*** -0.286***

(0.016) (0.033) (0.087)

IsSurprising 0.057 0.050

(0.042) (0.083)

IsMajority×

IsSurprising

0.009

(0.090)

Individual fixed effects Y Y Y Y Y

Question fixed effects Y Y Y Y Y

Observations 1,488 1,488 1,488 1,488 1,488

Pseudo R2 0.142 0.577 0.569 0.571 0.571

Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject

revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame

denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all

participants, RealSame denotes the proportion of subjects having the same choice as their own, IsMajority and IsSurprising are indicator

variables. Robust standard errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.

In the Moderate-Information treatment, we find that the average confidence of the participants

who share the same answer with the subject (SameGroup_Cfdc) and the average confidence of those

Page 28: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

28

who do not (OppoGroup_Cfdc) are also influential (Column 2 of Table 6). Intuitively, they function in

opposite directions with similar magnitude. A one-standard-deviation (0.099) increase in the average

confidence of those giving the same answer is associated with a 0.125 drop in the probability of

revision, and a one-standard-deviation (0.098) increase in the average confidence of those giving the

opposite answer enhances the probability of answer revision by 0.123. Again, the variables IsMajority

and GroupCfdc_IsLower can explain more of the observed revision variations, and the coefficients of

variables related to IsSurprising are all not significant (Columns 3-6 of Table 6).

In addition, as we have seen in Figure 10, subjects in the Moderate-Information treatment

generally view the Majority answer and Maximum Confidence answer as complements. The related

two approaches are also well-reflected in the marginal impact on revision decisions (Column 5 of Table

6). The odds ratio of revising from the minority answer to the majority answer (or revising from the

minimum confidence answer to maximum confidence answer) is around 12. The estimates of the

interaction terms in Column 6 suggest that the marginal impact of having an answer with lower average

confidence on the rate of revision, if their answer is in the majority, is also significantly larger than if

their answer is in the minority.

Table 6: Logit Regression of Revision Decisions, Moderate-Information Treatment

Dependent variable:

I(Revision)

Marginal effects

(1) (2) (3) (4) (5) (6)

Cfdc -0.479*** -0.301*** -0.445*** -0.444*** -0.348*** -0.361***

(0.107) (0.093) (0.097) (0.097) (0.091) (0.091)

EstiSame -0.069 0.022 0.054 0.043 -0.069 -0.054

(0.104) (0.084) (0.094) (0.099) (0.091) (0.094)

EstiCfdc 0.138 -0.012 0.030 0.030 0.071 0.062

(0.151) (0.134) (0.138) (0.138) (0.129) (0.129)

RealSame -0.656***

(0.054)

SameGroup_Cfdc -1.260***

(0.221)

OppoGroup_Cfdc 1.253***

(0.231)

IsMajority -0.251*** -0.246*** -0.223*** -1.509***

(0.016) (0.022) (0.022) (0.112)

IsSurprising 0.009 0.030 -0.036

(0.030) (0.028) (0.075)

GroupCfdc_IsLower 0.229*** -1.027***

(0.018) (0.111)

IsMajority ×IsSurprising 0.054

(0.066)

IsMajority

×GroupCfdc_IsLower

2.466***

(0.185)

Page 29: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

29

IsSurprising×GroupCfdc

_IsLower

0.025

(0.061)

Individual fixed effects Y Y Y Y Y Y

Question fixed effects Y Y Y Y Y Y

Observations 1,600 1,568 1,600 1,600 1,568 1,568

Pseudo R2 0.139 0.405 0.273 0.273 0.392 0.394

Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject

revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame

denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all

participants, RealSame denotes the proportion of subjects having the same choice as their own, SameGroup_Cfdc denotes the average

confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence of those giving the opposite answer,

IsMajority, IsSurprising and GroupCfdc_IsLower are indicator variables. Coefficients displayed are marginal effects. Robust standard

errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.

In the Full-Information treatment, the information about the answer distributions and the average

confidence of those who hold different opinions (OppoGroup_Cfdc) still has a significant impact on

the revision decisions, while the information about other respondents’ estimates is not significantly

influential (Column 2 of Table 7). One possibility is that the information complexity and limited

attention lead subjects to use simpler heuristics, thus reducing the amount of information considered.

Finally, as we can see from Column 5 of Table 7, subjects mainly rely on their own confidence

levels and the Majority Rule, followed by the Maximum Confidence rule, while the impact of the

egocentric SP rule is economically insignificant. The odds of revising from the minority answer to

majority answer is 40 times that of revising from the majority answer to minority answer, while the

odds of revising from the minimum confidence answer to maximum confidence answer (from the

surprisingly unpopular answer to surprisingly popular answer) is only 14.508 (3.212) times that of

revising from the maximum confidence answer to minimum confidence answer (from the surprisingly

popular answer to surprisingly unpopular answer). Once we add the interaction terms into the model,

the coefficients of variables related to IsSurprising are no longer statistically significant. Therefore,

subjects in the Full-Information treatment seldom employ the egocentric SP rule. When they happen

to be in the majority, they are more likely to rely on confidence information (Column 6 of Table 7).

Page 30: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

30

Table 7 Logit Regression of Revision Decisions, Full-Information Treatment

Dependent variable:

I(Revision)

Marginal effects

(1) (2) (3) (4) (5) (6)

Cfdc -0.432*** -0.303*** -0.313*** -0.312*** -0.310*** -0.311***

(0.093) (0.077) (0.076) (0.077) (0.079) (0.080)

EstiSame -0.003 0.087 0.084 -0.012 -0.051 -0.046

(0.092) (0.078) (0.087) (0.091) (0.082) (0.091)

EstiCfdc -0.172 -0.130 -0.162 -0.146 -0.102 -0.103

(0.147) (0.113) (0.128) (0.128) (0.114) (0.117)

RealSame -0.836***

(0.056)

SameGroup_Cfdc 1.479

(1.108)

OppoGroup_Cfdc 2.535**

(1.114)

SameGroup_EstiSame -0.708

(1.655)

OppoGroup_EstiSame -1.428

(1.632)

SameGroup_EstiCfdc -1.745

(1.699)

OppoGroup_EstiCfdc -0.083

(1.633)

IsMajority -0.313*** -0.262*** -0.294*** -1.268***

(0.013) (0.021) (0.033) (0.134)

IsSurprising 0.089*** 0.093*** 0.116

(0.030) (0.027) (0.139)

GroupCfdc_IsLower 0.213*** -0.694***

(0.031) (0.109)

IsMajority×IsSurprising 0.035

(0.102)

IsMajority×GroupCfdc_

IsLower

1.905***

(0.172)

IsSurprising×GroupCfd

c_IsLower

-0.063

(0.085)

Individual fixed effects Y Y Y Y Y Y

Question fixed effects Y Y Y Y Y Y

Observations 1,550 1,519 1,550 1,550 1,519 1,519

Pseudo R2 0.147 0.469 0.405 0.410 0.477 0.478

Notes: Logit estimation of the effect of information provision on the probability of revision. Dependent variable is whether a subject

revised his answer to a certain question in the second stage. Cfdc denotes subjects’ self-reported confidence in their own answer, EstiSame

Page 31: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

31

denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average confidence in own answer of all

participants, RealSame denotes the proportion of subjects having the same choice as their own, SameGroup_Cfdc denotes the average

confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence of those giving the opposite answer,

SameGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the same answer,

OppoGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the opposite answer,

SameGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the same answer,

OppoGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the opposite answer,

IsMajority, IsSurprising and GroupCfdc_IsLower are indicator variables. Coefficients displayed are marginal effects. Robust standard

errors are displayed in parentheses. *p<0.1, **p<0.05, ***p<0.01.

Our individual-level analysis finds that the Majority Rule is commonly adopted by participants

in their decision regarding whether to change their previous answer or not. When information about

other respondents’ confidence levels is provided as well, individuals take both the Majority Rule and

the Maximum Confidence rule into consideration. However, the additional information about other

respondents’ estimates and beliefs may serve as a distraction, leading them to rely relatively more

heavily on Majority Rule.

5. Initial Performance and Subsequent Revision

Finally, we want to evaluate the more general impact of social information on the revision

decisions and the subsequent changes to performance. Measures such as Type I Type Ⅱ errors are often

used to describe possible errors made in a statistical decision process. Type I error refers to rejecting

a true null hypothesis, while Type II error refers to failure to reject a false null hypothesis. In our

experiment’s context, a Type I error refers to revising a correct answer, and Type Ⅱ error corresponds

not revising an incorrect answer.

Table 8 displays the two types of errors in each information treatment. Although on average Type

I errors in the Low-Information treatment are the lowest, it is accompanied by the highest level of Type

Ⅱ error. Subjects in the Moderate-Information treatment make the lowest frequency of Type Ⅱ errors

and a moderate level of Type I errors, comparatively.

Table 8: Average Type I and Type Ⅱ Errors

LI Revise Not Revise MI Revise Not Revise FI Revise Not Revise

R 12.09% 87.91% R 12.28% 87.72% R 13.92% 86.08%

W 13.76% 86.24% W 25.44% 74.56% W 22.28% 77.72%

Notes: R (W) denotes that the submitted answer in the first stage is right (wrong). Average type I error made by subjects is displayed

in bold font and average type Ⅱ error is displayed in italics.

To further understand how the heuristic rules considered can potentially improve decisions,

subjects are assumed to adhere to a specific rule (Majority Rule, Maximum Confidence rule, standard

SP rule or egocentric SP rule). We calculate the errors for each rule accordingly and compare them

with the actual values.

As shown in Figure 11, if subjects strictly follow the Majority Rule, they would on average revise

about 30% of the correct answers and decline to revise about 60% of the incorrect answers. Errors of

the standard SP rule are close with that of the majority rule since the two rules suggest the same answer

in most cases. The egocentric SP rule has a lower Type Ⅱ error at the expense of doubling the chance

of rejecting a correct answer, which is due to the overestimation of the percentage agreement with own

answer. By contrast, the Maximum Confidence rule is relatively reliable such that mistaken decisions

Page 32: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

32

would occur only moderately if it is employed.

Generally speaking, subjects do not fully exploit the wisdom of crowds via the heuristics examine,

since the actual Type Ⅱ error they make is substantially higher than that of any of the specific heuristics

we have discussed here. In our setting, Type I errors are generally rarer than Type Ⅱ errors, however

the Maximum Confidence rule performs best among the considered heuristics for its success in

maintaining low Type I errors while also minimizing Type II error.

Figure 11: Type I and Type II Errors of Different Heuristics

Notes: Actual denotes the actual value. MR, EgoSP, MC and SP denote the error for the majority rule, egocentric SP rule, maximum

confidence rule and standard SP rule respectively.

Finally, we consider multinomial logit regressions in which subjects are classified by the states

of their answers to a given question in both stages. Table 9 reports the relative risk ratios for the

subjects giving the wrong answer in the first stage but giving the right answer in the second stage

(W&NewR), the subjects giving the right answer in the first stage but giving the wrong answer in the

second stage (R&NewW), the subjects giving the right answer in both stages (R&NewR), with those

who give the wrong answer in both stages (W&NewW) as the comparison group.

The results show that the relative risk of always giving the right answer over always giving the

wrong answer is significantly higher for knowledgeable subjects and easy questions. In addition,

revising an initially incorrect answer in the second stage (W&NewR) is significantly more likely than

not revising (W&NewW) for easy questions. Subjects who submit an answer consistent with a larger

proportion of respondents in the first stage are more likely to have poor overall performance.

Examining the Moderate-Information and Full-Information treatments, the relative risk of always

giving the right answer over always giving the wrong answer is much higher for subjects providing an

answer endorsed with higher average confidence. We also find a pronounced increase in the relative

risk of correcting a wrong answer (W&NewR) if it is refuted by those with high average confidence in

the first stage. Mistakenly revising a right answer (R&NewW) is more likely to occur when the answer

is agreed upon by the group with higher average confidence, or refuted by the group with a higher

average estimate of the average confidence for all participants.

Page 33: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

33

Table 9: Multinomial Logit Regression of Two-stage Performance

Panel A. Models for the Low-Information treatment

LI Relative Risk Ratio

(1a) (2a) (3a)

W&NewR R&NewW R&NewR

Cfdc 0.205 0.007*** 17.062***

(0.284) (0.010) (11.597)

EstiSame 0.283 0.043*** 0.193**

(0.353) (0.046) (0.130)

EstiCfdc 0.766 62.481** 0.050***

(1.546) (103.659) (0.048)

RealSame 2.42e-06*** 1.55e-05*** 0.158**

(2.90e-06) (1.29e-05) (0.114)

Knowledge 88.644** 0.259 756.683***

(171.836) (0.401) (714.509)

Easiness 5.060e+05*** 0.379 8,412.899***

(6.163e+05) (0.395) (4,266.940)

Constant 0.084** 838.613*** 0.004***

(0.100) (1,049.001) (0.003)

Panel B. Models for the Moderate-Information treatment

MI Relative Risk Ratio

(1b) (2b) (3b)

W&NewR R&NewW R&NewR

Cfdc 0.057*** 0.286 3.297*

(0.060) (0.345) (2.171)

EstiSame 0.471 1.420 0.104***

(0.447) (1.560) (0.074)

EstiCfdc 2.876 0.100 1.114

(4.202) (0.159) (1.134)

RealSame 0.007*** 0.0005*** 0.231***

(0.004) (0.0003) (0.106)

SameGroup_Cfdc 3.30e-05*** 0.009** 18,476.853***

(5.20e-05) (0.016) (19,407.700)

OppoGroup_Cfdc 4.957e+05*** 62.187** 1.89e-05***

(7.258e+05) (111.054) (2.13e-05)

Knowledge 0.135* 0.642 78.815***

(0.161) (0.881) (70.493)

Easiness 69.704*** 0.537 756.798***

(41.028) (0.423) (275.620)

Constant 0.962 199.268*** 0.039***

(1.156) (299.243) (0.034)

Panel C. Models for the Full-Information treatment

FI Relative Risk Ratio

(1c) (2c) (3c)

Page 34: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

34

W&NewR R&NewW R&NewR

Cfdc 0.276 0.003*** 2.058

(0.289) (0.004) (1.222)

EstiSame 0.714 1.175 0.234**

(0.766) (1.260) (0.139)

EstiCfdc 0.867 13.399 1.614

(1.411) (21.642) (1.455)

RealSame 0.001*** 0.0003*** 0.449

(0.001) (0.0002) (0.232)

SameGroup_Cfdc 0.024 1.327e+06*** 2.882e+06***

(0.068) (4.126e+06) (5.691e+06)

OppoGroup_Cfdc 8,284.578*** 0.0001** 1.46e-06***

(23,507.025) (0.001) (2.27e-06)

SameGroup_EstiSame 0.008 1.323 0.052

(0.024) (4.326) (0.112)

OppoGroup_EstiSame 1.65e-06*** 0.001* 0.129

(5.20e-06) (0.003) (0.204)

SameGroup_EstiCfdc 0.027 6.04e-10*** 0.001**

(0.130) (3.53e-09) (0.003)

OppoGroup_EstiCfdc 3.65e+07*** 7.24e+08*** 1.06e+05***

(1.99e+08) (4.09e+09) (2.97e+05)

Knowledge 0.384 4.620 173.623***

(0.527) (6.969) (147.432)

Easiness 620.973*** 0.179* 678.707***

(421.727) (0.161) (252.711)

Constant 0.434 233.316** 0.003***

(0.750) (499.690) (0.003)

Notes: Multinomial Logit estimation of the effect of information provision on the two-stage performance. Dependent variable is whether

a subject correctly answered a certain question in the first stage and the second stage. W&NewR denotes that the subject gives the wrong

answer in the first stage but gives the right answer in the second stage, R&NewW denotes that the subject gives the right answer in the

first stage but gives the wrong answer in the second stage, R&NewR denotes that the subject gives the right answer in both stages (without

revision). The comparison group is those who give the wrong answer in both stages. Cfdc denotes subjects’ self-reported confidence in

their own answer, EstiSame denotes their estimate of others’ giving the same answer, EstiCfdc denotes their estimate of the average

confidence in own answer of all participants, RealSame denotes the proportion of subjects having the same choice as their own,

SameGroup_Cfdc denotes the average confidence of those giving the same answer, OppoGroup_Cfdc denotes the average confidence

of those giving the opposite answer, SameGroup_EstiSame denotes the average estimate of others’ giving the same answer from those

giving the same answer, OppoGroup_EstiSame denotes the average estimate of others’ giving the same answer from those giving the

opposite answer, SameGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the same

answer, OppoGroup_EstiCfdc denotes the average estimate of the mean confidence in own answer from those giving the opposite answer,

Knowledge and Easiness refer to the aggregate correct rate of each subject in the first stage and the aggregate correct rate of each question

in the No-Information treatment respectively. Coefficients displayed are relative risk ratios. Robust standard errors are displayed in

parentheses. *p<0.1, **p<0.05, ***p<0.01.

Page 35: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

35

6. Conclusion and Discussion

There can be wisdom and intelligence in a group, however, the ‘wisdom of crowds’ can also lead

individual decision-makers astray. In particular, collective decisions can become more extreme and

eventually less correct than choices made by individuals alone. The key to understanding the validity

of the wisdom of crowds is to understand how the relevant information about a crowd’s viewpoints

are processed by individuals in their decision-making. Using a laboratory experiment approach, our

study tests whether more information content is necessarily better for collective knowledge, and how

individual heuristics adopted depend on the amount of information about others’ views provided.

Our experiment shows that information creates a more disperse distribution of correct rates at the

question level, as a result of the accuracy improvement on easy questions but deterioration on difficult

questions. Yet, we find that the extent to which “crowd wisdom” helps or harms knowledge is crucially

mediated by the amount of information. Subjects receiving our Moderate- Information treatment

perform better on most easy questions and partially better on difficult questions. On the other hand,

individuals’ correct response rates become more evenly distributed with information provision than

those in the baseline treatment, mainly because the information significantly improves the overall

performance of those with less initial knowledge.

The raw data patterns are reinforced by our regression analysis results: participants’ answer

revision rates are negatively correlated with their self-reported confidences. It is also relatively rare

for subjects to update their answers if their own answer is in the majority. Furthermore, the proportion

of respondents giving the same answer as originally, generally has a negative association with the

subject’s final performance.

Our experiment also allows us to explore how information affects answer revisions. When Full-

Information is provided, participants tend to put more weight on the Majority answer than on the

Maximum Confidence answer compared with those in the Moderate-Information treatment. To

summarize, the actual Type Ⅱ error of subjects’ revision decisions is much higher than any of the

heuristic rules discussed, indicating subjects’ tendencies to adhere to their original choices.

Furthermore, subjects are overall more likely to give the wrong answer in both stages without revision

if the original response is in line with the majority opinion. On the other hand, the relative risk of

giving the right answer in stage two over maintaining a wrong answer, is significantly higher when the

answer is endorsed with higher average confidence by others sharing the same view.

The accuracies of different revision heuristics are determined by the features of the generated

information in our setting. In our data, individuals tend to overestimate the proportion of respondents

having the same choice as their own, expecting themselves to be in the majority. This contributes to

the popularity of the Majority Rule and the failure of the standard Surprising Popularity rule. The

Maximum Confidence rule yields the best performance due to the confidence exhibited by those who

answer the questions correctly compared with their beliefs on the average confidence level of others.

Our analysis shows that despite subjects’ overall favoritism towards the Majority Rule, Maximum

Confidence performs best out of the rules in terms of minimizing Type 1 and Type II errors altogether.

Hence, overall performance could be enhanced if subjects were to adopt this heuristic.

Our findings have some potentially important policy implications with regard to current social

issues, in which access to the views and intensity of views of other members of society are readily

Page 36: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

36

available to individuals through social media and online platforms.

Firstly, our study shows that even when a multitude of information on answers and incentivized

confidence levels of others is available, decision-makers tend to rely heavily on Majority Rule as a

favored heuristic, perhaps due to its simplicity. Providing information about other individuals’

confidence levels in their answers does help to an extent, as exhibited by the best performances arising

from the Moderate-Information treatment, although the marginal information provided in that

treatment (confidence levels) remains under-utilized in subjects’ decisions. That Majority Rule is

heavily utilized even when other available social information presented is helpful to fact-finding,

indicates that the composition of individuals participating in social media or online platforms can be

highly influential, since the proportion of proponents and opponents of a statement tends to sway

public opinion the most.

In addition, our study points to the potential for information about other individuals’ incentivized

confidence reports to improve collective knowledge. While the information on confidence levels in

our experiment was not utilized to its fullest extent, our analysis shows that it could have helped

performance further. This suggests a possible role for incentivized confidence elicitation as a

mechanism in online discussion platforms and social media. Most current mechanisms for evaluating

online statements are still of a majority rule nature, such as users’ ability to rate one another’s

comments. A first policy step could be to introduce a simple method for online users to weight their

self-endorsements of their own statements according to their true confidence levels, while a follow-up

policy might help bring users’ attention to these confidence statements, deflecting users’ tendency to

focus on majority opinions within a platform. Such policies may help to realize the potential of

information about others’ confidence levels in promoting better performance outcomes.

Finally, our study indicates that there is a limit to the amount of information about others’ views

that decision-makers can effectively process. This is exemplified by the reduction in overall

performance in the Full-Information treatment compared to the Moderate-information treatment. In

particular, decision-makers may not know how to utilize the higher-order information provided in the

Full-Information treatment, and could even be confused by it such that on net, it cancels some of the

beneficial effects obtained in the Moderate-Information treatment. Thus, a policy implication is that a

mechanism which seeks to improve collective knowledge need not collect or present users with higher-

order information, since it may be unlikely to be utilized in a beneficial manner.

We believe there are promising directions for future research based on the findings in this study.

One of the interesting findings in our setting is that the Surprising Popularity rule does not deliver the

improved performance hypothesized, mainly due to the lack of Surprisingly Popular answers in our

data. Future research can explore the question-based conditions needed for this and other heuristics to

serve as effective decision-making tools. For example, it may be possible that particular phrasing or

framing of statements may lead to better conditions for realizing the potential of the heuristics

examined.

Another possible direction for future work is to consider other types of social knowledge

information in a High-Information treatment besides the variables we have considered here. Although

Full-Information did not help our subjects on the margin, it is possible that subjects might find other

types of information about others’ viewpoints useful in making better decisions. In addition, while our

current study examined a range of question difficulty levels, and tested for the marginal effects of

Page 37: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

37

information provision across the spectrum of difficulty, we found relatively limited scope for

improvement on very difficult questions, based on the information provided to subjects in this study.

An important future direction could be to study more specifically the types of information that can

help individuals obtain more accurate answers to questions of high difficulty levels.

Page 38: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

38

References:

Aspinall, W., 2010. A route to more tractable expert advice. Nature, 463(7279), 294-295.

Baars, J. A., and Mass, C. F., 2005. Performance of national weather service forecasts compared

to operational, consensus, and weighted model output statistics. Weather and Forecasting, 20, 1034-

1047.

Baillon, A., Tereick, B., and Wang, T.V., 2020. Follow the money, not the majority: Incentivizing

and aggregating expert opinions with Bayesian markets. Working paper.

Bazazi, S., von Zimmermann, J., Bahrami, B., and Richardson, D., 2019. Self-serving incentives

impair collective decisions by increasing conformity. PLoS One, 14(11), e0224725.

Becker, G. M., DeGroot, M. H., and Marschak, J., 1964. Measuring utility by a single-response

sequential method. Behavioral Science, 9, 226-232.

Brosig, J., Weimann, J., and Yang, C-L., 2003. The hot versus cold effect in a simple bargaining

experiment. Experimental Economics, 6, 75-90.

Brown, R., 2000. Group processes. Malden, MA: Blackwell.

Budescu, D. V., and Chen, E., 2014. Identifying expertise to extract the wisdom of crowds.

Management Science, 61(2), 267-280.

Chacoma, A., and Zanette, D. H., 2015. Opinion formation by social influence: From experiments

to modelling. PLoS One, 10(10): e0140406.

Chambers, J. R., and Windschitl, P. D., 2004. Biases in social comparative judgments: The role

of nonmotivated factors in above-average and comparative-optimism effects. Psychological Bulletin,

130, 813-838.

Chen, G., Lien, J. W., and Zheng, J., 2017. The value of the knowledge of the others. Technical

report.

Chen, H., De, P., Hu, Y., and Hwang, B., 2014. Wisdom of crowds: The value of stock opinions

transmitted through social media. Review of Financial Studies, 27(5), 1367-1403.

Chen, X., Hong, F., and Zhao, X., 2020. Concentration and variability of forecasts in artificial

investment games: An online experiment on WeChat. Experimental Economics, 1-33.

Cooke, R., 1991. Experts in uncertainty: Opinion and subjective probability in science. Oxford

University Press, USA.

Fischbacher, U., 2007. Z-tree: Zurich toolbox for ready-made economic experiments.

Experimental Economics, 10(2), 171-178.

Gottschlich, J., and Hinz, O. A., 2014. A decision support system for stock investment

recommendations using collective wisdom. Decision Support System, 59, 52-62.

Jayles, B., Kim, H., Escobedo, R., Cezera, S., Blanchet, A., Kameda, T., Sire, C., and Theraulaz,

G., 2017. How social information can improve estimation accuracy in human groups. Proceedings of

the National Academy of Sciences, 114(47), 12620-12625.

Page 39: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

39

Jayles, B., and Kurvers, R. H. J. M., 2020. Exchanging small amounts of opinions outperforms

sharing aggregated opinions of large crowds. Working paper.

King, A. J., Cheng, L., Starke, S. D., and Myatt, J. P., 2012. Is the true ’wisdom of the crowd’ to

copy successful individuals? Biology Letters, 8(2),197-200.

Koriat, A., 2012. When are two heads better than one and why? Science, 336, 360-362.

Kurvers, R. H. J. M., Herzog, S. M., Hertwig, R., Krause, J., Carney, P. A., Bogart, A.,

Argenziano, G., Zalaudek, I., and Wolf, M., 2016. Boosting medical diagnostics by pooling

independent judgments. Proceedings of the National Academy of Sciences, 113 (31), 8777-8782.

Lee, M. D., Zhang, S., and Shi, J., 2011. The wisdom of the crowd playing the price is right.

Memory and Cognition, 39(5), 914-923.

Lorenz, J., Rauhut, H., Schweitzer, F., and Helbing, D., 2011. How social influence can

undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences, 108(22),

9020-9025.

Marks, G. and Miller, N., 1987. Ten years of research on the false-consensus effect: An empirical

and theoretical review. Psychological Bulletin, 102, 72–90.

Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., Scott, S.E., Moore, D.,

Atanasov, P., Swift, S. A., Murray, T., Stone, E., and Tetlock, P. E., 2014. Psychological strategies for

winning a geopolitical forecasting tournament. Psychological Science, 25, 1106-1115.

Morton, R. B., Piovesan, M., and Tyran, J-R., 2019. The dark side of the vote: Biased voters,

social information, and information aggregation through majority voting. Games and Economic

Behavior, 113, 461-481.

Niu, X., Li, J., Browne, G. J., Li, D., Cao, Q., Liu, X., Wang, G., and Wang, P., 2019. Transcranial

stimulation over right inferior frontal gyrus increases the weight given to private information during

sequential decision-making. Social Cognitive and Affective Neuroscience, 14 (1), 59-71.

Palley, A. B., and Soll, J. B., 2019. Extracting the wisdom of crowds when information is shared.

Management Science, 65, 2291-2309.

Prelec, D., Seung, H. S., and McCoy, J., 2017. A solution to the single-question crowd wisdom

problem. Nature, 541, 532-535.

Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., and Moy, L., 2010.

Learning from crowds. Journal of Machine Learning Research, 11, 1297-1322.

Silva, S., and Correia, L., 2016. An experiment about the impact of social influence on the wisdom

of the crowd effect. Working paper.

Tump, A. N., Wolf, M., Krause, J., and Kurvers, R. H. J. M., 2018. Individuals fail to reap the

collective benefits of diversity because of over-reliance on personal information. Journal of the Royal

Society Interface, 15, 20180155.

Wang, G., Kulkarni, S. R., Poor, H. V., and Osherson, D. N., 2011. Aggregating large sets of

probabilistic forecasts by weighted coherent adjustment. Decision Analysis, 8(2), 128-144.

Wolf, M., Krause, J., Carney, P. A., Bogart, A., and Kurvers, R. H. J. M., 2015. Collective

Page 40: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

40

intelligence meets medical decision-making: The collective outperforms the best radiologist. PLoS

One, 10 (8), e0134269.

Wolfers, J, and Zitzewitz, E., 2004. Prediction markets. Journal of Economic Perspectives, 18,

107-126.

Yaniv, I., and Milyavsky, M., 2007. Using advice from multiple sources to revise and improve

judgments. Organizational Behavior and Human Decision Processes, 103, 104-120.

Yum, H., Lee, B., and Chae, M., 2012. From the wisdom of crowds to my own judgment in

microfinance through online peer-to-peer lending platforms. Electronic Commerce Research and

Applications, 11(5), 469-483.

Page 41: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

41

Appendix A. Question-based Comparisons between Treatments (Raw Data Graphs)

Figure A1: Comparisons between the Information Treatments and the No-Information Treatment

Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Second-stage

correct rate on the y-axis. Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.

Page 42: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

42

Figure A2: Comparisons between the Information Treatments

Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Second-stage

correct rate on the y-axis. Nonparametric kernel estimation with Epanechnikov function used to obtain smooth curve.

Page 43: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

43

Appendix B. Question-based Confidences and Answer Distributions in the No-Information

Treatment

Figure B1: Average Real and Estimated Confidence

Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Average

confidence level of subjects on the y-axis.

Figure B2: Average Real and Estimated Percentage of Agreement with Own Answer

Notes: Question number ordered by question difficulty levels in the No-Information treatment from difficult to easy. Average

percentage of proponents (i.e., participants choosing the same answer as his own) aggregated by subjects on the y-axis. No

significant difference in the estimates of others’ giving the same answer between the Right group and Wrong group (Wilcoxon

matched-pairs signed-ranks test, p=0.958).

Page 44: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

44

Appendix C. Experimental Instructions (translated from original Chinese version)

C.1 Experimental instructions for the first-stage tasks (common to all the treatments)

Thank you for participating in this experiment! Please read the following instructions carefully. If you have any

questions, feel free to ask us. Please note that you cannot communicate with other participants during the experiment.

Experimental task

You will see 50 trivia questions, each of which contains 4 sub-questions.

Question (a) includes a statement. Please choose between the options “True” and “False” based on your

knowledge.

Question (b) requires you to estimate the probability of giving the correct answer to question (a).

Question (c) requires you to estimate the proportion of participants (including yourself) in the experiment who

give the same answer to question (a) as you.

Question (d) requires you to estimate the average value of the confidence in own answer reported by all the

participants (including yourself) to question (b) in the experiment.

Experimental payoff

For each question, you will get 10 points if you answer question (a) correctly, and 0 points otherwise.

For question (b), we will randomly generate a number from [0%, 100%] (the number will be generated again

randomly for different questions). If your answer (i.e., the percentage entered in the question) is smaller than the

number, your score is: the random number * 2 points. If your answer (i.e., the percentage entered in the question) is

greater than or equal to the number, your score depends on your answer to the corresponding question (a). You will

get 2 points if question (a) is correctly answered, and 0 points otherwise.

For questions (c) and (d), you will get 2 points if the difference between your answer and the actual result is

within the range [-5%, 5%]; you will get 0 points otherwise.

Each 1 point can be exchanged for 0.1 RMB. Therefore, your total payoff in the experiment equals your total

score multiplied by 0.1 RMB.

Example

For example, you will see the following 4 questions:

(a) The Russians celebrated the October Revolution in October. True/ False

(b) What is your estimate of the probability that you are correct? (an integer between 50 and 100, %)

(c) Among all the participants in this experiment (including yourself), what do you think is the proportion of

participants who have the same answer to question (a) as you? (an integer between 1 and 100, %)

(d) Among all the participants in this experiment (including yourself), what do you think is the average value of

answers given to question (b)? (an integer between 50 and 100, %)

Suppose there are 5 participants in an experiment. Their answers are shown in the following table.

Subject (a) (b) Xi% (c) Yi% (d) Zi%

N1 True X1 Y1 Z1

N2 False X2 Y2 Z2

N3 False X3 Y3 Z3

N4 True X4 Y4 Z4

N5 True X5 Y5 Z5

Page 45: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

45

Suppose the correct answer to question (a) is "False". Then the participants N1, N4 and N5 who answer “True”

will get 0 points for question (a); the participants N2 and N3 who answer “False” will get 10 points for question (a).

Question (b) requires you to estimate the possibility of giving the correct answer to question (a). For each subject

i, the computer will generate a random number %iR in [0%, 100%]. If i iR X , he will get % 2iR points for

question (b); if i iR X , he will get 2 points when he answers question (a) correctly for question (b), and 0 points

otherwise.

Let us analyze whether the subject has the incentive to report his true estimate. Suppose the true estimate of

subject i is *

iX , but he reports iX

If *

i iX X , he gets % 2iR points when i iR X ; he gets 2 points (answering question (a) correctly) or 0

point (answering question (a) incorrectly) when *

i iR X and *

i i iX R X . However, when *

i i iX R X , his

expected score is * * *% 2+(1 %) 0= % 2i i iX X X − , which is smaller than % 2iR . Therefore, it is not beneficial

for the subject to report *

i iX X .

If *

i iX X , he gets % 2iR points when *

i iR X and *

i i iX R X ; he gets 2 points (answering question

(a) correctly) or 0 point (answering question (a) incorrectly) when i iR X . However, when *

i i iX R X , his

expected score % 2iR is smaller than * * *% 2+(1 %) 0= % 2i i iX X X − . Therefore, it is also not beneficial for

the subject to report *

i iX X .

Question (c) requires you to estimate of the proportion of participants (including yourself) in the experiment

who give the same answer to question (a) as you.

There are 3 participants answering “True” and 2 participants answering “False” in the example. That is, 60% of

the participants answer “True” and 40% of the participants answer “False”.

For subject N1, the proportion of others’ giving the same answer is 60%. He will get 2 points for question (c) if

| 60| 5iY − , and 0 point otherwise. For subject N2, the proportion of others’ giving the same answer is 40%. He will

get 2 points for question (c) if | 40| 5iY − , and 0 point otherwise. We can calculate the scores for subject N3, N4

and N5 in a similar way.

Question (d) requires you to estimate the average value of the confidence in own answer reported by all the

participants (including yourself) to question (b) in the experiment. A subject i will get 2 points if

1 2 3 4 5| | 55

i

X X X X XZ

+ + + +− , and 0 points otherwise.

If you have any questions, please raise your hand. The experiment will start if all subjects have no questions in

understanding the experimental procedures.

C.2 Experimental instructions for the second-stage tasks

Note. Subjects were informed of the second stage only when all the participants had finished the experimental task

in the first stage.

Thank you for answering the above 50 questions. Next, we will provide you with the answers submitted by the

participants in an earlier session to each of the 50 questions, and also provide you with an opportunity to revise your

answers. For each question (a), you can feel free to revise your previous answers or simply choose not to revise them.

If you decide to revise the previous answer, your answer will be automatically updated according to the

Page 46: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

46

following rule: the modified answer becomes "False" if your previous answer is "True"; and the modified answer

becomes "True" if your previous answer is "False".

If you decide not to revise the previous answer, your answer remains the same as your previous answer.

Your final payoff for question (a) will depend on your answer in this stage.

Ⅰ. Example [specific to the treatment LI]

If you have answered the question "the Russians celebrate the October Revolution in October" before, then you

will see the question again as follows:

The Russians celebrated the October Revolution in October.

Your previous answer is: True / False

Do you want to modify your answer: Yes / No

At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The

details are as follows:

For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%

of the total number of participants.

For the participants who disagreed with the statement in an earlier session, the number of them accounted for

DW% of the total number of participants.

Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.

Subject (a) (b) Xi% (c) Yi% (d) Zi%

N1 True X1 Y1 Z1

N2 False X2 Y2 Z2

N3 False X3 Y3 Z3

N4 True X4 Y4 Z4

N5 True X5 Y5 Z5

Therefore, we have AW%= 60%, DW%= 40%.

Ⅱ. Example [specific to the treatment MI]

If you have answered the question "the Russians celebrate the October Revolution in October" before, then you

will see the question again as follows:

The Russians celebrated the October Revolution in October.

Your previous answer is: True / False

Do you want to modify your answer: Yes / No

At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The

details are as follows:

For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%

of the total number of participants; the average probability that they estimated they were correct was AX%.

For the participants who disagreed with the statement in an earlier session, the number of them accounted for

DW% of the total number of participants; the average probability that they estimated they were correct was DX%.

Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.

Page 47: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

47

Subject (a) (b) Xi% (c) Yi% (d) Zi%

N1 True X1 Y1 Z1

N2 False X2 Y2 Z2

N3 False X3 Y3 Z3

N4 True X4 Y4 Z4

N5 True X5 Y5 Z5

Therefore, we have AW%= 60%, AX%= 1 4 5 100%3

X X X+ + ; DW%= 40%, DX%= 2 3 100%

2

X X+ .

Ⅲ. Example [specific to the treatment FI]

If you have answered the question "the Russians celebrate the October Revolution in October" before, then you

will see the question again as follows:

The Russians celebrated the October Revolution in October.

Your previous answer is: True / False

Do you want to modify your answer: Yes / No

At the same time, you will see the answer submitted by the participants in an earlier session on the screen. The

details are as follows:

For the participants who agreed with the statement in an earlier session, the number of them accounted for AW%

of the total number of participants; the average probability that they estimated they were correct was AX%; the

average proportion of participants they estimated agreeing with the statement was AY%; the average value they

estimated of the average probability that each subject believed himself to be correct was AZ%.

For the participants who disagreed with the statement in an earlier session, the number of them accounted for

DW% of the total number of participants; the average probability that they estimated they were correct was DX%;

the average proportion of participants they estimated disagreeing with the statement was DY%; the average value

they estimated of the average probability that each subject believed himself to be correct was DZ%.

Suppose there were 5 participants in the earlier session. Their answers are shown in the following table.

Subject (a) (b) Xi% (c) Yi% (d) Zi%

N1 True X1 Y1 Z1

N2 False X2 Y2 Z2

N3 False X3 Y3 Z3

N4 True X4 Y4 Z4

N5 True X5 Y5 Z5

Therefore, we have AW%= 60%, AX%= 1 4 5 100%3

X X X+ + , AY%= 1 4 5 100%

3

Y Y Y+ + , AZ%=

1 4 5 100%3

Z Z Z+ + ; DW%= 40%, DX%= 2 3 100%

2

X X+ , DY%= 2 3 100%

2

Y Y+ , DZ%= 2 3 100%

2

Z Z+ .

Page 48: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

48

Appendix D. 50 Quiz Questions Used in the Experiment (translated from original Chinese

version)

1. The national anthem of Spain has no lyrics.

2. Bangladesh has a smaller population than Russia.

3. Pluto has not orbited the sun one round since its discovery.

4. Russia has the most time zones of any country.

5. Nata de coco is made from coconut meat.

6. Liu Bang was three years younger than Qin Shihuang.

7. Uranus has a moon named after a character in King Lear written by Shakespeare.

8. The farthest place from the center of the earth is the Himalayas.

9. We can send WeChat messages on Mount Qomolangma.

10. “亖” is pronounced [sì].

11. Sharks are the animal that have the most teeth in the world.

12. The gas produced by melting gold is of green color.

13. “Sworn brothers” includes friends between generations.

14. The name of the Polish special forces is “Giant Palm”.

15. The Arabia in Arabian Nights refers to India.

16. The Chinese saying “This is just between you and me” was first said by Yang Zhen in the Han Dynasty, who

meant to refuse to accept gifts.

17. Among the four major basins in China, the Tarim Basin in Xinjiang Province has a special kind of soil called

“purple soil”.

18. The Chinese saying “桃李年华” refers to a woman’s age of 24.

19. The dying wish of Goethe was to be buried beside the poet Schiller.

20. The Pyrenees is a natural border between Spain and France.

21. The candidates for the 2012 FIFA Ballon d ’Or Award include Lionel Messi, Cristiano Ronaldo and Neymar.

22. The Italian scientist and astronomer Copernicus was burnt to die for his adherence to the heliocentric theory.

23. The father of Dayu, who is the hero of controlling flood in the history of China, was called Gu.

24. Duan Jingzhu is ranked 108 and has a nickname of “Golden Retriever” in the Water Margin.

25. In the classic FC game Super Mario, when Mario can fire bullets, he wears blue clothing.

26. Zishi in ancient China refers to the present time of 23:00-00:59 in a day.

27. After Amsterdam, the second most populous city in the Netherlands is The Hague.

28. The top left corner of the flag of Australia is the flag of the United Kingdom.

29. Czechoslovakia, a central European country, had split from the Czech Republic.

30. In the Chinese story “No 300 taels of silver buried here”, Wang Er stole the silver.

31. On the reverse side of the 5th edition of 5-yuan RMB bills is Huangshan Mountain in China.

32. The Indus River is located in India.

33. Among the “six top ancient water-towns in the southern Yangtze River area”, the three ancient towns in Zhejiang

Page 49: Stuck in the Wisdom of Crowds: Information, Knowledge, and ...

49

Province are Wuzhen, Zhouzhuang and Xitang.

34. The capital of the South American country of Panama is Panama City.

35. The first “9” in the C919 medium-size airliner means everlasting.

36. The Hundred Years’ War lasted 116 years.

37. The “Black box” on an airliner is purple.

38. The Canary Islands in the Atlantic Ocean are named after the Canary Dog.

39. The label number of diesel oil, such as 0 and -10, is classified by its solidifying point.

40. In the history of the Golden Horse Awards in Taiwan, only two actors, Jackie Chan and Tony Leung Chiu-wai,

have won two Best Actor awards in a row.

41. “东风” in the Chinese poetry “东风不与周郎便,铜雀春深锁二乔” refers to Borrowing Arrows with

Thatched Boats.

42. Zhang Qian brought corn back from the Western Regions in the Tang Dynasty.

43. “Beat the snake seven inches” means beating its heart.

44. The Bentley LOGO has 10 large feathers on each side of the wing.

45. Sea cucumbers go dormant in winter.

46. Severe winter refers to the twelfth month of the lunar calendar.

47. The first woman to proclaim herself emperor in the history of China is Wu Zetian.

48. The name “84 Disinfectant” is derived from the successful development in 1984.

49. “Grand finale” was originally a Chinese opera term, referring to the last part of a drama performance.

50. The last name of Confucius is Zi.