PODER Experimental Design Catherine C. Eckel – Texas A&M University Rick K. Wilson – Rice...

Capeto

wn

PODER

Experimental Design

Catherine C. Eckel – Texas A&M University

Rick K. Wilson – Rice University

1

7/7

/14

2

Capeto

wn

IN-CLASS EXPERIMENT 1

7/7

/14

3

Capeto

wn

EXPERIMENT

Rules: Your cards are private and may not be revealed to

anyone except the experimenter No talking unless you are asked to do so Keep your earnings private You must pass 2 cards to the experimenter

Earnings in Round: Black Cards are worth 0 Red Cards held in your hand are worth 1 Red Cards in the experimenter stack are

worth .10

7/7

/14

4

Capeto

wn

EXPERIMENT

Payoffs:

1 x (# Red Cards Kept) + 0.1*(# Red Cards in Deck)

7/7

/14

5

Capeto

wn

WHAT IS THIS?

What is the experiment? What were the treatments? What kind of design was used?

Problems with execution?

7/7

/14

6

Capeto

wn

A WELL PLANNED EXPERIMENT ONLY NEEDS

T C

OR

7/7

/14

7

Capeto

wn

WHAT IS AN EXPERIMENT?

Definition: A test of a hypothesis or demonstration of a fact under conditions manipulated by a researcher.

Key elements: Control, control, control Simplify, simplify, simplify Randomize, randomize, randomize

7/7

/14

OBJECTIVES OF EXPERIMENTS Testing theories Establish empirical regularities as a basis for

new theories Testing institutions and environments Policy advice and wind-tunnel experiments Measurement

Preferences: Goods, risk, fairness, time

8

Capeto

wn

7/7

/14

9

Capeto

wn

WHAT AN EXPERIMENT DOESN’T DO

Substitute for thinking Generate hypotheses Not an all-purpose tool

7/7

/14

OBJECTIVES I:THEORY AND EXPERIMENTS

Test a theory or discriminate between theoriesFormal theory provides the basis for

experimental designTest a theory on its own domain:

Implement the conditions of the theory (e.g., preference assumptions, technology assumptions, institutional assumptions)

(Best to have an alternative hypothesis)Compare the prediction(s) with the

experimental outcome10

Capeto

wn

7/7

/14

THEORY, CONT’D What if the results reject theory? Explore the causes of a theory’s failure

Check each of the assumptions Explore parameter space Find out when the theory fails and when it succeeds Design proper control treatments that allows causal

inferences about why the theory fails

11

Capeto

wn

7/7

/14

12

Capeto

wn

OBJECTIVES II:ESTABLISHING REGULARITIES

Finding patterns Lab is inexpensive Manipulations are easy

Pinpointing effects Interventions, GOTV

Fishing … … in the right Lake

7/7

/14

OBJECTIVES III:INSTITUTIONS AND ENVIRONMENTS

(What is an institution?) Compare environments within the same

institution How robust are the results across different

environments? (e.g., CPRs) Compare institutions within the same

environment Allows for comparisons even when no theory about

the effects of the institution is available (Example: cheap talk vs. face-to-face communications in PG)

How to compare? Usual aggregate welfare measure: Aggregate

amount of money earned divided by the maximum that could be earned

13

Capeto

wn

7/7

/14

OBJECTIVES IV:POLICY AND WIND-TUNNEL EXPERIMENTS

Evaluate policy proposals Which auctions generate the higher revenue? (e.g.,

in arts auctions or broadband license auctions) Do emission permits allow efficient pollution control? How should airport slots be allocated?

The laboratory as a wind tunnel for new institutions What is the distributional consequence of

implementing a “transferable quota” system in a fishery?

Market for research space on space station?

14

Capeto

wn

7/7

/14

OBJECTIVES V:MEASUREMENT: ELICITATION OF PREFERENCES

Valuation to Inform PolicyHow much should the government spend on

avoiding traffic injuries?How much should be spend on the

conservation of nature? Measuring people’s valuations is hard

Are people risk seeking/averse?Who is cooperative?

Requires a theory of individual preferences and knowledge about the strength of particular “motives” (preferences). 15

Capeto

wn

7/7

/14

16

Capeto

wn

EXAMPLE: RISK AVERSION

Economists have a theory of decision making under uncertainty: Expected utility

Suggests that ANY measure of the curvature of the utility function will do: Valuation of lotteries

Willingness to pay/accept Becker Degroot Marschak method

Choices between lotteries and certain amounts Choices among lotteries

But these give different answers. (More on request....)

7/7

/14

17

Capeto

wn

HOW THE READINGS FIT

Establishing Regularities: Chaudhuri (2011); Cardenas and Carpenter (2008); Herrmann, Thöni and Gächter (2008)

Measurement/Elicitation of Preferences: Rustagi, Engel and Kosfeld (2010); Karlan (2005)

Institutions and Environments: Rustagi, Engel and Kosfeld (2010); Heinrich et al. (2010)

7/7

/14

18

Capeto

wn

OBJECTIONS TO EXPERIMENTS

Objection: “Experiments are unrealistic.” All models are unrealistic

They leave out many aspects of reality. Simplicity is a virtue – focuses on critical aspects of a

situation (a causal mechanism or logic of a complex relationship)

Experiments are like models They leave out many aspects of reality Focus on critical aspects (cause or precision of

estimate) Realism may be important but so is control.

7/7

/14

19

Capeto

wn

OBJECTIONS CONTINUED

Objection: “Experiments are artificial.” Biased subject pool (students) Low stakes Small number of participants Inexperienced subjects Anonymity

All can be tested in the lab Such testing rarely affects important results

Example: ultimatum game

7/7

/14

20

Capeto

wn

OBJECTIONS CONTINUED

Objection: “Experiments say nothing about the real world.” External validity Generalization

The experiment involves real decisions for the subjects: real consequences

What is the aim of the experiment? Internal validity – ensuring that the causal

inference is correct External validity: secondary

7/7

/14

21

Capeto

wn

LIMITS OF EXPERIMENTS

Control is never perfect (more later) Weather, Laboratory environment No real control over all other motives (no

dominance) Self-selection: who takes part in the experiment?

Randomization is difficult Experiments (like models) are never general,

just examples: Applies to both lab and field experiments

7/7

/14

22

Capeto

wn

WHAT EXPERIMENTS DO WELL

Test for causal claims Inform theory (e.g. fairness) Allow for replication Develop measures (problem of reliability and

validity) Explore parameters of interest (MPCR) Develop counterfactuals (e.g., macro)

7/7

/14

23

Capeto

wn

CAUSAL CONSIDERATIONS

7/7

/14

24

Capeto

wn

RUEBEN CAUSAL MODEL (RCM)

The dilemma:

The same “i” can’t be in two states at the same time!

FPCI: fundamental problem of causal inference

δi =

iT

-

iCTrue Treatment Effect

7/7

/14

25

Capeto

wn

RCM – BEST CORRECTION

iTjT

kTmC

nC

pC

-ATE =

Mean Treatment Mean Control

Analog: y = α + β1X1 + ε, Xi = (0, 1)

7/7

/14

26

Capeto

wn

SUTVA – STABLE UNIT TREATMENT VALUE ASSUMPTION

Treatment ONLY affects the treated.

iTjT

kTmC

nC

pC

7/7

/14

27

Capeto

wn

SUTVA – AUXILIARY ASSUMPTIONS (FOR INFERENCE)

Assumption 1: Average treatment effect is homogeneous across individuals

iT jT kT

= =

7/7

/14

28

Capeto

wn


Assumption 2: Treatment is invariant to manner delivered

iT jT kT

= =

Computer Paper Oral

7/7

/14

29

Capeto

wn


iTjT

kT

Assumption 3: All possible states of the world are observed

=

Place 1 Place 2

7/7

/14

30

Capeto

wn


Assumption 4: Causal question of interest is historically bound to the data. (e.g., no history effects)

Sept. 9, 2001 Sept. 13, 2001

7/7

/14

iTjT

kT

≠

31

Capeto

wn


Assumption 5: The treatment precedes the outcome

7/7

/14

32

Capeto

wn

FIXES FOR THREATS TO INFERENCE, I

Threats Lab Field

One-sided noncompliance

Rare ITT problem

Two-sided noncompliance

Rare ITT problem

Attrition Rare Common observables among treated and untreated?

Interference between experimental units

Rare Source of spillover?IV or matching

Heterogeneous treatment effects

Rare, increase sample size

Subgroup analysis; clustered designs

Covariates Blocking Stratification

7/7

/14

33

Capeto

wn

FIXES FOR THREATS TO INFERENCE, II

Threats Lab Field

Multiple Outcomes Rare – split into distinct orthogonal treatments

Often necessary because of cost of carrying out the design

Hawthorne/John Henry Effects

Rare Within subject/unit data

Generalization Problematic Context dependent?Replication is expensive

Selection to treatment

Rare under random assignment

May be due to NGO demands. Matching or IV might help

7/7

/14

34

Capeto

wn

DESIGN CONSIDERATIONS

7/7

/14

Claim by Rustagi et al. (2010): Forest management can only be successful if: (1) there are large numbers of conditional cooperators and (2) there is an investment in monitoring and punishment.

What would this claim look like under different designs?

COMMON RESEARCH DESIGNS: ONE-SHOT DESIGN (NON-EXPERIMENTAL)

35

Capeto

wn

T O

7/7

/14

Rustagi et al. (2010): With their observational data they have field measures of outcomes (potential crop trees) and survey measures of costly monitoring (amount of time spent on forest patrol).

COMMON DESIGNS: ONE-SHOT DESIGN (NON-EXPERIMENTAL)

36

Capeto

wn

T O

Inference:

Statistics: descriptive or kitchen sink

y = α + β1X1 + β2X2 + … + βmXm + ε

SUTVA violations: Everyone is treated (maybe)?

Rustagi et al. (2010): Using observation data only, know nothing about “types” of actors. “Treatment” might be self-report of time spent in monitoring. But monitoring may be purely social or status enhancing – lots of potential explanations.

7/7

/14

PRE/POST-TEST DESIGN(NON-EXPERIMENTAL)

37

Capeto

wn

T O2O1

7/7

/14

Rustagi et al. (2010): With their observational data they have field measures of outcomes (potential crop trees) and survey measures of costly monitoring (amount of time spent on forest patrol). An improvement would be to measure monitoring time at time 1 and again at time 2 once there was some intervention.

PRE/POST-TEST DESIGN(NON-EXPERIMENTAL)

38

Capeto

wn

T O2O1Inference: Treatment might have caused a difference

Statistics: O1 ≠ O2

SUTVA violations: Everyone is treated (maybe)?Other things may have changed at the same timeAll possible states of the world are not observed.

Rustagi et al. (2010): Again, know nothing about “types” of actors. “Treatment” might be some intervention affecting monitoring. But all 49 CPRs are given same tenure rights.

7/7

/14

STATIC GROUP COMPARISON(NON-EXPERIMENTAL)

39

Capeto

wn

T

O1B

O1AGroup A

Group B

7/7

/14

Rustagi et al. (2010): With their observational data they have field measures of outcomes (potential crop trees) and survey measures of costly monitoring (amount of time spent on forest patrol). Treatment could be differences in attention to tenure rights for harvesting the forest.

STATIC GROUP COMPARISON(NON-EXPERIMENTAL)

40

Capeto

wn

T

O1B

O1AGroup A

Group B (Control group?No, not Randomized.)

Inference: Treatment might have caused a difference

Statistics: O1A ≠ O1B

SUTVA violations: Not clear that treatment produced any observed difference between treated and untreated

Rustagi et al. (2010): Suppose selection into the treatment by CPR (e.g., previously more successful CPRs adopt tenure rules)

y = α + β1X1 + β2X2 + … + βmXm + ε

7/7

/14

RANDOMIZED GROUP COMPARISON(EXPERIMENTAL)

41

Capeto

wn

T

O1B

O1ART

RC

7/7

/14

Rustagi et al. (2010): With their observational data they have field measures of outcomes (potential crop trees) and survey measures of costly monitoring (amount of time spent on forest patrol). Experimental measure is (1) cooperation and (2) beliefs.

RANDOMIZED GROUP COMPARISON(EXPERIMENTAL)

42

Capeto

wn

T

O1B

O1ART

RCInference: Very likely the treatment caused the difference

Statistics: O1A ≠ O1BSUTVA violations:

All possible states of the world may not be observedHistorical boundedness IF everything is not run at

same time

Rustagi et al. (2010): Endogenous selection to “type” – no real assignment.

y = α + β1X1 + ε

7/7

/14

PRE/POST CONTROL(EXPERIMENTAL)

43

Capeto

wn

T O2TO1T

O2CO1C

RTRC

7/7

/14

Rustagi et al. (2010): With their observational data they have one measure of outcomes (potential crop trees). Experimental measure is (1) cooperation and (2) beliefs.

PRE/POST CONTROL(EXPERIMENTAL)

44

Capeto

wn

T O2TO1T

O2CO1C

RTRC

Inference: Treatment probably caused a difference

Statistics: O1T = O1C ; O2T ≠ O2C SUTVA violations:

Treatment homogeneous across individuals?Treatment invariant to delivery method?

Rustagi et al. (2010): Treatment 1 is endogenously selected (conditional cooperator or not) and treatment 2 is a mediator (beliefs about cooperation of others).

y = α + β1X1 + ε

where y = O2 – O1

7/7

/14

SOLOMON FOUR-GROUP(EXPERIMENTAL)

45

Capeto

wn

T O2T1O1T1

O2C1O1C1

RT1RC1

T O2T2

O2C2

RT2RC2

7/7

/14


46

Capeto

wn

T O2T1O1T1

O2C1O1C1

RT1RC1

T O2T2

O2C2

RT2RC2

Inference: Treatment very likely caused a difference

Statistics: O1T1 = O1C1 = O2C1 = O2C2 ; O2T1 = O2T2 ≠ O2C1 = O2C2

7/7

/14


47

Capeto

wn

T O2T1O1T1

O2C1O1C1

RT1RC1

T O2T2

O2C2

RT2RC2

NOTE: This is easy to accomplish in the Lab, but often not efficient. It is a nightmare in the field.

7/7

/14

48

Capeto

wn

EFFICIENCY IN DESIGN:COUNTER-BALANCE/WITHIN SUBJECT

Counter-balanced designs O1 TA O2 TB O3 and O1 TB O2 TA O3

Builds within subject design Decreases the number of trials Accounts for treatment ordering effect

Cross-over designs O1 TA O2 TB O3 TA O4

Accounts for treatment plus learning

7/7

/14

49

Capeto

wn

EFFICIENCY IN DESIGN:BLOCKING (AND MATCHING) Randomized Blocking Designs

Suppose a 2(H,L)x2(B,S) within subject design with a 4 game ordering effect (HB, HS, LB,LS). Would require 16 cells under complete factorial design Blocking allows 4 cells: (1) HB,HS,LB,LS (2)

HS,LB,LS,HB(3) LB,LS,HB,HS (4) LS,HB,HS,LB

Assumptions Blocks must be homogeneous Blocks must be randomly assigned

Blocking on “nuisance” variables Sex is not randomly assigned

y = α + β1X1 + β2X2 + ε

Treatment Sex of Subject

7/7

/14

50

Capeto

wn

CALIBRATION

Keep in mind that you are producing a data set

Include a “baseline” in the experimental design

Set parameters so you can be sure to tell if hypothesis is supported

Try to factor in “noise” in behavior – variability in the performance of the subjects. Lots of noise means results are hard to determine.

Develop criteria for rejection

7/7

/14

51

Capeto

wn

ADVICE ON CONTROLS

Control everything that might be controlled (and collect data on everything that might be uncontrolled)

Randomize everything else Maximize contrasts with treatments (do you

really need to use low, medium and high manipulations?)

If a “nuisance” variable is suspected to interact with a treatment, then make it a separate treatment.

If a “nuisance” variable is too much of a problem, then make it a constant (blocking) – comparative statics will then play out.

7/7

/14

52

Capeto

wn

GENERAL REMARK

Whether the conditions implemented in the laboratory are also present in reality will always be subject to some uncertainty.

Therefore, laboratory experiments are no substitute for the analysis of field happenstance data for the conduct and the analysis of field

experiments and for survey data.

7/7

/14

PODER Experimental Design Catherine C. Eckel – Texas A&M University Rick K. Wilson – Rice...

Documents

Transcript of PODER Experimental Design Catherine C. Eckel – Texas A&M University Rick K. Wilson – Rice...