Measurement

What is measurement?

“the assignment of a value on a variable to a unit of measurement in accordance with an operational definition” (Kleinnijenhuis 1999: 83 in Pennings et. al. 1999)

Components of measurement

– A value: a categorization or number (e.g. whether or not a country is a democracy or not; or level of democratization)

– A variable: a characteristic that can vary in value among observations

– A unit of measurement: an observation/case (e.g. a country in a particular year)

– An operational definition: a definition of a concept that facilitates the assignment of a value to an observation (e.g. what countries would be classified as democracies)

From concept to operational definition

• Reduce abstraction to facilitate measurement• E.g. Political party positions / electoral appeals

– Concept: the salience of themes to political parties

– Operational definition: the percentage of a manifesto devoted to each of a set of defined themes (including “free enterprise”, “welfare state expansion”, “expansion of military capabilities”)

Levels of measurement

• Nominal– Categories with no rankings (e.g. types of democracies / types of

political parties / ethnic groups) - MECE

• Ordinal– Some observations score higher/lower than others, but we do

not know how much more or less (e.g. Left-Right position of parties / intensity of attachment to ethnic group)

• Interval– The differences between the values of the observations are

meaningful

• Ratio– There is a meaningful zero point (e.g. numbers of decisions

taken by referenda/ cross border trade)

Guidelines for constructing measures

• Reveal assumptions implicit in operationalising a concept: often nothing intrinsic in a concept that implies a particular level of measurement

• Maximise information: Wherever sensible, aim for as high a level of measurement as possible

• Address trade-off between comparability and descriptive richness

Lijphart’s (1977) typology of democratic systems

Structure of society

Elite behaviour

Homogeneous Plural

Coalescent Depoliticized:Austria (1966-)

Consociational:Belgium, Netherlands, Switzerland, Austria (1945-66)

Adversarial Centripetal:Finland, Denmark, UK, USA, Norway, Sweden, Germany

Centrifugal:France, Italy, Canada

Evaluating measurement

• Validity and reliability

• Valid: the measure does measure the concept it purports to measure?– Face validity– Correlational validity (internal validity)– Predictive validity (external validity)

An example of a correlational validity test

• Compared key-informants judgements on “what were the controversial issues” in a particular decision situation with documentation– Key informants identified five controversial issues in

decision making on a legislative proposal in the EU (the tobacco directive)

– To what extent and in what way were these related to the controversial issues identified through analysis of documentation?

For details see:

http://www.dhv-speyer.de/tkoenig/DEU_manuscript_Oct2005.htm

E.g. of correlational validity test cont.

Related to experts’ issues?

Issues from documents

Actors with positions on

point of contention.

Average (range)

Number of Council meetings at which point was raised.

Average (range)

Related 8 7.9 (4-17) 3.0 (1-6)

Unrelated 23 4.8 (2-9) 1.7 (1-4)

31 controversial issues on the basis of documentation

5 issues from the informants

How many of the 31 were related to the 5?

What distinguished document-sourced issues that were related to informants’ issues from those that were not?

Reliability

• Reliable: repeated measures on the same cases give the same results– Intra observer reliability– Inter observer reliability

Another example of testing validity and reliability

• The enactment of election pledges in a coalition system of government: The Netherlands

(Thomson, R. 2001. The programme to policy linkage: The fulfilment of election pledges on socio-economic policy in the Netherlands, 1986-1998. European Journal of Political Research 40: 171-197)

Face validity

• Do the election pledges appear to be substantively important?

• Do the patterns of fulfilment match with qualitative descriptions of the main policy events in the period?

Face validiity

Reliability tests

• Inter-coder reliability– Did other readers of the same election

manifestos identify the same statements as pledges?

• Inter-coder reliability– Did subject-area specialists identify the same

pledges as unfilfilled or fulfilled?

An inter-coder reliability testa / (a+b+c)

= 199/226

= .88

Statement identified as pledge by Thomson?

Yes No Total identified by 2nd coder

Statement identified as pledge by 2nd coder?

Yes 199 (a) 13 (b) 212

No 14 (c)

Total identified by Thomson

213

Measurement error

• Aim at unbiased and efficient measures

• Bias: systematic error– We either overestimate or underestimate the

true value

• Inefficiency: Non-systematic error– On average, we estimate the true value

correctly, but with considerable variation

The ideal is unbiased and efficient estimates

True value Measure/ estimate

Unbiased, but inefficient

Unbiased and efficient

Biased

The impact of bias on inference

• Bias / systematic error– Consistent over or underestimation of the true

value– E.g. consistent overestimation of annual

income

• Leads to bias in description

• If it affects all cases equally, does not lead to bias in estimating causal effects

The impact of inefficiency on inference

• Hampers description, although descriptions are correct on average

• Hampers explanation, but the impact differs depending on whether inefficiency pertains to the IV or DV

• Inefficiently measured DV – estimates of causal effects will be difficult, but correct on average (unbiased)

• Inefficiently measured IV – estimates of causal effects will be difficult and incorrect on average (i.e. biased)

Why inefficiently measured DVs do not bias estimates of causal effects

IV: unemployment

DV: violence

Why inefficiently measured IVs bias estimates of causal effects

IV: unemployment

DV: violence

Measurement

Documents

Transcript of Measurement