Questionnaire Design and Evaluation Mark Shevlin.

Questionnaire Design and Questionnaire Design and EvaluationEvaluation

Mark ShevlinMark Shevlin

Type of Psychological TestsType of Psychological Tests

Psychological tests can be used to Psychological tests can be used to measuremeasure– General ability (IQ)General ability (IQ)– Specific abilitiesSpecific abilities– AttitudesAttitudes– InterestsInterests– Clinical pathology Clinical pathology – Personality traitsPersonality traits

Type of Psychological TestsType of Psychological Tests

The guidelines in this lecture relate toThe guidelines in this lecture relate to– AttitudesAttitudes– InterestsInterests– Personality traitsPersonality traits

Always make sure that there is not an Always make sure that there is not an already published scale available.already published scale available.

Guidelines in Scale Guidelines in Scale ConstructionConstruction

What do you want to measureWhat do you want to measure Generate an item poolGenerate an item pool Decide on appropriate response formatDecide on appropriate response format Initial item review and development Initial item review and development

samplesample Evaluate itemsEvaluate items Optimise scale contentOptimise scale content

What do you want to What do you want to measuremeasure

You will be attempting to measure a You will be attempting to measure a variablevariable, a dimension along which , a dimension along which people are different.people are different.

The variable will be latent, The variable will be latent, unobservable variables.unobservable variables.

Developing a scale requires a clear Developing a scale requires a clear and concise understanding of what and concise understanding of what you are trying to measure.you are trying to measure.

Level of generalityLevel of generality Variables can be measured a different Variables can be measured a different

levels of specificity.levels of specificity. Specificity refers to the breadth of the Specificity refers to the breadth of the

construct under consideration.construct under consideration. Some measures tap a very specific small Some measures tap a very specific small

group of behaviours (eg. Punctuality).group of behaviours (eg. Punctuality). Some measures tap a very broad and Some measures tap a very broad and

general group of behaviours (eg. general group of behaviours (eg. Intoversion).Intoversion).

Level of generalityLevel of generality

The level of generality has an influence on the The level of generality has an influence on the

‘bandwidth fidelity trade-off’‘bandwidth fidelity trade-off’..

A measure with narrow bandwidth (specific) A measure with narrow bandwidth (specific)

should be good at predicting a small number of should be good at predicting a small number of

behaviours, but poor at predicting a range of behaviours, but poor at predicting a range of

behaviours.behaviours.

A measure with broad bandwidth (general) A measure with broad bandwidth (general)

should be reasonable at predicting a large should be reasonable at predicting a large

number of behaviours, but poor at predicting number of behaviours, but poor at predicting

specific behaviours.specific behaviours.

Level of generality: NarrowLevel of generality: Narrow

A punctuality measure would be A punctuality measure would be goodgood

at predicting time of arrival at classes, at predicting time of arrival at classes,

how often a person was late for work how often a person was late for work

etc.etc.

A punctuality measure would be A punctuality measure would be poorpoor

at predicting social or interpersonal at predicting social or interpersonal

behaviour.behaviour.

Level of generality: BroadLevel of generality: Broad

A sociability measure would be A sociability measure would be poorpoor

at predicting time of arrival at classes, at predicting time of arrival at classes,

how often a person was late for work how often a person was late for work

etc.etc.

A sociability measure would be A sociability measure would be goodgood

at predicting many social or at predicting many social or

interpersonal behaviours.interpersonal behaviours.

ExampleExample

Extraversion

Sociability Activity Excitability

Do you enjoy

meeting new people?

Do you like plenty of

bustle and excitement

around you?

Do you like mixing with

people?

ExerciseExercise

Name three general variables that Name three general variables that may interest psychologists. What type may interest psychologists. What type of behaviours would they predict.of behaviours would they predict.

Name three specific, or narrow, Name three specific, or narrow, variables that may interest variables that may interest psychologists. What type of psychologists. What type of behaviours would they predict.behaviours would they predict.

Item PoolItem Pool

An item pool is a large number of An item pool is a large number of initial questions that may be included initial questions that may be included in the final questionnaire.in the final questionnaire.

Item pools can be generated simply Item pools can be generated simply by thinking of items that reflect the by thinking of items that reflect the variable of interest.variable of interest.

Preferably you should use a Preferably you should use a blueprintblueprint..

Item PoolItem Pool

A blueprint, or test specification, is a A blueprint, or test specification, is a framework for developing the framework for developing the questionnaire.questionnaire.

It requires you to specify It requires you to specify content areas. content areas. The The content areas content areas should cover everything should cover everything that is relevant to the purpose of the that is relevant to the purpose of the questionnaire. questionnaire.

ManifestationsManifestations refer to the way that the refer to the way that the content areas may manifest themselves.content areas may manifest themselves.

Item PoolItem Pool

More specifically, different types of More specifically, different types of manifestations should be identifiedmanifestations should be identified– Behavioural: instances of behaviour Behavioural: instances of behaviour

related to content arearelated to content area– Cognitive: the way of thinking related to Cognitive: the way of thinking related to

a content areaa content area– Affective: the way a person feels related Affective: the way a person feels related

to a content areato a content area

Item PoolItem Pool The content areas and manifestations The content areas and manifestations

should form the axis for a grid.should form the axis for a grid.

Content areas

Man

ifes

tati

ons

Item PoolItem Pool You should use between 4 and 7 You should use between 4 and 7

categories for each axis.categories for each axis. An example of a blueprint for An example of a blueprint for

measuring social anxiety (defined as measuring social anxiety (defined as an anxiety response to social an anxiety response to social interaction).interaction).

Each cell should be completed Each cell should be completed showing how each content area may showing how each content area may become manifest - become manifest - BUT NOT NOWBUT NOT NOW

Content areas

Manifestations

A. Anxiety at meeting new people

B. Anxiety at speaking publicly

C. Anxiety at being in a public place

A B C

A. Avoidance

B. Tension

C. Feelings of worry

D. Thinking people do not like me

A

B

C

D

ExerciseExercise

Construct a test specification (5 x 5) Construct a test specification (5 x 5) for one of the following variables.for one of the following variables.– Fear of technologyFear of technology– TrustTrust– LonelinessLoneliness– HappinessHappiness

Weighting content areas and Weighting content areas and manifestationsmanifestations

You may decide that not all content You may decide that not all content areas and manifestations are equally areas and manifestations are equally important in representing the variable important in representing the variable of interest.of interest.

You may want to weight some areas You may want to weight some areas and manifestations more heavily and manifestations more heavily depending on their importance.depending on their importance.

First, determine number of items.First, determine number of items.


Determining number of items.Determining number of items.– At least 20.At least 20.– Smaller numbers if sample is elderly or Smaller numbers if sample is elderly or

very young.very young.– Remember than 50% of the items may be Remember than 50% of the items may be

removed.removed.– Rough guide is between 40 and 100.Rough guide is between 40 and 100.

In this example 100 items will be In this example 100 items will be initially developed.initially developed.


In this example 100 items will be In this example 100 items will be initially developed.initially developed.

It is believed that anxiety at meeting It is believed that anxiety at meeting new people is a very important new people is a very important content areas, and that all the content areas, and that all the manifestations are equally important.manifestations are equally important.

The blueprint could be specified as The blueprint could be specified as follows.follows.

Content areas

Manifestations

A. Anxiety at meeting new people

B. Anxiety at speaking publicly

C. Anxiety at being in a public place

A B C

A. Avoidance

B. Tension

C. Feelings of worry

D. Thinking people do not like me

A

B

C

D

60%

20%

20%

25%

25%

25%

25%


If 100 items are to be developed, the If 100 items are to be developed, the number to be written for each cell can number to be written for each cell can be calculated.be calculated.

A B C

A

B

C

D

25%

25%

25%

25%

60% 20% 20%

Content areas

Manifestations

15

15

15

15

5

5

5

5

5

5

5

5

25

25

25

25

60 20 20 100

Writing ItemsWriting Items

Writing items involves constructing Writing items involves constructing questions or statement relating to questions or statement relating to each cell in the test specification.each cell in the test specification.

The nature of the statements will The nature of the statements will depend on the response format used. depend on the response format used.

There are some guidelines to writing There are some guidelines to writing good items.good items.


Items should be concise, clear and Items should be concise, clear and unambiguous.unambiguous.

You should avoid long, wordy items.You should avoid long, wordy items. Construct your items to be compatible Construct your items to be compatible

with the target sample in terms of with the target sample in terms of reading difficulty (e.g. children or reading difficulty (e.g. children or elderly).elderly).


Avoid double negativesAvoid double negatives– ‘‘I am not in favour of the government not I am not in favour of the government not

making drugs legal’making drugs legal’ Avoid double barrelled items that Avoid double barrelled items that

include two or more issuesinclude two or more issues– ‘‘I agree that crime should always be I agree that crime should always be

punished and hanging should return’punished and hanging should return’


Try to avoid floor effects (all Try to avoid floor effects (all respondents scoring low or negatively) respondents scoring low or negatively) by making items too extreme. by making items too extreme. – ‘‘I try to kill myself regularly’I try to kill myself regularly’– ‘‘I hear voices telling me what to do’I hear voices telling me what to do’– ‘‘I am too nervous to speak to anyone’I am too nervous to speak to anyone’– ‘‘I drink more than 300 units of alcohol I drink more than 300 units of alcohol

each week’each week’


Try to avoid ceiling effects (all Try to avoid ceiling effects (all respondents scoring high or respondents scoring high or positively) by making items too positively) by making items too extreme. extreme. – ‘‘I have some positive attributes’I have some positive attributes’– ‘‘What is 1+1?’What is 1+1?’– ‘‘I am too nervous to speak to anyone’I am too nervous to speak to anyone’


Include some negatively worded items Include some negatively worded items to reduce response set, or to reduce response set, or acquiescence (agreeing with all the acquiescence (agreeing with all the items). Remember to reverse code items). Remember to reverse code these items.these items.– I feel I have a number of good qualitiesI feel I have a number of good qualities– On the whole, I am satisfied with myselfOn the whole, I am satisfied with myself– I feel useless at timesI feel useless at times– I feel I do not have a lot to be proud ofI feel I do not have a lot to be proud of

Response FormatResponse Format

Types of scalingTypes of scaling– LikertLikert– Semantic differentialSemantic differential– Visual analogVisual analog– Forced choice binary Forced choice binary

LikertLikert

The item is presented as a declarative The item is presented as a declarative statement and the response options statement and the response options reflect varying degrees of agreement reflect varying degrees of agreement or disagreement.or disagreement.

Between 5 and 7 options is usual.Between 5 and 7 options is usual. The respondent is asked to circle the The respondent is asked to circle the

appropriate category.appropriate category.

LikertLikert The categories should be labelled as The categories should be labelled as

to represent equal intervals.to represent equal intervals. An optional midpoint can be used, butAn optional midpoint can be used, but

– how is it scored?how is it scored?– what does it mean?what does it mean?

Scale the items so that a high level of Scale the items so that a high level of the variable you are measuring is the variable you are measuring is reflected in a high value of a category reflected in a high value of a category that reflects the variable.that reflects the variable.

LikertLikert

1. I enjoy going to parties.

StronglyDisagree

StronglyAgree

1 2 3 4 5

1. I enjoy going to parties.

StronglyDisagree

Disagree Neitheragree ordisagree

Agree StronglyAgree

1 2 3 4 5

Likert: Assessing frequencyLikert: Assessing frequency

1. I feel gloomy

Never Hardly ever Occasionally Sometimes Always1 2 3 4 5

1. I feel gloomy

Less thanonce amonth

Once amonth

Once a week Some days All day

1 2 3 4 5

Semantic differentialSemantic differential

Typically used in attitudinal research Typically used in attitudinal research (Osgood & Tannenbaum, 1955).(Osgood & Tannenbaum, 1955).

Is generally used in reference to one or Is generally used in reference to one or more stimuli, such as a particular more stimuli, such as a particular person, political party, or person, political party, or racial/religious group.racial/religious group.

The target stimulus is followed by a list The target stimulus is followed by a list of adjective pairs representing opposite of adjective pairs representing opposite ends of a continuum.ends of a continuum.

Semantic differentialSemantic differential

The adjective pairs can be unipolarThe adjective pairs can be unipolar– UnfriendlyUnfriendly FriendlyFriendlyOr bipolarOr bipolar

– HostileHostile FriendlyFriendly The respondent is required to to place The respondent is required to to place

a mark between the adjectives to a mark between the adjectives to indicate the appropriate level of their indicate the appropriate level of their response.response.

Students

Happy Sad

Hard

WorkingLazy

Stressed Relaxed

__ __ __ __ __ __ __

__ __ __ __ __ __ __

__ __ __ __ __ __ __

Semantic differential

Visual AnalogVisual Analog

The visual analog scale is similar to The visual analog scale is similar to the semantic differential in that the the semantic differential in that the respondent is required to mark their respondent is required to mark their response between a pair of response between a pair of descriptors.descriptors.

The difference is that the visual The difference is that the visual analog uses a continuum.analog uses a continuum.

Visual AnalogVisual AnalogAt the dentist I feel

Relaxed Frightened

Comfortable Uncomfortable

No pain A lot of pain______________________

______________________

______________________

Visual AnalogVisual Analog

The visual analog scale is very sensitive The visual analog scale is very sensitive and can detect smaller changes than the and can detect smaller changes than the Likert or semantic differential scales.Likert or semantic differential scales.

Therefore useful if an intervention is Therefore useful if an intervention is being assessed, or if the variable is being assessed, or if the variable is transient (e.g. mood).transient (e.g. mood).

Memory effects minimal in visual analog.Memory effects minimal in visual analog.

Forced ChoiceForced Choice

Forced choice usually involves a Forced choice usually involves a binary choice choice as ‘yes/no’ or binary choice choice as ‘yes/no’ or ‘agree/disagree’.‘agree/disagree’.

Generally considered inappropriate for Generally considered inappropriate for clinical symptoms, mood or aptitude clinical symptoms, mood or aptitude measures.measures.

Can be effective at discriminating Can be effective at discriminating between different ‘types’.between different ‘types’.


Some forced choice may include a Some forced choice may include a ‘don’t know’ or ‘?’ option. A decision ‘don’t know’ or ‘?’ option. A decision has to made on how to score this has to made on how to score this response.response.

Found by many respondents to be too Found by many respondents to be too restrictive.restrictive.

Many items needed to generate Many items needed to generate variability.variability.


1. Does your mood often go up and down YES NO2. Do you take much notice of what people think? YES NO3. If you say you will do something, do you always keep your promise no matter how inconvenient it might be? YES NO4. Are you a talkative person? YES NO

All QuestionnairesAll Questionnaires

All questionnaires should includeAll questionnaires should include– Background information, with space for Background information, with space for

demographic detailsdemographic details– Instructions; clear and concise with Instructions; clear and concise with

example if thought necessaryexample if thought necessary– Keep layout clearKeep layout clear

All QuestionnairesAll Questionnaires

Do not mix type of response formats.Do not mix type of response formats. Do not mix labels on a Likert scale in Do not mix labels on a Likert scale in

the same scale.the same scale. Different scales can be included in a Different scales can be included in a

questionnaire, but make sure that the questionnaire, but make sure that the is information an instructions for each is information an instructions for each section.section.

Initial item reviewInitial item review

The initial pool of items should be The initial pool of items should be reviewed by experts in the content reviewed by experts in the content area on the basis of area on the basis of – relevancerelevance– clarity and concisenessclarity and conciseness– content area omissionscontent area omissions– alternative manifestationsalternative manifestations

Initial scale administrationInitial scale administration

The new scale needs to be administered The new scale needs to be administered to a large sample. Nunnally (1978) to a large sample. Nunnally (1978) recommends no less than 300.recommends no less than 300.

If the scale is measuring a single If the scale is measuring a single construct, with few items, a smaller construct, with few items, a smaller sample size may be used.sample size may be used.

Ensure that the sample is as Ensure that the sample is as representative of your target population representative of your target population as possible.as possible.

ExerciseExercise

Using the test specification from the Using the test specification from the first exercisefirst exercise– decide on a weighting schemedecide on a weighting scheme– write three items for each cellwrite three items for each cell– decide on a response format: explain whydecide on a response format: explain why– what sample would the scale be what sample would the scale be

administered to? administered to? 5 minute presentation of work.5 minute presentation of work.

Evaluate itemsEvaluate items

Items must be evaluated in terms of Items must be evaluated in terms of reliability and validity.reliability and validity.

A necessary prerequisite is A necessary prerequisite is determining how many variables, or determining how many variables, or factors, are being measured. This is factors, are being measured. This is done by using factor analysis.done by using factor analysis.

Each subscale is then analysed Each subscale is then analysed separately.separately.

Reliability: Item to totalReliability: Item to total

All the items should be highly All the items should be highly correlated.correlated.

Each item can be correlated with the Each item can be correlated with the remaining total scale items (including remaining total scale items (including or excluding itself).or excluding itself).

Items with low item to scale Items with low item to scale correlations will have low reliability.correlations will have low reliability.

Reliability: Coefficient alphaReliability: Coefficient alpha

This gives an estimate of the scales This gives an estimate of the scales reliability.reliability.

Scaled between 0.0 and 1.0. Higher Scaled between 0.0 and 1.0. Higher values indicating higher reliability.values indicating higher reliability.

There is a positive relationship There is a positive relationship between the number of items in a between the number of items in a scale and estimates of alpha.scale and estimates of alpha.

Item analysis: Item variancesItem analysis: Item variances

The variance (The variance (22) of an item indicates ) of an item indicates its variability.its variability.

If an item has a relatively low If an item has a relatively low variance, this indicates that it is not variance, this indicates that it is not differentiating individuals.differentiating individuals.

Item analysis: Item meansItem analysis: Item means

Extremely low or high means for Extremely low or high means for individual items suggests that the individual items suggests that the wording of the item is too extreme wording of the item is too extreme and floor or ceiling effects are and floor or ceiling effects are occurring.occurring.

Such items will have little power to Such items will have little power to discriminate and therefore should be discriminate and therefore should be discarded.discarded.

Criterion references itemsCriterion references items

Items can be selected on their ability Items can be selected on their ability to predict some external criteria.to predict some external criteria.

For a conservatism scale items should For a conservatism scale items should be retained that can predict political be retained that can predict political preferences.preferences.

For an IQ test scale items should be For an IQ test scale items should be retained that can predict retained that can predict school/university performance.school/university performance.

Questionnaire Design and Evaluation Mark Shevlin.

Documents

Transcript of Questionnaire Design and Evaluation Mark Shevlin.