Questionnaire Design and Evaluation Mark Shevlin.
-
Upload
amos-calvin-sutton -
Category
Documents
-
view
217 -
download
2
Transcript of Questionnaire Design and Evaluation Mark Shevlin.
Type of Psychological TestsType of Psychological Tests
Psychological tests can be used to Psychological tests can be used to measuremeasure– General ability (IQ)General ability (IQ)– Specific abilitiesSpecific abilities– AttitudesAttitudes– InterestsInterests– Clinical pathology Clinical pathology – Personality traitsPersonality traits
Type of Psychological TestsType of Psychological Tests
The guidelines in this lecture relate toThe guidelines in this lecture relate to– AttitudesAttitudes– InterestsInterests– Personality traitsPersonality traits
Always make sure that there is not an Always make sure that there is not an already published scale available.already published scale available.
Guidelines in Scale Guidelines in Scale ConstructionConstruction
What do you want to measureWhat do you want to measure Generate an item poolGenerate an item pool Decide on appropriate response formatDecide on appropriate response format Initial item review and development Initial item review and development
samplesample Evaluate itemsEvaluate items Optimise scale contentOptimise scale content
What do you want to What do you want to measuremeasure
You will be attempting to measure a You will be attempting to measure a variablevariable, a dimension along which , a dimension along which people are different.people are different.
The variable will be latent, The variable will be latent, unobservable variables.unobservable variables.
Developing a scale requires a clear Developing a scale requires a clear and concise understanding of what and concise understanding of what you are trying to measure.you are trying to measure.
Level of generalityLevel of generality Variables can be measured a different Variables can be measured a different
levels of specificity.levels of specificity. Specificity refers to the breadth of the Specificity refers to the breadth of the
construct under consideration.construct under consideration. Some measures tap a very specific small Some measures tap a very specific small
group of behaviours (eg. Punctuality).group of behaviours (eg. Punctuality). Some measures tap a very broad and Some measures tap a very broad and
general group of behaviours (eg. general group of behaviours (eg. Intoversion).Intoversion).
Level of generalityLevel of generality
The level of generality has an influence on the The level of generality has an influence on the
‘bandwidth fidelity trade-off’‘bandwidth fidelity trade-off’..
A measure with narrow bandwidth (specific) A measure with narrow bandwidth (specific)
should be good at predicting a small number of should be good at predicting a small number of
behaviours, but poor at predicting a range of behaviours, but poor at predicting a range of
behaviours.behaviours.
A measure with broad bandwidth (general) A measure with broad bandwidth (general)
should be reasonable at predicting a large should be reasonable at predicting a large
number of behaviours, but poor at predicting number of behaviours, but poor at predicting
specific behaviours.specific behaviours.
Level of generality: NarrowLevel of generality: Narrow
A punctuality measure would be A punctuality measure would be goodgood
at predicting time of arrival at classes, at predicting time of arrival at classes,
how often a person was late for work how often a person was late for work
etc.etc.
A punctuality measure would be A punctuality measure would be poorpoor
at predicting social or interpersonal at predicting social or interpersonal
behaviour.behaviour.
Level of generality: BroadLevel of generality: Broad
A sociability measure would be A sociability measure would be poorpoor
at predicting time of arrival at classes, at predicting time of arrival at classes,
how often a person was late for work how often a person was late for work
etc.etc.
A sociability measure would be A sociability measure would be goodgood
at predicting many social or at predicting many social or
interpersonal behaviours.interpersonal behaviours.
ExampleExample
Extraversion
Sociability Activity Excitability
Do you enjoy
meeting new people?
Do you like plenty of
bustle and excitement
around you?
Do you like mixing with
people?
ExerciseExercise
Name three general variables that Name three general variables that may interest psychologists. What type may interest psychologists. What type of behaviours would they predict.of behaviours would they predict.
Name three specific, or narrow, Name three specific, or narrow, variables that may interest variables that may interest psychologists. What type of psychologists. What type of behaviours would they predict.behaviours would they predict.
Item PoolItem Pool
An item pool is a large number of An item pool is a large number of initial questions that may be included initial questions that may be included in the final questionnaire.in the final questionnaire.
Item pools can be generated simply Item pools can be generated simply by thinking of items that reflect the by thinking of items that reflect the variable of interest.variable of interest.
Preferably you should use a Preferably you should use a blueprintblueprint..
Item PoolItem Pool
A blueprint, or test specification, is a A blueprint, or test specification, is a framework for developing the framework for developing the questionnaire.questionnaire.
It requires you to specify It requires you to specify content areas. content areas. The The content areas content areas should cover everything should cover everything that is relevant to the purpose of the that is relevant to the purpose of the questionnaire. questionnaire.
ManifestationsManifestations refer to the way that the refer to the way that the content areas may manifest themselves.content areas may manifest themselves.
Item PoolItem Pool
More specifically, different types of More specifically, different types of manifestations should be identifiedmanifestations should be identified– Behavioural: instances of behaviour Behavioural: instances of behaviour
related to content arearelated to content area– Cognitive: the way of thinking related to Cognitive: the way of thinking related to
a content areaa content area– Affective: the way a person feels related Affective: the way a person feels related
to a content areato a content area
Item PoolItem Pool The content areas and manifestations The content areas and manifestations
should form the axis for a grid.should form the axis for a grid.
Content areas
Man
ifes
tati
ons
Item PoolItem Pool You should use between 4 and 7 You should use between 4 and 7
categories for each axis.categories for each axis. An example of a blueprint for An example of a blueprint for
measuring social anxiety (defined as measuring social anxiety (defined as an anxiety response to social an anxiety response to social interaction).interaction).
Each cell should be completed Each cell should be completed showing how each content area may showing how each content area may become manifest - become manifest - BUT NOT NOWBUT NOT NOW
Content areas
Manifestations
A. Anxiety at meeting new people
B. Anxiety at speaking publicly
C. Anxiety at being in a public place
A B C
A. Avoidance
B. Tension
C. Feelings of worry
D. Thinking people do not like me
A
B
C
D
ExerciseExercise
Construct a test specification (5 x 5) Construct a test specification (5 x 5) for one of the following variables.for one of the following variables.– Fear of technologyFear of technology– TrustTrust– LonelinessLoneliness– HappinessHappiness
Weighting content areas and Weighting content areas and manifestationsmanifestations
You may decide that not all content You may decide that not all content areas and manifestations are equally areas and manifestations are equally important in representing the variable important in representing the variable of interest.of interest.
You may want to weight some areas You may want to weight some areas and manifestations more heavily and manifestations more heavily depending on their importance.depending on their importance.
First, determine number of items.First, determine number of items.
Weighting content areas and Weighting content areas and manifestationsmanifestations
Determining number of items.Determining number of items.– At least 20.At least 20.– Smaller numbers if sample is elderly or Smaller numbers if sample is elderly or
very young.very young.– Remember than 50% of the items may be Remember than 50% of the items may be
removed.removed.– Rough guide is between 40 and 100.Rough guide is between 40 and 100.
In this example 100 items will be In this example 100 items will be initially developed.initially developed.
Weighting content areas and Weighting content areas and manifestationsmanifestations
In this example 100 items will be In this example 100 items will be initially developed.initially developed.
It is believed that anxiety at meeting It is believed that anxiety at meeting new people is a very important new people is a very important content areas, and that all the content areas, and that all the manifestations are equally important.manifestations are equally important.
The blueprint could be specified as The blueprint could be specified as follows.follows.
Content areas
Manifestations
A. Anxiety at meeting new people
B. Anxiety at speaking publicly
C. Anxiety at being in a public place
A B C
A. Avoidance
B. Tension
C. Feelings of worry
D. Thinking people do not like me
A
B
C
D
60%
20%
20%
25%
25%
25%
25%
Weighting content areas and Weighting content areas and manifestationsmanifestations
If 100 items are to be developed, the If 100 items are to be developed, the number to be written for each cell can number to be written for each cell can be calculated.be calculated.
A B C
A
B
C
D
25%
25%
25%
25%
60% 20% 20%
Content areas
Manifestations
15
15
15
15
5
5
5
5
5
5
5
5
25
25
25
25
60 20 20 100
Writing ItemsWriting Items
Writing items involves constructing Writing items involves constructing questions or statement relating to questions or statement relating to each cell in the test specification.each cell in the test specification.
The nature of the statements will The nature of the statements will depend on the response format used. depend on the response format used.
There are some guidelines to writing There are some guidelines to writing good items.good items.
Writing ItemsWriting Items
Items should be concise, clear and Items should be concise, clear and unambiguous.unambiguous.
You should avoid long, wordy items.You should avoid long, wordy items. Construct your items to be compatible Construct your items to be compatible
with the target sample in terms of with the target sample in terms of reading difficulty (e.g. children or reading difficulty (e.g. children or elderly).elderly).
Writing ItemsWriting Items
Avoid double negativesAvoid double negatives– ‘‘I am not in favour of the government not I am not in favour of the government not
making drugs legal’making drugs legal’ Avoid double barrelled items that Avoid double barrelled items that
include two or more issuesinclude two or more issues– ‘‘I agree that crime should always be I agree that crime should always be
punished and hanging should return’punished and hanging should return’
Writing ItemsWriting Items
Try to avoid floor effects (all Try to avoid floor effects (all respondents scoring low or negatively) respondents scoring low or negatively) by making items too extreme. by making items too extreme. – ‘‘I try to kill myself regularly’I try to kill myself regularly’– ‘‘I hear voices telling me what to do’I hear voices telling me what to do’– ‘‘I am too nervous to speak to anyone’I am too nervous to speak to anyone’– ‘‘I drink more than 300 units of alcohol I drink more than 300 units of alcohol
each week’each week’
Writing ItemsWriting Items
Try to avoid ceiling effects (all Try to avoid ceiling effects (all respondents scoring high or respondents scoring high or positively) by making items too positively) by making items too extreme. extreme. – ‘‘I have some positive attributes’I have some positive attributes’– ‘‘What is 1+1?’What is 1+1?’– ‘‘I am too nervous to speak to anyone’I am too nervous to speak to anyone’
Writing ItemsWriting Items
Include some negatively worded items Include some negatively worded items to reduce response set, or to reduce response set, or acquiescence (agreeing with all the acquiescence (agreeing with all the items). Remember to reverse code items). Remember to reverse code these items.these items.– I feel I have a number of good qualitiesI feel I have a number of good qualities– On the whole, I am satisfied with myselfOn the whole, I am satisfied with myself– I feel useless at timesI feel useless at times– I feel I do not have a lot to be proud ofI feel I do not have a lot to be proud of
Response FormatResponse Format
Types of scalingTypes of scaling– LikertLikert– Semantic differentialSemantic differential– Visual analogVisual analog– Forced choice binary Forced choice binary
LikertLikert
The item is presented as a declarative The item is presented as a declarative statement and the response options statement and the response options reflect varying degrees of agreement reflect varying degrees of agreement or disagreement.or disagreement.
Between 5 and 7 options is usual.Between 5 and 7 options is usual. The respondent is asked to circle the The respondent is asked to circle the
appropriate category.appropriate category.
LikertLikert The categories should be labelled as The categories should be labelled as
to represent equal intervals.to represent equal intervals. An optional midpoint can be used, butAn optional midpoint can be used, but
– how is it scored?how is it scored?– what does it mean?what does it mean?
Scale the items so that a high level of Scale the items so that a high level of the variable you are measuring is the variable you are measuring is reflected in a high value of a category reflected in a high value of a category that reflects the variable.that reflects the variable.
LikertLikert
1. I enjoy going to parties.
StronglyDisagree
StronglyAgree
1 2 3 4 5
1. I enjoy going to parties.
StronglyDisagree
Disagree Neitheragree ordisagree
Agree StronglyAgree
1 2 3 4 5
Likert: Assessing frequencyLikert: Assessing frequency
1. I feel gloomy
Never Hardly ever Occasionally Sometimes Always1 2 3 4 5
1. I feel gloomy
Less thanonce amonth
Once amonth
Once a week Some days All day
1 2 3 4 5
Semantic differentialSemantic differential
Typically used in attitudinal research Typically used in attitudinal research (Osgood & Tannenbaum, 1955).(Osgood & Tannenbaum, 1955).
Is generally used in reference to one or Is generally used in reference to one or more stimuli, such as a particular more stimuli, such as a particular person, political party, or person, political party, or racial/religious group.racial/religious group.
The target stimulus is followed by a list The target stimulus is followed by a list of adjective pairs representing opposite of adjective pairs representing opposite ends of a continuum.ends of a continuum.
Semantic differentialSemantic differential
The adjective pairs can be unipolarThe adjective pairs can be unipolar– UnfriendlyUnfriendly FriendlyFriendlyOr bipolarOr bipolar
– HostileHostile FriendlyFriendly The respondent is required to to place The respondent is required to to place
a mark between the adjectives to a mark between the adjectives to indicate the appropriate level of their indicate the appropriate level of their response.response.
Students
Happy Sad
Hard
WorkingLazy
Stressed Relaxed
__ __ __ __ __ __ __
__ __ __ __ __ __ __
__ __ __ __ __ __ __
Semantic differential
Visual AnalogVisual Analog
The visual analog scale is similar to The visual analog scale is similar to the semantic differential in that the the semantic differential in that the respondent is required to mark their respondent is required to mark their response between a pair of response between a pair of descriptors.descriptors.
The difference is that the visual The difference is that the visual analog uses a continuum.analog uses a continuum.
Visual AnalogVisual AnalogAt the dentist I feel
Relaxed Frightened
Comfortable Uncomfortable
No pain A lot of pain______________________
______________________
______________________
Visual AnalogVisual Analog
The visual analog scale is very sensitive The visual analog scale is very sensitive and can detect smaller changes than the and can detect smaller changes than the Likert or semantic differential scales.Likert or semantic differential scales.
Therefore useful if an intervention is Therefore useful if an intervention is being assessed, or if the variable is being assessed, or if the variable is transient (e.g. mood).transient (e.g. mood).
Memory effects minimal in visual analog.Memory effects minimal in visual analog.
Forced ChoiceForced Choice
Forced choice usually involves a Forced choice usually involves a binary choice choice as ‘yes/no’ or binary choice choice as ‘yes/no’ or ‘agree/disagree’.‘agree/disagree’.
Generally considered inappropriate for Generally considered inappropriate for clinical symptoms, mood or aptitude clinical symptoms, mood or aptitude measures.measures.
Can be effective at discriminating Can be effective at discriminating between different ‘types’.between different ‘types’.
Forced ChoiceForced Choice
Some forced choice may include a Some forced choice may include a ‘don’t know’ or ‘?’ option. A decision ‘don’t know’ or ‘?’ option. A decision has to made on how to score this has to made on how to score this response.response.
Found by many respondents to be too Found by many respondents to be too restrictive.restrictive.
Many items needed to generate Many items needed to generate variability.variability.
Forced ChoiceForced Choice
1. Does your mood often go up and down YES NO2. Do you take much notice of what people think? YES NO3. If you say you will do something, do you always keep your promise no matter how inconvenient it might be? YES NO4. Are you a talkative person? YES NO
All QuestionnairesAll Questionnaires
All questionnaires should includeAll questionnaires should include– Background information, with space for Background information, with space for
demographic detailsdemographic details– Instructions; clear and concise with Instructions; clear and concise with
example if thought necessaryexample if thought necessary– Keep layout clearKeep layout clear
All QuestionnairesAll Questionnaires
Do not mix type of response formats.Do not mix type of response formats. Do not mix labels on a Likert scale in Do not mix labels on a Likert scale in
the same scale.the same scale. Different scales can be included in a Different scales can be included in a
questionnaire, but make sure that the questionnaire, but make sure that the is information an instructions for each is information an instructions for each section.section.
Initial item reviewInitial item review
The initial pool of items should be The initial pool of items should be reviewed by experts in the content reviewed by experts in the content area on the basis of area on the basis of – relevancerelevance– clarity and concisenessclarity and conciseness– content area omissionscontent area omissions– alternative manifestationsalternative manifestations
Initial scale administrationInitial scale administration
The new scale needs to be administered The new scale needs to be administered to a large sample. Nunnally (1978) to a large sample. Nunnally (1978) recommends no less than 300.recommends no less than 300.
If the scale is measuring a single If the scale is measuring a single construct, with few items, a smaller construct, with few items, a smaller sample size may be used.sample size may be used.
Ensure that the sample is as Ensure that the sample is as representative of your target population representative of your target population as possible.as possible.
ExerciseExercise
Using the test specification from the Using the test specification from the first exercisefirst exercise– decide on a weighting schemedecide on a weighting scheme– write three items for each cellwrite three items for each cell– decide on a response format: explain whydecide on a response format: explain why– what sample would the scale be what sample would the scale be
administered to? administered to? 5 minute presentation of work.5 minute presentation of work.
Evaluate itemsEvaluate items
Items must be evaluated in terms of Items must be evaluated in terms of reliability and validity.reliability and validity.
A necessary prerequisite is A necessary prerequisite is determining how many variables, or determining how many variables, or factors, are being measured. This is factors, are being measured. This is done by using factor analysis.done by using factor analysis.
Each subscale is then analysed Each subscale is then analysed separately.separately.
Reliability: Item to totalReliability: Item to total
All the items should be highly All the items should be highly correlated.correlated.
Each item can be correlated with the Each item can be correlated with the remaining total scale items (including remaining total scale items (including or excluding itself).or excluding itself).
Items with low item to scale Items with low item to scale correlations will have low reliability.correlations will have low reliability.
Reliability: Coefficient alphaReliability: Coefficient alpha
This gives an estimate of the scales This gives an estimate of the scales reliability.reliability.
Scaled between 0.0 and 1.0. Higher Scaled between 0.0 and 1.0. Higher values indicating higher reliability.values indicating higher reliability.
There is a positive relationship There is a positive relationship between the number of items in a between the number of items in a scale and estimates of alpha.scale and estimates of alpha.
Item analysis: Item variancesItem analysis: Item variances
The variance (The variance (22) of an item indicates ) of an item indicates its variability.its variability.
If an item has a relatively low If an item has a relatively low variance, this indicates that it is not variance, this indicates that it is not differentiating individuals.differentiating individuals.
Item analysis: Item meansItem analysis: Item means
Extremely low or high means for Extremely low or high means for individual items suggests that the individual items suggests that the wording of the item is too extreme wording of the item is too extreme and floor or ceiling effects are and floor or ceiling effects are occurring.occurring.
Such items will have little power to Such items will have little power to discriminate and therefore should be discriminate and therefore should be discarded.discarded.
Criterion references itemsCriterion references items
Items can be selected on their ability Items can be selected on their ability to predict some external criteria.to predict some external criteria.
For a conservatism scale items should For a conservatism scale items should be retained that can predict political be retained that can predict political preferences.preferences.
For an IQ test scale items should be For an IQ test scale items should be retained that can predict retained that can predict school/university performance.school/university performance.