Validity: Conceptual Issues Furr & Bacharach Chapter 8.
-
Upload
blake-hampton -
Category
Documents
-
view
225 -
download
1
Transcript of Validity: Conceptual Issues Furr & Bacharach Chapter 8.
Contrasting Reliability & Validity
Both fundamental to a sophisticated understanding of psychometrics
Must have a clear understanding of the relationship between the two
Definitions – notice differences Reliability
Degree to which differences in test scores reflect differences among people in their levels of the trait that affects those scores, whatever that trait may be
Quantitative property of the test scores Validity
Tied to interpretation of test score Tied to theory and implication of scores
LINK Validity requires reliability
Stable traits (Intelligence & IQ) Measure at two point in time, scores should be
stable across time (test-retest reliability) If not, the test cannot be a valid test of IQ
States (Depression & BDI) If poor internal consistency, can’t be valid
Reliability does not imply validity Stable Trait (Autism & AQ)
May have excellent test-retest reliability or good internal consistency, but may not be interpreted in a valid manner
Iowa story Don’t want to hire people who might
abuse clients anymore!!! Personality tests…
Is there a test that measures the construct? Does it validly measure abusive personality? Is there a test that was designed to predict
the likelihood that a particular individual will abuse people?
Validity ----- Definition
Basic Definition The degree to which a test measures
what it is supposed to measure Contemporary Definition
“The degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses” of the test
Validity about interpretation & use of test scores
NEO-PI-R Conscientiousness scale – 48
items High scores reflect an “active process of planning, organizing and carrying out tasks,“ and people with high scores on this scale are “purposeful, strong willed, and determined”
NEO-PI-RConscientiousness Scale What is the correct question about the
scale’s validity or invalidity?
Are the test items valid or invalid?
Are the test scores valid or invalid?
Is the interpretation of the test scores valid or invalid?
Not “are items or scores valid or invalid?”
The question is:Are the authors’ interpretations of the scores valid or invalid?
Are conscientiousness scores validly interpreted in terms of planfulness, organization, and determination?
Proposed use of scores…
Employers may use NEO-PI-R Conscientiousness Scale to screen potential employees
BELIEF: Differentiates potentially better and worse employees? Predictive power of
conscientiousness scale score?
Simplistic & inaccurate to say…
“Conscientiousness scale is valid without regard to the way in which it will be interpreted and used”
Rather (what is accurate) Scores can be interpreted validly as an
indicator of conscientiousness Scale is not valid as a measure of
intelligence or extraversion Not a valid predictor of successful
employment
Compare:
“Scores on the Conscientiousness scale of the NEO-PI-R are validly interpreted as a measure of conscientiousness.”
vs.“The Conscientiousness scale of
the NEO-PI-R is valid.”
Implication 2 Validity is a matter of degree
Strong vs. weak NOT valid vs. invalid
Select test if strong enough evidence supporting intended interpretation and use
http://www.wired.com/wired/archive/9.12/aqtest.html
Concern about the Autism Spectrum Quotient…
Marginal internal consistency, so reliability is already of concern
What about validity? Is it valid to interpret a high score on
the test as reflecting a high degree of autism traits?
Interpretation of AQ Magical
Ideation Physical
Anhedonia Perceptual Aberration
Social Anhedonia
Pearson
Correlation .371** .231* .230* .573**
Sig. (2-
tailed) .000 .013 .014 .000
Autism Spectrum Quotient
N 114 114 114 114
SCID-II Paranoid
SCID II Schizotypy
SCID II Schizoid
SCL-90 Paranoid
SCL-90 Psychoticism
Pearson
Correlatio
n
.399** .314** .309** .255** .194**
Sig. (2-
tailed) .000 .000 .000 .001 .010
Autism Spectrum Quotient
N 179 179 179 178 178
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Regret vs. Autism? (r = .45)
Regret Scale
1. Whenever I make a choice, I ’m curious about what would have happened if I had chosen differently.
2. Whenever I make a choice, I try to get information about how the other alternatives would have turned out.
3. I f I make a choice and it turns out well, I still feel like somewhat of a failure if I find out another choice would have turned out better.
4. When I think about how I ’m doing in life, I often assess opportunities I have passed up.
5. Once I make a decision, I don’t look back.
Maximization Scale
1. When I watch TV, I channel surf, often scanning through the available options even while attempting to watch one program.
2. When I am in the car listening to the radio, I often check other stations to see if something better is playing, even if I ’m relatively satisfied with what I ’m listening to.
3. I treat relationships like clothing; I expect to try a lot on before I get the perfect fit.
4. No matter how satisfied I am with my job, it’s only right for me to be on the lookout for better opportunities.
5. I often fantasize about living in ways that are quite different from my actual life.
6. I ’m a big fan of lists that attempt to rank things (the best movies, the best singers, the best athletes, the best novels, etc.).
7. I often find it difficult to shop for a gift for a friend.
8. When shopping, I have a hard time finding clothing that I really love.
9. Renting videos is really difficult. I ’m always struggling to pick the best one.
10. I find that writing is very difficult, even if it’s just a letter to a friend, because it’s so hard to word things just right. I often do several drafts of even simple things.
11. No matter what I do, I have the highest standards for myself.
12. I never settle for second best.
13. Whenever I ’m faced with a choice, I try to imagine what all the other possibilities are, even ones that aren’t present at the moment.
What is to be measured?
What are the relative strengths of the alternatives that are available to measure that construct?
Select best measures of specific characteristics to be assessed
Implication 3
Validity of a test’s interpretation is based on evidence and theory
Human resources: “…in her experience, use of NEO-PI-R was useful in selection”
“Personality Color Test”
Based on color psychology (Max Luscher) Color preferences reveal something
about your personality Survey of scientific literature finds
almost no empirical evidence of validity of color preferences as a measure of personality characteristics
Evidence for “color test” Less than clear Cite implies validity Web site:
“Is the test reliable? We leave that to your opinion. We can only say that there are a number of corporations and colleges that use the Lûscher test as part of their hiring/admissions processes. It can be a useful tool for doctors and psychologists as well and is used to get a quick overview of potential issues patients may have in their lives.”
http://colorquiz.com/
“Color Quiz”
Is the test useful as a measure of personality?
Denied employment based on such a test?
Empirical evidence & theoretical underpinnings?
Data from high quality research must be available.
Theory alone is not adequate.
Contemporary view of validity
Although 3 forms, content, criterion, and construct, contemporary perspective highlights CONSTRUCT VALIDITY
Standards
Standards for Educational and Psychological Testing - revised (1999)
Co-published by American Education Research Association
(AERA) American Psychological Association (APA) National Council on Measurement in
Education (NCME
Standards outline 5 types of evidence relevant for establishing validity of test interpretations (AERA, APA, NCME, 1999)
Construct
Validity
Associations With Other Variables
Internal Structure
Test Content
Response Processe
s
Consequences of Use
Validity Evidence: Test Content
Match between the actual content of a test and the content that should be included in the test.
Psychological nature of the construct should dictate the appropriate content of the test.
Face Validity
Face validity – the degree to which a measure appears to be related to a specific construct in the judgment of non-experts such as test takers and representatives of the legal system.
LOOKS relevant, and this fact may increase likelihood that the test will be well received by users and takers
Threats to content validity Construct-irrelevant content – e.g., test
includes questions on content not covered in book, lecture, or discussion
Construct under-representation – e.g., test content fails to represent the full scope of the content implied from the construct
Related practical issues – e.g., time, respondent fatigue, respondent attention, and etc. – Is content a fair representation?
Content Validity vs. Face Validity Content validity is the
degree to which the content reflects the full domain of the construct &
can only be evaluated by experts who have a deep understanding of the construct
Face validity is the degree to which non-experts perceive the
test to be relevant to what they believe is being measured by it
Validity Evidence: Internal Structure of the
Test
For a test to be validly interpreted as a measure of a particular construct, the actual structure of the test should
match the theoretically based structure of the construct
Does the theoretical basis suggest a unidimensional or a multi-dimensional structure?
Internal Structure Often assess via examination of
factor structure (factor analysis) Items that are more strongly
correlated with each other than other items form clusters called factors…
Factor analysis should clarify the number of factors within a set of test questions
Example: Self esteem – is the construct uni- or multi-dimensional?
Factor analysis
1. Clarifies number of factors2. Reveals associations among the
factors within a multi-dimensional test
3. Identifies which items are linked to which factors
Rosenberg Self-Esteem Inventory (RSEI; Rosenberg 1989)
1. On the whole, I am satisfied with myself2. At times, I think I am no good at all.3. I feel that I have a number of good qualities4. I am able to do things as well as most other people5. I feel I do not have much to be proud of6. I certainly feel useless at time7. I feel that I’m a person of worth, at least on an equal
plan with others8. I wish I could have more respect for myself9. All in all, I am inclined to feel that I am a failure10. I take a positive attitude toward myself
RSEI - Scree Plot Number of factors
evident in the plot?Question: This scree plot
provides evidence for what type of structure
a. Unidimensionalb. Multidimensional
Scree Plot
0
1
2
3
4
5
6
0 1 2 3 4 5 6 7 8 9
NumberEi
genv
alue
s
Validity Evidence:Response Processes
Match between the psychological processes that respondents actually use when completing a measure and the processes that they should use. When I say start, raise your finger when you
feel 10 s have elapsed. Assumption: should use “feel” (feels like time
is up) but could use another process such as covert
counting, copying others, or looking at a second hand on a watch
Response processes
If a different response process used is different than the one assumed to be used, then the scores may not be interpretable as the test developer intended Attention to the internal feel of time
passing vs. use of some selected process to intentionally mark passage of time
Validity Evidence:Association With Other
Variables
Match between a measure’s actual associations with other measures and the associations that the test should have with the other measures.
Discriminant evidence
Degree to which test scores are uncorrelated with tests of unrelated constructs
Measure of autism should be uncorrelated with measures of schizophrenia
Magical Ideation
Physical Anhedonia
Perceptual Aberration
Social Anhedonia
Pearson
Correlation .371** .231* .230* .573**
Sig. (2-
tailed) .000 .013 .014 .000
Autism Spectrum Quotient
N 114 114 114 114
SCID-II Paranoid
SCID II Schizotypy
SCID II Schizoid
SCL-90 Paranoid
SCL-90 Psychoticism
Pearson
Correlatio
n
.399** .314** .309** .255** .194**
Sig. (2-
tailed) .000 .000 .000 .001 .010
Autism Spectrum Quotient
N 179 179 179 178 178
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Support for C & B’s theory? NO: Convergent evidence - autism
measure correlated positively with sz measures Finding: AU & SZ are related constructs? i.e., Crespi & Badcock are wrong
Or Not really yes, but could assume strong
correlations indicate weak validity of AQ as a measure of autism construct
Concurrent validity evidence
The degree to which test scores are correlated with other relevant variables that are measured at the same time as the primary test of interest
SAT is a measure of skills needed for academic success? Compare SAT administered during high
school senior year to hs senior year GPA
Predictive validity evidence The degree to which test scores
are correlated with relevant variables that are measured at a future point in time.
SAT is a measure of skills needed for academic success? Compare SAT administered during
senior year of high school to college freshman year GPA
Validity Evidence:Consequences of Testing
Social consequences of test are a facet of validity…
Standards for Educational and Psychological Testing Validity includes “the intended and
unintended consequences of test use” E.g., does a construct and its
measurement benefit one group?
Not all agree… Consequences of a testing
program should be considered a facet of the scientific evaluation of the meaning of a test score.
Some feel that this is an intrusion of politics into science…
Can science be separated from personal and social values?
Summary
Conceptual basis for validity
Construct
Validity
Associations With Other Variables
Internal Structure
Test Content
Response Processe
s
Consequences of Use
Validity
Standard for Education and Psychological Tests (1999) The degree to which
evidence and theory support the interpretations of test scores
entailed by the proposed uses of a test