Validating Assessment Centers Kevin R. Murphy Department of Psychology Pennsylvania State...
-
Upload
gabriella-harris -
Category
Documents
-
view
217 -
download
0
Transcript of Validating Assessment Centers Kevin R. Murphy Department of Psychology Pennsylvania State...
Validating Assessment Centers
Kevin R. MurphyDepartment of Psychology
Pennsylvania State University, USA
A Prototypic AC
• Groups of candidates participate in multiple exercises
• Each exercise designed to measure some set of behavioral dimensions or competencies
• Performance/behavior in exercises is evaluated by sets of assessors
• Information from multiple assessors is integrated to yield a range of scores
Common But Deficient Validation Strategies
• Criterion-related validity studies
– Correlate OAR with criterion measures
• e.g., OAR correlates .40 with performance measures, but written ability tests do considerably better (.50’s)
– There may be practical constraints to using tests, but psychometric purists are not concerned with the practical
Common But Deficient Validation Strategies
• Construct Validity Studies
– Convergent and Discriminant Validity assessments
• AC scores often show relatively strong exercise effects and relatively weak dimension/competency effects
• This is probably not the right model for assessing construct validity, but it is the one that has dominated much of the literature
Common But Deficient Validation Strategies
• Content validation– Map competencies/behavioral descriptions onto
the job
– If competencies measures by AC show reasonable similarity to job competencies, content validity is established
• Track record for ACs is nearly perfect because job information is used to select competencies, but evidence that competencies are actually measured is often scant
Ask the Wrong Question, Get the Wrong Answer
• Too many studies ask “Are Assessment Centers Valid?”
• The Question should be “Valid for What?”– That is, validity is not determined by the
measurement procedure or even by the data that arises from that procedure. Validity is determined by what you attempt to do with the data
Sources of Validity Information
• Validity for what?– Determine the ways you will use the data coming
out of an AC. ACs are not valid or invalid in general, they are valid for specific purposes
• Cast a wide net!– Virtually everything you do that gives you insight
into what the data coming out of an AC mean can be thought of as part of the validation process
Sources of Validity Information
• Raters– Rater training, expertise, agreement
• Exercises– What behaviors are elicited, what situational
factors affect behaviors
• Dimensions– Is there evidence to map from AC behavior to
dimensions to job
Sources of Validity Information
• Scores– Wide range of assessments of the relationships
among the different scores obtained in the AC process provide validity information
• Processes– Evidence that the processes used in an AC tend to
produce reliable and relevant data is part of the assessment of validity
Let’s Validate An Assessment Center!• Design the AC• Identify the data that come out of an AC• Determine how you want to use that data• Collect and evaluate information relevant to
those uses– Data from pilot tests– Analysis of AC outcome data– Evaluations of AC components and process– Lit reviews, theory and experience
Design• Job - Entry-level Human Resource Manager
• Competencies– Active Listening– Speaking– Management of Personnel Resources – Social Perceptiveness
• Being aware of others' reactions and understanding why they react as they do.
– Coordination • Adjusting actions in relation to others' actions.
– Critical Thinking – Reading Comprehension– Judgment and Decision Making– Negotiation – Complex Problem Solving
DesignExercise #1 Exercise #2 Exercise #3
Competency 1
Competency 2
Competency 3
Competency 4
Competency 5
Competency 6
Populate the Matrix – which competencies and what exercises?
Assessors
• How many, what type, which exercises?
Exercise 1 Exercise 2 Exercise 3
Assessor 1
Assessor 2
Assessor 3
Assessor 4
Assessor 5
Assessment Data• Individual behavior ratings?– How will we set these up so that we can assess their
accuracy or consistency?
• Individual competency ratings?– How will we set these up so that we can assess their
accuracy or consistency?
• Pooled ratings– What level of analysis?
• OAR
Uses of Assessment Data• Competency
– Is it important to differentiate strengths and weaknesses?
• Exercise– Is AC working as expected (exercise effect might or might not be
confounds)
• OAR– Do you care how people did overall?
• Other– Process tracing for integration. Is it important how ratings change in
this process?
Validation
• The key question in all validation efforts is whether the inferences you want to draw from the data can be supported or justified
– A question that often underlies this assessment involves determining whether the data are sufficiently credible to support any particular use
Approaches to Validation
• Assessment of the Design– Competency mapping• Do exercises engage the right competencies• Are competency demonstrations in AC likely to
generalize• Are these the right competencies?
– Can assessors discriminate competencies?• Are the assessors any good?
– Do we know how good they are
Approaches to Validation
• Assessment of the Data– Inter-rater agreement– Distributional assessments– Reliability and Generalizability analysis– Internal structure– External correlates
Approaches to Validation
• Assessment of the Process– Did assessors have opportunities to observe
relevant behaviors?– What is the quality of the behavioral information
that was collected?– How were behaviors translated into evaluations– How were observations and evaluations
integrated
Approaches to Validation
• Assessment of the Track Record– Relevant theory and literature– Relevant experience with similar ACs• Outcomes with dissimilar ACs
Assessment of the Design:Competencies
• Competency Mapping (content validation)– Do exercises elicit behaviors that illustrate the
competency• Are we measuring the right competencies?• Evidence that exercises reliably elicit the competencies
– Generalizability from AC to world of work
Assessment of the Design:Assessor Characteristics
• Training and expertise• What do we know about their performance as
assessors– One piece of evidence for validity might be
information that will allow us to evaluate the performance or the likely performance of our assessors
Assessment of the Data
• Distributional assessments– Does the distribution of scores make sense– Is the calibration of assessors reasonable given
the population being assessed– Is the variability in scores?
Assessment of the Data
• Reliability and Generalizability analyses– Distinction between reliability and validity is not
as fundamental as most people believe– Assessments of reliability are an important part of
validation– The natural structure of AC data fits nicely with
generalizability theory
Assessment of the Data
• Generalizability– AC data can be classified according to a number of
factors – rater, ratee, competency, exercise– ANOVA is the stating point for generalizability
analysis – i.e., identifying the major sources of variability• Complexity of ANOVA design depends largely on
whether the same assessors evaluate all competencies and exercises or some
Assessment of the Data
• Generalizability – an example
– Use ANOVA to examine the variability of scores as a function of• Candidates• Dimensions (Competencies)• Exercises (potential source of irrelevant variance)• Assessors
Assessment of the Data
Candidates Overall differences in candidate performance
Dimensions Does the pool of candidates show more strength in some competency areas than others?
Assessors Are assessors calibrated?
C x D Do candidates show different strengths and weaknesses?
C x A Do assessors agree about candidates?
A X D Do assessors agree about dimensions (competencies)
C x D x A Do assessors agree in their evaluations of the patterns of strength and weakness of different candidates?
Assessment of the Data
• Internal Structure– Early in the design phase, articulate your
expectations regarding the relationships among competencies and dimensions
– This articulation becomes the foundation for subsequent assessments• It is impossible to tell if the correlations among ratings
of competencies are too high or too low unless you have some idea of the target you are shooting for
Assessment of the Data
• Internal Structure– Confirmatory factor analysis is much better than
exploratory for making sense of the internal structure
– Exercise effects are not necessarily a bad thing. No matter how good assessors are, they cannot ignore overall performance levels• Halo is not necessarily an error, it is part of the
judgment process all assessors use
Assessment of the Data
• Confirmatory Factor Models– Exercise only• Does this model provide a reasonable fit?
– Competency• Does this model provide a reasonable fit?
– Competency + exercise• How much better is the fit when you include both sets
of factors?
Assessment of the Data
• External Correlates– The determination of external correlates depends
strongly on• The constructs/competencies you are trying to
measure• the intended uses of the data
Assessment of the Data
• External Correlates– Alternate measures of competencies– Measures of the likely outcomes and correlates of
these competencies
Competencies– Active Listening– Speaking– Management of Personnel Resources – Social Perceptiveness– Coordination – Critical Thinking – Reading Comprehension– Judgment and Decision Making– Negotiation – Complex Problem Solving
Alternative Measures
Critical Thinking , Reading ComprehensionStandardized tests
Judgment and Decision MakingSupervisory ratings, Situational
Judgment Tests
Possible Correlates
• Active Listening– Success in coaching assignments– Sought as mentor
• Speaking– Asked to serve as spokesman, public speaker
• Negotiation– Success in bargaining for scarce resources
Assessments of the Process
• Opportunities to observe– Frequency with which target behaviors are
recorded
• Quality of the information that is recorded– Detail and consistency• Influenced by format – e.g., narrative vs. checklist
Assessments of the Process
• Observations to evaluations– How is this done?– Consistent across assessors?
• Integration– Clinical vs. statistical• Statistical integration should always be present but
should not necessarily trump consensus• Process by which consensus moves away from
statistical summary should be transparent and documented
Assessment of the Track Record
• The history of similar ACs forms part of the relevant research record
• The history of dissimilar ACs is also relevant
The Purpose-Driven AC
• What are you trying to accomplish with this AC?
• Is there evidence this AC or ones like it have accomplished or will accomplish this thing?
– Suppose the AC is intended principally to serve as part of leadership development. Identifying this principal purpose helps to identify relevant criteria
Criteria
• Advancement• Leader success• Follower satisfaction• Org success in dealing with turbulent
environments– The process of identifying criteria is largely one of
thinking through what the people and the organization would be like if you AC worked
An AC Validation Report
• Think of validating an AC the same way a pilot does his or her pre-flight checklist
• The more you know about each of the items on the checklist, the more compelling the evidence that the AC is valid for its intended purpose
AC Validity Checklist
• Do you know (and how do you know) whether:
– The exercises elicit behaviors that are relevant to the competencies you are trying to measure
– These AC demonstrations of competency are likely to generalize
AC Validity Checklist
• Do you know (and how do you know) whether:
– Raters have the skill, training, expertise needed?
– Raters agree in their observations and evaluations
– Their resolutions of disagreements make sense
AC Validity Checklist
• Do you know (and how do you know) whether:
– Score distributions make sense • Are there differences in scores received by candidates?• Can you distinguish strengths from weaknesses
AC Validity Checklist
• Do you know (and how do you know) whether:– Analyses of Candidate X Dimensions X Assessors
yields sensible outcomes• Assessor – are assessors calibrated?• C X D – do candidates show patterns of strength and
weakness?• A X D – do assessors agree about dimensions• C X D X A – do assessors agree about evaluations of
patterns of strengths and weaknesses
AC Validity Checklist
• Do you know (and how do you know) whether:– Factor structure makes sense, given what you are
trying to measure• Do you know anything about the relationships among
competencies?• Is this reflected in the sorts of factor models that fit?
AC Validity Checklist
• Do you know (and how do you know) whether:– Competency scores related to• Alternate measures of these competencies• Likely outcomes and correlates of these competencies
AC Validity Checklist
• Do you know (and how do you know) whether:– There are Competency X Treatment Interactions• Identifying individual strengths and weaknesses of
most useful when different patterns will lead to different treatments (training programs, development opportunities) and when making the right treatment decision for each individual leads to better outcomes than treating everyone the same
AC Validity Checklist
• Do you know (and how do you know) whether:– The process supports good measurement• Do assessors have opportunities to observe relevant
behaviors?• Do they record the right sort of information?• Is there a sensible process for getting from behavior
observation to competency judgment?
AC Validity Checklist
• Do you know (and how do you know) whether:– The integration process helps or hurts• How is integration done?• Is it the right method given the purpose of the AC?• How much does the integration process change the
outcomes?
AC Validity Checklist
• Do you know (and how do you know) whether:– Other similar ACs have worked well– Other dissimilar ACs have worked better, worse,
etc.
AC Validity Checklist
• Don’t overdo the checklist metaphor– A pilot will not take off unless everything on the
list checks out– Validation is not an all-or-none thing• More evidence is better• Broadly based evidence is better than lots of one kind• Validation checklist can help you improve AC
– Your goal is not to make an AC perfect, but to accumulate evidence