TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE ... · Ghiselli and Haire (1960) and Bass...

TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC

CRITERIA PHENOMENON

by

DAVID BRENT BIRKELBACH

(Under the Direction of Charles Lance)

ABSTRACT

Dynamic Criteria refers to the systematic instability of criterion measures and

predictive validities examined across longitudinal time periods. To date, much of the

research used to support the dynamic criteria phenomenon has been fraught with

methodological flaws (Barrett et al., 1985), limited by the utilization of single-task

performance as the principle criteria of interest, and has failed to establish boundary

conditions for qualitatively distinct predictor constructs. For the current study, meta-

analytic techniques were used to examine the criteria-related validates of two common

selection instruments, namely cognitive ability assessments and personality inventories, in

relation to time-bound performance appraisals. In addition, performance trajectories were

investigated through the use of weighted least squares multiple regression analyses to

establish the systematic nature of change in predictive-validity coefficient trends over time.

Results indicated that the criterion-related validities specific to the General Mental Ability,

Emotional Stability, and Openness to Experience predictors do, in fact, change over time

when measured against either general and/or specific criterion types. Performance

trajectories for each of the aforementioned predictors offer support for the simplex-like

patterns traditionally subscribed to changes in predictive validities over time (Henry &

Hulin, 1987). Findings are discussed in the context of Murphy’s (1989) dynamic model of

job performance.

INDEX WORDS: Dynamic Criteria, Meta-Analysis, Weighted Least Squares Multiple

Regression, Cognitive Ability, Personality


CRITERIA PHENOMENON

by


B.A., Southwestern University, 2001

M.S., Saint Mary’s University, 2007

A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirements for the Degree

DOCTOR OF PHILOSOPHY

ATHENS, GEORGIA

2013

© 2013

David Brent Birkelbach

All Rights Reserved


CRITERIA PHENOMENON

by


Major Professor: Charles Lance Committee: Nathan Carter Robert Mahan Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia May 2013

iv

TABLE OF CONTENTS

Page

LIST OF TABLES ......................................................................................................................... vi

LIST OF FIGURES ..................................................................................................................... viii

CHAPTER

1 INTRODUCTION .........................................................................................................1

Historical Overview .................................................................................................2

Definitions of Dynamic Criteria ..............................................................................4

Murphy’s (1989) Dynamic Model of Performance .................................................8

Criticisms and Limitations of the Previous Dynamic Criteria Literature ..............11

2 CURRENT STUDY.....................................................................................................16

Purpose ...................................................................................................................17

Cognitive Ability Measures ...................................................................................17

Cognitive Ability and Dynamic Criteria ................................................................20

Personality Tests ....................................................................................................21

Personality and Dynamic Criteria ..........................................................................24

3 METHODS ..................................................................................................................30

Literature Search per Selection Device ..................................................................30

Criteria for Inclusion ..............................................................................................32

Coding Procedures .................................................................................................34

Data Analysis .........................................................................................................35

v

Moderator Detection ..............................................................................................36

Moderator Estimation ............................................................................................37

4 RESULTS ....................................................................................................................45

Overall Validity Coefficients .................................................................................45

Overall Continuous Moderator Analysis ...............................................................49

Validity Coefficients by Criterion Type ................................................................59

Continuous Moderator Analysis by Criterion Type ...............................................64

Tests for Availability Bias .....................................................................................77

5 DISSCUSION ..............................................................................................................78

GMA-Performance Relationships over Time ........................................................79

FFM-Performance Relationships over Time .........................................................80

Implications and Future Research ..........................................................................84

Limitations .............................................................................................................88

Conclusion .............................................................................................................92

REFERENCES ..............................................................................................................................94

REFERENCES FOR GMA META-ANALYSES .......................................................................109

REFERENCES FOR FFM META-ANALYSES ........................................................................112

APPENDICES

A ORIGINGS OF DYNAMIC CRITERIA ...................................................................115

B ACKERMAN’S MODEL OF SKILL AQUISTITION .............................................121

C CHANGING TASKS AND CHANGING SUBJECTS MODELS ...........................125

D PERFORMANCE TRAJECTORIES ........................................................................127

vi

LIST OF TABLES

Page

Table 1: GMA Studies Used in the Meta-Analyses .......................................................................39

Table 2: Big Five Personality Studies Used in the Meta-Analyses ...............................................42

Table 3: Meta-Analysis Results for the Criterion-Related Validities between GMA, the

Big Five Personality Dimensions, and Performance .........................................................48

Table 4: Results for Continuous Moderators of Predictor-Performance Relationships ................52

Table 5: Results for Continuous Moderators of Predictor-Performance Relationships

Without Outliers .................................................................................................................53

Table 6: Meta-Analysis Results for the Criterion-Related Validities between Predictors

and Criteria Type ...............................................................................................................63

Table 7: Results for Continuous Moderators of Predictor-Training Performance

Relationships ......................................................................................................................67

Table 8: Results for Continuous Moderators of Predictor-Training Performance

Relationships Without Outliers ..........................................................................................68

Table 9: Results for Continuous Moderators of Predictor-Job Performance Relationships ..........69

Table 10: Results for Continuous Moderators of Predictor-Job Performance Relationships

Without Outliers .................................................................................................................70

Table 11: Results from File-Drawer Test for Availability Bias ....................................................77

Table 12: Intercorrelations of Semester Grades in Electrical Engineering,

Humphreys (1960) .........................................................................................................119

vii

Table 13: Intercorrelations of Pattern Comprehension over Repeated Trials,

Fleishman and Hemple (1955) ......................................................................................119

viii

LIST OF FIGURES

Page

Figure 1: GMA-General Performance Validity over Time ............................................................54

Figure 2: GMA-General Performance Validity over Time without Outliers ................................55

Figure 3: Emotional Stability-General Performance Validity over Time ......................................56

Figure 4: Openness-General Performance Validity over Time .....................................................57

Figure 5: Openness-General Performance Validity over Time without Outliers ..........................58

Figure 6: GMA-Training Performance Validity over Time ..........................................................71

Figure 7: GMA-Training Performance Validity over Time without Outliers ...............................72

Figure 8: GMA-Job Performance Validity over Time ...................................................................73

Figure 9: Emotional Stability-Training Performance Validity ......................................................74

Figure 10: Openness-Job Performance Validity over Time ..........................................................75

Figure 11: Openness-Job Performance Validity over Time without Outliers ...............................76

Figure 12: Ackerman’s Model of Skill Acquisition ....................................................................122

1

CHAPTER 1

INTRODUCTION

The relationship between an individual’s personal qualities and their ability to

perform in a given position has been the cornerstone of industrial psychology since the

advent of the Army Alpha and Beta tests of mental ability during World War I. The goal of

selecting and promoting employees who could succeed in the workplace has led to more

than a century of validity studies designed to identify the individual differences that best

result in increased efficiency, effectiveness, and productivity. The importance of predictive

validity in personnel selection is due, in part, to the direct proportional relationship

between predictive validity coefficients and the practical utility of the selection method

(Schmidt & Hunter, 1998). In other words, economic gains largely rest on the accuracy of a

selection measure to predict job performance.

One issue specific to the current study that can potentially affect the estimates of

predictive validity involves the stability of the criteria over time. Performance criteria has

been treated as a static concept throughout the history of validity studies in industrial-

organizational (I-O) psychology as evidenced by the practice of collecting criterion data at a

single time-point, the use of aggregate scores or composites, the overwhelming use of

cross-sectional data, and the practice of validating instruments with initial performance

(Henry & Hulin, 1987). However, a growing body of research has provided support for the

notion that criteria are not static and that job performance varies systematically when

examined longitudinally (Austin & Villanova, 1992).

2

Also known as dynamic criteria, the concept that performance does not remain

temporally stable has profound consequences for the conduct of validity studies and

subsequent utility of selection devices. For instance, if criteria do change over time,

assumptions regarding the longitudinal stability of predictive estimates for selection into

schools, advanced training program, employment, and promotion may be founded on a

flawed pretence, thus limiting the opportunity to identify true, sustainable talent. Since the

majority of selection and placement programs utilize criteria gathered at a single point in

time, or validate with the use of cross-sectional data, validity estimates may be greatly

distorted and only reveal part of a greater picture (Henry & Hulin, 1989). The current

study contributes to the issue of dynamic criteria by examining the criterion-related

validities of two common selection devices (i.e., cognitive ability measures and personality

inventories) in relation to job performance over time through the use of meta-analytic

techniques. Steps will be taken to determine the nature of the performance trends in terms

of directional change, magnitude, and linearity.

Historical Overview

As evidence of unstable criteria and decaying predictive-validities began to emerge

in the industrial psychology literature (e.g. Adams, 1953; Fleishman & Hemple, 1954, 1955;

Rothe, 1946, 1947, 1951; Tiffin, 1942; Worbois, 1951), Ghiselli (1956) called into question

the field’s stance of job performance as a stable construct and advocated for research that

explored, what he termed “dynamic criteria.” According to Ghiselli (1956), the study and

use of static criteria did not account for the instability of criteria over time, but simply

relegated criteria to the mere summation of data collected at a single time point.

Furthermore, Ghiselli (1956) provided two operational methods to identify the dynamic

3

nature of performance. First, he suggested that intercorrelations among criterion

measures at different time points could be used to ascertain an overall pattern of

performance. Ideally, correlations examined over a long time period, such as a span of

years, could inform the extent that the criterion systematically varies with time. Second,

Ghiselli (1956) suggested that changes in predictive validity could be accounted for by

examining the correlations between scores on selection tests and production measures at

varying time points.

Ghiselli and Haire (1960) and Bass (1962) were the first to directly implement

Ghiselli’s (1956) suggestions into empirical field studies. For example, Ghiselli and Haire

(1960) examined a sample of newly hired taxicab drivers over their first 18 weeks of

employment. Intercorrelations among criteria generally declined suggesting that the rank

order of performance had changed with time. Validity coefficients between a test battery

and the criterion also generally declined over the 18 week period, although this was not the

case for all predictors. Bass (1962) extended the length of time to 48-months in an

examination of sales personnel. Consistent with Ghiselli and Haire’s (1960) findings,

intercorrelations of the criteria across time periods began to decline with the greatest

reduction occurring between the first and last ratings. In this case, all predictive validity

coefficients declined over the 48-month period.

While Ghiselli (1956), Ghiselli and Haire (1960) and Bass (1962) sought to

specifically examine dynamic criteria in the workplace, researchers exploring the temporal

reliability of performance measures (e.g. Rambo, Chomiak, & Price, 1983; Rambo, Chomiak,

& Roundtree, 1987; Rothe, 1946a, 1946b, 1947, 1951, 1970, 1978; Rothe & Nye, 1958,

1959, 1961; Tiffin, 1942) and those uncovering simplex patterns in ability-performance

4

coefficients (e.g. Bass, 1962; Deadrick & Madigan, 1990; Dennis, 1954, 1956; Dunham,

1974, Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963; Ghiselli

& Haire, 1960; Hanges, Schneider, & Niles, 1990; Henry & Hulin 1987; Humphreys, 1960,

1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959) also indirectly contributed to

the growing dynamic performance criteria literature by providing evidence of the

phenomenon (See Appendix A for full summary of both temporal reliability and simplex

pattern studies).

Definitions of Dynamic Criteria

After the initial conceptualizations of dynamic performance, a series of critical

reviews based on the extant literature at the time provoked debates concerning definitions

of dynamic performance, the ubiquity of unstable criteria, proper methods to identify

changes in performance, alternative explanations, and underlying causes. Barrett,

Caldwell, Alexander (1985) were the first to question what they coined “the received

doctrine of dynamic performance.” They consolidated the earlier literature in an attempt

to clarify and distinguish the various operationalizations of dynamic performance, as well

as, provide a critical reanalysis of the evidence for each. Referring to previous sources,

they identified three definitions of dynamic criteria: (a) Changes in group average

performance over time (Casico, 1982; Ghiselli, 1956; Hanges et al., 1990; McCormick &

Ilgen, 1980), (b) changes in the rank-ordering of scores on the criterion over time (Bass,

1962; Blum & Naylor,1968; Deadrick & Madigan, 1990; Ghiselli,1956; Ghiselli & Haire,

1960; Hanges, et al., 1990; Korman, 1971; MacKinney, 1967, McCormack & Ilgen,1980), and

(c) changes in predictive validity over time (Austin, Humphreys, & Hulin, 1989; Blum &

Naylor, 1968; Cascio, 1982; Ghiselli, 1956; Guion, 1965; Korman, 1971; MacKinney, 1967;

5

Prien, 1966; Smith, 1976; Steele-Johnson, Osburn, & Pieper, 2000). The following is a

synopsis of the key arguments made by researchers pertaining to the merits of each of the

aforementioned definitions of dynamic performance, the literature and methods used to

support each definition, and the conclusions drawn about the legitimacy of the dynamic

criteria phenomenon.

Changes in mean performance over time. In their earlier works, Ghiselli and Haire

(1960) and McCormick and Ilgen (1980) proposed that dynamic criteria be defined as

changes in average group performance over time. This definition of dynamic performance

is usually measured by grouping a sample into categories, such as age, taking the mean

performance of each group, and comparing the means longitudinally. Most studies that

utilize this approach are concerned with the concept of job tenure (often mislabeled job

experience), and how differences in tenure relate to job performance (e.g. Avolio,

Waldman, & McDaniel, 1990; Gordon & Fitzgibbons, 1982; Gordon & Johnson, 1982;

Hoffman, Jacobs, & Guerra, 1992; Jacobs, Hofmann, & Kriska, 1990; McDaniel, Schmidt, &

Hunter, 1988; McEvoy & Cascio, 1989; Medoff & Abraham, 1980, 1981; Schmidt, Hunter, &

Outerbridge, 1986; Schmidt, Hunter, Outerbridge, & Goff, 1988). This definition has been

criticized as being conceptually and operationally weak (Austin et al., 1989; Barrett et al.,

1985) because average performance may not reflect the individual performances

comprising them. Group-level performance could even change while individuals’

performance remains constant if the performance level of those leaving the organization

were different than the performance level of those entering (Boudreau & Berger, 1985).

Austin et al. (1989) continued the criticism by stating that while mean performance could

be used to capture systematic changes over repeated practice, the measurement of

http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib23#idbib23�

6

relationships over time, ideally identified through simplex matrices, should be the principle

focus when studying dynamic criteria.

The change in the rank-ordering of scores on the criterion over time directly

addresses the issue of stability (Hanges, et al., 1990). Changes in rank-order would imply,

as an extreme example, that high performers may eventually become low performers, and

vise versa (Ployhart & Hakel, 1998). This second definition is often measured through the

examination of correlations between criterion scores at multiple points in time (Barrett et

al., 1985; Deadrick & Madigan, 1990; Hanges et al., 1990). Such studies have been framed

as considering the test–retest reliability or the stability of performance ratings. If

performance is truly dynamic the criterion correlations are proposed to decrease as time

points increase essentially forming a simplex-like pattern.

Hulin, Henry, and Noon (1990) used meta-analytic techniques to investigate the

stability of performance measures across time by examining Time Period by Time Period

matrices of performance intercorrelations and found that all 23 validity sequences

examined in their study decreased over time. Abundant empirical evidence has verified the

definition of changing rank order of individual performance scores (Deadrick & Madigan,

1990; Hanges et al., 1990; Henry & Hulin, 1987; Hofmann, Jacobs, & Baratta, 1993;

Hofmann et al., 1992) principally through examinations of simplex matrices.

Changes in predictive validity over time. Central to the current study is the definition

that dynamic performance occurs when predictive validities change over time. If

predictive relationships are temporally variant, continued validity assessment may be

required. Research using this definition has focused on examinations of the criterion-

related validity of predictors such as intelligence and psychomotor ability for predicting

http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib36#idbib36�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�

7

task performance over multiple time periods. While stability coefficients tend to decrease

over time across studies (Deadrick & Madigan, 1990; Hanges et al., 1990; Henry & Hulin,

1987; Hofmann et al., 1992, 1993), there is some debate as to the nature of changes in

predictive validities. Some argue that dynamic criteria universally leads to a degradation in

validity over time (Austin et al., 1989; Henry & Hulin, 1987, 1989; Hulin et al., 1990; Keil &

Cortina, 2001). Whereas others suggest that the nature of change in predictive validities is

determined by the predictor in question or external factors that may influence

performance over time as evidenced in some studies where predictive validities either

remained stable or increased with time (Ackerman, 1987, 1988, 1989, 1992; Barrett et al.,

1985; Barrett & Alexander, 1989; Deadrick & Madigan, 1990; Hanges et al., 1990; Murphy,

1989).

Hulin et al. (1990) conducted a meta-analysis to determine if time was a source of

systematic variance in test validities by utilizing literature on temporal ability-performance

relationships that spanned organizational, educational, and developmental research. The

authors found that time accounted for the variance of predictive validities beyond variance

attributable to statistical artifacts. In general, predictive validities decreased monotonically

over time. Of all the validity sequences analyzed, 44 out of 54 showed negative slopes for

the regressions of predictive validity onto time.

Kiel and Cortina (2001) also expanded on Hulin et al. (1990) through the addition of

potential moderators to examine changes in predictive validities over time. Furthermore,

they tested the nature of the relationships using polynomial equations. Their findings

provide strong evidence that validities do deteriorate over time as observed across

predictors (i.e., cognitive ability, perceptual speed ability, and psychomotor ability), criteria

http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�

8

(i.e. consistent and inconsistent task performance), and time periods (i.e. short-term and

long-term performance). Patterns were also found that suggested ability-performance

relationships began to decay in the early stages of task performance for both consistent and

inconsistent tasks.

Of particular interest are Keil and Cortina’s (2001) findings concerning curvilinear

effects. Both quadratic and cubic effects were found for all three abilities under all

moderating conditions. Keil and Cortina (2001) attributed the curvilinear relationships to

a “Eureka effect” where individuals with high levels of ability maintain or steadily increase

their level of performance over time then come up with an insight that results in a sudden

jump in performance. The Eureka effect can be captured by a bifurcation in the ability-

performance variables. Keil and Cortina (2001) offered two alternatives for how the

bifurcation could be utilized in research. Each bifurcation may cause a “different predictor

to wane in importance such that it can be used to predict while performance remains on a

given plateau” (Keil & Cortina, 2001, p. 689), or knowing when a bifurcation is likely to

occur may inform researchers of the length of time they have before a predictor diminishes

in utility.

Murphy’s (1989) Dynamic Model of Performance

While evidence was mounting in support of the dynamic criteria phenomenon,

researchers began to speculate the theoretical causes for changes in performance over

time. The theoretical impetus for the current study is based on Murphy’s (1989) dynamic

model of performance. In response to the growing acceptance that cognitive ability-

performance relationships remained invariant over time (Schmidt et al., 1986), Murphy

(1989) offered a model of job performance that focused on two classes of predictors:

9

Abilities and dispositional variables. Murphy’s (1989) definition of abilities included both

higher and lower order abilities such as general cognitive ability and perceptual speed.

Dispositional variables included individual differences in personality, interests, values, and

motivation. In the dynamic model of job performance, rank order changes in job

performance over time and declining predictive validities are the result of fluctuations in

activities requiring varying levels of either abilities or dispositional variables.

Building on Ackerman’s three stages of skill acquisition as it applied to the

workplace (See Appendix B for review of Ackerman’s model), Murphy (1989) posited the

dynamic model of job performance as a progression between two distinct stages: The

transition stage and the maintenance stage. During the transition stage, employees are

faced with some manner of change. They may be new to a job, recently promoted, or an

organizational intervention has fundamentally changed the job duties required of the

employee. In such cases of transition, the employee must heavily rely on the use of

cognitive ability and sound judgment to lean the new duties, goals, and strategies for

execution. In the maintenance stage, major requirements for the job are well-learned and

do not heavily weigh on cognitive ability to be performed. At this point, dispositional

variables, such as personality and motivation, have a greater influence on job performance

than cognitive ability.

Deadrick and Madigan (1990) provided empirical evidence for Murphy’s (1989)

dynamic performance model in their attempt to distinguish between the predictive

influences of employee experience. Concerned that the standard definitions of dynamic

criteria did not adequately distinguish between actual changes in job performance with

changes in the performance evaluation context, Deadrick and Madigan (1990) defined

10

criterion changes as attributes of either individual differences (i.e., performance

consistency), the organizational context (i.e., evaluation consistency), or changes in

measurement procedure (i.e., measurement reliability). To test the performance

consistency definition, Deadrick and Madigan (1990) collected periodic measures of both

objective (i.e. weekly output) and subjective (i.e. supervisory ratings of production

quantity) performance for sewing machine workers over a period of six months.

Distinctions were made between both experienced and inexperienced employees.

Predictors also included cognitive and psychomotor ability. The results for performance

consistency strongly supported the simplex pattern for stability measures despite previous

experience, but failed to do so for the tests of predictive validity where cognitive ability

actually began to increase after training and psychomotor ability remained relatively

stable. The conflicting results were interpreted as evidence of Muphy’s (1989) dynamic

model of job performance, as dispositional variables such as motivation were proposed to

account for the changing patterns in predictive validities.

Hanges et al. (1990) applied the interactionist perspective of psychology to

Murphy’s (1989) dynamic performance model as a means to account for aberrations to the

simplex pattern in stability measures. According to interactionist psychology, behavior is

not merely determined by either the person or situational variables but is a function of the

interaction between person and situation. Furthermore, a simplex pattern is expected

when the stability of performance over perceptually different situations is explored,

however, when the situations are similar, behavior should remain relatively stable over

time.

11

Murphy’s (1989) dynamic model of performance is clear regarding predictive

validities during the transition phase, but the maintenance stage can be confounded by

both stable and dynamic dispositional variables. Hanges et al. (1990) maintained that the

interactionist perspective can help clarify the effects situational variables have on

performance during the maintenance stage. For example, as situations become more stable

over time, such as those found in the maintenance phase, an individual’s performance in

that situation would become stable as well. Hanges et al. (1990) empirically evaluated the

interactionist perspective by examining student evaluations of university professors over

time (i.e., the person), the particular courses taught by the professors (i.e., the situation),

and the professors who taught the same course over time (i.e., the person-situation

interaction). Results showed that a simplex pattern was observed in the situation and

person analyses, but not in the person-situation interaction analysis, thus supporting the

utility of the interactionist perspective in predicting the conditions where a simplex pattern

may or may not appear.

Criticisms and Limitations of the Previous Dynamic Criteria Literature

While Barrett et al. (1985) did concede that, in some cases, predictive validities may

deteriorate over long time spans, the authors speculated that dynamic criteria are quite

plausibly the result of changes in the abilities and skills required for the job (i.e. changing

subjects model; Adams, 1957) or changes in the job itself (i.e. the changing-task model;

Woodrow, 1938a; 1938b; Fleishman, 1960; 1972). The changing-subjects model is based

on the hypothesis that abilities change over time even as the tasks remain relatively stable.

The changing-task model assumes that the structure of the task is the variable component

that undergoes change during skill acquisition (Alvares & Hulin, 1972, 1973; See Appendix

12

C for review of changing-task and changing-person models of performance). In the few

instances where they did find significant change over time, Barrett et al. (1985) felt that

dynamic criteria were more the result of methodological artifacts than systematic

variation. The authors pointed out the studies used to support the dynamic criteria

phenomenon were so rife with methodological flaws, that any fluctuations in predictor-

criteria validities were most likely the result of a number of study design related artifacts,

including: (a) temporal unreliability of the criterion, (b) contamination from unmatched

samples (i.e., criterion scores were based on individuals with differing levels of experience

and tenure), and (c) the lack of a standardized measure of performance. In light of these

findings, Barrett et al. (1985) claimed that the error variance caused by the unreliability of

the criterion measure probably accounted for a majority of fluctuations in validity

coefficients. Unfortunately, many of the studies that followed Barrett et al.’s (1985)

literature review continued to suffer from the design flaws noted by the authors.

In terms of limitations, studies that proposed the ubiquity of dynamic predictive

validities across all forms of ability did not distinguished between classes of individual

differences and, thus, failed to establish boundary conditions for examining the predictor-

criteria relationships over time. For example, in their meta-analysis to determine the

systematic variability of predictive validities as a function of time, Hulin et al. (1990)

gathered data from areas that spanned the research regarding the prediction of

performance (e.g. experimental studies, studies of academic performance, and growth and

development research) but did not provide an inclusion criterion to classify the type of

predictors used. Consequently, an entire host of individual differences ranging from

psychomotor skills to aerial orientation were lumped together. While the results of Hulin

13

et al.’s (1990) meta-analysis provided evidence that the majority of predictor-criterion

relationships systematically follow a decreasing temporal trend, valuable information may

have been lost through the act of indiscriminately clustering predictors.

Given the implications that changes in predictive validities over time have on human

resource practices such as selection, promotion, and interventions, the lack of research

dedicated to dynamic criteria under specific boundary conditions is surprising. If

predictive validities associated with construct-based selection measures change over time,

the specific predictability trends should be evaluated to improve decision-making

procedures and the utility of the selection device under consideration. By grouping

predictors under construct-based selection measures, both initial performance and

subsequent performance curves can be used to inform longitudinal policy decisions. Henry

and Hulin (1987) further articulated the point by stating that “the failures of researchers to

develop models that address long-term predictions and build into predictive equations

measures that will reflect expected changes in the abilities of the selected employees or

students is a source of serious concern” (Henry & Hulin, 1987, p. 461). Currently, very little

empirical support has been provided to determine the temporal validities of common

selection devices and their underlying constructs.

Another limitation found in the dynamic performance literature concerns the

operational definitions of the criteria. Many of the studies used to support the dynamic

criteria phenomenon utilized experimental designs conducted in laboratories where

analysis centered on a task performance criterion. For example, all criteria used in Kiel and

Cortina’s (2000) study were characterized as either task performance or GPA, with only

one criterion indicative of job performance ratings (i.e. Deadrick & Madigan, 1990).

14

Substantively speaking, job performance differ from task performance in that job

performance is multidimensional and made up of many tasks, while task performance is

typically represented by a single facet of the job. Researchers have questioned the

generalizability of using tasks as a criterion, especially those measured in short time frames

(i.e. a matter of minutes), noting that they shared little resemblance to job performance

criteria measured in the environment of an applied setting (Barrett et al., 1989; Farrell &

McDaniel, 2000).

As a consequence, experiments used to examine skill acquisition in task

performance over time consist mostly of student samples or of individuals taken out of the

job context. Such studies have a tendency to isolate the individual from real and complex

high stake scenarios where adapting, understanding, and successfully performing the

elements that comprise a job is imperative. In cases where actual workers were included

in an applied context, inquiries into dynamic criteria failed to design predictive studies

with an expressed point of entry into a new job, a training period, a new position, or after

an organizational intervention. Such studies, instead, capitalized on samples made up of

individuals with differing experience levels, and, possibly, in separate employee stages (i.e.

transitional or maintenance stages). To ensure shared equivalent histories, samples

should consist of a cohort that is in the same or comparable level of entry, training, or

promotion.

Finally, while increasing empirical evidence has been used to verify the changes in

predictive validities over time through the examinations of simplex matrices (Ackerman,

1987, 1988, 1992; Deadrick & Madigan, 1990; Henry & Hulin, 1987; Hofman et al., 1992,

1993; Hulin, 1990) the use of simplex patterns to support dynamic criteria suffers from

15

many limitations. For instance, the simplex pattern provides little information about

intraindividual change (i.e. changes within an individual) over time and does not shed light

on the nature of the pattern changes. A growing body of research has begun to transition

from the use of autoregressive simplex patterns which primarily allow for modeling the

effects of past performance scores on future performance scores, to investigations of

intraindividual change in latent trajectories (Deadrick, Bennett, & Russell, 1997; Hofmann

et al., 1992, 1993; Ployhart & Hakel, 1998; Stewart & Nandkeolyar, 2006; Sturman &

Trevor, 2001; Thoreson, Bradley, Bliese, & Thoreson, 2004; See Appendix D for a full

review of latent performance trajectories in examining dynamic criteria).

While not fully enveloped into the mainstream literature, the notion that job

performance varies over time for a given employee is becoming increasingly accepted in

the field of I-O psychology. Furthermore, the relative ubiquity of the simplex pattern in

almost all studies concerning ability-performance relationships over time, and the

examinations of latent performance trajectories has contributed to a firm empirical

foundation for support of the dynamic criteria phenomena. In light of these findings, it

seems necessary to reevaluate the practice of using a single indicator of performance in

validation studies, and address the aforementioned criticisms and limitations in an effort to

move toward a more accurate understanding of how selection devices fair over time. By

not examining dynamic criteria in relation to even the most common of selection methods,

I-O researchers may limit key conceptual understanding of the evolutionary sources of

variance in performance, and ostensibly deny increases in economic gains from proper

method selection.

16

CHAPTER 2

CURRENT STUDY

The current study draws heavily on Murphy’s (1989) dynamic model of

performance to address the limitations concerning boundary conditions, study design,

participants utilized, and operational definitions of criteria. Consistent with Murphy’s

distinction between ability and dispositional variables, two sets of analyses were

conducted. The first involved selection devices used to measure cognitive ability as

representative of the ability variables identified by Murphy. The second set of analyses

consisted of Big Five personality inventory dimensions as representative of dispositional

variables. In order to capture the progression from the transitional stage to the

maintenance stage, criterion-related predictive validity studies containing an initial

starting point of entry into a new job, a training period, a new position, or after an

organizational intervention were identified. As Murphy’s model is specifically associated

with conditions within a business environment, participants and the criteria of interests

were represented by actual workers in the field appraised through job performance

measures or participants in a real job training scenario. The use of stable work cohorts

that are in the same or comparable level of entry, training, or promotion were used to

address Barrett et al.’s (1985) criticism of contamination from unmatched samples and

ensured equivalent sample histories. Furthermore the use of meta-analytic techniques

were used to address Barrett et al.’s (1985) claim that dynamic criteria is the product of

temporal unreliability, range restriction, and insufficient power.

17

Purpose

The purpose of this study was to separately determine the criterion-related

validities of common selection devices, namely cognitive ability measures and personality

inventories, in relation to job performance over time through the use of meta-analytic

techniques. When predictive validities of the common selection devices were, indeed,

dynamic, further steps were taken to determine the nature of the performance trends in

terms of directional changes in magnitude and linearity. The following section is an

overview of the two selection devices (i.e. cognitive ability tests and personality

inventories) chosen for the current study, their relation to the job performance criterion as

determined by previous research, and an examination of how time has been explored as a

source of systematic variance in test validities for each predictor classification.

Cognitive Ability Measures

The one consistent finding concerning the dynamic nature of the predictor –criteria

relationship is that time-lagged correlations between ability measures and performance

have a tendency to deteriorate over increasing intervals (Henry & Hulin, 1987; Hulin et al.,

1990; Keil & Cortina, 2001). The majority of ability measures in the dynamic performance

literature are generally characterized by assessments of cognitive ability (Alvares & Hulin,

1972; Bass, 1962; Ghiselli & Haire, 1960, Flieshman & Hemple, 1954, 1955; Fleishman &

Rich, 1963; Humphreys, 1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959),

psychomotor ability (Ghiselli & Haire, 1960, Fleishman, 1960; Flieshman & Hemple, 1954,

1955; Fleishman & Rich, 1963; Hinrichs, 1970; Parker & Fleishman, 1959) and sensory

perception (Ackerman, 1988, 1990; Ackerman & Kanfer, 1993; Ackerman, Kanfer, & Goff,

1995; Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963;

18

Hinrichs, 1970; Parker & Fleishman, 1959; Powers, 1982) in relation to experimental task-

performance and educational assessments over time.

Due to the inherent differences between controlled task-based experiments and the

workplace, it is difficult to fully generalize task-proficiency as a criterion to job

performance. Unfortunately, there are surprisingly few studies that explore the criterion-

related validity of general mental ability (GMA or g) over subsequent measures of job

performance. The purpose of this portion of the study is to contribute to the dynamic

criteria literature by exploring the dynamic nature of individual GMA-job performance

validities over time by addressing the following questions: Are GMA-performance

validities, indeed, dynamic? If so, what is the nature and direction of the validity patterns

when plotted across time, and, finally, what implications do systematically changing

validity patterns have on validity generalization and utility issues? The following is an

overview of the current state of the literature regarding the use of GMA as a selection tool

and the predictive validities found in terms of job performance.

Interest in the relationship between cognitive ability and job performance has

predominately been approached in I-O psychology through the use of Spearmanian

frameworks (Lang, Kerstring, Hulsheger, & Lang, 2010). In 1904, Charles Spearman

proposed a two-factor theory of abilities that included general cognitive ability (g) and one

or more specific abilities (s). The conceptualization of GMA was used to explain the

positive manifold present across a set of ability tests. Specific abilities refer to unique test

properties that correspond to the variance in ability tests not attributed to a latent GMA

construct or error. When applied to certain factor analytic techniques, cognitive ability

tests reveal a multiple factor solution, but a second-order factor analyses based on the

19

correlation matrices of the first-order dimensions do commonly result in a single factor

(Carroll, 1993). As a result, GMA is characterized as a higher-order factor that accounts for

the variance in narrower first-order content ability factors.

Research findings have clearly established GMA as an important predictor of job

performance (Campbell, Glasser, & Oswald, 1996; Ree & Earles, 1992; Schmidt & Hunter,

1998). From a theoretical perspective, GMA is linked to general models of job performance

by directly influencing both declarative and procedural knowledge. According to

Campbell’s (1990) model of job performance, declarative and procedural knowledge are

determinants of job performance, thus, GMA influences the level of job performance

indirectly (e.g. Ackerman, 1987; Schmidt & Hunter, 1993, 1998; Schmidt et al., 1986). As

such, the acquisition of knowledge and the necessary skills to perform a job during training

and maintenance of those knowledge and skills throughout an employee’s tenure is highly

influenced by GMA (Jensen, 1998; Ree, Earles, & Carretta, 1998). Abundant empirical

evidence demonstrates that GMA predicts training and job performance across numerous

jobs and job families (Carretta, Perry, & Ree, 1996; Chan, 1996; Crawley, Pinder, & Herriot,

1990; Hunter & Hunter, 1984; Ree & Earles, 1992; Roth & Campion, 1992, Salgado, 1995;

Schmidt & Hunter, 1998; Vineburg & Taylor, 1972). For example, Hunter and Hunter

(1984) conducted a broad-based meta-analysis to assess the validity of GMA for both

training and job performance criteria. Their analysis included several hundred jobs across

numerous job families, as well as reanalysis of data from previous studies. The authors

estimated a true validity of GMA as .54 for training criteria and .45 for job performance

with the predictive validity of GMA increasing as a function of job complexity.

20

Cognitive Ability and Dynamic Criteria

The substantial body of research conducted to examine the predictive validity of

GMA and job performance (Hunter & Hunter, 1984; Jensen, 1986; Ree & Earles, 1992; Ree

et al., 1994; Schmidt, 2002; Schmidt & Hunter, 1998) treated performance as a stable

criterion, and therefore collected data during a single period of time, used a cross-sectional

sample, or validated the measures through concurrent design, thus resulting in a lack of

evidence to support the notion of unstable predictive validities in GMA-job performance

relationships. In light of the limited resources in applied psychology, a number of studies

have used cognitive based entrance exams such as the Scholastic Aptitude Test (SAT, e.g.,

Butler & McCauley, 1987; Mael & Hirsch, 1993) and the Law School Admission Tests (LSAT,

e.g., Hathaway, 1984; Powers, 1982) as predictors of Grade Point Average (GPA) over

subsequent semesters or years. Other researchers have relied on previous GPA or aptitude

composites as predictors of future GPA (Humphreys, 1960, 1968; Humphreys & Tabet,

1973; Lin & Humphreys, 1977; Powers, 1982; Winterbottom, Pitcher, & Miller, 1963).

Overall, results showed a general deterioration of predictive validities over time, but this

finding in not consistent across all educational studies (e.g. Powers, 1982; Winterbottom, et

al., 1963). Barrett and Alexander (1989) attributed the mixed results in educational

studies and the “fleeting nature of the prediction of grades” to incomparable metrics for the

criteria. They argued that GPAs from different schools, across different courses, and

curricula did not comprise the same measurement scale.

Much of the dynamic criteria literature produced from experimental psychology

utilized task performance as the central criteria (Ackerman, 1986, 1988, 1992; Ackerman &

Kanfer, 1993; Ackerman et al., 1995; Ackerman & Woltz, 1994; Fleishman & Hempel, 1954,

21

1955, Fleishman & Rich, 1963; Keil & Cortina, 2001; Parker & Fleishman, 1959).

Recognizing the limitations associated with the use of task performance as a criterion, a

number of studies introduced criteria that directly represented the elements comprising

job performance (e.g. Farrell & McDaniel, 2001; Kolz, McFarland, & Silverman, 1998;

Schmidt et al.,1988). Unfortunately, these studies suffered from the use of cross-sectional

data, which, in the context of dynamic performance, provides no opportunity for examining

within-person changes in individual differences (Hulin et al., 1990) and relies on two

critical assumptions: That the mean level of the characteristic does not vary with time (i.e.

cohort equivalence), and that characteristics of the hiring process remain stable over time

(Sturman, 2007). If the two assumptions are not met, specification error may distort the

results. Of the handful of studies that do examine the changes in GMA-job performance

using a longitudinal design (i.e. Bass, 1962; Deadrick & Madigan, 1990; Deadrick et al.,

1997; Ghiselli & Haire, 1960) mixed results have been found in regard to the directionality

of the predictive validities over time.

Personality Tests

Inquiries into the phenomena of systematically decaying predictor-criteria

relationships primarily focus on individual differences in abilities as predictors of

performance (Austin et al., 1989; Henry & Hulin, 1987; Hulin et al., 1990), but little effort

has been made to determine if the predictive validities of dispositional variables, such as

personality, behave in a similar fashion. Henry and Hulin (1987) claimed that the principle

of decreasing predictive validities can be found in nearly every longitudinal study involving

any type of individual differences, including personality. Unfortunately, longitudinal

examinations of personality-performance relationships are rare making it difficult to verify

22

Henry and Hulin’s (1987) claim. The paucity of information regarding the influence of time

on personality-performance relationships has left a vacuum in the dynamic criteria

literature that requires further exploration. The purpose of this portion of the study is to

fill in the gaps concerning the dynamic nature of individual personality trait-performance

validities over time by satisfying the following questions: Are personality-performance

validities, in fact, dynamic? If so, what is the nature and direction of the validity patterns

when plotted across time, and, finally, what implications do systematically changing

validity patterns have on validity generalization and utility issues? The following is an

overview of the current state of the literature regarding the use of personality inventories

as a selection device and the predictive validities found in terms of job performance.

Prior to the 1990s, personality testing was generally considered an inferior method

for selecting employees. This view was qualified by low validities in personality-job

performance relationships (Hogan, 2005; Schmitt, Gooding, Noe, & Kirsh, 1984) and the

lack of standardized frameworks to support and organize the dizzying array of available

personality measures (Barrick & Mount, 1991; Hurtz & Donovan, 2000; Ones, Mount,

Barrick, & Hunter, 1994). Renewed interest in personality inventories began as mounting

evidence of a five-dimension factor solution emerged across qualitatively different studies

(Cattell, 1946; Digman & Inouye, 1986; Fiske, 1949; Goldberg, 1981, 1990; John, 1990;

McCrae & Costa, 1985, 1987; Peabody & Goldberg, 1989; Saucier & Goldberg, 1996; Tupes

& Christal, 1961). The prominence of a five-factor model of personality, later dubbed the

“Big Five” by Goldberg (1981), resulted in the creation of multiple personality inventories

ranging from Trait Descriptive Adjectives (TDA, Goldberg, 1990, 1992), questionnaires

(NEO Personality Inventory Revised, NEO PI R, Costa & McCrea, 1992; NEO FFI, Costa &

23

McCrae, 1989, 1992), and short phrase assessments (Big Five Inventory, BFI, John &

Srivastava, 1999). The prototypical Big Five personality factors are commonly indentified

as Extraversion, Agreeableness, Conscientiousness, Emotional Stability (also referred in

reverse pole as Neuroticism), and Openness to Experience. Each broad personality trait is

comprised of several narrow facets varying in number and substance depending on the

measure in question.

Conceptually, Extraversion (Factor I) implies an energetic disposition toward the

social and material world, and refers to the extent to which a person is talkative, lively,

assertive, excitable, and emotionally positive. Agreeableness (Factor II) contrasts a

prosocial and communal orientation with antagonism, and refers to the extent to which a

person is good-natured, helpful, trusting, and cooperative. Conscientiousness (Factor III)

describes socially prescribed impulse control that facilitates task and goal directed

behavior, such as thinking before acting, delaying gratification, and following rules.

Conscientiousness, also, refers to the extent to which a person is consistent, organized,

careful, self-disciplined, and responsible. Neuroticism (Factor IV) contrasts emotional

stability and even-temperedness with negative emotionality, such as feelings of

nervousness and anxiety. Finally, Openness to Experience (Factor V) describes the

breadth, depth, originality, and complexity of an individual’s mental and experiential life.

People high in Openness are commonly described as imaginative, independent, and having

a preference for variety (John & Srivastava, 1999).

The application of the five-factor model as a legitimate selection tool coincided with

notable meta-analytic findings from Barrick and Mount (1991) and Tett, Jackson, and

Rothstein (1991). Both studies identified Conscientiousness as one of the few viable Big

24

Five personality traits for predicting job performance. Conscientiousness has been shown

to provide consistent positive associations with job performance across a multitude of

occupations and job situations (Barrick & Mount, 1991; Barrick, Mount, Judge, 2001; Hurtz

& Donovan, 2000; Salgado, 1997, Tett et al., 1991; Vichur, Schippman, Switzer, & Roth,

1998). Furthermore, Conscientiousness tests are recognized as adding an 18 percent

increase in incremental predictive validity beyond cognitive ability in predicting job

performance (Schmidt & Hunter, 1998).

Aside from Conscientiousness, the rest of the superordinate Big Five personality

dimensions have shown little generalizable predictive relationships with performance

across jobs, and in many cases validities approach zero (Barrick & Mount, 1991; Barrick et

al., 2001; Hurtz & Donovan, 2000; Salgado, 1997). However, there are specific occupations

and situations where personality traits, such as Extroversion and Openness, manifest as

meaningful predictors. Extroversion, for instance, does seem to have particular salience for

sales effectiveness (Barrick, Stewart, & Piotrowski, 2002; Vinchur et al., 1998). Likewise,

Openness has been linked to the ability to adapt to changing work roles and demands

(Stewart & Nandkeolyar, 2006). Judge, Thoresen, Pucik, and Welbourne (1999) reported a

statistically significant positive relationship between Openness and a manager’s ability to

cope with various organizational changes, including mergers, acquisitions, and downsizing.

Similarily, LePine, Colquitt, and Erez (2000) found that Openness helped participants adapt

to changing task demands in a computerized decision-making simulation.

Personality and Dynamic Criteria

While very little empirical data have been gathered regarding the temporal nature

of the personality-performance relationship, there are two competing perspectives that can

25

be used to hypothesize the pattern of directionality and linearity of the projected validity

coefficients. The first perspective involves the precedent of a universal simplex pattern set

by previous inquiries into dynamic criteria. Humphreys (1985) argued that the simplex

pattern of correlations can be found in any data pertaining to individual differences and

performance over time. If personality dimensions do follow the assumptions of a simplex

pattern, predictive validities would degrade over time in a manner consistent with results

reported for time-lagged ability-performance estimates. In their examination of GMA, the

Big Five personality dimensions, and career success, Judge, Higgins, Thorsen, and Barrick

(1999) reported that each Big Five trait produced decreasing validities when related to

career success across five time intervals. Burrus (2006) also found evidence of decreasing

predictive validities for Conscientiousness over 16 task trials given to students in a

laboratory study designed to examine dynamic performance. The study did suffer from key

limitations: Simulated tasks did not reflect the complexity and multi-dimensionality of job

performance, sample size was not large enough for adequate power, and the trials only

took place over the course of a week.

The second perspective is based on claims that predictive validities for personality

dimensions actually increase over time as opposed to following a simplex-like pattern.

Such alternative views originate from Helmich, Sawin, and Carsurd’s (1986) examination of

the strength of the personality-performance relationship across time within a relatively

consistent job context. According to Helmreich et al. (1986), cognitive ability was an

important determinant of early performance but eventually declined. The non-cognitive

measures (i.e. measures of achievement motivation and interpersonal orientation), on the

other hand, increased in predictive validity from a relatively low starting point. Helmreich

26

et al. (1986) attributed the switch in predictive magnitude from cognitive ability to

personality to, what they described as, the “honeymoon effect.” The honeymoon effect is

characterized as the time period early in a job when everything is new and exciting. During

this period the employee utilizes cognitive ability to absorb the organization’s culture,

values, work systems, and the necessary knowledge and skills to perform the job. Once the

novelty begins to wane some employees become increasingly disenchanted. At this point,

personality becomes more salient as a predictor of job performance.

Murphy (1989) expanded on Helmreich’s et al. (1986) conceptualization of the

honeymoon effect in his model of dynamic performance. Progression from the transition to

the maintenance stage, in essence, represents an employee’s changing reliance on GMA to

dispositional variables. In this case, the rank-order of performance measure scores are

argued to change with time, but movement from the transition to maintenance stage

results in increasing personality-performance validity coefficients as opposed to simplex-

like decreases.

While empirical examinations of Murphy’s (1989) model of dynamic performance

solely focused on the relationships between GMA and job performance over time (Deadrick

& Madigan, 1990; Deadrick et al., 1997), Thoreson et al. (2004) was the first to draw on

Murphy’s (1989) distinction between transition and maintenance stages to examine Big

Five personality traits in the context of individual changes in latent performance

trajectories over time. Thoresen et al. (2004) reported that Openness and Agreeableness

were positively associated with mean performance and performance change over time

during the transitional stage but not in the maintenance stage. Conscientiousness and

Extraversion were positively associated with mean performance and performance change

27

over time in the maintenance stage but not in the transition stage. Unfortunately, Thoresen

et al. (2004) did not longitudinally examine a single sample as they gradually transitioned

from one phase to the other, but, instead, compared two samples, each characterized by the

transition or maintenance scenario, over four three month periods. A single sample may

have revealed a more complete picture of the latent trajectories of the Big Five dimensions.

Thoresen et al.’s (2004) results did, however, provide empirical evidence that

certain predictive validities did, in fact, increase over time, at least within the designated

employment stage. Of particular interest is the finding that personality traits play separate

roles based on the situational characteristics found in either the transition or maintenance

stages. For example, Openness has been associated with an individual’s responsiveness to

changes in job demands (Stewart & Nandkeolyar, 2006), and may play a more significant

role in the transitional stage where an individual is faced with novel challenges. On the

other hand, Conscientiousness is argued to positively influence job performance for both

stages due to its generalizability across occupation and job situations performance (Barrick

& Mount, 1999; Barrick et al., 2001; Hurtz & Donovan, 2000; Mount & Barrick, 1995;

Salgado, 1997), but may produce positively sloping performance growth trajectories as

employees transition into the maintenance stage as suggested by Helmrich et al. (1986)

and Murphy (1989). Zyphur, Bradley, Landis, and Thoreson (2008) also found

Conscientiousness to increase in predictive validity in their study to examine the extent to

which cognitive ability and Conscientiousness predict initial GPA and changes in predictive

28

performance over the course of college student careers. Consistent with Murphy’s

(1989) theoretical assertions, cognitive ability did predict initial performance, but

beyond the 3rd semester, Conscientiousness became a better predictor of student

performance over cognitive ability.

The influence of time on the relationship between common selection devices

and job performance remains elusive in the I-O literature. A majority of validation

studies utilize concurrent designs over predictive methods and very rarely compare

estimates at separate time points. Studies that have longitudinally examined

construct specific predictive validities have produced mixed results as to the

directionality of coefficient patterns, or have been plagued by study artifacts. The

goal of the current study was to clarify the aforementioned issues by meta-

analytically examining the criteria-related validates of two common selection

instruments (i.e. GMA and FFM measures) in relation to performance over time. To

date, no study has either provided a quantitative review of the degree to which time

attributes to observed variances for distinct constructs within a real work context or

the systematic patterns of change derived thereof.

Scientific progress in understanding the nature of the dynamic criteria

phenomena is optimally based on the evaluation and extension of theoretical and

empirical findings from within-person data. Cross-sectional designs can rely on

untenable assumptions and are fundamentally limited for understanding individual-

level change processes, however time-bound predictive validity studies, especially

those containing multiple time points through a longitudinal design provide an

optimal basis for describing change patterns. Unfortunately, longitudinal

29

examinations of changes in predictive validities are relatively scarce due to the time,

resources, and effort required. In the case of the current study, time-bound

criterion-related validity studies, consisting of either single or multiple time points,

were integrated to create generalized validity estimates across a progressive

timeframe though meta-analytic techniques.

The meta-analysis approach also allowed for the examination of moderators

that are difficult to examine in primary studies alone. In the case of the current

study, the principle moderator of interest was time. Time points from each primary

study were progressively plotted to establish the directional trends in changing

validities through the use of weighted least squares (WLS) regression. Finally,

polynomial terms were introduced into the regression equation in an effort to

model linear and curvilinear trends in the relationships between time and the

indicated predictor-performance coefficients.

30

CHAPTER 3

METHODS

In the current study, a separate meta-analysis was conducted for each of the

designated selection devices (i.e. GMA and Big Five personality inventories) in an effort to

establish construct based boundary distinctions that would, otherwise confound valuable

insights into the nature of dynamic performance if combined in the same study. The meta-

analyses were used to examine the time-bound relationships between the indicated

selection instrument as a predictor and a performance criterion specific to work-related

samples. Further analysis consisted of examinations of the data categorized by criterion

type (i.e. training and job performance).

Literature Search per Selection Device

Cognitive Ability Measures: An extensive literature search was conducted to identify

studies with explicitly time-bound predictive validity coefficients for GMA measures and

performance. First, meta-analyses conducted by Hunter and Hunter (1984), Levine,

Spector, Menon, Narayanan, and Cannon-Bowers (1996), Schmitt et al. (1984), and Schmidt

and Hunter (1998) were used to locate previously identified criterion-related validity

studies that utilized some form of job or training performance as a criterion and GMA as a

predictor. Second, published studies were identified using a computer-based literature

search in PsycInfo, Business Complete Resources, and ProQuest Thesis and Dissertations

using keywords such as cognitive ability, general mental ability, intelligence, g, Armed

Forces Qualification Test, Wonderlic, and the Generalized Abilities Test Battery with

31

performance, job performance, training performance, selection, promotion, and validation.

Search items included peer-reviewed articles, popular-press articles, books, edited book

chapters, and unpublished dissertations. Third, studies were identified using a manual

search in the following journals: Journal of Applied Psychology, Personnel Psychology,

Academy of Management Journal, Human Performance, Journal of Management, and

Organizational Behavior and Human Decision Processes. The literature search yielded an

initial total of 4,245 articles, reports and dissertations.

Personality Inventories: A comprehensive literature search was conducted to

identify studies with explicitly time-bound predictive validity coefficients between the

dates of January 1992 and September 2012. According to Hurtz and Donovon (2000),

previous meta-analyses that examined the role of Big Five personality dimensions in

relations to job performance (i.e., Barrick & Mount, 1991; Salgado, 1997; Tett et al., 1991)

mapped personality predictors not explicitly designed to measure Big Five dimensions

onto actual Big Five dimensions which can potentially threatened construct validity and

lead to inaccurate conclusions. The year 1992 marks the beginning development of

empirically validated Big Five personality inventories (e.g. NEO Personality Inventory,

Costa & McCrae, 1992; Goldberg’s Big Five markers, Goldberg, 1992) for application in a

business context. First, the meta-analysis conducted by Hurtz & Donovon (2000) was used

as a starting point to identify relevant criterion-related validity studies. The authors

limited their article search to studies with established Five-Factor Model inventories as

predictors and performance as the criterion. Second, published studies were located using

a computer-based literature search in PsycInfo (1992 – 2012), Business Complete Resource

(1992 – 2012), and ProQuest Thesis and Dissertations using keywords such as personality,

32

five factor model, big five, conscientiousness, extraversion, emotional stability, neuroticism,

openness, and agreeableness, with performance, job performance, training performance,

selection, promotion, and validation. Search items included peer-reviewed articles, popular-

press articles, books, edited book chapters, and unpublished dissertations. Third, studies

were identified using a manual search in the following journals for previously designated

period of time: Journal of Applied Psychology, Personnel Psychology, Academy of

Management Journal, Human Performance, Journal of Management, and Organizational

Behavior and Human Decision Processes. The literature search yielded an initial total of

1,519 articles, reports and dissertations.

Criteria for Inclusion

For a study to be included in the present meta-analyses, seven criteria had to be

met:

1. The study had to use actual workers as participants. Educational studies were

generally excluded in cases where experiments were conducted on students over the

duration of a college semester. However, students in educational settings that could be

argued as vocational training (i.e. medical school, trade school, specialty training) were

considered.

2. The study had to include one of the two selection devices (i.e., Cognitive Ability

Measures, and Personality Inventories) as a predictor of interest. Due to the relatively

stable nature of both general intelligence and personality it was not necessary for the

researchers to gather either cognitive ability or personality inventory data during the

hiring or promotion process. Data pertaining to cognitive ability or personality could be

collected at any point and used as a predictor.

33

A number of stipulations for personality inventories were also required: The

personality measures used for each study had to either have been explicitly designed a

priori to measure one or all dimensions of the Five Factor Model (i.e., Extraversion,

Conscientiousness, Emotional Stability/Neuroticism, Agreeable, and Openness to

Experience) or sizeable empirical evidence had to show that a measure could be

significantly reduced to load on the Big Five dimensions. Five established a priori

measures were identified in the studies collected for the present analysis: the NEO

Personality Inventory Revised (NEO-PI-R) and the five factor inventory versions (NEO-FFI;

Costa & McCrae, 1992), the International Personality Item Pool (IPIP; Golderberg, 1992),

the Personal Characteristics Inventory (PCI; Barrick & Mount 1993), the Big Five Inventory

(BFI; John & Srivastava, 1999), and Saucier’s Mini Marker (Saucier, 1994).

3. The study had to include an explicit measure of job or training performance as a

criterion.

4. In terms of criterion-related validity, the study had to utilize a predictive design

where an expressed point of entry into a new job, a training period, a new position, or after

an organizational intervention. GMA and Big Five studies that utilized a concurrent design

were included if the time period between entry and the criterion measurement was clearly

designated and the sample shared equivalent histories.

5. Primary study samples had to consist of a cohort with equivalent histories (i.e.

the same or comparable level of entry, training, or promotion).

6. The study had to report the sample size for each correlation presented.

7. Finally, the time between a point of entry into a new job, training, a new position,

or after an organizational intervention and the criterion measurement had to be firmly

34

established. The timeframe had to be greater than a week. In cases where all other

inclusion criteria were met, attempts to contact the principle authors concerning the

timeframe of the study were made.

Coding Procedure

For each individual study, the correlates between the selection instrument and the

performance criterion were coded along with sample sizes and scale reliabilities when

available. Longitudinal studies that included coefficients for more than one time point

were divided into the number of time points present (e.g. entry point to year 1, entry point

to year 2, entry point to year 3, etc.) and treated as independent data points for the

analysis. Information for the time moderator was also coded, and converted to the smallest

increment of measurement present in the analyses (i.e. days). All personality variables

were based on broad-based dimensions (i.e. Agreeableness, Extroversion, etc.) as opposed

to the narrow dimensions that comprise the broad-base identifiers. If a primary study

solely relied on a narrow dimension in relation to performance, or in cases where more

than one narrow dimension of the same principle construct was used, the correlations

were averaged and subsumed under the coinciding Big Five dimension.

Of the initial total, 30 articles were considered for inclusion in the GMA analysis and

30 for the Big Five analyses. The 30 GMA articles yielded a total of 49 validity coefficients.

For the Big Five analyses, a total of three articles were excluded due to substandard FFM

measures (i.e. produced low correlations when compared to established Big Five

inventories). The 27 remaining studies yielded a total of 37 validity coefficients for

Extraversion, 36 for Agreeableness, 42 for Conscientiousness, 39 for Neuroticism, and 37

for Openness.

35

An outlier analysis was conducted using the Sample-Adjusted Meta-Analysis

Deviancy technique (SAMD; Arthur, Bennet, Huffcutt, 2001). The SAMD identifies potential

outliers by comparing the value of each study coefficient to the mean sample weighted

coefficient computed without the coefficient in the analysis. The difference is then adjusted

for the sample size in the study. Scree plots of the SAMD values and subjective

comparisons were used to isolate individual study coefficients. Given the nature of the

current study, outliers could possibly represent systematic changes in coefficients given

the amount of elapsed time and sample size. No outliers were identified for any of the GMA

or FFM dimensions save two coefficients for Extroversion (e.g. Lievens, Ones, & Dilchert,

2009: Time 1, Ployhart, Lim, & Chan, 2001: AC performance) and one for Agreeableness

(Rothstien, Paunonen, Rush, & King, 1994). With these cases removed, both Extroversion

and Agreeableness analyses were left with a total of 35 predictive validities. Detailed lists

of the articles used for GMA and the FFM dimensions are provided in Tables 1 and 2.

Data Analysis

Meta-analyses procedures based on a correlation model were conducted for each

selection device using Arthur et al.’s (2001) SAS PROC MEANS. Arthur et al.’s method

assumes a random effects model which is in line with the study’s assumption that

population effect sizes are variable. Sampling error was calculated in a manner consistent

with Hunter and Schmidt’s (2004) methods where sample-weighted average correlations,

sample-weighted variances, and sampling error variances were computed to identify and

remove the variance attributed to sampling error from the total variance across

coefficients. Furthermore, the current study utilizes meta-analytic techniques developed

by Raju, Burke, Normand, and Langlois (1991) to correct for attenuating artifacts (i.e.

36

measurement error and range restriction) as a departure from traditional validity

generalization correlation methods originated by Schmidt and Hunter (1977). Raju et al.’s

(1991) procedure allows for estimating mean population-level correlations and

population-level variances when attenuating artifact information is only sporadically

presented in the primary studies. Traditional correlational methods rely on population

values to correct for measurement error and range restriction. The general lack of

available population values undermines researchers’ efforts to produce accurate results

under the assumptions of the traditional correlational method. Previous researchers have

attempted to overcome this limitation through the use of hypothetical artifact distributions

or sample-based artifact distributions. Unfortunately, hypothetical artifact distributions

can limit the accuracy of mean and variance estimates of validities if they do not closely

match true artifact distributions and sample-based artifact distribution are subject to

sampling error within the attenuating artifacts which may bias results (Raju et al., 1991).

Raju et al.’s (1991) procedure provides a more accurate method for estimating the mean

and variance of the population correlation by disregarding the assumption that population

correlations (ρ), predictor reliability, and criterion reliability are not correlated across

populations. Raju et al.’s (1991) procedure allows for observed correlations and study

reliabilities to be obliquely measured across studies in the analysis. Arthur et al.’s (2001)

SAS PROC MEANS was adapted to meet Raju et al.’s (1991) assumptions.

Moderator Detection

Multiple tests for homogeneity were conducted to detect possible moderators: The

75% Rule (Hunter & Schmidt, 2004), the Q-statistic (Hunter & Schmidt, 2004), credibility

and confidence intervals (Hunter & Schmidt, 2004; Whitener, 1990). The 75% Rule states

37

that if 75% or more of the variance is accounted for by the ratio of artifactual variance to

corrected observed variance, then the rest of the variance is considered a function of

uncorrectable artifacts. If less than 75% of the variance is accounted for then a moderator

may be present. The Q-statistic tests the hypothesis that the observed variance is the

product of sampling error and attenuating artifacts. A significant chi-square value

indicates the presence of a potential moderator in the research domain. Both the 75% Rule

and the Q-statistic have the lowest occurrences of Type I error and the highest power rates

for meta-analysis consisting of 60 to 100 studies when compared to other moderator

detecting techniques (Sagie & Koslowsky, 1993). The credibility intervals are computed

around the corrected population variance and provide the variability of individual

correlations in the population, as well as, lower bound estimates. The size of the credibility

intervals and the inclusion of zero are indicative of a moderating variable. Confidence

intervals provide an estimate of the variability of the corrected mean correlation due to

sampling error and are computed around the corrected population correlation using the

standard error of the mean correlation. The confidence interval provides a range of values

for the mean effect sizes and indicates whether the corrected effects differ from zero.

Moderator Estimation

TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE ... · Ghiselli and Haire (1960) and Bass...

Documents

Transcript of TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE ... · Ghiselli and Haire (1960) and Bass...