TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE ... · Ghiselli and Haire (1960) and Bass...

143
TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC CRITERIA PHENOMENON by DAVID BRENT BIRKELBACH (Under the Direction of Charles Lance) ABSTRACT Dynamic Criteria refers to the systematic instability of criterion measures and predictive validities examined across longitudinal time periods. To date, much of the research used to support the dynamic criteria phenomenon has been fraught with methodological flaws (Barrett et al., 1985), limited by the utilization of single-task performance as the principle criteria of interest, and has failed to establish boundary conditions for qualitatively distinct predictor constructs. For the current study, meta- analytic techniques were used to examine the criteria-related validates of two common selection instruments, namely cognitive ability assessments and personality inventories, in relation to time-bound performance appraisals. In addition, performance trajectories were investigated through the use of weighted least squares multiple regression analyses to establish the systematic nature of change in predictive-validity coefficient trends over time. Results indicated that the criterion-related validities specific to the General Mental Ability, Emotional Stability, and Openness to Experience predictors do, in fact, change over time when measured against either general and/or specific criterion types. Performance trajectories for each of the aforementioned predictors offer support for the simplex-like

Transcript of TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE ... · Ghiselli and Haire (1960) and Bass...

  • TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC

    CRITERIA PHENOMENON

    by

    DAVID BRENT BIRKELBACH

    (Under the Direction of Charles Lance)

    ABSTRACT

    Dynamic Criteria refers to the systematic instability of criterion measures and

    predictive validities examined across longitudinal time periods. To date, much of the

    research used to support the dynamic criteria phenomenon has been fraught with

    methodological flaws (Barrett et al., 1985), limited by the utilization of single-task

    performance as the principle criteria of interest, and has failed to establish boundary

    conditions for qualitatively distinct predictor constructs. For the current study, meta-

    analytic techniques were used to examine the criteria-related validates of two common

    selection instruments, namely cognitive ability assessments and personality inventories, in

    relation to time-bound performance appraisals. In addition, performance trajectories were

    investigated through the use of weighted least squares multiple regression analyses to

    establish the systematic nature of change in predictive-validity coefficient trends over time.

    Results indicated that the criterion-related validities specific to the General Mental Ability,

    Emotional Stability, and Openness to Experience predictors do, in fact, change over time

    when measured against either general and/or specific criterion types. Performance

    trajectories for each of the aforementioned predictors offer support for the simplex-like

  • patterns traditionally subscribed to changes in predictive validities over time (Henry &

    Hulin, 1987). Findings are discussed in the context of Murphy’s (1989) dynamic model of

    job performance.

    INDEX WORDS: Dynamic Criteria, Meta-Analysis, Weighted Least Squares Multiple

    Regression, Cognitive Ability, Personality

  • TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC

    CRITERIA PHENOMENON

    by

    DAVID BRENT BIRKELBACH

    B.A., Southwestern University, 2001

    M.S., Saint Mary’s University, 2007

    A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial

    Fulfillment of the Requirements for the Degree

    DOCTOR OF PHILOSOPHY

    ATHENS, GEORGIA

    2013

  • © 2013

    David Brent Birkelbach

    All Rights Reserved

  • TIME WILL TELL: A META-ANALYTIC INVESTIGATION OF THE DYNAMIC

    CRITERIA PHENOMENON

    by

    DAVID BRENT BIRKELBACH

    Major Professor: Charles Lance Committee: Nathan Carter Robert Mahan Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia May 2013

  • iv

    TABLE OF CONTENTS

    Page

    LIST OF TABLES ......................................................................................................................... vi

    LIST OF FIGURES ..................................................................................................................... viii

    CHAPTER

    1 INTRODUCTION .........................................................................................................1

    Historical Overview .................................................................................................2

    Definitions of Dynamic Criteria ..............................................................................4

    Murphy’s (1989) Dynamic Model of Performance .................................................8

    Criticisms and Limitations of the Previous Dynamic Criteria Literature ..............11

    2 CURRENT STUDY.....................................................................................................16

    Purpose ...................................................................................................................17

    Cognitive Ability Measures ...................................................................................17

    Cognitive Ability and Dynamic Criteria ................................................................20

    Personality Tests ....................................................................................................21

    Personality and Dynamic Criteria ..........................................................................24

    3 METHODS ..................................................................................................................30

    Literature Search per Selection Device ..................................................................30

    Criteria for Inclusion ..............................................................................................32

    Coding Procedures .................................................................................................34

    Data Analysis .........................................................................................................35

  • v

    Moderator Detection ..............................................................................................36

    Moderator Estimation ............................................................................................37

    4 RESULTS ....................................................................................................................45

    Overall Validity Coefficients .................................................................................45

    Overall Continuous Moderator Analysis ...............................................................49

    Validity Coefficients by Criterion Type ................................................................59

    Continuous Moderator Analysis by Criterion Type ...............................................64

    Tests for Availability Bias .....................................................................................77

    5 DISSCUSION ..............................................................................................................78

    GMA-Performance Relationships over Time ........................................................79

    FFM-Performance Relationships over Time .........................................................80

    Implications and Future Research ..........................................................................84

    Limitations .............................................................................................................88

    Conclusion .............................................................................................................92

    REFERENCES ..............................................................................................................................94

    REFERENCES FOR GMA META-ANALYSES .......................................................................109

    REFERENCES FOR FFM META-ANALYSES ........................................................................112

    APPENDICES

    A ORIGINGS OF DYNAMIC CRITERIA ...................................................................115

    B ACKERMAN’S MODEL OF SKILL AQUISTITION .............................................121

    C CHANGING TASKS AND CHANGING SUBJECTS MODELS ...........................125

    D PERFORMANCE TRAJECTORIES ........................................................................127

  • vi

    LIST OF TABLES

    Page

    Table 1: GMA Studies Used in the Meta-Analyses .......................................................................39

    Table 2: Big Five Personality Studies Used in the Meta-Analyses ...............................................42

    Table 3: Meta-Analysis Results for the Criterion-Related Validities between GMA, the

    Big Five Personality Dimensions, and Performance .........................................................48

    Table 4: Results for Continuous Moderators of Predictor-Performance Relationships ................52

    Table 5: Results for Continuous Moderators of Predictor-Performance Relationships

    Without Outliers .................................................................................................................53

    Table 6: Meta-Analysis Results for the Criterion-Related Validities between Predictors

    and Criteria Type ...............................................................................................................63

    Table 7: Results for Continuous Moderators of Predictor-Training Performance

    Relationships ......................................................................................................................67

    Table 8: Results for Continuous Moderators of Predictor-Training Performance

    Relationships Without Outliers ..........................................................................................68

    Table 9: Results for Continuous Moderators of Predictor-Job Performance Relationships ..........69

    Table 10: Results for Continuous Moderators of Predictor-Job Performance Relationships

    Without Outliers .................................................................................................................70

    Table 11: Results from File-Drawer Test for Availability Bias ....................................................77

    Table 12: Intercorrelations of Semester Grades in Electrical Engineering,

    Humphreys (1960) .........................................................................................................119

  • vii

    Table 13: Intercorrelations of Pattern Comprehension over Repeated Trials,

    Fleishman and Hemple (1955) ......................................................................................119

  • viii

    LIST OF FIGURES

    Page

    Figure 1: GMA-General Performance Validity over Time ............................................................54

    Figure 2: GMA-General Performance Validity over Time without Outliers ................................55

    Figure 3: Emotional Stability-General Performance Validity over Time ......................................56

    Figure 4: Openness-General Performance Validity over Time .....................................................57

    Figure 5: Openness-General Performance Validity over Time without Outliers ..........................58

    Figure 6: GMA-Training Performance Validity over Time ..........................................................71

    Figure 7: GMA-Training Performance Validity over Time without Outliers ...............................72

    Figure 8: GMA-Job Performance Validity over Time ...................................................................73

    Figure 9: Emotional Stability-Training Performance Validity ......................................................74

    Figure 10: Openness-Job Performance Validity over Time ..........................................................75

    Figure 11: Openness-Job Performance Validity over Time without Outliers ...............................76

    Figure 12: Ackerman’s Model of Skill Acquisition ....................................................................122

  • 1

    CHAPTER 1

    INTRODUCTION

    The relationship between an individual’s personal qualities and their ability to

    perform in a given position has been the cornerstone of industrial psychology since the

    advent of the Army Alpha and Beta tests of mental ability during World War I. The goal of

    selecting and promoting employees who could succeed in the workplace has led to more

    than a century of validity studies designed to identify the individual differences that best

    result in increased efficiency, effectiveness, and productivity. The importance of predictive

    validity in personnel selection is due, in part, to the direct proportional relationship

    between predictive validity coefficients and the practical utility of the selection method

    (Schmidt & Hunter, 1998). In other words, economic gains largely rest on the accuracy of a

    selection measure to predict job performance.

    One issue specific to the current study that can potentially affect the estimates of

    predictive validity involves the stability of the criteria over time. Performance criteria has

    been treated as a static concept throughout the history of validity studies in industrial-

    organizational (I-O) psychology as evidenced by the practice of collecting criterion data at a

    single time-point, the use of aggregate scores or composites, the overwhelming use of

    cross-sectional data, and the practice of validating instruments with initial performance

    (Henry & Hulin, 1987). However, a growing body of research has provided support for the

    notion that criteria are not static and that job performance varies systematically when

    examined longitudinally (Austin & Villanova, 1992).

  • 2

    Also known as dynamic criteria, the concept that performance does not remain

    temporally stable has profound consequences for the conduct of validity studies and

    subsequent utility of selection devices. For instance, if criteria do change over time,

    assumptions regarding the longitudinal stability of predictive estimates for selection into

    schools, advanced training program, employment, and promotion may be founded on a

    flawed pretence, thus limiting the opportunity to identify true, sustainable talent. Since the

    majority of selection and placement programs utilize criteria gathered at a single point in

    time, or validate with the use of cross-sectional data, validity estimates may be greatly

    distorted and only reveal part of a greater picture (Henry & Hulin, 1989). The current

    study contributes to the issue of dynamic criteria by examining the criterion-related

    validities of two common selection devices (i.e., cognitive ability measures and personality

    inventories) in relation to job performance over time through the use of meta-analytic

    techniques. Steps will be taken to determine the nature of the performance trends in terms

    of directional change, magnitude, and linearity.

    Historical Overview

    As evidence of unstable criteria and decaying predictive-validities began to emerge

    in the industrial psychology literature (e.g. Adams, 1953; Fleishman & Hemple, 1954, 1955;

    Rothe, 1946, 1947, 1951; Tiffin, 1942; Worbois, 1951), Ghiselli (1956) called into question

    the field’s stance of job performance as a stable construct and advocated for research that

    explored, what he termed “dynamic criteria.” According to Ghiselli (1956), the study and

    use of static criteria did not account for the instability of criteria over time, but simply

    relegated criteria to the mere summation of data collected at a single time point.

    Furthermore, Ghiselli (1956) provided two operational methods to identify the dynamic

  • 3

    nature of performance. First, he suggested that intercorrelations among criterion

    measures at different time points could be used to ascertain an overall pattern of

    performance. Ideally, correlations examined over a long time period, such as a span of

    years, could inform the extent that the criterion systematically varies with time. Second,

    Ghiselli (1956) suggested that changes in predictive validity could be accounted for by

    examining the correlations between scores on selection tests and production measures at

    varying time points.

    Ghiselli and Haire (1960) and Bass (1962) were the first to directly implement

    Ghiselli’s (1956) suggestions into empirical field studies. For example, Ghiselli and Haire

    (1960) examined a sample of newly hired taxicab drivers over their first 18 weeks of

    employment. Intercorrelations among criteria generally declined suggesting that the rank

    order of performance had changed with time. Validity coefficients between a test battery

    and the criterion also generally declined over the 18 week period, although this was not the

    case for all predictors. Bass (1962) extended the length of time to 48-months in an

    examination of sales personnel. Consistent with Ghiselli and Haire’s (1960) findings,

    intercorrelations of the criteria across time periods began to decline with the greatest

    reduction occurring between the first and last ratings. In this case, all predictive validity

    coefficients declined over the 48-month period.

    While Ghiselli (1956), Ghiselli and Haire (1960) and Bass (1962) sought to

    specifically examine dynamic criteria in the workplace, researchers exploring the temporal

    reliability of performance measures (e.g. Rambo, Chomiak, & Price, 1983; Rambo, Chomiak,

    & Roundtree, 1987; Rothe, 1946a, 1946b, 1947, 1951, 1970, 1978; Rothe & Nye, 1958,

    1959, 1961; Tiffin, 1942) and those uncovering simplex patterns in ability-performance

  • 4

    coefficients (e.g. Bass, 1962; Deadrick & Madigan, 1990; Dennis, 1954, 1956; Dunham,

    1974, Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963; Ghiselli

    & Haire, 1960; Hanges, Schneider, & Niles, 1990; Henry & Hulin 1987; Humphreys, 1960,

    1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959) also indirectly contributed to

    the growing dynamic performance criteria literature by providing evidence of the

    phenomenon (See Appendix A for full summary of both temporal reliability and simplex

    pattern studies).

    Definitions of Dynamic Criteria

    After the initial conceptualizations of dynamic performance, a series of critical

    reviews based on the extant literature at the time provoked debates concerning definitions

    of dynamic performance, the ubiquity of unstable criteria, proper methods to identify

    changes in performance, alternative explanations, and underlying causes. Barrett,

    Caldwell, Alexander (1985) were the first to question what they coined “the received

    doctrine of dynamic performance.” They consolidated the earlier literature in an attempt

    to clarify and distinguish the various operationalizations of dynamic performance, as well

    as, provide a critical reanalysis of the evidence for each. Referring to previous sources,

    they identified three definitions of dynamic criteria: (a) Changes in group average

    performance over time (Casico, 1982; Ghiselli, 1956; Hanges et al., 1990; McCormick &

    Ilgen, 1980), (b) changes in the rank-ordering of scores on the criterion over time (Bass,

    1962; Blum & Naylor,1968; Deadrick & Madigan, 1990; Ghiselli,1956; Ghiselli & Haire,

    1960; Hanges, et al., 1990; Korman, 1971; MacKinney, 1967, McCormack & Ilgen,1980), and

    (c) changes in predictive validity over time (Austin, Humphreys, & Hulin, 1989; Blum &

    Naylor, 1968; Cascio, 1982; Ghiselli, 1956; Guion, 1965; Korman, 1971; MacKinney, 1967;

  • 5

    Prien, 1966; Smith, 1976; Steele-Johnson, Osburn, & Pieper, 2000). The following is a

    synopsis of the key arguments made by researchers pertaining to the merits of each of the

    aforementioned definitions of dynamic performance, the literature and methods used to

    support each definition, and the conclusions drawn about the legitimacy of the dynamic

    criteria phenomenon.

    Changes in mean performance over time. In their earlier works, Ghiselli and Haire

    (1960) and McCormick and Ilgen (1980) proposed that dynamic criteria be defined as

    changes in average group performance over time. This definition of dynamic performance

    is usually measured by grouping a sample into categories, such as age, taking the mean

    performance of each group, and comparing the means longitudinally. Most studies that

    utilize this approach are concerned with the concept of job tenure (often mislabeled job

    experience), and how differences in tenure relate to job performance (e.g. Avolio,

    Waldman, & McDaniel, 1990; Gordon & Fitzgibbons, 1982; Gordon & Johnson, 1982;

    Hoffman, Jacobs, & Guerra, 1992; Jacobs, Hofmann, & Kriska, 1990; McDaniel, Schmidt, &

    Hunter, 1988; McEvoy & Cascio, 1989; Medoff & Abraham, 1980, 1981; Schmidt, Hunter, &

    Outerbridge, 1986; Schmidt, Hunter, Outerbridge, & Goff, 1988). This definition has been

    criticized as being conceptually and operationally weak (Austin et al., 1989; Barrett et al.,

    1985) because average performance may not reflect the individual performances

    comprising them. Group-level performance could even change while individuals’

    performance remains constant if the performance level of those leaving the organization

    were different than the performance level of those entering (Boudreau & Berger, 1985).

    Austin et al. (1989) continued the criticism by stating that while mean performance could

    be used to capture systematic changes over repeated practice, the measurement of

    http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib23#idbib23�

  • 6

    relationships over time, ideally identified through simplex matrices, should be the principle

    focus when studying dynamic criteria.

    The change in the rank-ordering of scores on the criterion over time directly

    addresses the issue of stability (Hanges, et al., 1990). Changes in rank-order would imply,

    as an extreme example, that high performers may eventually become low performers, and

    vise versa (Ployhart & Hakel, 1998). This second definition is often measured through the

    examination of correlations between criterion scores at multiple points in time (Barrett et

    al., 1985; Deadrick & Madigan, 1990; Hanges et al., 1990). Such studies have been framed

    as considering the test–retest reliability or the stability of performance ratings. If

    performance is truly dynamic the criterion correlations are proposed to decrease as time

    points increase essentially forming a simplex-like pattern.

    Hulin, Henry, and Noon (1990) used meta-analytic techniques to investigate the

    stability of performance measures across time by examining Time Period by Time Period

    matrices of performance intercorrelations and found that all 23 validity sequences

    examined in their study decreased over time. Abundant empirical evidence has verified the

    definition of changing rank order of individual performance scores (Deadrick & Madigan,

    1990; Hanges et al., 1990; Henry & Hulin, 1987; Hofmann, Jacobs, & Baratta, 1993;

    Hofmann et al., 1992) principally through examinations of simplex matrices.

    Changes in predictive validity over time. Central to the current study is the definition

    that dynamic performance occurs when predictive validities change over time. If

    predictive relationships are temporally variant, continued validity assessment may be

    required. Research using this definition has focused on examinations of the criterion-

    related validity of predictors such as intelligence and psychomotor ability for predicting

    http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib11#idbib11�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib36#idbib36�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�

  • 7

    task performance over multiple time periods. While stability coefficients tend to decrease

    over time across studies (Deadrick & Madigan, 1990; Hanges et al., 1990; Henry & Hulin,

    1987; Hofmann et al., 1992, 1993), there is some debate as to the nature of changes in

    predictive validities. Some argue that dynamic criteria universally leads to a degradation in

    validity over time (Austin et al., 1989; Henry & Hulin, 1987, 1989; Hulin et al., 1990; Keil &

    Cortina, 2001). Whereas others suggest that the nature of change in predictive validities is

    determined by the predictor in question or external factors that may influence

    performance over time as evidenced in some studies where predictive validities either

    remained stable or increased with time (Ackerman, 1987, 1988, 1989, 1992; Barrett et al.,

    1985; Barrett & Alexander, 1989; Deadrick & Madigan, 1990; Hanges et al., 1990; Murphy,

    1989).

    Hulin et al. (1990) conducted a meta-analysis to determine if time was a source of

    systematic variance in test validities by utilizing literature on temporal ability-performance

    relationships that spanned organizational, educational, and developmental research. The

    authors found that time accounted for the variance of predictive validities beyond variance

    attributable to statistical artifacts. In general, predictive validities decreased monotonically

    over time. Of all the validity sequences analyzed, 44 out of 54 showed negative slopes for

    the regressions of predictive validity onto time.

    Kiel and Cortina (2001) also expanded on Hulin et al. (1990) through the addition of

    potential moderators to examine changes in predictive validities over time. Furthermore,

    they tested the nature of the relationships using polynomial equations. Their findings

    provide strong evidence that validities do deteriorate over time as observed across

    predictors (i.e., cognitive ability, perceptual speed ability, and psychomotor ability), criteria

    http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�http://www.emeraldinsight.com/journals.htm?issn=&volume=Research%20in%20Personnel%20and%20Human%20Resources%20Management&issue=26&articleid=1758781&show=html&view=printarticle&nolog=554466#idbib59#idbib59�

  • 8

    (i.e. consistent and inconsistent task performance), and time periods (i.e. short-term and

    long-term performance). Patterns were also found that suggested ability-performance

    relationships began to decay in the early stages of task performance for both consistent and

    inconsistent tasks.

    Of particular interest are Keil and Cortina’s (2001) findings concerning curvilinear

    effects. Both quadratic and cubic effects were found for all three abilities under all

    moderating conditions. Keil and Cortina (2001) attributed the curvilinear relationships to

    a “Eureka effect” where individuals with high levels of ability maintain or steadily increase

    their level of performance over time then come up with an insight that results in a sudden

    jump in performance. The Eureka effect can be captured by a bifurcation in the ability-

    performance variables. Keil and Cortina (2001) offered two alternatives for how the

    bifurcation could be utilized in research. Each bifurcation may cause a “different predictor

    to wane in importance such that it can be used to predict while performance remains on a

    given plateau” (Keil & Cortina, 2001, p. 689), or knowing when a bifurcation is likely to

    occur may inform researchers of the length of time they have before a predictor diminishes

    in utility.

    Murphy’s (1989) Dynamic Model of Performance

    While evidence was mounting in support of the dynamic criteria phenomenon,

    researchers began to speculate the theoretical causes for changes in performance over

    time. The theoretical impetus for the current study is based on Murphy’s (1989) dynamic

    model of performance. In response to the growing acceptance that cognitive ability-

    performance relationships remained invariant over time (Schmidt et al., 1986), Murphy

    (1989) offered a model of job performance that focused on two classes of predictors:

  • 9

    Abilities and dispositional variables. Murphy’s (1989) definition of abilities included both

    higher and lower order abilities such as general cognitive ability and perceptual speed.

    Dispositional variables included individual differences in personality, interests, values, and

    motivation. In the dynamic model of job performance, rank order changes in job

    performance over time and declining predictive validities are the result of fluctuations in

    activities requiring varying levels of either abilities or dispositional variables.

    Building on Ackerman’s three stages of skill acquisition as it applied to the

    workplace (See Appendix B for review of Ackerman’s model), Murphy (1989) posited the

    dynamic model of job performance as a progression between two distinct stages: The

    transition stage and the maintenance stage. During the transition stage, employees are

    faced with some manner of change. They may be new to a job, recently promoted, or an

    organizational intervention has fundamentally changed the job duties required of the

    employee. In such cases of transition, the employee must heavily rely on the use of

    cognitive ability and sound judgment to lean the new duties, goals, and strategies for

    execution. In the maintenance stage, major requirements for the job are well-learned and

    do not heavily weigh on cognitive ability to be performed. At this point, dispositional

    variables, such as personality and motivation, have a greater influence on job performance

    than cognitive ability.

    Deadrick and Madigan (1990) provided empirical evidence for Murphy’s (1989)

    dynamic performance model in their attempt to distinguish between the predictive

    influences of employee experience. Concerned that the standard definitions of dynamic

    criteria did not adequately distinguish between actual changes in job performance with

    changes in the performance evaluation context, Deadrick and Madigan (1990) defined

  • 10

    criterion changes as attributes of either individual differences (i.e., performance

    consistency), the organizational context (i.e., evaluation consistency), or changes in

    measurement procedure (i.e., measurement reliability). To test the performance

    consistency definition, Deadrick and Madigan (1990) collected periodic measures of both

    objective (i.e. weekly output) and subjective (i.e. supervisory ratings of production

    quantity) performance for sewing machine workers over a period of six months.

    Distinctions were made between both experienced and inexperienced employees.

    Predictors also included cognitive and psychomotor ability. The results for performance

    consistency strongly supported the simplex pattern for stability measures despite previous

    experience, but failed to do so for the tests of predictive validity where cognitive ability

    actually began to increase after training and psychomotor ability remained relatively

    stable. The conflicting results were interpreted as evidence of Muphy’s (1989) dynamic

    model of job performance, as dispositional variables such as motivation were proposed to

    account for the changing patterns in predictive validities.

    Hanges et al. (1990) applied the interactionist perspective of psychology to

    Murphy’s (1989) dynamic performance model as a means to account for aberrations to the

    simplex pattern in stability measures. According to interactionist psychology, behavior is

    not merely determined by either the person or situational variables but is a function of the

    interaction between person and situation. Furthermore, a simplex pattern is expected

    when the stability of performance over perceptually different situations is explored,

    however, when the situations are similar, behavior should remain relatively stable over

    time.

  • 11

    Murphy’s (1989) dynamic model of performance is clear regarding predictive

    validities during the transition phase, but the maintenance stage can be confounded by

    both stable and dynamic dispositional variables. Hanges et al. (1990) maintained that the

    interactionist perspective can help clarify the effects situational variables have on

    performance during the maintenance stage. For example, as situations become more stable

    over time, such as those found in the maintenance phase, an individual’s performance in

    that situation would become stable as well. Hanges et al. (1990) empirically evaluated the

    interactionist perspective by examining student evaluations of university professors over

    time (i.e., the person), the particular courses taught by the professors (i.e., the situation),

    and the professors who taught the same course over time (i.e., the person-situation

    interaction). Results showed that a simplex pattern was observed in the situation and

    person analyses, but not in the person-situation interaction analysis, thus supporting the

    utility of the interactionist perspective in predicting the conditions where a simplex pattern

    may or may not appear.

    Criticisms and Limitations of the Previous Dynamic Criteria Literature

    While Barrett et al. (1985) did concede that, in some cases, predictive validities may

    deteriorate over long time spans, the authors speculated that dynamic criteria are quite

    plausibly the result of changes in the abilities and skills required for the job (i.e. changing

    subjects model; Adams, 1957) or changes in the job itself (i.e. the changing-task model;

    Woodrow, 1938a; 1938b; Fleishman, 1960; 1972). The changing-subjects model is based

    on the hypothesis that abilities change over time even as the tasks remain relatively stable.

    The changing-task model assumes that the structure of the task is the variable component

    that undergoes change during skill acquisition (Alvares & Hulin, 1972, 1973; See Appendix

  • 12

    C for review of changing-task and changing-person models of performance). In the few

    instances where they did find significant change over time, Barrett et al. (1985) felt that

    dynamic criteria were more the result of methodological artifacts than systematic

    variation. The authors pointed out the studies used to support the dynamic criteria

    phenomenon were so rife with methodological flaws, that any fluctuations in predictor-

    criteria validities were most likely the result of a number of study design related artifacts,

    including: (a) temporal unreliability of the criterion, (b) contamination from unmatched

    samples (i.e., criterion scores were based on individuals with differing levels of experience

    and tenure), and (c) the lack of a standardized measure of performance. In light of these

    findings, Barrett et al. (1985) claimed that the error variance caused by the unreliability of

    the criterion measure probably accounted for a majority of fluctuations in validity

    coefficients. Unfortunately, many of the studies that followed Barrett et al.’s (1985)

    literature review continued to suffer from the design flaws noted by the authors.

    In terms of limitations, studies that proposed the ubiquity of dynamic predictive

    validities across all forms of ability did not distinguished between classes of individual

    differences and, thus, failed to establish boundary conditions for examining the predictor-

    criteria relationships over time. For example, in their meta-analysis to determine the

    systematic variability of predictive validities as a function of time, Hulin et al. (1990)

    gathered data from areas that spanned the research regarding the prediction of

    performance (e.g. experimental studies, studies of academic performance, and growth and

    development research) but did not provide an inclusion criterion to classify the type of

    predictors used. Consequently, an entire host of individual differences ranging from

    psychomotor skills to aerial orientation were lumped together. While the results of Hulin

  • 13

    et al.’s (1990) meta-analysis provided evidence that the majority of predictor-criterion

    relationships systematically follow a decreasing temporal trend, valuable information may

    have been lost through the act of indiscriminately clustering predictors.

    Given the implications that changes in predictive validities over time have on human

    resource practices such as selection, promotion, and interventions, the lack of research

    dedicated to dynamic criteria under specific boundary conditions is surprising. If

    predictive validities associated with construct-based selection measures change over time,

    the specific predictability trends should be evaluated to improve decision-making

    procedures and the utility of the selection device under consideration. By grouping

    predictors under construct-based selection measures, both initial performance and

    subsequent performance curves can be used to inform longitudinal policy decisions. Henry

    and Hulin (1987) further articulated the point by stating that “the failures of researchers to

    develop models that address long-term predictions and build into predictive equations

    measures that will reflect expected changes in the abilities of the selected employees or

    students is a source of serious concern” (Henry & Hulin, 1987, p. 461). Currently, very little

    empirical support has been provided to determine the temporal validities of common

    selection devices and their underlying constructs.

    Another limitation found in the dynamic performance literature concerns the

    operational definitions of the criteria. Many of the studies used to support the dynamic

    criteria phenomenon utilized experimental designs conducted in laboratories where

    analysis centered on a task performance criterion. For example, all criteria used in Kiel and

    Cortina’s (2000) study were characterized as either task performance or GPA, with only

    one criterion indicative of job performance ratings (i.e. Deadrick & Madigan, 1990).

  • 14

    Substantively speaking, job performance differ from task performance in that job

    performance is multidimensional and made up of many tasks, while task performance is

    typically represented by a single facet of the job. Researchers have questioned the

    generalizability of using tasks as a criterion, especially those measured in short time frames

    (i.e. a matter of minutes), noting that they shared little resemblance to job performance

    criteria measured in the environment of an applied setting (Barrett et al., 1989; Farrell &

    McDaniel, 2000).

    As a consequence, experiments used to examine skill acquisition in task

    performance over time consist mostly of student samples or of individuals taken out of the

    job context. Such studies have a tendency to isolate the individual from real and complex

    high stake scenarios where adapting, understanding, and successfully performing the

    elements that comprise a job is imperative. In cases where actual workers were included

    in an applied context, inquiries into dynamic criteria failed to design predictive studies

    with an expressed point of entry into a new job, a training period, a new position, or after

    an organizational intervention. Such studies, instead, capitalized on samples made up of

    individuals with differing experience levels, and, possibly, in separate employee stages (i.e.

    transitional or maintenance stages). To ensure shared equivalent histories, samples

    should consist of a cohort that is in the same or comparable level of entry, training, or

    promotion.

    Finally, while increasing empirical evidence has been used to verify the changes in

    predictive validities over time through the examinations of simplex matrices (Ackerman,

    1987, 1988, 1992; Deadrick & Madigan, 1990; Henry & Hulin, 1987; Hofman et al., 1992,

    1993; Hulin, 1990) the use of simplex patterns to support dynamic criteria suffers from

  • 15

    many limitations. For instance, the simplex pattern provides little information about

    intraindividual change (i.e. changes within an individual) over time and does not shed light

    on the nature of the pattern changes. A growing body of research has begun to transition

    from the use of autoregressive simplex patterns which primarily allow for modeling the

    effects of past performance scores on future performance scores, to investigations of

    intraindividual change in latent trajectories (Deadrick, Bennett, & Russell, 1997; Hofmann

    et al., 1992, 1993; Ployhart & Hakel, 1998; Stewart & Nandkeolyar, 2006; Sturman &

    Trevor, 2001; Thoreson, Bradley, Bliese, & Thoreson, 2004; See Appendix D for a full

    review of latent performance trajectories in examining dynamic criteria).

    While not fully enveloped into the mainstream literature, the notion that job

    performance varies over time for a given employee is becoming increasingly accepted in

    the field of I-O psychology. Furthermore, the relative ubiquity of the simplex pattern in

    almost all studies concerning ability-performance relationships over time, and the

    examinations of latent performance trajectories has contributed to a firm empirical

    foundation for support of the dynamic criteria phenomena. In light of these findings, it

    seems necessary to reevaluate the practice of using a single indicator of performance in

    validation studies, and address the aforementioned criticisms and limitations in an effort to

    move toward a more accurate understanding of how selection devices fair over time. By

    not examining dynamic criteria in relation to even the most common of selection methods,

    I-O researchers may limit key conceptual understanding of the evolutionary sources of

    variance in performance, and ostensibly deny increases in economic gains from proper

    method selection.

  • 16

    CHAPTER 2

    CURRENT STUDY

    The current study draws heavily on Murphy’s (1989) dynamic model of

    performance to address the limitations concerning boundary conditions, study design,

    participants utilized, and operational definitions of criteria. Consistent with Murphy’s

    distinction between ability and dispositional variables, two sets of analyses were

    conducted. The first involved selection devices used to measure cognitive ability as

    representative of the ability variables identified by Murphy. The second set of analyses

    consisted of Big Five personality inventory dimensions as representative of dispositional

    variables. In order to capture the progression from the transitional stage to the

    maintenance stage, criterion-related predictive validity studies containing an initial

    starting point of entry into a new job, a training period, a new position, or after an

    organizational intervention were identified. As Murphy’s model is specifically associated

    with conditions within a business environment, participants and the criteria of interests

    were represented by actual workers in the field appraised through job performance

    measures or participants in a real job training scenario. The use of stable work cohorts

    that are in the same or comparable level of entry, training, or promotion were used to

    address Barrett et al.’s (1985) criticism of contamination from unmatched samples and

    ensured equivalent sample histories. Furthermore the use of meta-analytic techniques

    were used to address Barrett et al.’s (1985) claim that dynamic criteria is the product of

    temporal unreliability, range restriction, and insufficient power.

  • 17

    Purpose

    The purpose of this study was to separately determine the criterion-related

    validities of common selection devices, namely cognitive ability measures and personality

    inventories, in relation to job performance over time through the use of meta-analytic

    techniques. When predictive validities of the common selection devices were, indeed,

    dynamic, further steps were taken to determine the nature of the performance trends in

    terms of directional changes in magnitude and linearity. The following section is an

    overview of the two selection devices (i.e. cognitive ability tests and personality

    inventories) chosen for the current study, their relation to the job performance criterion as

    determined by previous research, and an examination of how time has been explored as a

    source of systematic variance in test validities for each predictor classification.

    Cognitive Ability Measures

    The one consistent finding concerning the dynamic nature of the predictor –criteria

    relationship is that time-lagged correlations between ability measures and performance

    have a tendency to deteriorate over increasing intervals (Henry & Hulin, 1987; Hulin et al.,

    1990; Keil & Cortina, 2001). The majority of ability measures in the dynamic performance

    literature are generally characterized by assessments of cognitive ability (Alvares & Hulin,

    1972; Bass, 1962; Ghiselli & Haire, 1960, Flieshman & Hemple, 1954, 1955; Fleishman &

    Rich, 1963; Humphreys, 1968; Lin & Humphreys, 1977; Parker & Fleishman, 1959),

    psychomotor ability (Ghiselli & Haire, 1960, Fleishman, 1960; Flieshman & Hemple, 1954,

    1955; Fleishman & Rich, 1963; Hinrichs, 1970; Parker & Fleishman, 1959) and sensory

    perception (Ackerman, 1988, 1990; Ackerman & Kanfer, 1993; Ackerman, Kanfer, & Goff,

    1995; Fleishman, 1960; Flieshman & Hemple, 1954, 1955; Fleishman & Rich, 1963;

  • 18

    Hinrichs, 1970; Parker & Fleishman, 1959; Powers, 1982) in relation to experimental task-

    performance and educational assessments over time.

    Due to the inherent differences between controlled task-based experiments and the

    workplace, it is difficult to fully generalize task-proficiency as a criterion to job

    performance. Unfortunately, there are surprisingly few studies that explore the criterion-

    related validity of general mental ability (GMA or g) over subsequent measures of job

    performance. The purpose of this portion of the study is to contribute to the dynamic

    criteria literature by exploring the dynamic nature of individual GMA-job performance

    validities over time by addressing the following questions: Are GMA-performance

    validities, indeed, dynamic? If so, what is the nature and direction of the validity patterns

    when plotted across time, and, finally, what implications do systematically changing

    validity patterns have on validity generalization and utility issues? The following is an

    overview of the current state of the literature regarding the use of GMA as a selection tool

    and the predictive validities found in terms of job performance.

    Interest in the relationship between cognitive ability and job performance has

    predominately been approached in I-O psychology through the use of Spearmanian

    frameworks (Lang, Kerstring, Hulsheger, & Lang, 2010). In 1904, Charles Spearman

    proposed a two-factor theory of abilities that included general cognitive ability (g) and one

    or more specific abilities (s). The conceptualization of GMA was used to explain the

    positive manifold present across a set of ability tests. Specific abilities refer to unique test

    properties that correspond to the variance in ability tests not attributed to a latent GMA

    construct or error. When applied to certain factor analytic techniques, cognitive ability

    tests reveal a multiple factor solution, but a second-order factor analyses based on the

  • 19

    correlation matrices of the first-order dimensions do commonly result in a single factor

    (Carroll, 1993). As a result, GMA is characterized as a higher-order factor that accounts for

    the variance in narrower first-order content ability factors.

    Research findings have clearly established GMA as an important predictor of job

    performance (Campbell, Glasser, & Oswald, 1996; Ree & Earles, 1992; Schmidt & Hunter,

    1998). From a theoretical perspective, GMA is linked to general models of job performance

    by directly influencing both declarative and procedural knowledge. According to

    Campbell’s (1990) model of job performance, declarative and procedural knowledge are

    determinants of job performance, thus, GMA influences the level of job performance

    indirectly (e.g. Ackerman, 1987; Schmidt & Hunter, 1993, 1998; Schmidt et al., 1986). As

    such, the acquisition of knowledge and the necessary skills to perform a job during training

    and maintenance of those knowledge and skills throughout an employee’s tenure is highly

    influenced by GMA (Jensen, 1998; Ree, Earles, & Carretta, 1998). Abundant empirical

    evidence demonstrates that GMA predicts training and job performance across numerous

    jobs and job families (Carretta, Perry, & Ree, 1996; Chan, 1996; Crawley, Pinder, & Herriot,

    1990; Hunter & Hunter, 1984; Ree & Earles, 1992; Roth & Campion, 1992, Salgado, 1995;

    Schmidt & Hunter, 1998; Vineburg & Taylor, 1972). For example, Hunter and Hunter

    (1984) conducted a broad-based meta-analysis to assess the validity of GMA for both

    training and job performance criteria. Their analysis included several hundred jobs across

    numerous job families, as well as reanalysis of data from previous studies. The authors

    estimated a true validity of GMA as .54 for training criteria and .45 for job performance

    with the predictive validity of GMA increasing as a function of job complexity.

  • 20

    Cognitive Ability and Dynamic Criteria

    The substantial body of research conducted to examine the predictive validity of

    GMA and job performance (Hunter & Hunter, 1984; Jensen, 1986; Ree & Earles, 1992; Ree

    et al., 1994; Schmidt, 2002; Schmidt & Hunter, 1998) treated performance as a stable

    criterion, and therefore collected data during a single period of time, used a cross-sectional

    sample, or validated the measures through concurrent design, thus resulting in a lack of

    evidence to support the notion of unstable predictive validities in GMA-job performance

    relationships. In light of the limited resources in applied psychology, a number of studies

    have used cognitive based entrance exams such as the Scholastic Aptitude Test (SAT, e.g.,

    Butler & McCauley, 1987; Mael & Hirsch, 1993) and the Law School Admission Tests (LSAT,

    e.g., Hathaway, 1984; Powers, 1982) as predictors of Grade Point Average (GPA) over

    subsequent semesters or years. Other researchers have relied on previous GPA or aptitude

    composites as predictors of future GPA (Humphreys, 1960, 1968; Humphreys & Tabet,

    1973; Lin & Humphreys, 1977; Powers, 1982; Winterbottom, Pitcher, & Miller, 1963).

    Overall, results showed a general deterioration of predictive validities over time, but this

    finding in not consistent across all educational studies (e.g. Powers, 1982; Winterbottom, et

    al., 1963). Barrett and Alexander (1989) attributed the mixed results in educational

    studies and the “fleeting nature of the prediction of grades” to incomparable metrics for the

    criteria. They argued that GPAs from different schools, across different courses, and

    curricula did not comprise the same measurement scale.

    Much of the dynamic criteria literature produced from experimental psychology

    utilized task performance as the central criteria (Ackerman, 1986, 1988, 1992; Ackerman &

    Kanfer, 1993; Ackerman et al., 1995; Ackerman & Woltz, 1994; Fleishman & Hempel, 1954,

  • 21

    1955, Fleishman & Rich, 1963; Keil & Cortina, 2001; Parker & Fleishman, 1959).

    Recognizing the limitations associated with the use of task performance as a criterion, a

    number of studies introduced criteria that directly represented the elements comprising

    job performance (e.g. Farrell & McDaniel, 2001; Kolz, McFarland, & Silverman, 1998;

    Schmidt et al.,1988). Unfortunately, these studies suffered from the use of cross-sectional

    data, which, in the context of dynamic performance, provides no opportunity for examining

    within-person changes in individual differences (Hulin et al., 1990) and relies on two

    critical assumptions: That the mean level of the characteristic does not vary with time (i.e.

    cohort equivalence), and that characteristics of the hiring process remain stable over time

    (Sturman, 2007). If the two assumptions are not met, specification error may distort the

    results. Of the handful of studies that do examine the changes in GMA-job performance

    using a longitudinal design (i.e. Bass, 1962; Deadrick & Madigan, 1990; Deadrick et al.,

    1997; Ghiselli & Haire, 1960) mixed results have been found in regard to the directionality

    of the predictive validities over time.

    Personality Tests

    Inquiries into the phenomena of systematically decaying predictor-criteria

    relationships primarily focus on individual differences in abilities as predictors of

    performance (Austin et al., 1989; Henry & Hulin, 1987; Hulin et al., 1990), but little effort

    has been made to determine if the predictive validities of dispositional variables, such as

    personality, behave in a similar fashion. Henry and Hulin (1987) claimed that the principle

    of decreasing predictive validities can be found in nearly every longitudinal study involving

    any type of individual differences, including personality. Unfortunately, longitudinal

    examinations of personality-performance relationships are rare making it difficult to verify

  • 22

    Henry and Hulin’s (1987) claim. The paucity of information regarding the influence of time

    on personality-performance relationships has left a vacuum in the dynamic criteria

    literature that requires further exploration. The purpose of this portion of the study is to

    fill in the gaps concerning the dynamic nature of individual personality trait-performance

    validities over time by satisfying the following questions: Are personality-performance

    validities, in fact, dynamic? If so, what is the nature and direction of the validity patterns

    when plotted across time, and, finally, what implications do systematically changing

    validity patterns have on validity generalization and utility issues? The following is an

    overview of the current state of the literature regarding the use of personality inventories

    as a selection device and the predictive validities found in terms of job performance.

    Prior to the 1990s, personality testing was generally considered an inferior method

    for selecting employees. This view was qualified by low validities in personality-job

    performance relationships (Hogan, 2005; Schmitt, Gooding, Noe, & Kirsh, 1984) and the

    lack of standardized frameworks to support and organize the dizzying array of available

    personality measures (Barrick & Mount, 1991; Hurtz & Donovan, 2000; Ones, Mount,

    Barrick, & Hunter, 1994). Renewed interest in personality inventories began as mounting

    evidence of a five-dimension factor solution emerged across qualitatively different studies

    (Cattell, 1946; Digman & Inouye, 1986; Fiske, 1949; Goldberg, 1981, 1990; John, 1990;

    McCrae & Costa, 1985, 1987; Peabody & Goldberg, 1989; Saucier & Goldberg, 1996; Tupes

    & Christal, 1961). The prominence of a five-factor model of personality, later dubbed the

    “Big Five” by Goldberg (1981), resulted in the creation of multiple personality inventories

    ranging from Trait Descriptive Adjectives (TDA, Goldberg, 1990, 1992), questionnaires

    (NEO Personality Inventory Revised, NEO PI R, Costa & McCrea, 1992; NEO FFI, Costa &

  • 23

    McCrae, 1989, 1992), and short phrase assessments (Big Five Inventory, BFI, John &

    Srivastava, 1999). The prototypical Big Five personality factors are commonly indentified

    as Extraversion, Agreeableness, Conscientiousness, Emotional Stability (also referred in

    reverse pole as Neuroticism), and Openness to Experience. Each broad personality trait is

    comprised of several narrow facets varying in number and substance depending on the

    measure in question.

    Conceptually, Extraversion (Factor I) implies an energetic disposition toward the

    social and material world, and refers to the extent to which a person is talkative, lively,

    assertive, excitable, and emotionally positive. Agreeableness (Factor II) contrasts a

    prosocial and communal orientation with antagonism, and refers to the extent to which a

    person is good-natured, helpful, trusting, and cooperative. Conscientiousness (Factor III)

    describes socially prescribed impulse control that facilitates task and goal directed

    behavior, such as thinking before acting, delaying gratification, and following rules.

    Conscientiousness, also, refers to the extent to which a person is consistent, organized,

    careful, self-disciplined, and responsible. Neuroticism (Factor IV) contrasts emotional

    stability and even-temperedness with negative emotionality, such as feelings of

    nervousness and anxiety. Finally, Openness to Experience (Factor V) describes the

    breadth, depth, originality, and complexity of an individual’s mental and experiential life.

    People high in Openness are commonly described as imaginative, independent, and having

    a preference for variety (John & Srivastava, 1999).

    The application of the five-factor model as a legitimate selection tool coincided with

    notable meta-analytic findings from Barrick and Mount (1991) and Tett, Jackson, and

    Rothstein (1991). Both studies identified Conscientiousness as one of the few viable Big

  • 24

    Five personality traits for predicting job performance. Conscientiousness has been shown

    to provide consistent positive associations with job performance across a multitude of

    occupations and job situations (Barrick & Mount, 1991; Barrick, Mount, Judge, 2001; Hurtz

    & Donovan, 2000; Salgado, 1997, Tett et al., 1991; Vichur, Schippman, Switzer, & Roth,

    1998). Furthermore, Conscientiousness tests are recognized as adding an 18 percent

    increase in incremental predictive validity beyond cognitive ability in predicting job

    performance (Schmidt & Hunter, 1998).

    Aside from Conscientiousness, the rest of the superordinate Big Five personality

    dimensions have shown little generalizable predictive relationships with performance

    across jobs, and in many cases validities approach zero (Barrick & Mount, 1991; Barrick et

    al., 2001; Hurtz & Donovan, 2000; Salgado, 1997). However, there are specific occupations

    and situations where personality traits, such as Extroversion and Openness, manifest as

    meaningful predictors. Extroversion, for instance, does seem to have particular salience for

    sales effectiveness (Barrick, Stewart, & Piotrowski, 2002; Vinchur et al., 1998). Likewise,

    Openness has been linked to the ability to adapt to changing work roles and demands

    (Stewart & Nandkeolyar, 2006). Judge, Thoresen, Pucik, and Welbourne (1999) reported a

    statistically significant positive relationship between Openness and a manager’s ability to

    cope with various organizational changes, including mergers, acquisitions, and downsizing.

    Similarily, LePine, Colquitt, and Erez (2000) found that Openness helped participants adapt

    to changing task demands in a computerized decision-making simulation.

    Personality and Dynamic Criteria

    While very little empirical data have been gathered regarding the temporal nature

    of the personality-performance relationship, there are two competing perspectives that can

  • 25

    be used to hypothesize the pattern of directionality and linearity of the projected validity

    coefficients. The first perspective involves the precedent of a universal simplex pattern set

    by previous inquiries into dynamic criteria. Humphreys (1985) argued that the simplex

    pattern of correlations can be found in any data pertaining to individual differences and

    performance over time. If personality dimensions do follow the assumptions of a simplex

    pattern, predictive validities would degrade over time in a manner consistent with results

    reported for time-lagged ability-performance estimates. In their examination of GMA, the

    Big Five personality dimensions, and career success, Judge, Higgins, Thorsen, and Barrick

    (1999) reported that each Big Five trait produced decreasing validities when related to

    career success across five time intervals. Burrus (2006) also found evidence of decreasing

    predictive validities for Conscientiousness over 16 task trials given to students in a

    laboratory study designed to examine dynamic performance. The study did suffer from key

    limitations: Simulated tasks did not reflect the complexity and multi-dimensionality of job

    performance, sample size was not large enough for adequate power, and the trials only

    took place over the course of a week.

    The second perspective is based on claims that predictive validities for personality

    dimensions actually increase over time as opposed to following a simplex-like pattern.

    Such alternative views originate from Helmich, Sawin, and Carsurd’s (1986) examination of

    the strength of the personality-performance relationship across time within a relatively

    consistent job context. According to Helmreich et al. (1986), cognitive ability was an

    important determinant of early performance but eventually declined. The non-cognitive

    measures (i.e. measures of achievement motivation and interpersonal orientation), on the

    other hand, increased in predictive validity from a relatively low starting point. Helmreich

  • 26

    et al. (1986) attributed the switch in predictive magnitude from cognitive ability to

    personality to, what they described as, the “honeymoon effect.” The honeymoon effect is

    characterized as the time period early in a job when everything is new and exciting. During

    this period the employee utilizes cognitive ability to absorb the organization’s culture,

    values, work systems, and the necessary knowledge and skills to perform the job. Once the

    novelty begins to wane some employees become increasingly disenchanted. At this point,

    personality becomes more salient as a predictor of job performance.

    Murphy (1989) expanded on Helmreich’s et al. (1986) conceptualization of the

    honeymoon effect in his model of dynamic performance. Progression from the transition to

    the maintenance stage, in essence, represents an employee’s changing reliance on GMA to

    dispositional variables. In this case, the rank-order of performance measure scores are

    argued to change with time, but movement from the transition to maintenance stage

    results in increasing personality-performance validity coefficients as opposed to simplex-

    like decreases.

    While empirical examinations of Murphy’s (1989) model of dynamic performance

    solely focused on the relationships between GMA and job performance over time (Deadrick

    & Madigan, 1990; Deadrick et al., 1997), Thoreson et al. (2004) was the first to draw on

    Murphy’s (1989) distinction between transition and maintenance stages to examine Big

    Five personality traits in the context of individual changes in latent performance

    trajectories over time. Thoresen et al. (2004) reported that Openness and Agreeableness

    were positively associated with mean performance and performance change over time

    during the transitional stage but not in the maintenance stage. Conscientiousness and

    Extraversion were positively associated with mean performance and performance change

  • 27

    over time in the maintenance stage but not in the transition stage. Unfortunately, Thoresen

    et al. (2004) did not longitudinally examine a single sample as they gradually transitioned

    from one phase to the other, but, instead, compared two samples, each characterized by the

    transition or maintenance scenario, over four three month periods. A single sample may

    have revealed a more complete picture of the latent trajectories of the Big Five dimensions.

    Thoresen et al.’s (2004) results did, however, provide empirical evidence that

    certain predictive validities did, in fact, increase over time, at least within the designated

    employment stage. Of particular interest is the finding that personality traits play separate

    roles based on the situational characteristics found in either the transition or maintenance

    stages. For example, Openness has been associated with an individual’s responsiveness to

    changes in job demands (Stewart & Nandkeolyar, 2006), and may play a more significant

    role in the transitional stage where an individual is faced with novel challenges. On the

    other hand, Conscientiousness is argued to positively influence job performance for both

    stages due to its generalizability across occupation and job situations performance (Barrick

    & Mount, 1999; Barrick et al., 2001; Hurtz & Donovan, 2000; Mount & Barrick, 1995;

    Salgado, 1997), but may produce positively sloping performance growth trajectories as

    employees transition into the maintenance stage as suggested by Helmrich et al. (1986)

    and Murphy (1989). Zyphur, Bradley, Landis, and Thoreson (2008) also found

    Conscientiousness to increase in predictive validity in their study to examine the extent to

    which cognitive ability and Conscientiousness predict initial GPA and changes in predictive

  • 28

    performance over the course of college student careers. Consistent with Murphy’s

    (1989) theoretical assertions, cognitive ability did predict initial performance, but

    beyond the 3rd semester, Conscientiousness became a better predictor of student

    performance over cognitive ability.

    The influence of time on the relationship between common selection devices

    and job performance remains elusive in the I-O literature. A majority of validation

    studies utilize concurrent designs over predictive methods and very rarely compare

    estimates at separate time points. Studies that have longitudinally examined

    construct specific predictive validities have produced mixed results as to the

    directionality of coefficient patterns, or have been plagued by study artifacts. The

    goal of the current study was to clarify the aforementioned issues by meta-

    analytically examining the criteria-related validates of two common selection

    instruments (i.e. GMA and FFM measures) in relation to performance over time. To

    date, no study has either provided a quantitative review of the degree to which time

    attributes to observed variances for distinct constructs within a real work context or

    the systematic patterns of change derived thereof.

    Scientific progress in understanding the nature of the dynamic criteria

    phenomena is optimally based on the evaluation and extension of theoretical and

    empirical findings from within-person data. Cross-sectional designs can rely on

    untenable assumptions and are fundamentally limited for understanding individual-

    level change processes, however time-bound predictive validity studies, especially

    those containing multiple time points through a longitudinal design provide an

    optimal basis for describing change patterns. Unfortunately, longitudinal

  • 29

    examinations of changes in predictive validities are relatively scarce due to the time,

    resources, and effort required. In the case of the current study, time-bound

    criterion-related validity studies, consisting of either single or multiple time points,

    were integrated to create generalized validity estimates across a progressive

    timeframe though meta-analytic techniques.

    The meta-analysis approach also allowed for the examination of moderators

    that are difficult to examine in primary studies alone. In the case of the current

    study, the principle moderator of interest was time. Time points from each primary

    study were progressively plotted to establish the directional trends in changing

    validities through the use of weighted least squares (WLS) regression. Finally,

    polynomial terms were introduced into the regression equation in an effort to

    model linear and curvilinear trends in the relationships between time and the

    indicated predictor-performance coefficients.

  • 30

    CHAPTER 3

    METHODS

    In the current study, a separate meta-analysis was conducted for each of the

    designated selection devices (i.e. GMA and Big Five personality inventories) in an effort to

    establish construct based boundary distinctions that would, otherwise confound valuable

    insights into the nature of dynamic performance if combined in the same study. The meta-

    analyses were used to examine the time-bound relationships between the indicated

    selection instrument as a predictor and a performance criterion specific to work-related

    samples. Further analysis consisted of examinations of the data categorized by criterion

    type (i.e. training and job performance).

    Literature Search per Selection Device

    Cognitive Ability Measures: An extensive literature search was conducted to identify

    studies with explicitly time-bound predictive validity coefficients for GMA measures and

    performance. First, meta-analyses conducted by Hunter and Hunter (1984), Levine,

    Spector, Menon, Narayanan, and Cannon-Bowers (1996), Schmitt et al. (1984), and Schmidt

    and Hunter (1998) were used to locate previously identified criterion-related validity

    studies that utilized some form of job or training performance as a criterion and GMA as a

    predictor. Second, published studies were identified using a computer-based literature

    search in PsycInfo, Business Complete Resources, and ProQuest Thesis and Dissertations

    using keywords such as cognitive ability, general mental ability, intelligence, g, Armed

    Forces Qualification Test, Wonderlic, and the Generalized Abilities Test Battery with

  • 31

    performance, job performance, training performance, selection, promotion, and validation.

    Search items included peer-reviewed articles, popular-press articles, books, edited book

    chapters, and unpublished dissertations. Third, studies were identified using a manual

    search in the following journals: Journal of Applied Psychology, Personnel Psychology,

    Academy of Management Journal, Human Performance, Journal of Management, and

    Organizational Behavior and Human Decision Processes. The literature search yielded an

    initial total of 4,245 articles, reports and dissertations.

    Personality Inventories: A comprehensive literature search was conducted to

    identify studies with explicitly time-bound predictive validity coefficients between the

    dates of January 1992 and September 2012. According to Hurtz and Donovon (2000),

    previous meta-analyses that examined the role of Big Five personality dimensions in

    relations to job performance (i.e., Barrick & Mount, 1991; Salgado, 1997; Tett et al., 1991)

    mapped personality predictors not explicitly designed to measure Big Five dimensions

    onto actual Big Five dimensions which can potentially threatened construct validity and

    lead to inaccurate conclusions. The year 1992 marks the beginning development of

    empirically validated Big Five personality inventories (e.g. NEO Personality Inventory,

    Costa & McCrae, 1992; Goldberg’s Big Five markers, Goldberg, 1992) for application in a

    business context. First, the meta-analysis conducted by Hurtz & Donovon (2000) was used

    as a starting point to identify relevant criterion-related validity studies. The authors

    limited their article search to studies with established Five-Factor Model inventories as

    predictors and performance as the criterion. Second, published studies were located using

    a computer-based literature search in PsycInfo (1992 – 2012), Business Complete Resource

    (1992 – 2012), and ProQuest Thesis and Dissertations using keywords such as personality,

  • 32

    five factor model, big five, conscientiousness, extraversion, emotional stability, neuroticism,

    openness, and agreeableness, with performance, job performance, training performance,

    selection, promotion, and validation. Search items included peer-reviewed articles, popular-

    press articles, books, edited book chapters, and unpublished dissertations. Third, studies

    were identified using a manual search in the following journals for previously designated

    period of time: Journal of Applied Psychology, Personnel Psychology, Academy of

    Management Journal, Human Performance, Journal of Management, and Organizational

    Behavior and Human Decision Processes. The literature search yielded an initial total of

    1,519 articles, reports and dissertations.

    Criteria for Inclusion

    For a study to be included in the present meta-analyses, seven criteria had to be

    met:

    1. The study had to use actual workers as participants. Educational studies were

    generally excluded in cases where experiments were conducted on students over the

    duration of a college semester. However, students in educational settings that could be

    argued as vocational training (i.e. medical school, trade school, specialty training) were

    considered.

    2. The study had to include one of the two selection devices (i.e., Cognitive Ability

    Measures, and Personality Inventories) as a predictor of interest. Due to the relatively

    stable nature of both general intelligence and personality it was not necessary for the

    researchers to gather either cognitive ability or personality inventory data during the

    hiring or promotion process. Data pertaining to cognitive ability or personality could be

    collected at any point and used as a predictor.

  • 33

    A number of stipulations for personality inventories were also required: The

    personality measures used for each study had to either have been explicitly designed a

    priori to measure one or all dimensions of the Five Factor Model (i.e., Extraversion,

    Conscientiousness, Emotional Stability/Neuroticism, Agreeable, and Openness to

    Experience) or sizeable empirical evidence had to show that a measure could be

    significantly reduced to load on the Big Five dimensions. Five established a priori

    measures were identified in the studies collected for the present analysis: the NEO

    Personality Inventory Revised (NEO-PI-R) and the five factor inventory versions (NEO-FFI;

    Costa & McCrae, 1992), the International Personality Item Pool (IPIP; Golderberg, 1992),

    the Personal Characteristics Inventory (PCI; Barrick & Mount 1993), the Big Five Inventory

    (BFI; John & Srivastava, 1999), and Saucier’s Mini Marker (Saucier, 1994).

    3. The study had to include an explicit measure of job or training performance as a

    criterion.

    4. In terms of criterion-related validity, the study had to utilize a predictive design

    where an expressed point of entry into a new job, a training period, a new position, or after

    an organizational intervention. GMA and Big Five studies that utilized a concurrent design

    were included if the time period between entry and the criterion measurement was clearly

    designated and the sample shared equivalent histories.

    5. Primary study samples had to consist of a cohort with equivalent histories (i.e.

    the same or comparable level of entry, training, or promotion).

    6. The study had to report the sample size for each correlation presented.

    7. Finally, the time between a point of entry into a new job, training, a new position,

    or after an organizational intervention and the criterion measurement had to be firmly

  • 34

    established. The timeframe had to be greater than a week. In cases where all other

    inclusion criteria were met, attempts to contact the principle authors concerning the

    timeframe of the study were made.

    Coding Procedure

    For each individual study, the correlates between the selection instrument and the

    performance criterion were coded along with sample sizes and scale reliabilities when

    available. Longitudinal studies that included coefficients for more than one time point

    were divided into the number of time points present (e.g. entry point to year 1, entry point

    to year 2, entry point to year 3, etc.) and treated as independent data points for the

    analysis. Information for the time moderator was also coded, and converted to the smallest

    increment of measurement present in the analyses (i.e. days). All personality variables

    were based on broad-based dimensions (i.e. Agreeableness, Extroversion, etc.) as opposed

    to the narrow dimensions that comprise the broad-base identifiers. If a primary study

    solely relied on a narrow dimension in relation to performance, or in cases where more

    than one narrow dimension of the same principle construct was used, the correlations

    were averaged and subsumed under the coinciding Big Five dimension.

    Of the initial total, 30 articles were considered for inclusion in the GMA analysis and

    30 for the Big Five analyses. The 30 GMA articles yielded a total of 49 validity coefficients.

    For the Big Five analyses, a total of three articles were excluded due to substandard FFM

    measures (i.e. produced low correlations when compared to established Big Five

    inventories). The 27 remaining studies yielded a total of 37 validity coefficients for

    Extraversion, 36 for Agreeableness, 42 for Conscientiousness, 39 for Neuroticism, and 37

    for Openness.

  • 35

    An outlier analysis was conducted using the Sample-Adjusted Meta-Analysis

    Deviancy technique (SAMD; Arthur, Bennet, Huffcutt, 2001). The SAMD identifies potential

    outliers by comparing the value of each study coefficient to the mean sample weighted

    coefficient computed without the coefficient in the analysis. The difference is then adjusted

    for the sample size in the study. Scree plots of the SAMD values and subjective

    comparisons were used to isolate individual study coefficients. Given the nature of the

    current study, outliers could possibly represent systematic changes in coefficients given

    the amount of elapsed time and sample size. No outliers were identified for any of the GMA

    or FFM dimensions save two coefficients for Extroversion (e.g. Lievens, Ones, & Dilchert,

    2009: Time 1, Ployhart, Lim, & Chan, 2001: AC performance) and one for Agreeableness

    (Rothstien, Paunonen, Rush, & King, 1994). With these cases removed, both Extroversion

    and Agreeableness analyses were left with a total of 35 predictive validities. Detailed lists

    of the articles used for GMA and the FFM dimensions are provided in Tables 1 and 2.

    Data Analysis

    Meta-analyses procedures based on a correlation model were conducted for each

    selection device using Arthur et al.’s (2001) SAS PROC MEANS. Arthur et al.’s method

    assumes a random effects model which is in line with the study’s assumption that

    population effect sizes are variable. Sampling error was calculated in a manner consistent

    with Hunter and Schmidt’s (2004) methods where sample-weighted average correlations,

    sample-weighted variances, and sampling error variances were computed to identify and

    remove the variance attributed to sampling error from the total variance across

    coefficients. Furthermore, the current study utilizes meta-analytic techniques developed

    by Raju, Burke, Normand, and Langlois (1991) to correct for attenuating artifacts (i.e.

  • 36

    measurement error and range restriction) as a departure from traditional validity

    generalization correlation methods originated by Schmidt and Hunter (1977). Raju et al.’s

    (1991) procedure allows for estimating mean population-level correlations and

    population-level variances when attenuating artifact information is only sporadically

    presented in the primary studies. Traditional correlational methods rely on population

    values to correct for measurement error and range restriction. The general lack of

    available population values undermines researchers’ efforts to produce accurate results

    under the assumptions of the traditional correlational method. Previous researchers have

    attempted to overcome this limitation through the use of hypothetical artifact distributions

    or sample-based artifact distributions. Unfortunately, hypothetical artifact distributions

    can limit the accuracy of mean and variance estimates of validities if they do not closely

    match true artifact distributions and sample-based artifact distribution are subject to

    sampling error within the attenuating artifacts which may bias results (Raju et al., 1991).

    Raju et al.’s (1991) procedure provides a more accurate method for estimating the mean

    and variance of the population correlation by disregarding the assumption that population

    correlations (ρ), predictor reliability, and criterion reliability are not correlated across

    populations. Raju et al.’s (1991) procedure allows for observed correlations and study

    reliabilities to be obliquely measured across studies in the analysis. Arthur et al.’s (2001)

    SAS PROC MEANS was adapted to meet Raju et al.’s (1991) assumptions.

    Moderator Detection

    Multiple tests for homogeneity were conducted to detect possible moderators: The

    75% Rule (Hunter & Schmidt, 2004), the Q-statistic (Hunter & Schmidt, 2004), credibility

    and confidence intervals (Hunter & Schmidt, 2004; Whitener, 1990). The 75% Rule states

  • 37

    that if 75% or more of the variance is accounted for by the ratio of artifactual variance to

    corrected observed variance, then the rest of the variance is considered a function of

    uncorrectable artifacts. If less than 75% of the variance is accounted for then a moderator

    may be present. The Q-statistic tests the hypothesis that the observed variance is the

    product of sampling error and attenuating artifacts. A significant chi-square value

    indicates the presence of a potential moderator in the research domain. Both the 75% Rule

    and the Q-statistic have the lowest occurrences of Type I error and the highest power rates

    for meta-analysis consisting of 60 to 100 studies when compared to other moderator

    detecting techniques (Sagie & Koslowsky, 1993). The credibility intervals are computed

    around the corrected population variance and provide the variability of individual

    correlations in the population, as well as, lower bound estimates. The size of the credibility

    intervals and the inclusion of zero are indicative of a moderating variable. Confidence

    intervals provide an estimate of the variability of the corrected mean correlation due to

    sampling error and are computed around the corrected population correlation using the

    standard error of the mean correlation. The confidence interval provides a range of values

    for the mean effect sizes and indicates whether the corrected effects differ from zero.

    Moderator Estimation