Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what...

89
MEASUREMENT AND STATISTICAL ISSUES IN HUMAN RESOURCE MANAGEMENT A Primer for the Non-Expert Timothy A. Judge Department of Management Mendoza College of Business University of Notre Dame

Transcript of Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what...

Page 1: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

MEASUREMENT AND STATISTICAL ISSUES IN HUMAN RESOURCE

MANAGEMENT

A Primer for the Non-Expert

Timothy A. Judge

Department of Management

Mendoza College of Business

University of Notre Dame

©Timothy A. Judge, 2013

Page 2: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

MEASUREMENT AND STATISTICAL ISSUES IN HUMAN RESOURCE

MANAGEMENT

A Primer for the Non-Expert

OUTLINE

I. INTRODUCTION Page 2Importance of MeasurementImportance of Statistical Analysis

II. FUNDAMENTALS OF STATISTICAL ANALYSIS Page 5Central TendencyDispersionStandard ScoresNormal DistributionHypothesis TestingErrorsCorrelationRegressionMultiple Regression

III. PROBLEMS IN ESTABLISHING CAUSALITY Page 29

IV. MEASURING INDIVIDUAL DIFFERENCES Page 30ReliabilityStandard Error of MeasurementValidity of MeasuresCriterion-Related ValidityContent ValidityFace ValidityConstruct ValidityCross-ValidationValidity Generalization

V. CONFIRMATORY RESEARCH Page 45Decision AnalysisUtility AnalysisMeta-Analysis

VI. COMPUTER PACKAGES Page 54

VII. SUMMARY Page 55

Page 3: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 2 of 57

I. INTRODUCTION

After many years of saving, Jack had accumulated enough cash to buy a local

ice cream shop. One of Jack's first tasks was to figure out how to staff the shop. Being

a novice at this, Jack consulted his friend, Margaret, owner of the local hardware store.

Margaret advised Jack that she used the interview to get "the most knowledgeable

people possible," and recommended it to Jack because her people had "generally

worked out well."

While Jack greatly respected Margaret's advice, upon reflection several

questions came to mind. Given that there are several qualities important to a good ice

cream shop employee, how does one go about identifying and measuring the best

indicators of those qualities? Does Margaret's use of the interview mean that it meets

Jack's requirements? Jack also wondered that if he used the interview, how confident

could he be that his judgments would be the same as someone else's? Jack also needed

to hire a store manager. What characteristics would he need to look for in a strong

leader? Finally, how could Jack test if his chosen method of selecting employees was

effective or ineffective?

Jack also had another set of decisions to make. How could he determine if the

wage he offers differs greatly from the relevant labor market? Jack has heard that

entry-level employees often engage in counterproductive behaviors—stealing, showing

up late, taking off early, giving free ice cream to friends, etc. By what means could he

predict employees’ tendencies to engage in these behaviors in advance? How could

these relationships be compared with findings from other organizations? By what

means could Jack evaluate the effectiveness of a training and development program?

© Timothy A. Judge, 2013

Page 4: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 3 of 57

Finally, how can Jack ensure that his human resource decisions are fair and non-

discriminatory? Jack was unsure how to go about answering these questions.

These questions faced by Jack are just a few of the issues confronting

managers of human resources every day. While answering each question requires

knowledge of the specific practice under consideration, it is also essential that the

manager understand the measurement and analytical issues underlying each

question. Without measurement and statistical analysis, evaluation of practices

must be as subjective as Margaret's answer to Jack's question. The purpose of this

primer is to introduce you to the measurement concepts and statistical tools

essential to answer the questions facing managers of human resources, a few of

which were presented above.

Importance of Measurement

Imagine a world in which measurement of individual differences did not

exist, except within the mind of each individual. Every person would have his or her

own measure of a man or woman, but the standard would dwell solely within the

opinions and values of the individual. Inferences made about, and debates over, the

characteristics of individuals would be entirely subjective. Efforts to understand

and predict could not be undertaken because no knowledge would be generally

held. Further, because each individual would have his or her own set of standards

and measurements, general knowledge about people would be difficult to achieve.

Accepted standards of measurement provide a common metric against which

differences between individuals can be judged. To be sure, there is still room for

subjectivity and disagreements. However, measures allow the debate of individual

© Timothy A. Judge, 2013

Page 5: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 4 of 57

differences to reach a higher plane. Accepted standards of measurement enable us

to draw inferences based on procedures that have been tried and tested, allowing us

to be more objective and systematic in investigating our attribute(s) of interest.

The better the measure, the less decision error one risks over the true level

of the attribute. This has direct implications for managers. For example, the better

measure of friendliness Jack chooses, the fewer customers will be driven away by

employees (mistakenly identified as friendly) providing poor customer service.

Further, if Jack has difficulty measuring friendliness, accurately appraising whether

this is a wise selection strategy will be an arduous task. Finally, selection and

appraisal procedures that are not accurate predictors of true performance often

place one in jeopardy of litigation from disgruntled applicants.

Importance of Statistical Analysis

As just explained, measurement is an essential issue for the manager of

human resources to consider. Yet without analysis of those measures, measurement

itself is futile. It is probably safe to conclude that rather than being beset by a lack of

measurement information, most managers are overwhelmed by too much

information. For example, in formulating selection decisions the manager may have

information on hundreds of candidates on several different predictors. The use of

statistics is to make sense out of this mass of information.

As evidenced by Jack's dilemma, the typical manager is faced with a great

deal of uncertainty. While statistical analysis does not eliminate the uncertainty, it

provides the basis for better decisions to be made based on the data at hand.

Further, statistics are the tools that allow us to make inferences about our measures.

© Timothy A. Judge, 2013

Page 6: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 5 of 57

How reliable or consistent are the measures of the attribute(s) of interest? How

accurate or valid are they? This paper will introduce the ways in which we can

describe and make inferences about our measures of concern.

II. FUNDAMENTALS OF STATISTICAL ANALYSIS

Measuring individual differences is a detailed issue that will be addressed in

the next section. However, a pertinent question is: once we have a measurement,

what do we do with it? It is essential that the manager be able to analyze the

numbers measurement provides. Statistics are the methods we use to make sense

out of numbers, both to describe measures of attributes, and to infer knowledge

from them. In short, descriptive statistics are concerned with summarizing data in a

digestible manner; inferential statistics are concerned with estimating the likelihood

of certain phenomena given the results at hand. The statistics reviewed below can

be used for both descriptive and inferential purposes, depending on the goal of the

manager.

Central Tendency

Central tendency designates the typical response of a distribution. There

are three statistics commonly used to indicate central tendency. The mode refers to

the most frequent value. The median is the middle observation, or the point at

which half the observations fall above and half fall below. The mean of a set of

observations is the arithmetic average, or the sum of the set divided by the total

number of observations in the set. The mean is calculated using the following

formula:

© Timothy A. Judge, 2013

Page 7: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 6 of 57

M=∑ xn

Where: M = mean∑ x = sum of the observations, xn = number of observations

As an example, suppose we had the following set of performance scores from a

sample of Jack's employees (on a 100 point scale):

49,54,68,68,75,78,84,91,100

There is only one value, 68, that occurs twice. Therefore, it is the mode. The median

is 75—four observations fall above 75 and four fall below. The mean is 74.1, which

is the sum of scores (667) divided by the number of scores (9).

What are the advantages and disadvantages of each measure of central

tendency? The mode is most appropriate for summarizing qualitative data. For

example, if one was curious about the number of women working at a company

(perhaps to compare female representation of one's company to the relevant labor

market), the mode would describe the most common gender indicated. It may make

less sense to discuss mean or median gender. However, the mode suffers from

several disadvantages that limit its use. First, there may be more than one mode. If

another 91 were added to the above distribution, there would be two modes,

making it an ambiguous measure of central tendency. Second, the mode is very

sensitive to changes in a single value in the distribution. For example, if one of the

applicants scoring 68 instead scored 100, the mode would jump from 68 to 100

even though only one scored changed! For these reasons, the mode is generally only

used in describing qualitative data.

© Timothy A. Judge, 2013

Page 8: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 7 of 57

The median has the advantage of not being sensitive to extreme values in the

distribution. If the person who scored 68 instead scored 25, the median would not

change (four scores still fall below 75), whereas the mean would change

considerably (69.3). On the other hand, this insensitivity to extreme values can be a

disadvantage. Consider the following tests:

Test #1: 19,25,51,52,53

Test #2: 50,50,51,97,99

The median (51) is the same for both tests even though the placement of values is

radically different. The mean is capable of reflecting this difference (40 for test #1

versus 69.4 for test #2). Thus, sensitivity to extreme values can be both illustrative

and misleading. If the median and mean are vastly different, one should investigate

the cause of the difference, as each may provide an important piece of information

in describing the data.

While the mean and median are both acceptable methods of describing

central tendency, the mean has one characteristic that makes it the most widely

used measure of central tendency: its importance in drawing inferences about

central tendency (for example, to see if the average score for the above two tests are

significantly different). The median has computational properties that make it

problematic in inferential statistics. Thus, the mean is employed as the measure of

central tendency in most statistical analyses. In a subsequent section we will

illustrate the use of the mean in drawing inferences.

Dispersion

© Timothy A. Judge, 2013

Page 9: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 8 of 57

The obvious fact in studying individual differences is that individuals differ.

Dispersion, or variability, indicates the degree to which observations on individuals

depart from central tendency. The most common means of expressing dispersion is

the standard deviation, which indicates how far the observations on average deviate

from central tendency. The equation for the standard deviation (s) is:

s=√∑ (x i -M)2

n-1

Where: ∑ ( x i -M)2 = squared deviation of the ith observation, xi, from

the mean of the observations, M, summed over allobservations

n = number of observations1

From the previous example, the standard deviation of the first test is 16.6. The

standard deviation of the second test is 26.1. The higher standard deviation of test

#2 indicates that the scores are more dispersed.

Standard Scores

When comparing scores between two or more samples, often the raw value

alone does not provide full information on the relative status of the score. For

example, an individual scoring 80 on test #1 (with a mean of 40) is very different

from scoring 80 on test #2 (where the mean is 69.4). The former is 40 points above

the mean, the latter only 10.6. It is also important to consider, and control for, how

variable the scores are about the mean.

Standard Scores (Z Scores)

1 In finding the average deviation, why not simply average the deviations about the mean by subtracting each observation from the mean and dividing by the number of observations? The difficulty is that the average signed deviation from the mean is always zero. Therefore, one must take the absolute average deviation. The easiest way to do this is to square each deviation and then return it to its original units by taking the square root. If the square root is not taken, it is known as the variance.

© Timothy A. Judge, 2013

Page 10: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 9 of 57

Standard scores show the relative status of a score within a distribution, or,

as in the above example, between distributions. It indicates the number of standard

deviations the particular observation is above or below the mean. Therefore, it

adjusts for unequal means and variances between samples. It is calculated as

follows:

Z= ( x−M )s

where the terms are as previously defined. Continuing the example of the two tests,

we can calculate a standard score for someone scoring 80 on each test:

Z1 = (80−40.0)16.6

=2.41

Z2 = (80−69.4)26.1

=0.41

The person in the first test, scoring 2.41 standard deviations above the mean, did

relatively better than the individual in the second scoring 0.41 standard deviations

above the mean—even though their absolute score is the same. Standardizing

variables gives us a more complete picture of where the scores stand relative to

others within a distribution or across distributions.2

Percentiles

Another way of reporting standard scores is with a score with which the

reader undoubtedly has some experience, the percentile rank. Percentile rank refers

the percentage of scores in its frequency distribution that are the same or lower

than it. For example, if someone scores at the 80th percentile on a measure, the

2 The mean of standardized scores is always 0 and the standard deviation 1.

© Timothy A. Judge, 2013

Page 11: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 10 of 57

person scored equal to or higher than 80% of the other people who completed the

measure. The formula for computing percentile rank is:

PR=C l+0.5F i

N×100

Where: PR = percentile rank

Cl = the count of all scores less than the score of interest

Fi = the frequency of the score of interest

N = the number of individuals in the sample.

Returning to Jack’s distribution of scores:

Test #1: 19,25,51,52,53

Test #2: 50,50,51,97,99

For either test, the person who scored 51 would be at the following

percentile:

PR=2+0.5(1)5

×100=50(50 th percentile)

For Test #2, the person who scored 50 would be at the following percentile:

PR=0+0.5 (2)5

×100=20 (20 th percentile)

As you can see, percentile rankings change depending on the number and

distribution of scores. For example, if 50 still tied for the lowest score on Test #2 out

of 100 (as opposed to 5) test takers, the percentile rank becomes:

PR=0+0.5 (2)100

×100=1(1 st percentile)

Other Standard Scores

There are other ways of standardizing scores, often for the purpose of

providing feedback. Stanine scores standardize scores on a nine-point scale with a

© Timothy A. Judge, 2013

Page 12: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 11 of 57

mean of five and a standard deviation of two. So, for example, the bottom 4% of

scores represent the 1st stanine, the middle 20% of scores represent the 5th stanine,

and the top 4% of scores represent the 9th stanine. T-scores standardize scores so

that the mean is 50 and the standard deviation is 10. T-scores are computed as

follows:

T=50+10(X−M x )

sx

Where: X = Raw score of individual

Mx = Mean score of sample

sx = Standard deviation of sample scores

Returning again to Jack’s scores, the person who scored 51 on Test #1 would have:

T=50+ 10(51−40)16.58

=56.63

The person who scored 99 on Test #2 would have:

T=50+ 10(99−69.4)26.12

=61.33

Figure 1

Relationships Among Various Standard Score Measures

© Timothy A. Judge, 2013

Page 13: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 12 of 57

Figure 1 shows the relationships among z-scores, percentiles, stanines, T

scores, and the normal distribution. If scores are normally distributed, the

percentile rank is directly analogous to probabilities derived from the normal

distribution, a topic to which we turn next.

Normal Distribution

Observe Figure 2. It could be, for example, a distribution of scores on an

employment test. Note that the distribution is centered on (and has the greatest

Figure 2

The Normal Distribution

© Timothy A. Judge, 2013

Page 14: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 13 of 57

frequency about) the mean, is bell shaped with decreasing frequency of

observations as one gets farther from the mean. Also note that the distribution is

symmetric about the mean. Such a distribution is called a normal distribution.

One rather interesting property of the normal distribution is that approximately

68% of the scores fall within 1 standard deviation of the mean, approximately 95%

within 2, and approximately 99% within 3 standard deviations of the mean.

Figure 3

Height and the Normal Distribution

© Timothy A. Judge, 2013

Page 15: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Height is one of many variables that is normally distributed.

As we will see, though, it is important to remember that not everything is normally distributed.

Measurement and Statistics Primer Page 14 of 57

The normal distribution is referred to as "the workhorse of inferential

statistics" because once raw scores have been transformed into z scores, it is very

easy to refer them to tabled values of the standard normal distribution to find

probabilities associated with finding a value within the particular range of interest.

For example, if the population of scores for test #1 is normally distributed, the

probability of observing a z-score greater than 2.41 is about .02, indicating that

about 2% of individuals taking the test can be expected to score above 80.

Conversely, roughly 34% of individuals taking test #2 can be expected to be over 80.

While some attributes are approximately normally distributed (height,

weight, intelligence), many are not (income). One cannot use the normal

distribution for inferential purposes without assuming the values are approximately

normally distributed. However, the Central Limit Theorem allows us to assume

Figure 4

© Timothy A. Judge, 2013

Page 16: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 15 of 57

Not All Variables Are Normally Distributed

As you can see from this graph of income in the United Kingdom, income is one of those variables that is not normally distributed. (Source: Life in the Middle - The Untold Story of Britain’s Average Earners.)

that the distribution of means is approximately normally distributed as long as the

sample size is sufficiently large (usually at least 30), regardless of the distribution of

individual values. Therefore, even if the population is not normally distributed, the

distribution of sample means drawn from the population is. This allows

determination of probabilistic properties associated with mean observations from

the standard normal.

The standard normal distribution applies when the population standard

deviation is known. In practice, one seldom knows values of the entire population.

When the population variance is unknown, the Student's t-distribution can be used,

© Timothy A. Judge, 2013

Page 17: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 16 of 57

which closely resembles the standard normal. Tables for the t-distribution are also

widely published in statistics texts, and are precisely estimated by computer

packages (see section VI).

Hypothesis Testing

Human resource managers often want to make inferences about a population

or populations from which samples have been drawn. Remember that one of the

questions in Jack's mind was how his company's compensation level compared with

the relevant labor market. He may, for example, wish to compare the wage he is

offering to that of a competing company. As another example, Jack may wish to

compare pass rates on his selection measure between minorities and nonminorities

to assess if his hiring procedure adversely impacts upon minorities. For both these

investigations, Jack could take a sample of each group to assess if the means from

each population are equal or unequal. Since the sample drawn will not perfectly

reflect the population, the means will vary due to sampling error. Hypothesis

testing seeks to answer the question: at what point does the difference between the

means become so large that we dismiss the hypothesis that the two population

means are equal? The null hypothesis, denoted Ho, is the hypothesis that is

assumed to be true in producing the sample distribution used in testing the null

hypothesis. Typically, the null is no difference hypothesized between the

populations. The alternative hypothesis, H1, is assumed to be true when the null

is false. It typically posits a difference between the means.

The exact procedures to execute the test vary, depending on the particular

assumptions and samples underling the test. The computations are explained in

© Timothy A. Judge, 2013

Page 18: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 17 of 57

most introductory statistics texts, or conducted on computer (see section VI).

Suffice it to say that a t-statistic is calculated (in place of a z-score because is

unknown) and compared to the t-distribution.3 Given, as explained above, that the

sample means will probably differ, it can mean two things. The difference could

simply be due to sampling error or chance variation because we do not have a

perfect picture of the population. On the other hand, it could be indication that the

two population means are in fact not equal and the difference is not due to error.

Convention is to use .05 (5 chances out of 100 that the difference arises by chance

variation if there is no true difference) as the probability level at which we would

reject the null hypothesis that the means are not equal. A t-statistic of 2 is a good

benchmark, as the probability of observing a t-statistic of 2 is about .05. To be sure,

5 times out of 100 we can expect to be wrong in rejecting the null of equal means.

However, .05 is a point at which most are willing to chance a mistake in order to

make inferences about the true nature of events.

Errors

Effective management of human resources necessitates the use of statistics

to make "best guesses" about the true state of affairs when incomplete information

and measurement error exists. Obviously, these educated guesses are not always

correct. In statistical lexicon, mistakes that arise from erroneous inferences are

termed Type I and Type II errors.4

3 We are allowed to compare the mean value to the t-distribution because we can assume the means are approximately normally distributed through the Central Limit Theorem.4 There is nothing magical (or, according to some) even logical about the p < .05 standard. The origin of this p-value is one of the towering figures in statistics, Sir Ronald A. Fisher. In 1925, Fisher suggested the use of a boundary between significance and nonsignificance that was based on probability. Fisher set this boundary at p = .05; its widespread adoption has led many to question the wisdom of the standard in theory and in practice (see Cohen, 1994).

© Timothy A. Judge, 2013

Page 19: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 18 of 57

If the null hypothesis is true, but Jack rejected it, he has made a Type I error.

This is also represented by the Greek letter ("alpha"), or the significance level.

When one makes a Type I error, the means differed by a significant amount, but the

difference was due to chance variation (sampling error). This is not the only

mistake Jack needs to concern himself with. He could also make a Type II error, or

falsely accepting the null hypothesis of equal means when they are in fact not equal.

This error is represented by the Greek letter ("beta"). When one lowers the

probability of rejecting a true null (decreases ), it is more likely that one has

accepted a false null (increases ). For most decisions, it is best to make it difficult to

reject the hypothesis the weight of past evidence supports (the null). That is why

is generally set quite low (and thus increasing ). However, one must be aware of

both errors. Each can be costly. And, all else equal, decreasing one error increases

the probability of committing the other.

Figure 4

Results of Hypothesis Tests

NATURE OF NULL

DECISION Ho true Ho false

Accept Ho

Correct

(1-)

Type II error

()

Reject Ho

Type I error

()

Correct

(power)

Figure 4 illustrates the decisions and results. The probability of accepting a true

null is equal to 1-. On the other hand, rejecting a false null, the other correct

© Timothy A. Judge, 2013

Page 20: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 19 of 57

decision, is 1- and is often referred to as the power of the test. Alpha and beta are

as previously defined.

Correlation

Remember one of the questions in Jack's mind was how to hire a store

manager. Suppose a friend of Jack’s—Sallie—gave him a dataset from the lifeguard

service she manages (in reality, the data in Figures 5 and 6 are actually on

lifeguards). Sallie’s data shows a relationship between a lifeguard’s personality and

his or her leadership effectiveness. Graphically, the relationship might look like

Figure 5 for Sallie’s lifeguards. Each point on the graph, called a scatterplot,

represents a lifeguard, having both a score on extraversion and a rating of

leadership effectiveness. By visual inspection one could see that there is a positive

association between extraversion and leadership. Those who are extraverted seem

to make better leaders. However, it is important to have a precise numerical

measure of the association between two variables. A correlation coefficient is a

standardized (controls for differing levels of variance) measure of linear covariation

between two variables. The population correlation, like the population mean and

standard deviation, is unknown and must be estimated from sample data. The

sample correlation coefficient is calculated by the following formula:

r xy=∑(x−M x¿)( y−M y )

√∑ ¿¿¿¿¿

With standardized values (z scores), the equation simplifies to:

r xy=(z x z y)

n

© Timothy A. Judge, 2013

Page 21: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 20 of 57

The correlation can range from +1.0 (perfect positive relation between the

two variables) to -1.0 (perfect negative relation). A correlation of 1 indicates that

knowing the value of one variable allows exact determination of the other's value. A

correlation of 0.0 signifies no relationship between the variables, indicating that

knowing the value of one variable gives us no information about the value of the

other. In the extraversion and leadership example above, the correlation is +.42,

consistent with the visual inspection of Figure 5.5

Let’s say Jack also received data from Margaret’s hardware store—in this

case, prediction of the degree to which the employees engaged in counterproductive

work behaviors. This variable of interest—counterproductive work behaviors—is

graphed with conscientiousness in Figure 6. Each data point represents an

employee with a score on conscientiousness and a supervisor rating of the degree to

which the employee engages in counterproductive work behaviors. A visual

inspection gives one the impression that the variables are negatively related. To

point, the correlation is .41. Higher levels of employee conscientiousness are

associated with lower degrees of counterproductive behaviors (as perceived by the

Figure 5

The Relationship Between Extraversion and Leadership

5 The reader can be forgiven for underestimating the correlation in Figure 3 from a visual inspect of the graph. As Hunter and Schmidt (2004) note, when interpreting raw data, we tend to underestimate the true relationship and overestimate the variability in that relationship (in other words, think the data are “all over the place” when in fact there is a consistent relationship).

© Timothy A. Judge, 2013

Page 22: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 21 of 57

employee’s supervisor). From Figure 6, Jack might interpret these data as indicating

that when staffing the ice cream shop, he should give applicants a personality test

(to assess conscientiousness). From Figure 5, Jack might wish to give a measure of

extraversion to those individuals he is considering for store manager. (Shortly, we

will address a question that might come to mind: Can we have any confidence that

validity for one organization or one type of job [in this case, lifeguards or hardware

store employees] would generalize to another organization or another job type [in

this case, ice cream shop employees or store manager]?)

Figure 6

The Relationship Between Conscientiousness and Counterproductive Work Behaviors

© Timothy A. Judge, 2013

Page 23: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 22 of 57

Given many possible correlation coefficients based on many different

possible samples from the population, how does one determine if there is a "true"

relationship between the variables? In much the same way as comparing means, we

may test the hypothesis of no relationship between the variables (correlation

coefficient equal to zero) against the alternative of a significant relationship. As in

comparing population means, a test statistic is calculated (here rxy), compared to a

probability distribution (generally the t-distribution) and a probability level

derived. If the probability is less than the significance level, the hypothesis of no

relationship between the variables is rejected. In such a case we would conclude the

"true" relationship is likely to be other than zero.

© Timothy A. Judge, 2013

Page 24: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 23 of 57

The larger the sample size, the easier it is to achieve a significant correlation.

For example, a correlation of rxy=.97 is not significantly different from zero at

the .05 level when the sample size is 3. However, when n=100 a correlation of

rxy=.19 is significant.6

Squaring the correlation coefficient, or r2, represents the proportion of total

variance of one variable explained by the other. Therefore, Jack's correlation of .38

between pay and performance represents 14% of the variance in performance

explained by variation in pay. It also leaves 86% unexplained by pay (explained by

other factors). When trying to predict what a person will do in the future, errors are

common. This simply serves to illustrate that human behavior is somewhat

unpredictable. Thus, it is relatively rare for one variable to explain a majority of

variance in another. This issue will be revisited in subsequent sections.

Regression

Suppose Jack has operated the store for a year and now wants to estimate his

staffing needs for the upcoming summer ice cream rush. Jack could use past data on

the daily high temperature and the estimated number of workers required that day

(recorded each day over the last year) to predict his staffing requirements for the

upcoming summer. Regression, a prediction of the level of one variable based on

the level of one or more other variables, is perfectly suited for this type of problem.

Suppose Jack had past data on demand for ice cream and numbers of workers

6 The significance test for the correlation coefficient relies on the assumption that the population values of both distributions are normally distributed. When this assumption is in doubt or the sample size is small, one should use the Spearman's rank-order correlation coefficient. The computational formula is contained in nearly all statistics texts.

© Timothy A. Judge, 2013

Page 25: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 24 of 57

required for the past year. Figure 7 represents these values. Each data point

represents a day in the past year when Jack recorded the daily high temperature

and wrote down his estimate of the optimal number of employees on that day. The

line fitted through the data is called a regression line, which represents the "best fit"

line, as the squared deviations from the mean line are the least of all possible

straight lines. It represents the prediction line for the number of workers

demanded for a corresponding high temperature. From this line, the number of

workers Jack needs to hire, based on the forecast high, can be projected.

In regression, the dependent variable is the variable whose value is

influenced (or depends on) the value of another. In this case, the dependent

variable is the number of workers demanded (total number of workers needed to

staff three shifts). The independent variable is that which induces changes in the

dependent variable. Here, the independent variable is the daily high temperature.

The regression line is estimated by:

y=a+bx+e

Where: y =score on dependent variablea =intercept valueb =slope of the regression line (regression coefficient)x =score on independent variablee =error term

Like all other statistics, the population regression equation must be

estimated from sample data. Errors result when the regression line does not

perfectly

Figure 7

Predicted Demand for Workers

© Timothy A. Judge, 2013

Page 26: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 25 of 57

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

f(x) = 0.291576482950323 x − 7.0080572872727R² = 1

Regression Line Fit Plot

Estimated Workers Need Over 3 6-Hour ShiftsLinear (Estimated Workers Need Over 3 6-Hour Shifts)Predicted Estimated Workers Need Over 3 6-Hour Shifts

Daily High

Estim

ated

Wor

kers

Nee

d O

ver 3

6-H

our S

hifts

predict values of the dependent variable. In our example the error term includes all

factors other than temperature that influence demand for workers. The estimated

regression function is y = 7.008 + 0.2916x, where y is the predicted value.

Accordingly, for any given value of x (i.e., daily high temperature) we can predict y

(the number of workers required). For example, if the daily high is 60 degrees, Jack

will need an estimated 10-11 workers on his payroll (the actual predicted value is

10.487 workers). If the high temperature is 90 degrees, Jack will need a predicted

19 (exact predicted value = 19.234) workers. The slope value indicates that a 1 unit

© Timothy A. Judge, 2013

Estimated workers is based on the past year’s data, when on that day Jack wrote down an estimate of the optimal number of workers needed that day.

Page 27: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 26 of 57

change in x induces a b unit change in y. In our example, an increase of 10 degrees

leads to approximately 3 more workers required.7

Several assumptions are required for regression analysis: the independent

variables and error terms are uncorrelated; the mean of all errors is 0; all errors

have equal variances; and the errors are not correlated with one another. The

implications of violating these assumptions are discussed in Kennedy (2008).

It would be useful to determine what proportion of total variability in the

dependent variable explained by the regression of Y on X. The coefficient of

determination, denoted R2, is the proportion of total sample variability of the

dependent variable explained by the independent variable. It is calculated by

dividing variability explained by the independent variable by total variability (which

is the variance of Y). For example, R2=.68 in the equation in our example, meaning

68% of the variability in number of ice cream workers required is explained by its

linear dependence on consumer demand for ice cream. In "simple" regression (one

independent variable) such as this, R2 = rxy2. When predicting human thoughts,

feelings, or action, one generally has to settle for less variance explained. People are

complicated.

As with other statistics, we are able to test the b coefficient against zero to

determine if the independent variable is a significant predictor of the dependent

variable.8 We do this by dividing the coefficient estimate by its standard error

7 When using standardized variables, the intercept drops out (remember z scores have a mean of zero), and the b coefficient represents the correlation between the dependent and independent variable.8 In order to do this, it is necessary to assume that the prediction errors, e, are normally distributed. This assumption is also dealt with in Kennedy (2008).

© Timothy A. Judge, 2013

Page 28: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 27 of 57

(remember because the population regression coefficients are estimated with

sample data, and because the prediction is not perfect, they are estimated with

error). Calculation of b coefficients is quite laborious and is therefore conducted

using computer packages (see section VI). The null hypothesis is generally b=0 (a

slope of zero), indicating no relationship between the variables. Once the test

statistic is calculated, it is referred to the t-distribution. If the statistic is large

enough to be statistically significant, the null is rejected and it is asserted that values

of Y significantly depend on values of X (or that X significantly predicts Y).

Multiple Regression

Remember the example from a few pages earlier regarding the effect of

conscientiousness on counterproductive behaviors? Jack observed a correlation of

-.41 and concluded that hiring conscientious individuals should reduce

counterproductive behaviors such as absence, lateness, theft, etc. However, this

conclusion might be suspect without considering the job held by the individual.

Individuals in higher-level positions (like managers) may be less likely to engage in

counterproductive behaviors—taking a day off may simply leave more work for the

next day. Therefore, job level might confound the relationship between

conscientiousness and counterproductivity. Luckily, there is a procedure that

allows us to control for other influences when investigating the relationship

between two variables. Multiple regression, as a generalization of simple

regression, allows investigation of multiple influences on the independent variable.

The general form of the equation can be represented as:

Y=a+b1x1+b2x2+...+bkxk+e

© Timothy A. Judge, 2013

Page 29: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 28 of 57

Where x1,x2,...,xk represent 1 through k independent variables; all other terms are

as previously defined.

The interpretation of the effect of an independent variable is similar to

simple regression, except that it now measures the effect of one variable holding the

others in the equation constant. Each regression coefficient in multiple regression is

known as a partial regression coefficient because it expresses the partial effect of

the coefficient on the dependent variable.

The power of multiple regression to the human resource manager should not

be underestimated. By controlling for the influence of all variables the investigator

wishes to specify, it allows inferences regarding the influence of one independent

variable on the dependent variable, controlling for the effect of other possible

influences. In our earlier example, it is possible to investigate the effect of

conscientiousness on counterproductive behaviors controlling for job held. In other

words, for those having the same position in the organization, what is the effect of

conscientiousness on counterproductivity?

Multiple regression is ideally suited for prediction based on multiple sources

of information. For example, suppose Jack decided to predict job performance

based on two selection predictors, collected data on the predictors and the criterion,

and estimated the following regression equation with his sample data:

Y=10+.3X1+.6X2

Jack may then use this equation for future selection decisions. For example, Jack

may wish to predict subsequent job performance on an applicant who scored 50 on

© Timothy A. Judge, 2013

Page 30: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 29 of 57

test 1 and 80 on test 2. Assume 65 is the minimum acceptable performance rating.

The applicant's predicted job performance is:

Y=.3(.50)+.6(.80)=73

Thus, this applicant would be predicted to be successful, albeit marginally, on the

job. If Jack needed to fill 25 positions, he would probably hire the highest 25

predicted job performances.

It is often held that because the weight on X2 is greater than X1 it is a more

important predictor of the dependent variable (e.g., job performance). This is an

incorrect assertion because the variables are measured in different units. For

example, measuring pay in dollars versus thousands of dollars would yield a

coefficient one thousand times smaller even though the relationship is no different.

Regression with standardized variables eliminates this problem as all the variables

are forced into the same units. In fact, with standardized variables, each regression

coefficient is equivalent to a partial correlation coefficient between the particular

independent variable and the dependent variable. Therefore, it provides

information on the strength of the separate relationship between the independent

and dependent variables, partialling out (e.g., holding constant) the effect of the

other variables. Squaring the partial correlation coefficient indicates the proportion

of variance in the dependent variable explained by the independent variable, once

the influence of the other variables is removed. In our example, if once standardized

X2 had a larger coefficient than X1, X2 would explain more variance in performance.

© Timothy A. Judge, 2013

Page 31: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 30 of 57

Thus, without generalizing beyond the sample, X2 would be a stronger (more

important) predictor of the dependent variable.

The coefficient of determination in multiple regression has a comparable

interpretation to simple regression. R2 reflects the proportion of total variance in

the dependent variable explained by the set of independent variables. For example,

R2=.50 indicates that 50% of the variance in the dependent variable is explained by

the independent variables.9

III. PROBLEMS IN ESTABLISHING CAUSALITY

One must be cautious in attributing causality using correlation and

regression. By themselves, they do not separate causality between variables.

Consider a correlation on might find between pay and performance such that those

who earn more have higher performance ratings. How does one interpret this?

High performers are generally paid more for their accomplishments (

performance→ pay). However, high pay also serves as an incentive to greater efforts

(pay→ performance). Thus, in this example it is impossible merely looking at a

correlation or regression coefficient to attribute causal direction. In such cases,

tighter controls, either in research designs or statistical controls, are needed before

causal inferences can be drawn (see Schwab & Trevor, 2012, for further discussion).

9 Non-linear regression models can be estimated, often with a substantial increase in prediction. For example, one can see that the scatterplot in Figure 7 is not linear—as you might expect, changes in temperature lead to greater differences in estimated demand for workers at high temperatures than at low temperatures (i.e., the difference between a high of 80° and 70° leads to a greater change in workers needed than the difference between a high of 30° and 20°). The distribution is exponential, and there are various ways to model such distributions (see Kennedy, 2008).

© Timothy A. Judge, 2013

Page 32: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 31 of 57

IV. MEASURING INDIVIDUAL DIFFERENCES

From Plato to Darwin to managers in search of productive workers, the

fundamental differences in individuals has at once been an obvious fact and a source

of fascination. The first task of a manager making differentiations between people

(whether for hiring, compensating, training, or appraising employees) is to measure

the differences. Measurement is the assignment of numbers to objects, attributes,

or events. In many cases, measurement is both critical and difficult. In trying to

assess human thought and behavior, measurement is particularly difficult. Two

central means of evaluating the quality of our measures are reliability and validity.

Each will be explored in turn.

Reliability

Remember Jack's concern whether his judgment when interviewing

applicants would be consistent with others? For example, if Jack's assistant

manager also interviewed applicants, to what degree would their evaluations agree?

This is an issue of reliability, or the consistency or reproducibility of a measuring

instrument. If Jack found that their judgments were often quite different, Jack might

question the reliability of their evaluations, and the usefulness of the procedure. A

test, set of evaluations, or survey items that do not correlate well with themselves

can hardly be expected to correlate with any variable of interest. Thus, reliability is

an essential starting point in measurement and statistical analysis.

Reliability theory posits that variation in scores, for example on an

employment test, appraised performance, or job satisfaction survey, is composed of

© Timothy A. Judge, 2013

Page 33: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 32 of 57

variation in "true" scores (i.e., reflecting variation in true ability or performance)

plus variation due to error in the measuring instrument. Or,

2 = 2t + 2e

Where:

2 = total variance in scores, as defined earlier

2t = variance in "true" scores

2e = error variance

The more total variance is due to true differences between the individuals and less

to inconsistencies (which produce variance) in the measuring instrument, the more

reliable the measuring device.

In classical reliability theory, the reliability coefficient is represented as:

r xx=σ t

2

σ2=1−σ e

2

σ 2

The higher proportion of "true" variance to total variability (or lower proportion of

error to total variance), the higher the reliability of the measuring instrument. Just

as r2 tells us the percentage of total variance shared by the variables, and R2

indicates the proportion of variance in the dependent variable explained by the

independent variable(s), the square of the reliability coefficient, theoretically,

reveals the proportion of total variance in the measured variable due to "true"

differences in individuals. If we had true scores, we could calculate reliability in this

manner.

© Timothy A. Judge, 2013

Page 34: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Test #1

Test #2

80%

80%20%

20%

Measurement and Statistics Primer Page 33 of 57

Figure 8

A Tale of Two Tests

Variability alone does not determine reliability; it is the proportion of true

variance to total. For example, Figure 8 shows two tests with the same level of total

variance, 2. Yet test 1 is much more reliable than test 2, as 80% of the total

variance in test 1 is due to variation in individual characteristics ("true" variance)

and only 20% due to error. However, in test 2, only 40% is "true" variance, and

60% measurement error.

In practice, since true scores are never known, reliability must be estimated

from the data obtained from our measuring instruments. One of the more obvious

means of estimating reliability is test-retest, where the same form of a test is

© Timothy A. Judge, 2013

Page 35: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 34 of 57

administered twice to the same applicants (after a suitable time period) and the two

scores are correlated. One potential drawback of the test-retest estimate is any

variable that influences one administration and not the other will reduce reliability.

Another problem with the test-retest method is that the individual may remember

responses from the first test or assessment, or consistently guess in the same

manner on both tests.

Perhaps the most popular method of estimating reliability is internal

consistency, which holds that items from the same test should predict the total score

equally well regardless of where they are placed in the test. One approach is to

correlate one half of the test with the other half, a split-half reliability. Because

reliability increases with test length and the split-half method cuts length in half, the

obtained correlation is a conservative estimate of the true reliability of the test. The

Spearman-Brown prophecy formula is often used to correct for this reduced

reliability:

r11=2r xx

(1+r xx)

Where r11 is the corrected correlation and rxx is the correlation between the

halves. Perhaps the most sophisticated measure of internal consistency is

Cronbach's alpha (Cronbach, 1951), which yields the mean correlation between all

possible half-splits. Cronbach's alpha is available on most computer packages (see

section VI). It can be calculated manually with the following formula:

∝= N×r1+([N−1]×r )

Where = coefficient alpha

© Timothy A. Judge, 2013

Page 36: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 35 of 57

N = number of items in measure

r = average correlation among items

For example, if Jack wished to measure extraversion with a 10-item scale,

and the average correlation among those 10 items was r=.40, then is:

∝= 10× .401+([10−1]× .40)

= 4(1+3.6)

= 44.6

=.87

What is an acceptable level of reliability? It depends on several factors.

Though most researchers appear to adhere to a “ .70 is acceptable, .80 is

good” rule, such simplistic rules do as much harm as good. For example, longer tests

can be expected to be more reliable than shorter tests. Internal consistency

estimates can also be expected to be higher than inter-rater estimates. A coefficient

alpha of .70 on a long item test might be considered to be marginally reliable,

whereas a correlation of .60 between interviewer judgments might be thought of as

quite good. Reliabilities below .50 are seldom considered adequate regardless of

the method used to estimated reliability.

There are many factors that influence the reliability of a measuring

instrument. As mentioned earlier, large sample sizes (more is known about the

population) and number of test items or raters (using 10 predictors to select people

is more likely to yield a consistent estimate of their ability than a single item)

increase reliability. Finally, heterogeneity in the individual difference being

measured serves to increase reliability, as there is more variance to be explained.

Standard Error of Measurement

The standard error of measurement indicates the degree of error expected

in an individual's score. If an individual were to take the test (or be evaluated)

© Timothy A. Judge, 2013

Page 37: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 36 of 57

many times, his or her scores would vary, and we expect that variance to follow a

normal distribution. More scores should be near the individual's true score than far

away. The mean of this distribution is the individual's true score, and the standard

deviation is the standard error of measurement (abbreviated meas). The meas

represents the average error in the measurement device. As with all normal

distributions, 68% of the scores lie within 1 standard deviation of the mean, 95%

within 2, and so on. The standard error of measurement may be expressed as:

meas = x (1 r‒ xx)

As one can see, meas is determined by both the variance of the scores and the

reliability of measurement. If reliability is perfect (rxx=1.0), there is no error in

estimating an individual's true score. Perhaps the most important use of meas for

human resource managers is that it enables us to make inferences about true scores.

For example, if the standard deviation on Jack's employment test is x=4, and

reliability for the test is rxx=.80, then meas=1.79. If an individual scores 80, Jack can

be 68% confident that the individual's true score is within 1.79 point of their

obtained score (roughly between 78 and 82), and 95% confident that their true

score is between 76.5 and 83.5 (3.58 points). This also provides useful information

in determining whether two scores are significantly different. If the lower limit of

the higher score is above the upper limit of the lower score, then we can conclude

the two scores are significantly different. For example, following the example above,

if one applicant scored 80 and another scored 72, Jack can be 95% confident that the

two scores are different (that the first applicant truly has a higher score).

© Timothy A. Judge, 2013

Page 38: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 37 of 57

Validity of Measures

Suppose that Jack and his assistant manager each interviewed applicants and

then rated them on a 1 to 10 point scale. Jack found that the correlation between

their ratings was r=.75. One might be tempted to conclude that Jack and his

assistant must do a good job of selecting applicants since they have fairly consistent

evaluations. However, reliability of measurement does not necessarily imply

accuracy of judgment. For example, weight can be measured quite reliably, but

surely is not an accurate predictor of performance for most jobs. Similarly, while

Jack and his hand-picked assistant's judgments are consistent, it could be because

they both evaluate applicants on criteria not strongly related to job performance

(e.g., appearance).

The above example illustrates the importance of validity in human resource

management. Validity refers to how well the instrument measures or predicts the

criterion. If we have information from a measurement device, how much does that

information help in predicting the criterion of interest? If the highest (lowest)

scores on a predictor always led to the highest (lowest) scores on the criterion, our

predictor would be perfectly valid. Unfortunately, in practice this does not occur.

The question then becomes: how does one go about designing a measure to be as

valid as possible and evaluating if a given measure is valid? Strategies used to

establish validity depend on both the specific use of the measuring instrument and

the data collection constraints imposed on the organization.

The primary validation strategies can be classified as either empirical or

logical. Empirical strategies estimate the validity of a procedure by examining the

© Timothy A. Judge, 2013

Page 39: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 38 of 57

correlation or regression coefficient between the predictor and the criterion. High

correlation coefficients imply high validities.10 The most important empirical

strategy is criterion-related validity. Logical strategies establish validity by

evaluating how well the measuring device samples the criterion. The important

logical strategy is content validation. Face validity is an informal method, neither

logical nor empirical. Construct validity is actually a combination of empirical and

logical strategies that enable us to understand the factors that cause variation in the

criterion. Each will be explained in turn.

Criterion-Related Validity

Criterion-related validation is employed when one wishes to quantitatively

estimate the relationship between a predictor and the criterion. For example, if Jack

were to relate (using correlation or regression) interview or test scores to job

performance in evaluating the accuracy of the predictor, he would be using a

criterion-related strategy. Those predictors explaining the most variance in the

criterion are the most valid and will be preferred. There are two specific variants of

criterion-related strategies: predictive and concurrent.

Predictive Validation

In predictive validation, the predictor is measured at one point in time and

information on the criterion is gathered at a later date. Then, the two sets of

information are correlated. Perhaps the "purest" way to conduct predictive 10 The following formula is used to estimate the validity if there was no measurement error (reliability was perfect):

r xy=rxy (obs)

√rxx √r yy

Where rxy=estimated true correlation; rxy(obs)=observed correlation; rxx=reliability of predictor;

ryy=reliability of criterion. This is the best estimate of the true validity of the measure.

© Timothy A. Judge, 2013

Page 40: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 39 of 57

validation in selection decisions, for example, is to gather information on the

predictor and then select applicants on the basis of some other predictor.

For most organizations this "pure" method of validation is impractical. It is

costly to administer a test when no direct result is forthcoming. This is particularly

true in the case of the purest predictive design, which would require hiring all

applicants. A more realistic but still costly method would entail giving applicants

the test but ignoring the results from this test when making hiring decisions. A

problem with both of these methods is that managers need validation information

quickly to avoid costly mistakes in the immediate future. “If the test you want me to

use is so good,” Jack might ask, “Why can’t I use it now?”

Imagine if we used a predictive validation design whereby we administer the

measuring device (e.g., an employment test) to applicants, select on the basis of

those scores, and later correlate predictor scores with measures of job performance.

Why is this problem? The primary problem with this strategy is that the correlation

underestimates the true relationship between the test and performance because of

restriction of range in the predictor. Because only those who scored above the

cutoff point on the predictor were hired, we never know how those who were not

hired would have scored on the criterion (job performance). Figure 9a shows a

"true" relationship between the test and job performance of r=.56. If the

organization were to select on the basis of test scores, Figure 9b indicates because

the range is restricted, only information to the right of the cutoff Xc is considered,

and the obtained correlation coefficient would drop to r=.19 even though the true

© Timothy A. Judge, 2013

Page 41: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 40 of 57

Figure 9

Effect of Range Restriction on Observed Correlation

Figure 9a. Relationship Without Range Restriction (rxy=.56)

70 80 90 100 110 120 130 1401

2

3

4

5

6

7

8

9

10

Validity of Test

Score on Selection Test

Perf

orm

ance

Rati

ng

Figure 9b. Observed Relationship With Range Restriction (rxy=.19)

70 80 90 100 110 120 130 1401

2

3

4

5

6

7

8

9

10

Validity of Test

Score on Selection Test

Perf

orm

ance

Rati

ng

© Timothy A. Judge, 2013

Only those applicants with scores above 100 on the selection test were hired. Thus, in validating the test, performance ratings of those not hired are not available. This range restriction downwardly biases the observed correlation (if they had been hired, the observed correlation would have been r=.57).

Page 42: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 41 of 57

relationship was still r=.56. It is possible to estimate the correlation between the

predictor and criterion if no restriction of the range existed.

The formula to correct estimated validities for range restriction is relatively

complicated. This formula is provided below:

rt=(st

sr)

1−r xy2 +r xy

2 ¿¿

Where: rt = estimated "true" correlation between predictor and criterion

rxy = observed correlation between predictor and criterion

st = standard deviation of predictor for total sample (estimated on

applicant pool)

sr = standard deviation of restricted sample

In essence, this formula estimates what the distribution of test scores and job

performance would have looked like if all applicants were hired. As such, it is a

hypothetical means of projecting what the validity would be if all information was

available.

Concurrent Validation

Perhaps the most expedient method of empirical validation is concurrent

validation. In this case, present employees are administered the employment test,

and their most recent performance ratings are correlated with their test scores.

While this approach is convenient, particularly under time constraints, there are

several potential problems. First, it is not clear that current job holders are as

motivated to do well on the predictor (after all, their employment does not hinge on

performance on the test) as actual applicants for the job. Further, how would those

© Timothy A. Judge, 2013

Page 43: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 42 of 57

who quit or were fired have tested? This restriction in range of the criterion

attenuates the observed predictor-criterion relationship. Perhaps most importantly,

concurrent designs may be biased by the effect of job experience on the test. Almost

certainly individuals learn skills on the job that are related to the skills assessed by

the employment test. One approach to mitigate this bias is to control (using

multiple regression) for experience in predicting job performance.

Content Validity

Criterion-related validation strategies concern the extent to which the

predictor is a significant sign of the criterion. Content validation concerns the

degree to which the measurement device is an adequate sample of the criterion. In

other words, a test is content valid if it adequately represents the criterion of

interest. For example, Jack might consider a test that entails evaluating how nimbly

the applicant scoops ice cream into the cone and serves it to customers content

valid. Though there are metrics or statistics to assess content validity, typically it is

ascertained by subjective judgments.

If one does not use quantitative results to evaluate the content validity of a

test, how does one go about establishing validity? Typically, an expert or experts

evaluate how well the content of the test represents job performance. In short, the

knowledge, skills, and abilities (as identified by a job description and specification)

required to perform the job must be reflected in the test for it to be judged content

valid. Because content validity is judgmental, it is crucial that those who evaluate

the content of the test be experts regarding the job in question, and be supplied with

accurate information on the test and criterion.

© Timothy A. Judge, 2013

Page 44: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 43 of 57

Face Validity

Face validity refers to whether individuals taking the test believe it to be a

valid measure of the criterion. In short, is the test valid on "the face of it?" While

this is an informal and entirely subjective method, it can be very important to

organizations. If applicants view the test as a poor method of selection, using the

test might generate more resentment than it is worth. Although applicants judge a

test as fair or unfair on many grounds, content validity would appear to be one way

to increase face validity. For example, a work sample test (e.g., in Jack’s case, having

applicants scoop ice cream and serve it to customers) would likely be judged to have

high content validity because it samples a key aspect of performance. For the same

reason it should also have high face validity. Thus, content valid tests will almost

always be face valid, although the reverse is not necessarily true.

Construct Validity

Construct validity has as its goal to understand the trait or construct that the

test measures. Because it entails more than prediction or sampling, it is a more

rigorous method of validation. While construct validation can be conducted in many

different forms, several of the more common are: 1) correlations between several

different measures of the construct; 2) expert judgment regarding the

appropriateness of the test in sampling or predicting the underlying construct; 3)

correlational relationships between the measures and behaviors purportedly

manifested by the construct.11

11 There are more advanced methods (such as factor analysis) and concepts (such as convergent and discriminant validity) designed to assess construct validity (see Schwab & Trevor, 2012).

© Timothy A. Judge, 2013

Page 45: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 44 of 57

For example, suppose Jack wished to assess the construct validity of an

integrity test. Does the test allow Jack to understand what integrity is and how well

is it measured by the test? If Jack knew his that his test was highly correlated with

other measures of honesty (#1), rated as appropriate by experts on the subject (#2),

and found a strong negative correlation between his test and stealing (#3), it would

provide some evidence of the construct validity of the measure. While construct

validity is rigorous, the conclusions one can draw about applicants based on the test

are stronger, as one has a better idea of what factors cause the construct.

Cross-Validation

How does one know if a validity coefficient calculated from one sample will

apply to other samples of interest? Cross-validation is the procedure by which one

demonstrates whether a predictor validated from the present sample continues to

be a valid predictor when applied to another sample. Cross-validation is important

in selection because a prediction scheme (for example, weights on various

predictors) is often applied to many samples subsequent to the one in which it was

originally developed. It is crucial, therefore, to investigate how valid this scheme is

on the various samples to which it might be applied.

Cross-validation generally begins by gathering predictor and criterion

information on the current sample and then calculating a correlation coefficient or

regression equation. Next, a separate independent group has predictor information

gathered. These scores are then predicted based on the validity coefficient(s) from

the original sample. Finally, criterion values are correlated. The higher this

correlation, the greater the confidence that the selection method is valid across

© Timothy A. Judge, 2013

Page 46: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 45 of 57

samples. Perhaps the most practical approach to cross-validation is to split the

sample in half, using one half for developing the prediction scheme, and testing the

scheme on the other half. Regardless of the method used, the cross-validated

coefficient can be expected to "shrink" because the original scheme maximized on

the idiosyncrasies of the sample that do not generalize to the other. If the shrinkage

is great, doubt is cast on the ability of the predictor(s) beyond the sample it was

originally based upon.

Validity Generalization

One of the traditional views of personnel psychology was that validities for

employment tests are situation-specific. This was based on empirical results

showing considerable variation in validity coefficients across populations. This

opinion carried great weight in the formulation of early standards and laws

governing employee tests, which advised against borrowing validity evidence from

other populations unless it could be demonstrated that work behaviors and the

organizational context between the populations were very similar.

Schmidt and Hunter have convincingly argued that the specific nature of

validity coefficients might be due to artifacts in the measuring procedures. For

example, small sample sizes, differences in reliability in the predictor and criterion,

or differences in range restriction are only several of the possible factors that

attenuate estimates of validity across samples, irrespective of the true validity.

Schmidt, Hunter, and colleagues have found that nearly all of the variance in validity

estimates is due to these artifacts. Their findings indicate that validity coefficients

are much more generalizable than has typically been assumed. The implication is

© Timothy A. Judge, 2013

Page 47: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 46 of 57

that managers may not be forced to "re-invent the wheel" for their staffing

decisions. They may be able to rely on others who have demonstrated the test to be

valid.12 In fact, meta-analysis, introduced in the next section, will show how the

organization can use findings compiled across many organizations in making human

resource decisions.

V. CONFIRMATORY RESEARCH

Obviously, a central part of a manager's job is to make decisions. But how

can one determine the quality of those decisions? Successful outcomes are the

ultimate standard, but final outcomes (e.g., profitability, market share) give us very

poor information about exactly where decision might be improved. Confirmatory

research enables us to investigate the accuracy of human resource decisions, the

cost of errors associated with particular practices, and how to compile findings in

hope of making better decisions in the future.

Decision Analysis

After Jack institutes his new hiring procedure, he might like to see his

"batting average." Remember from hypothesis testing that we discussed four types

of decisions: accepting the null hypothesis when it is true; accepting the null when

it is false (Type II error); rejecting the null when it is true (Type I error); and

rejecting the null when it is false (power). Decision analysis is another 2 × 2

procedure that provides information on the immediate consequences of human

resource decisions. For the purposes of decision analysis, we assume that the null

hypothesis is that the individual will be considered successful on the job. Accepting

12 Not all courts have accepted this standard.

© Timothy A. Judge, 2013

Page 48: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 47 of 57

the applicant when he or she will in fact be successful is obviously a correct

decision. However, rejecting the applicant when he or she would have been

successful is an error. Rather than labelled a Type I error, in decision analysis such

a mistake is termed a false negative (applicants falsely predicted to be

unsuccessful). Rejecting the applicant who would have been considered

unsuccessful is a correct decision. Finally, accepting an applicant who turns out to

be unsuccessful is a false positive (positive performance was falsely predicted).

Figure 10 shows the scatterplot of predictor-criterion scores from Figure 9,

with a validity coefficient of r=.54. Point Xc represents the cutoff point for predictor

scores (in this case, the cutoff is 94). Applicants scoring to the right of Xc are hired,

those to the left are rejected. Cutoffs are set based on the desired number of

employees hired, minimum qualifications needed, or both factors. Point Ys

represents the minimum performance required to be judged successful on the job

(in this case, the minimum performance baseline is the scale midpoint—5.5 on the

1-10 scale; where the baseline is set depends, of course, on the job, the performance

standards, and so forth). Those above it are considered successful employees; those

below it are not. Applicants in Quadrant I were hired and were above the baseline

(considered successful). Applicants in Quadrant III were not hired and, if they were,

would have been below the performance baseline (considered unsuccessful). Thus,

Quadrants I and III are correct decisions. Applicants in Quadrant II were not hired

but, had they been, would have been above the baseline (considered successful).

Applicants in Quadrant IV were hired, but performed below the baseline. Thus,

© Timothy A. Judge, 2013

Page 49: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 48 of 57

whereas Quadrants I and III represent correct decisions, Quadrants II and IV

represent errors. Applicants in Quadrant II are false negatives. Employees in

Quadrant IV are false positives.13

Setting a cutoff score defines the selection ratio, or the proportion of

applicants hired. It can be calculated using the number of individuals in each

quadrant for the following formula:

Selection Ratio= (I+ IV )( I+ II+ III+ IV )

The lower (higher) the cutoff, the higher (lower) the selection ratio. Because 48 out

of 67 applicants in Figure 10 were hired, the selection ratio is (48/67)=.72.14

The base rate is the proportion of applicants considered successful if all

applicants were hired. It is represented by the following formula:

Base Rate= (I+ II )( I+ II+ III+ IV )

In Figure 10, 45 out of 67 applicants would be considered successful. Therefore, the

base rate is .67 (45/67).

13 Of course, in a predictive validation design (where the selection measure is used to hire from an applicant pool), if the test is used in making decisions, Quadrants II and III are missing (since applicants who scored below the cut line were never hired). However, as discussed previously, there are several options: (1) until the measure is validated, hiring decisions can be made without regard to scores on selection measure; (2) simulated results for those quadrants can be constructed based on range restriction; (3) a concurrent validation design can be used such that the selection measure is given to current employees.14 Selection ratios vary dramatically by job type, industry, and labor market conditions. For example, one would expect a very high selection ratio in hiring packing plant workers in good economic conditions (I worked with one such organization that hired virtually every able-bodied applicant). In contrast, the selection ratio in hiring a professor may be .01, which is precisely what it was with a search committee I chaired in 2012.

© Timothy A. Judge, 2013

Page 50: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

III

III IV

Ys

Xc

Measurement and Statistics Primer Page 49 of 57

Figure 10

Decision Analysis of Predictor-Criterion Scores

Obviously the goal is to eliminate the errors. One way to reduce the overall

error rate would be to choose a more valid selection procedure. A validity

coefficient of 1.0 (a straight line of scores) would lead to no errors. A coefficient of

0.0 (a circle of scores) would lead to as many errors as correct decisions. The

selection ratio and base rate also have implications for errors. Moving the cutoff or

minimum level of acceptable performance decreases one error while increasing the

other. However, there is a point at which total errors are minimized. The highest

© Timothy A. Judge, 2013

Page 51: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 50 of 57

number of correct decisions is where the number of false positives exactly equals

the number of false negatives.

The optimal place to set the cutoff, however, depends on the cost of each

error to organizations. False positives are undoubtedly more salient to managers.

Hiring applicants who later turn out to be poor matches is very visible. Conversely,

those who got away are often unnoticed. A strategy designed to minimize false

positives would mean hiring fewer applicants. Therefore, the balance is between

meeting one's labor force requirements and minimizing those that are incorrectly

hired.

The benefit of decision analysis is not that it makes the staffing decisions for

the manager. Rather, the advantage is that it presents the manager with

consequences of human resource judgments he or she must make. Further, the

natural tradeoff between false positives and false negatives forces managers to

consider the costs of both errors in formulating their selection strategies.

Utility Analysis

It is a truism that profit and loss are the bottom line for most organizations.

Utility analysis concerns the evaluation of implications of human resource (staffing

in particular) decisions on organizations in dollar terms. As such, it is a powerful

means to understand the costs and benefits of decisions managers must make

regarding selection.15

Suppose that Jack wishes to hire 50 employees, and has 100 applicants for

the positions. The selection ratio is .50 (50/100). Jack has the choice of using two

15 Cascio and Aguinis (2010) also analyze the costs associated with other human resource management activities (turnover, absenteeism, training programs).

© Timothy A. Judge, 2013

Page 52: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 51 of 57

different predictors but is unsure of which to use (he cannot afford to use both).

Schmidt, Hunter, McKenzie and Muldrow (1979) provide a framework to analyze

which predictor will yield the biggest dollar improvement over random selection.

Suppose the two predictors Jack is considering are the interview (denoted

P1) and a work sample test that entails scooping ice cream and serving it to the

customer (denoted P2). It costs $250 to interview an applicant and $325 per

applicant to administer the work sample (these costs mostly comprise the staff time

required to interview applicants or administer the work sample to them). In the

past Jack has found a correlation of .30 between his ratings of applicants based on

the interview and job performance, and a correlation of .35 between scores on the

work sample and job performance ratings. If the selection ratio is .50, the average

predictor score of the top 50% of applicants is z=.80 (.80 standard deviations above

the mean).16 The final piece of information Jack needs is the standard deviation of

performance is dollars. Cascio and Aguinis (2010) present several methods for

calculating the standard deviation of dollar-valued performance. The simplest

method is to assume that SDy is 40% of employees’ average annual salary. Assume

that Jack finds the standard deviation to be $6,000, indicating that an employee who

performs one standard deviation above the mean is worth $6,000 more to Jack than

the average employee.

Schmidt et al. use the following formula to estimate the net increase in

dollars to the organization using the selection procedure in question over random

selection:

16 Cascio and Aguinis (2010) provide tables for estimating this figure.

© Timothy A. Judge, 2013

Page 53: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 52 of 57

U=(N s ×r xy ×SD y×z xs )−(N t ×c)

Where:

U = utility (net gain from using selection procedure)

Ns = number of applicants selected

rxy = correlation between predictor and job performance

SDy = standard deviation of performance in dollars

zxs = average standard score on predictor for applicants selected

Nt = total number of applicants

c = cost of predictor per applicant

The net gain over random selection for the interview would be:

U P1=(50× .30×$6,000× .80 )−(100×$ 250 )=$47,000

The net gain for the work sample would be:

U P2=(50× .35×$6,000× .80 )−(100×$ 325 )=$51,500

Although both are a substantial improvement over random selection, it appears that

Jack would be better off using the interview even though the work sample is slightly

more valid. Use of the interview is expected to result in a $4,500 annual net savings

over using the work sample as a predictor.

One can see that the potential payoff from a selection procedure is a function

of several factors. As the selection ratio increases, the utility increases. In fact, if the

selection ratio were quite high, the work sample would lose money compared to

random selection. The validity of the test will also increase the utility. If the validity

for either test were .10, Jack would lose money over using either method over

© Timothy A. Judge, 2013

Page 54: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 53 of 57

random selection. However, since a more valid selection procedure may be more

expensive to administer, one must balance the extra cost against the savings from

better predictors. Finally, as job performance becomes more valuable, it pays the

organization to have a more valid selection procedure.

Meta-Analysis

Remember that Jack wondered how to examine results from other

organizations in formulating his own human resource management policies? He

could rely on information from surveys of other organizations, or he may have

information on his closest competitor(s). However, samples from disparate

populations may be difficult for Jack to assimilate in a systematic manner. Further,

he has no way of determining if his sample is representative. Meta-analysis refers

to the statistical analysis of empirical results accumulated from individual studies.

It allows the collection of data from various studies in an objective and systematic

manner, permitting the manager to make more informed and comprehensive

judgments about the relationship(s) of interest.

The particular methods of meta-analysis vary, depending on the data

available and the preferences of the investigator. The general approach is to

combine findings in a certain manner to arrive at the average result. For example,

suppose Jack had results from 5 organizations on their findings regarding the

relationship between satisfaction with pay and intent to leave the organization.

Their results are described in Figure 11.

Figure 11

Correlation Between Pay Satisfaction and Intent to Leave Organization for 5

Companies

© Timothy A. Judge, 2013

Page 55: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 54 of 57

Company rxy n

#1 -.25 110

#2 -.30 52

#3 -.45 98

#4 -.29 28

#5 -.51 205

How would Jack interpret these findings? We could find the average

correlation between pay satisfaction and intent to leave using the following formula:

r=∑ rxy

nr

Where: r = the average correlation from each

study

nr = the number of studies

For our example the average correlation is:

r=(-.25)+(-.30)+(-.45)+(-.29)+(-.51) = -.36

493

One could also calculate a weighted mean, so that the studies with larger sample

sizes would be given proportionately greater weight (thus eliminating sampling

error). Again using our example:

r=(-.25 110)+(-.30 52)+(-.45 98)+(-.29 28)+(-.51 205) = -.41

493

Based on the result, then, satisfaction with pay explains about 15% of the variance

in intent to leave. If Jack has a problem with turnover he may want to increase

employees' compensation.

© Timothy A. Judge, 2013

Page 56: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 55 of 57

One of the strengths of meta-analysis is that it is possible to combine studies

reporting differing statistics into an overall effect. For example, t-statistics,

correlations, and z-scores can all be transformed into the same metric, enabling

interpretation of the overall relationship despite the differing statistics. The

manager will not often conduct a meta-analysis. However, the increasing

proliferation of the results in professional journals allows the manager to consult

the source for an overall summary statistic in formulating his policies.

Another advantage of meta-analysis is that statistical corrections for study

artifacts can be made. Computing an average correlation weighted by sample size

corrects for sampling error (removing the bias that would be created by giving small

sample correlations or effects the same weight as large sample correlations or

effects). However, other corrections can be made as well, including corrections for

predictor and criterion unreliabilities and for range restriction (each using the

formulae provided earlier).

VI. COMPUTER PACKAGES

The statistics and measurement techniques reviewed in this paper can be

calculated, as they typically are, using computer packages. While the number of

packages available are too numerous to mention, PC Magazine reviewed 49 of the

most popular statistical packages. The editor recommends four advanced packages:

SPSS, Stata, SAS, Minitab, and R. Each performs all the statistics reviewed in this

paper: mode, median, mean, standard deviation, correlation, reliability, correlation,

difference between means, and regression. More advanced statistics are also within

the packages' capabilities. The price of these packages average about $795. The

© Timothy A. Judge, 2013

Page 57: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 56 of 57

article also reviews basic packages that are cheaper and easier for the novice to use.

R is particularly noteworthy because it is free (see http://www.r-project.org/),

though it is more technically oriented (and flexible) than other packages.

Spreadsheets such as Excel also perform all of the basic statistical analyses

mentioned above, though they can be quite cumbersome to use (there are add-ins—

such as EZAnalyze [http://www.ezanalyze.com/]—that make analyzing data with

Excel somewhat easier).

VII. SUMMARY

Statistics are the methods used to summarize data, and to infer knowledge

based upon it. Statistics indicating central tendency describe the typical value of a

distribution. Dispersion indicates how variable the scores are from the mean. Both

dispersion and central tendency can be used for inferential purposes. The normal

distribution is used to make probabilistic inferences about variables following such

a distribution.

These inferences are made based upon the null (e.g., no significant

relationship or difference) and alternative (a significant difference or relationship)

hypotheses. Rejecting a null of no differences indicates an inferred difference

between variables. A correlation coefficient is a standardized measure of linear

association between two variables. High correlations coefficients indicate the two

variables are strongly related. Regression the prediction of one variable based on

the level of one or more other variables.

Measurement is the assignment of numbers to objects, attributes, or events.

The quality of the measuring device directly affects managers. Good measures

© Timothy A. Judge, 2013

Page 58: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 57 of 57

provide important information about the attributes of interest. The two primary

means of evaluating measures are reliability and validity. Reliability indicates the

consistency of the measuring instrument. If a measuring instrument is inconsistent,

serious doubt is cast on its usefulness as a way of gaining information about the

attribute. Standard error of measurement indicates the average degree of error

in the measuring instrument. Validity refers to how well the instrument predicts

the criterion. Valid measures provide much information about the criterion. There

are several different forms of validity and validation, depending on the data and the

goal of the investigator.

There are several ways the manager can investigate and improve on the

quality of his or her decisions. Decision analysis refers to the analysis of mistakes

in human resource decisions. Utility analysis concerns the evaluation of

implications of human resource (particularly staffing) decisions in dollar terms.

Finally, meta-analysis is the empirical analysis of results accumulated from

individual studies.

© Timothy A. Judge, 2013

Page 59: Measurement and Statistics Primer - Timothy A. · Web viewIt would be useful to determine what proportion of total variability in the dependent variable explained by the regression

Measurement and Statistics Primer Page 58 of 57

References

Cascio, W. F., & Aguinis, H. (2010). Applied Psychology in Human Resource

Management (7th ed.). Upper Saddle River, NJ: Prentice Hall.

Cohen, J. (1994). The Earth Is Round (p < .05). American Psychologist, 49, 997-

1003.

Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests.

Psychometrika, 16, 297-334.

Hunter, J. E., &. Schmidt, F. L. (2004). Methods of Meta-Analysis: Correcting Error

and Bias in Research Findings (2nd ed.). Newbury Park, CA: Sage.

Kennedy, P. (2008). A Guide to Econometrics (6th ed.). Hoboken, NJ: Wiley-

Blackwell.

Schwab, D. P., & Trevor, C. O. (2012). Research Methods for Organizational

Studies (3rd ed.). Florence, KY: Routledge Academic.

Schmidt, F. L., Hunter, J. E., McKenzie, R. C., & Muldrow, T. W. (1979). Impact of Valid

Selection Procedures on Work-Force Productivity. Journal of Applied

Psychology, 64, 609-626.

© Timothy A. Judge, 2013