Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of...

24
Class 2: Tues., Sept. 14th • Correlation (2.2) • Introduction to Measurement Theory: – Reliability of measurements and correlation – Example that demonstrates reliability is not the same as accuracy – Validity
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    2

Transcript of Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of...

Page 1: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Class 2: Tues., Sept. 14th

• Correlation (2.2)

• Introduction to Measurement Theory:– Reliability of measurements and correlation– Example that demonstrates reliability is not the

same as accuracy– Validity

Page 2: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Strength of association

• Strength of the association: Measure of how strong is the positive or negative association. Statistical associations are overall tendencies, not ironclad rules.

• If there is a strong association between two variables, then knowing one helps a lot in predicting the other. But when there is a weak association, information about one variable does not help much in guessing the other.

Page 3: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

( A ) O v e r l a y P l o t

0

5

10

15

20

25

Alco

hol U

se

-0.1 .1 .2 .3 .4 .5 .6 .7 .8 .9 1Sensation Seeking

( B ) O v e r l a y P l o t

0

5

10

15

20

25

Alco

hol U

se

-2 -1.4 -0.8 -0.2 .4 .8 1.2 1.8Sensation Seeking

For which data set, (A) or (B), is X more strongly associated with Y?

Page 4: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Correlation• Motivation: We would like a numerical measure of

association. • Correlation (r) : numerical measure of how close X and Y

are to a straight line (the straight line that best summarizes the relationship of X and Y in scatterplot).

• Formula (do not need to use, we use JMP to calculate) • Correlation is always between –1 and 1. Correlations near

–1 indicate strong negative association (X and Y close to a downward sloping line), correlations near 0 indicate little association and correlations near 1 indicate strong positive association (X and Y close to an upward sloping line).

y

in

ix

i

s

yy

s

xx

nr

)()(

1

11

Page 5: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

B i v a r i a t e F i t o f y 1 B y x 1

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

y1

-2 -1.5 -1 -0.5 0 .5 1 1.5 2 2.5

x 1

C o r r e l a t i o n = - 0 . 0 6 B i v a r i a t e F i t o f y 2 B y x 2

-2

-1

0

1

2

3

y2

-2 -1.5 -1 -0.5 0 .5 1 1.5 2

x 2

C o r r e l a t i o n = 0 . 4 1 B i v a r i a t e F i t o f y 3 B y x 3

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

y3

-2 -1 0 1 2

x 3

C o r r e l a t i o n = 0 . 9 2

B i v a r i a t e F i t o f y 4 B y x 4

-2

-1.5

-1

-0.5

0

0.5

1

1.5

y4

-2 -1.5 -1 -0.5 0 .5 1 1.5 2 2.5

x 4

C o r r e l a t i o n = - 0 . 4 0 B i v a r i a t e F i t o f y 5 B y x 5

-2

-1

0

1

2

3

y5

-2 -1 0 1 2

x 5

C o r r e l a t i o n = - 0 . 9 2 B i v a r i a t e F i t o f y 6 B y x 6

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

y6

-2 .5 -2 -1.5 -1 -0.5 0 .5 1 1.5 2

x 6

C o r r e l a t i o n = 0 . 9 9

Page 6: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Computing Correlation in JMP

• Click Analyze, Multivariate Methods, Multivariate. Put variables into Y, Columns for which you want to compute the correlation. This produces scatterplot matrix and matrix of correlations above it.

• In scatterplot matrix, the ellipses contains approximately 95% of the points.

Page 7: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

C orre la tions in the P enn A lcoho l D ata S et

M ultivaria te C orre lations L ife C hange S ensation

S eeking A lcoho l U se

L ife C hange 1 .0000 -0 .0144 0 .0790 S ensation S eeking

-0 .0144 1 .0000 0 .5912

A lcoho l U se 0 .0790 0 .5912 1 .0000 10 rows not used due to m issing va lues. Scatterp lo t M atrix

500

1000

1500

2000

2500

3000

0

0.2

0.4

0.6

0.8

1

0

5

10

15

20

25

Life Change

500 1000 2000 3000

SensationSeeking

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 1.1

Alcohol Use

0 5 10 15 20 25

Page 8: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Properties of Correlation1. Correlation makes no use of distinction between

response and explanatory variables. It makes no difference what we call X and what we call Y.

2. Correlation requires both variables to be quantitative. We can’t compute correlation between religion and alcohol use.

3. The correlation is dimensionless. The correlation does not change if we change the units of measurement (e.g., change inches to feet) of X or Y.

4. The correlation also does not change if we add a constant to X or Y.

Page 9: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

• To summarize the relationship between X and Y numerically , we need to know the means and standard deviations of X and Y in addition to the correlation.

• Example 2.9: Competitive divers are scored on their form by a panel of judges who uses a scale from 1 to 10. We have the scores awarded by the two judges. Ivan and George, on a large number of dives. How well do they agree? We do some calculation and find that the correlation between their scores is r=0.9. But the mean of Ivan’s scores is 3 points lower than George’s mean.

Page 10: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Missing Data• Missing data is a fact of life in most real

data sets.

• In surveys, people often refuse or don’t bother to answer certain questions.

Correlations in Class Survey: 25 total surveys returned Multivariate Correlations Height Haircut

Price Random Number

Expected Study Time

CDs Home town population

Cups of coffee

Height 1.0000 -0.5391 0.0685 -0.1182 0.0071 0.4002 -0.2356 Haircut Price -0.5391 1.0000 -0.1864 -0.1045 0.0323 -0.1699 0.5660 Random Number 0.0685 -0.1864 1.0000 -0.2198 -0.0019 0.1752 -0.0368 Expected Studying Time -0.1182 -0.1045 -0.2198 1.0000 -0.4539 0.1727 -0.0375 CDs 0.0071 0.0323 -0.0019 -0.4539 1.0000 -0.2985 0.0662 Home town population 0.4002 -0.1699 0.1752 0.1727 -0.2985 1.0000 0.0787 Cups of coffee -0.2356 0.5660 0.0368 -0.0375 0.0662 0.0787 1.0000

15 rows not used due to missing values.

Page 11: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

How JMP handles missing data for correlations

• In Analyze, Multivariate, the Correlations Multivariate option that comes up by default uses only units for which all variables listed in Y, Columns are recorded.

• To obtain pairwise correlations that use all the units for which both of the two variables being considered are recorded, click on the red triangle next to Multivariate and click Pairwise Correlations after obtaining the scatterplot matrix.

Page 12: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Multivariate Correlations Height Haircut Price Random Number Expected Studying

Time Home town population

Cups of coffee

Height 1.0000 -0.5420 0.0398 -0.1166 0.4029 -0.2156 Haircut Price -0.5420 1.0000 -0.1571 -0.1051 -0.1732 0.5432 Random Number 0.0398 -0.1571 1.0000 -0.2139 0.1519 -0.0151 Expected Studying Time

-0.1166 -0.1051 -0.2139 1.0000 0.1731 -0.0348

Home town population

0.4029 -0.1732 0.1519 0.1731 1.0000 0.0868

Cups of coffee -0.2156 0.5432 -0.0151 -0.0348 0.0868 1.0000 14 rows not used due to missing values.

Pairwise Correlations Variable by Variable Correlation Count Haircut Price Height -0.5463 23 Random Number Height -0.0458 24 Random Number Haircut Price -0.1704 22 Expected Studying Time Height -0.1862 24 Expected Studying Time Haircut Price -0.0338 22 Expected Studying Time Random Number -0.0958 23 Home town population Height 0.3769 20 Home town population Haircut Price -0.1948 19 Home town population Random Number 0.1503 19 Home town population Expected Studying Time 0.1839 20 Cups of coffee Height -0.2353 25 Cups of coffee Haircut Price 0.5350 23 Cups of coffee Random Number -0.0447 24 Cups of coffee Expected Studying Time 0.0256 24 Cups of coffee Home town population 0.0687 20

Page 13: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

More Properties of Correlation• When looking for relationships between variables,

what size of correlation should we get excited about? Unfortunately, there is no absolute answer. Time series of two variables are usually highly correlated, often above 0.95. In social science research on the relationship between people’s attitudes and their characteristics (e.g., income level), researchers are often excited by a correlation of 0.25.

• Resistant statistic: Statistic that is not strongly affected by a few outliers. Correlation is not resistant.

Page 14: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Nonlinear relationships

• Correlation measures how close the observations in a scatterplot are to a straight line. Correlation is only a good measure of the association between two variables if the mean of X given Y roughly follows a straight line as Y increases.

Page 15: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Correlation and Nonlinear Relationships

Bivariate Fit of Mileage By Speed

5

10

15

20

25

30

35

40

Mile

ag

e

0 102030405060708090100110

Speed

                                                          

                                                                                            

Correlation = -0.005But Strong AssociationMoral: Don’t use correlation to summarize association when relationship is nonlinear.

Page 16: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Correlation and Reliability of Measurements

• Measurement theory: Branch of applied statistics that attempts to describe, categorize, evaluate and improve the quality of measurements. Measurement theory for psychological attributes such as intellectual ability or personality is called psychometrics.

• Reliability of a measurement: The degree of consistency with which a trait or attribute is measured. A perfectly reliable measurement will produce the same value each time assuming the trait or attribute remains constant.

• Validity of a measurement: The degree to which a measurement measures what it purports to measure.

• The reliability of a measurement is often determined by the correlation between repeated measurements of the same trait/attribute.

Page 17: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Reliability of Pulse Measurement

50

60

70

80

90

100

110P

ulse

mea

sure

mnt

1

50 60 70 80 90 100 110

Pulse measurement 2

Correlation = 0.9021

Page 18: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

When is a Measurement Reliable Enough?

• It is often said that a reliability (correlation) of greater than 0.90 is high.

• For example, a reliability of 0.90 for educational tests has been considered adequate to assure the quality of standardized tests and large scale assessment programs.

Page 19: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Shoe Shopping and the Reliability Coefficient

• Citation: This example was developed by David Rogosa of Stanford University.

Dedicated to Al BundyA man who cares as much about good measurement as he does about

his own children.

Page 20: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

What reliability would you assignthis man?

Page 21: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Reliability and AccuracyTry this on1. A population of male and female shoe-shoppers who have true shoe

sizes between size 5 and size 15 (e.g., the small sizes are female feet translated to the male shoe-size scale).

2. Mr. Bundy measures each shopper’s shoe size as either too large or too small with equal probability.

• On a good day Mr. Bundy misses the correct shoe size by one-half size too big or one-half size too small.

• On other days Mr. Bundy misses the correct shoe size by a full size too big or a full size too small.

In each case the shoe size measurement error has mean 0 (overall and at each level of shoe size) and is uncorrelated with actual shoe size.

Page 22: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

• The accuracy of shoe fitting on the good day is poor (as most wearers would notice a half-size misfitting), and on the other days the accuracy is totally unacceptable (as a full-size misfitting would presumably be unwearable).

• The reliability coefficient for Al Bundy on the good day is .973 (better than any standardized test, even though accuracy is poor). The reliability coefficient for Al Bundy making errors of a full shoe size is .902 (comparable to many standardized tests, even though accuracy is unacceptable).

• Moral: Reliability is not the same as accuracy. We’ll discuss more about this and how to measure accuracy on Thursday.

Page 23: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Validity

• Validity of a measurement: The degree to which a measurement measures what it purports to measure.

• To what extent is pulse a valid measure of one’s general state of health.

• What time did you go to bed last night?– Is it reliable?– Is it valid as a measure of what time you actually went

to bed?– Is it valid as a measure of time spent studying?

Page 24: Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.

Summary

• Correlation: – Use: Measure of a certain type of association, how close points are

to a straight line.– Caveat: Check in scatterplot that mean of Y given X is roughly a

straight line as X increases. Otherwise, correlation is not a good measure of association.

– Not good for measuring association that is nonlinear.

• Measurement Theory: – Reliability: The correlation between repeated measurements of

same trait, quantifies how consistent is the measurement. Not the same as accuracy.

– Validity: Does the measurement measure what it purports to measure.