Econ stat1

98
1 Economic Statistics University of St. La Salle Bacolod City

description

 

Transcript of Econ stat1

Page 1: Econ stat1

1

Economic Statistics

University of St. La Salle

Bacolod City

Page 2: Econ stat1

2

Stages of Research Process:

1. Problem Identification2. Generating Hypothesis3. Conducting the Research4. Statistical Analysis (Descriptive and

Inferential Statistics)5. Drawing Conclusion

Page 3: Econ stat1

3

Descriptive statistics Inferential statistics

Mean t-testMedian Analysis of variance (ANOVA)Mode CorrelationStandard deviation Multiple regressionVariance Factor analysisRange Discriminant analysis Chi square Repeated measures ANOVA

Page 4: Econ stat1

4

Page 5: Econ stat1

Economic StatisticsStatistics

-are a collection of theory and methods applied for the purpose of understanding data.

-art and science of collecting, analyzing, presenting, and interpreting data.

Page 6: Econ stat1

Why Study Econometrics?

Economic theory makes statement or hypotheses Theories do not provide

the necessary measure of strength of relationship (numerical estimate of the relationship) & the proper functional relationship between variables.

Example: Law of Demand A reduction in price of a commodity is expected to increase the quantity demanded of that commodity.

to provide empirical verification of theories

Page 7: Econ stat1

7

Economic StatisticsData, Data Set, Elements, Variables and ObservationsData are facts and figures that are collected, analyzed, and summarized for presentation and interpretation.

Data set refers to all data collected in particular study.

Elements are the entities on which data are collected.Variable is a characteristic of interest for the elements.Observation is a set of measurements obtained for a particular element.

Page 8: Econ stat1

8

Economic StatisticsQualitative, Quantitative, Cross-

section and time series DataQualitative data are labels or names used to

identity an attribute of each element.Quantitative data are numeric values that

indicate how much or how many.Qualitative variable is a variable with qualitative

data.Quantitative variable is a variable with

quantitative data.

Page 9: Econ stat1

9

Economic StatisticsCross-sectional data are data collected at the same or approximately the same point in time.Time series data are data collected over several time periods.Pooled data are data with elements of both cross-sectional and time series data.Panel data are data with the same cross-sectional unit, say, a family or firm, and is surveyed over time.

Page 10: Econ stat1

10

Economic Statistics

ITEM 1990

PHILIPPINES 9266287

CAR 165585

ILOCOS 847691

CAGAYAN VALLEY 1164758

CENTRAL LUZON 1910930

S. TAGALOG - A 904297

BICOL 686998

WESTERN VISAYAS 886732

CENTRAL VISAYAS 182940

EASTERN VISAYAS 337459

Western Mindanao 350313

NORTHERN MINDANAO 306069

Southern Mindanao 649812

Central Mindanao 443068

ARMM 203718

CARAGA 225917

Cross-sectional Data: Volume of Palay Production (000MT), Philippines, 1990.

Page 11: Econ stat1

11

Economic StatisticsYEAR REGION QRICE QCORN QSUGAR QCOCO QTOBAC QSPOT QCASVA

74 1 433810 33845 37367 68289 27694 57910 1035175 1 603625 31285 38350 69646 25600 71097 1270876 1 529395 19790 46503 73869 33852 88490 1581777 1 596155 29155 50320 76316 31190 103983 1858778 1 660195 30335 36143 81293 60845 107114 1664079 1 676595 32945 38441 71253 68996 103589 1681980 1 623640 40785 216334 74249 51149 100693 1792881 1 730140 41600 218270 81387 54547 93099 1941482 1 867890 54160 224672 85673 61888 90123 1772883 1 733795 60200 253069 87104 64188 77201 1624584 1 764640 63440 198391 88771 72296 62721 1463385 1 840570 69680 189334 91436 56005 66949 1296286 1 876690 64700 157268 99855 54370 65743 1368687 1 729222 74500 136842 98082 60814 65053 1378288 1 897616 78811 33005 88108 54469 66405 1386389 1 837404 76686 32260 83421 57732 70148 1377190 1 896563 97379 44446 84335 67617 38584 1289191 1 947694 106248 31789 87010 65839 44800 1341492 1 872891 78380 27402 83180 96460 44323 1314093 1 886319 85992 30323 79650 84982 41443 1466494 1 966001 121115 30170 78288 45593 44368 1506495 1 939839 148501 17561 82704 49666 44488 1469696 1 1052784 176874 15594 102112 50884 38025 1495097 1 1137998 216994 13623 95459 49283 28326 1530498 1 908964 232634 0 109484 49058 26772 1549499 1 1151794 203720 0 102334 39466 27243 1579400 1 1280443 195929 0 103868 38164 27320 16143

Time Series Data: Volume of Production (000 MT) of Selected Crops, Ilocos Region, Philippines 1974-2000.

Page 12: Econ stat1

12

Economic Statistics

YEAR REGION QRICE QCORN QSUGAR QCOCO QTOBAC QSPOT QCASVA74 2 669500.00 255170.00 0.00 12850.98 25251.00 31132.60 1341.4975 2 779430.00 300220.00 0.00 18883.58 17295.00 38221.79 1646.9676 2 805030.00 287875.00 0.00 24577.35 14856.40 47572.31 2049.8777 2 866540.00 330380.00 14488.21 32450.32 12500.00 55901.46 2408.7778 2 844310.00 336665.00 94404.66 20446.00 12536.00 56975.00 2335.0079 2 900815.00 338085.00 182568.24 19138.00 12700.00 55920.00 2349.0080 2 775620.00 194415.00 201052.00 20039.00 13571.00 51298.12 2362.3781 2 799815.00 278325.00 275132.00 21154.00 13063.00 49318.28 2293.1582 2 814050.00 259075.00 364464.00 20146.00 13361.00 52396.00 2133.4183 2 786420.00 230060.00 322158.00 19292.00 10951.00 48100.71 1884.0684 2 947270.00 301080.00 243165.00 19896.00 11573.00 16789.82 1721.5185 2 1144755.00 361240.00 231623.00 21103.71 7023.00 17336.10 1975.0486 2 1158825.00 369180.00 124873.00 21714.92 9963.00 18569.20 2204.2087 2 1112431.00 409935.00 82786.00 21517.85 11421.00 17938.50 2263.0988 2 1221445.00 455325.00 66733.00 20192.33 11060.00 18593.50 2200.4789 2 1206537.00 463414.00 62865.00 19159.40 11557.00 16684.04 2112.5690 2 1281471.00 566449.00 55459.00 19395.22 6351.00 15684.00 1624.0091 2 1137064.00 477085.00 68861.00 20490.27 4117.00 15911.00 1395.0092 2 1198382.00 665865.00 56548.00 21739.50 10066.00 18380.00 1401.0093 2 1010573.00 459298.00 89208.00 24049.79 9359.00 18140.00 1391.0094 2 1379936.00 526872.00 93574.00 29282.45 7126.00 20132.00 1643.0095 2 1489157.00 626654.00 91955.00 23125.95 9320.00 19842.00 1899.0096 2 1590645.00 486497.00 156503.00 28391.39 9634.00 31565.00 13066.0097 2 1702006.00 694466.00 241821.00 26691.46 11878.00 33527.00 20116.0098 2 1224493.00 593341.00 193184.00 30377.32 9040.00 25886.00 16514.0099 2 1860273.00 1073854.00 190147.00 28475.03 8914.00 30021.00 17987.0000 2 1968404.00 1001836.00 192020.00 28867.16 7938.00 28161.00 17354.00

Time Series Data: Volume of Production (000 MT) of Selected Crops, Cagayan Valley, Philippines 1974-2000.

Page 13: Econ stat1

13

Economic StatisticsScales of Measurement

The nominal scale has no mathematical value. It is also called a categorical scale. Numbers are assigned to categories of nominal data/variables to facilitate data processing.

An ordinal scale is a measure in which data or categories of a variables are ordered or ranked into two or more levels or degrees, such as from low to high or least to most.

An interval scale has the characteristics of an ordinal scale, but in addition, the distance between points in interval scales is equal.

A ratio scale is almost like the interval scale, except that the ratio scale has a real zero point.

Page 14: Econ stat1

14

Economic Statistics

Scale Description Example

Nominal Categories do not have mathematical values. One is not higher or lower than the other.

Sex: male, female Color: red, white, yellow Civil Status: single, married

Ordinal Categories can be ranked. The difference between the first and the second rank is not the same as the difference between the second and the third ranks.

Degree of malnutrition: 1st degree, 2nd degree, 3rd degree Honor roll: 1st, 2nd, 3rd Level of anger: not angry, very angry.

Interval The data have numerical value. The distance between two points is the same, but there is no zero point or it may be arbitrary.

Body temperature in Fahrenheit: 30 degrees, 40 degrees, 50 degrees Business capital (PhP): 1m, 2m, 3m

Ratio The same as interval data but the zero point is fixed.

No. of children: 0,1,2,3,4 Hrs. spent in studying: 0, 5,10

Descriptions and Examples of the Four Scales of measurement

Page 15: Econ stat1

15

Economic Statistics

Data

Qualitative Data Quantitative Data

Tabular Methods Graphical Methods Tabular Methods Graphical Method

Frequency Distribution

Relative Frequency Distribution

Percent Frequency Distribution

Bar Graph

Pie Chart

Frequency Distribution

Relative Frequency Distribution

Percent Frequency Distribution

Cumulative Frequency Distribution

Cumulative Relative Frequency Distribution

Cumulative Percent Frequency Distribution

Histogram

Scatter Diagram

Page 16: Econ stat1

16

Economic StatisticsFrequency Distribution: Qualitative Data

A Frequency Distribution is a tabular summary of data showing the number (frequency) of items in each of several nonoverlapping classes

Page 17: Econ stat1

17

Economic Statistics

Coke Classic Sprite Pepsi- ColaDiet Coke Coke Classic Coke ClassicPepsi- Cola Diet Coke Coke ClassicDiet Coke Coke Classic Coke Classic

Coke Classic Diet Coke Pepsi- ColaCoke Classic Coke Classic Dr. PepperDr. Pepper Sprite Coke ClassicDiet Coke Pepsi- Cola Diet CokePepsi- Cola Coke Classic Pepsi- ColaPepsi- Cola Coke Classic Pepsi- Cola

Coke Classic Coke Classic Pepsi- ColaDr. Pepper Pepsi- Cola Pepsi- Cola

Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper

Diet Coke Dr. Pepper Pepsi- ColaCoke Classic Pepsi- Cola SpriteCoke Classic Diet Coke

Data From a Sample of 50 Soft Drink Purchases

Page 18: Econ stat1

18

Economic Statistics

Frequency Distribution of Softdrink Purchases

Softdrink Frequency

Coke Classic 19Diet Coke 8Dr. Pepper 5Pepsi- Cola 13

Sprite 5Total 50

Page 19: Econ stat1

19

Economic StatisticsRelative Frequency DistributionA Relative Frequency distribution is tabular summary of data showing the relative frequency for each classRelative Frequency =

Frequency of the Classn

n = number of observations

Percent Frequency DistributionA percent frequency distribution is a tabular

summary of data showing the percent frequency for each class.

Page 20: Econ stat1

20

Economic Statistics

Frequency Distribution of Softdrink PurchasesRelative Percent

Softdrink Frequency Frequency

Coke Classic 0.38 38Diet Coke 0.16 16Dr. Pepper 0.10 10Pepsi- Cola 0.26 26

Sprite 0.10 10Total 1.00 100

n = 50

Page 21: Econ stat1

21

Economic Statistics

A bar graph is a graphical device depicting data that have been summarized in a frequency, relative frequency, or percent frequency distribution.

The pie chart is a graphical device for presenting relative frequency and percent frequency distributions.

Page 22: Econ stat1

22

Economic Statistics

05

10152025303540

Freq

uenc

y

CokeClassic

Diet Coke Dr.Pepper

Pepsi-Cola

Sprite

Soft Drinks

Bar Graph of Soft Drink Purchases

Page 23: Econ stat1

23

Economic StatisticsPie Chart of Soft Drink Purchases

38%

16%10%

26%

10%Coke ClassicDiet CokeDr. PepperPepsi-ColaSprite

Page 24: Econ stat1

24

Economic Statistics

Sex Number PercentMale 45 39.13

Female 70 60.87Total 115 100

Frequency Distribution of Students According to Sex

Page 25: Econ stat1

25

Economic Statistics

Nutritional status Number Percent

Normal 30 40

1st degree malnourished 20 26.7

2nd degree malnourished 15 20

3rd degree malnourished 10 13.3

Total 75 100

Frequency Distribution of Children by Nutritional Status

Page 26: Econ stat1

26

Economic StatisticsFrequency Distribution: Quantitative Data

1. Determine the number of nonoverlapping classes.

2. Determine the width of each class.

3. Determine the class limits.

Page 27: Econ stat1

27

Economic StatisticsNumber of Classes: Five or six classesWidth of the Classes

Approximate Class Width = Largest Data Value – Smallest Data Value

Number of Classes

Class Limits:

The lower class limit identifies the smallest possible data value assigned to the class.

The upper class limit identifies the largest possible data value assigned to the class.

Page 28: Econ stat1

28

Economic Statistics

12 14 19 1815 15 18 1720 27 22 2322 21 33 2814 18 16 13

Audit Times (In Days)

Page 29: Econ stat1

29

Economic Statistics

Audit Time Frequency10- 14 415- 19 820- 24 525- 29 230- 34 1

Total 20

Frequency Distribution for the Audit-time Data

Page 30: Econ stat1

30

Economic StatisticsHistogram for Audit-Time Data

0123456789

10-14 15-19 20-24 25-29 30-34

Audit Time in Days

Freq

uenc

y

Page 31: Econ stat1

31

Economic Statistics

Audits Time (days) Relative Percentage Frequency 10-14 .20 20 15-19 .40 40 20-24 .25 25 25-29 .10 10 30-34 .05 5 Total 1.00 100

Relative and Percent Frequency Distributions for the Audit-Time Data

n = 20

Page 32: Econ stat1

32

Economic StatisticsCumulative Frequency Distribution shows the number of data items with values less than or equal to the upper class limit of each class.Cumulative Relative Frequency distribution shows the proportion of data items with values less than or equal to the upper class limit of each class.Cumulative Percent Frequency distribution shows the percentage of data items with values less than or equal to the upper class limit of each class.

Page 33: Econ stat1

33

Economic StatisticsCumulative Frequency Distribution

Audits Time (days) Cumulative Cumulative Relative Cumulative Percent Frequency Frequency Frequency

Less than or equal to 14 4 0.20 20Less than or equal to 19 12 0.60 60Less than or equal to 24 17 0.85 85Less than or equal to 29 19 0.95 95Less than or equal to 34 20 1.00 100

Cumulative Frequency, Cumulative Relative Frequency, and Cumulative Percent Frequency Distributions for the Audit-Time Data

Page 34: Econ stat1

34

Economic Statistics

Scatter Diagram – is a graphical presentation of the relationship between two quantitative variables.

Page 35: Econ stat1

35

Economic Statistics

Week Number of commercial Sales ($100s)x y

1 2 502 5 573 1 414 3 545 4 546 1 387 5 638 3 489 4 5910 2 46

Sample Data for the Stereo and Sound Equipment Store

Page 36: Econ stat1

36

Economic StatisticsScartter Diagram for the Stereo and Sound Equiptment Store

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

No. of Commercials

Sale

s V

olum

e

Page 37: Econ stat1

Summation Notation

Page 38: Econ stat1

Summation Notation

= S sum of; X is a variable such as family income

Then total family income across N observations is

N

iNi XXXX

121 ...

Page 39: Econ stat1

Summation Notation

Summation of a constant times a variable is equal to the constant times the summation of that variable:

N

iNi kXkXkXXk

121 ...

Page 40: Econ stat1

Summation Notation

Summation of the sum of observations on two variables is equal to the sum of their summations:

N

ii

N

ii

N

iii YXYX

111)(

Page 41: Econ stat1

Summation Notation

Summation of a constant over N observations equals the product of the constant and N:

kNkN

i

1

Page 42: Econ stat1

42

Economic StatisticsMeasures of Central Tendency: Mean, Median and ModeThe mean is the average of all values. It is useful in analyzing interval and ratio data. The mean is derived by adding all the values and dividing the sum by the number of cases.

Example: Achievement can be measured by a score in a 100 item test. Scores of 15 students in the test

82 83 85 87 87 88 90 91 93 93 94 95 95 95 96

  Mean = Sum of 82 + 83 + 85 + 87…96 = 1266/15 = 84.4

Page 43: Econ stat1

43

Economic StatisticsThe median is the value in the middle when the data are arranged from highest to lowest.

For example:

Scores: 82 83 85 87 87 88 90 91 93 93 94 95 95 95 96

Note: For an odd number of observations, the median is the middle value. For an even number of observations, the median s the average of the two middle values.

Scores: 82 83 85 87 87 88 90 91 93 93 94 95 95 95 96 98

Page 44: Econ stat1

44

Economic StatisticsThe mode is the most frequently occurring in a set of figures or value that occurs with greatest frequency.

Example. 82 83 85 87 87 88 90 90 90 91 93 93 96 97 97

Page 45: Econ stat1

45

Economic StatisticsDescribing the Variance in the data (Univariate)

The range is a simple measure of variation calculated as the highest value in a distribution, minus the lowest value plus 1.

Example: 82 83 85 87 87 88 90 90 90 91 93 93 96 97 97

Range = highest value – Lowest value

97 - 82 = 15

Page 46: Econ stat1

46

Economic StatisticsVariance

The variance is a measure of variability that utilizes all the data. The variance is based on the difference between the value of each observation (xi) and the mean. The difference between each xi and the mean (x for a sample , u for a population) is called a deviation about the mean.

Page 47: Econ stat1

47

Economic Statistics

Population Variance

Sample Variance

2

2

N

xi

1

2

n

xxs i

2

Page 48: Econ stat1

48

Economic Statistics

Number of Students Mean Class Size Deviation About Squared Deviation in Class the Mean About the Mean

46 44 2 454 44 10 10042 44 -2 446 44 2 432 44 -12 144

0 256

Computation of Deviations and Squared Deviations About the Mean for the Class-Size Data

64

4

256

1

2

2

n

xxs i

xxi 2xxi

Page 49: Econ stat1

49

Economic Statistics

Standard Deviation The standard deviation is defined as the positive square root of the variance .The standard deviation is easier to interpret than the variance because standard deviation is measured in the same units as the data.

2ss

2

Sample Standard Deviation

Population Standard Deviation

Page 50: Econ stat1

50

Economic Statistics

Number of Students Mean Class Size Deviation About Squared Deviation in Class the Mean About the Mean

46 44 2 454 44 10 10042 44 -2 446 44 2 432 44 -12 144

0 256

64

4

256

1

2

2

n

xxs i

xxi 2xxi

864 s

Page 51: Econ stat1

51

Economic StatisticsThe coefficient of variation is a relative measure of variability; it measures the standard deviation relative to the mean. It is computed as follows

100Mean

Deviation Standardx

100xx

s

Page 52: Econ stat1

52

Economic Statistics

Number of Students Mean Class Size Deviation About Squared Deviation in Class the Mean About the Mean

46 44 2 454 44 10 10042 44 -2 446 44 2 432 44 -12 144

0 256

64

4

256

1

2

2

n

xxs i

xxi 2xxi

2.1810044

8100 xx

x

s

Page 53: Econ stat1

53

Economic StatisticsThe z-score is often called the standardized value. The standardized value or z-score, zi can be interpreted as the number of standard deviation xi is from the mean x. The z-score for any observation can be interpreted as a measure of the relative location of the observation in a data set.

s

xxz i

i

Page 54: Econ stat1

54

Economic Statistics

Z-Scores for the Class-Size Data

Number of Students in Class Deviation about the Mean z- score46 2 2/ 8 = 0 .2554 10 10/ 8 = 1.2542 - 2 - 2/ 8 = - 0.2546 2 2/ 8 = 0.2532 - 12 - 12/ 8 = - 1.50

s

xxz i

i

Page 55: Econ stat1

Economic Statistics

12 14 19 1815 15 18 1720 27 22 2322 21 33 2814 18 16 13

Audit Times (In Days)

nx xi = 19.3

Page 56: Econ stat1

Economic Statistics

Audit Time Frequency10- 14 415- 19 820- 24 525- 29 230- 34 1

Total 20

Frequency Distribution for the Audit-time Data

Page 57: Econ stat1

Economic Statistics

Sample Mean for Grouped Data

n

Mfx ii

Mi = the midpoint for class i

fi = the frequency for class i

n = fi = the sample size

Page 58: Econ stat1

Economic StatisticsAudit Time Class Midpoint Frequency

(Days) Mi fi fiMi10- 14 12 4 4815- 19 17 8 13620- 24 22 5 11025- 29 27 2 5430- 34 32 1 32

Total 20 380

days 1920

380

n

Mfx ii

Page 59: Econ stat1

Economic Statistics

Sample Variance for Grouped Data

1

2

2

n

xMfs ii

Page 60: Econ stat1

Economic Statistics Among the measures of central tendency discussed, the

mean is by far the most widely used.

The mean is not appropriate for highly skewed distributions and is less efficient than other measures of central tendency when extreme scores are possible.

The geometric mean is a viable alternative if all the scores are positive and the distribution has a positive skew.

Page 61: Econ stat1

Economic StatisticsA distribution is skewed if one of its tails is longer than the other.

This distribution has a positive skew. This means that it has a long tail in the positive direction. Distributions with positive skew are sometimes called "skewed to the right”.

Page 62: Econ stat1

Economic StatisticsThe distribution below has a negative skew since it has a long tail in the negative directions,so it is “skewed to the left.

Page 63: Econ stat1

Economic StatisticsThe third distribution is symmetric and has no skew.

Page 64: Econ stat1

Economic StatisticsPersonian Coefficient of Skewness

deviation standard

3 medianmeanSK

Page 65: Econ stat1

Economic StatisticsX Y

595 68520 55715 65405 42680 64490 45565 56580 59615 56435 42440 38515 50380 37510 42565 53

534x 53.96xs

Median = 520

47.51y 11.10ys

Median = 53

deviation standard

3 medianmeanSK

SK = 0.43

SK = -0.45

Page 66: Econ stat1

Economic Statistics

Mean,

Median,

Mode

Mean MeanMedianMedian Mode

Mode

Page 67: Econ stat1

Economic Statistics

Measures of Association Between Two VariablesCovarianceCorrelation Coefficient

Page 68: Econ stat1

68

Economic Statistics

Week Number of commercial Sales ($100s)x y

1 2 502 5 573 1 414 3 545 4 546 1 387 5 638 3 489 4 5910 2 46

Sample Data for the Stereo and Sound Equipment Store

Page 69: Econ stat1

69

Economic Statistics

Scartter Diagram for the Stereo and Sound Equiptment Store

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

No. of Commercials

Sale

s V

olum

e

Page 70: Econ stat1

70

Economic Statistics

Sample Covariance

1

n

yyxxs ii

xy

Page 71: Econ stat1

71

Economic Statistics

2 50 - 1 - 1 15 57 2 6 121 41 - 2 - 10 203 54 0 3 04 54 1 3 31 38 - 2 - 13 265 63 2 12 243 48 0 - 3 04 59 1 8 82 46 - 1 - 5 5

30 510 0 0 99

iyix xxi yyi yyxx ii Calculations for the Sample Covariance

11

110

99

1

n

yyxxs ii

xy

Page 72: Econ stat1

72

Economic Statistics

Scartter Diagram for the Stereo and Sound Equiptment Store

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

No. of Commercials

Sale

s V

olum

e

II I

III IV

3

51

Page 73: Econ stat1

Economic Statistics

Correlation Coefficient

yx

xyxy ss

sr

rxy = sample correlation coefficient

sxy = sample covariance sx = sample standard deviation of x sy = sample standard deviation of y

Page 74: Econ stat1

Economic Statistics

5 1010 3015 50

xi yi

Scatter Diagram Depicting a Perfect Linear Relationship

0

10

20

30

40

50

60

5 10 15

x

y

Page 75: Econ stat1

Economic StatisticsPearson r (Pearson product-moment correlation coefficient)

1

n

zzr yx

xy

Page 76: Econ stat1

Economic Statistics

595 68 0.63 1.64 1.03520 55 - 0.15 0.35 - 0.05715 65 1.88 1.34 2.51405 42 - 1.34 - 0.94 1.25680 64 1.51 1.24 1.87490 45 - 0.46 - 0.64 0.29565 56 0.32 0.45 0.14580 59 0.48 0.74 0.35615 56 0.84 0.45 0.38435 42 - 1.03 - 0.94 0.96440 38 - 0.97 - 1.33 1.30515 50 - 0.20 - 0.15 0.03380 37 - 1.60 - 1.43 2.28510 42 - 0.25 - 0.94 0.23565 53 0.32 0.15 0.05

8010 772 0.00 0.00 12.64

ZxZy ZyZxYX

534x 53.96xs

47.51y 11.10ys

63.053.96

534595

xz

64.111.10

47.5168

yz 90.0

14

67.12xyr

1

n

zzr yx

xy

Data for Calculating the Pearson Product-Moment Correlation Coefficient

Page 77: Econ stat1

Economic StatisticsSpearman rho (p)Applicable to some research studies in which the data consist of ranks or the raw scores can be converted to ranking. Spearman rho is a special case of the Pearson r because rankings are ordinal data.

ranks paired ebetween th difference d

ranks paired ofnumber n

where

1

61

2

2

nn

d

Page 78: Econ stat1

Economic StatisticsX Y X rank Y rank d

595 68 4.0 1.0 3.0 9.00520 55 8.0 7.0 1.0 1.00715 65 1.0 2.0 - 1.0 1.00405 42 14.0 12.0 2.0 4.00680 64 2.0 3.0 - 1.0 1.00490 45 11.0 10.0 1.0 1.00565 56 6.5 5.5 1.0 1.00580 59 5.0 4.0 1.0 1.00615 56 3.0 5.5 - 2.5 6.25435 42 13.0 12.0 1.0 1.00440 38 12.0 14.0 - 2.0 4.00515 50 9.0 9.0 0.0 0.00380 37 15.0 15.0 0.0 0.00510 42 10.0 12.0 - 2.0 4.00565 53 6.5 8.0 - 1.5 2.25

8010 772 0 36.50

D2

122515

50.3661

93.0

07.01

Page 79: Econ stat1

Economic Statistics

Size of Correlation Interpretation0.90 to 1.00 (- 0.90 to - 1.00) Very high positive (negative) correlation0.70 to 0.90 (- 0.70 to - 0.90) High positive (negative) correlation0.50 to 0.70 (- 0.50 to - 0.70) Moderate positive (negative) correlation0.30 to 0.50 (- 0.30 to - 0.50) Low positive (negative) correlation0.00 to 0.30 (- 0.00 to - 0.30) Little if any correlation

Rule of Thumb for Interpreting the Size of a Correlation Coefficient

A correlation coefficient can take on values between –1.0 and +1.0, inclusive. The sign indicates the direction of the relationship. A plus indicates that the relationship is positive; a minus sign indicates that the relationship is negative. The absolute value of the coefficient indicates the magnitude of the relationship.

Page 80: Econ stat1

Economic Statistics

Variable X Variable Y

Pearson r Interval/Ratio

Number of Commercial Salary

Interval/Ratio

Sales Years of Schooling

Spearman (p) Ordinal (Ranking)

Ordinal (Ranking)

Point-Biserial Nominal (Dichotomous)

Gender

Interval/Ratio

Test Scores

Phi (Φ) Nominal (Dichotomous)

Gender Gender

Nominal (Dichotomous)

Political Party Affiliation Issues

Rank-Biserial Nominal (Dichotomous)

Marital Status

Ordinal

Socio-economic Status

Lambda (λ)

Nominal (more than two classification levels)

Level of Education

Nominal (more than two classification levels)

Occupational Choice

Matrix Showing Correlation Coefficients Appropriate for Scales of Measurement for Variable X and Variable Y

Page 81: Econ stat1

Economic StatisticsStudent IQ Ranked Dichotomy

IQ1 103 5 12 94 7 03 117 1 14 112 2 15 89 9 06 93 8 07 99 6 08 107 4 19 87 10 0

10 110 3 1

Page 82: Econ stat1

Economic StatisticsSubject Item Score Test Score

(X) (Y)A 1 10B 1 12C 1 16D 1 10E 1 11F 0 7G 0 6H 0 11I 0 8J 0 5

5 96

X = nominal data with two classification levels (a dichotomous variable). Assignment of value 1 to correct response to item 1 of the 20-item test and value 0 to an incorrect response.

Y = data on the total test scores for ten students

Need to correlate success on one item of a test (the dichotomy—either right or wrong) with total score on the test.

Data for Calculating the Point-Biserial Correlation Coefficient

Page 83: Econ stat1

Economic StatisticsThe point-biserial correlation coefficient

1Y mean of the Y scores for those individuals with X scores equal to 1

0Y= mean of the Y scores for those individuals with X scores equal to 0

ys= standard deviation of all Y scores

p = proportion of individuals with an X score of 1 q = proportion of individuals with an X score of 0

pqs

YYr

ypb

01

The resulting correlation coefficient is the index of the relationship between performance on one test item and performance on the test as a whole.

Page 84: Econ stat1

Economic Statistics 50.050.0

07.3

40.780.11 pbr = 0.716

Subjects scoring high on the total test tended to answer item 1 correctly and those with lower scores tended to answer the item 1 incorrectly.

Page 85: Econ stat1

Economic StatisticsPerson Gender Political Affiliation

(X) (Y)

A 1 1B 1 1C 1 0D 1 1E 1 1F 0 0G 0 1H 0 1I 0 0J 0 0

5 6

1 = FEMALE 1 = PRO-ADMIN0 = MALE 0 = ANTI-ADMIN

Data for Calculating the Phi (Φ) Coefficient

X and Y are nominal dichotomous variables

Page 86: Econ stat1

Economic Statistics

Gender

Male (0) Female (1) TotalsPolitical affiliation Pro-Admin (1) 2 4 6

Anti-Admin (0) 3 1 4Totals 5 5 10

Variable X

0 1 TotalsVariable Y 1 A B A + B

0 C D C + DTotals A + C B + D N

DBCADCBA

ADBC

• Phi (Φ) coefficient

2x2 Contingency Table for Computing the Phi (Φ) Coefficient

Page 87: Econ stat1

Economic Statistics

14321342

1234

= 0.408

This coefficient indicates that there is a low positive relationship between gender and political affiliation. Females tend to be pro-admin and males tend to be anti-admin.

This direction is evidenced by the positive correlation, which indicates that scores of 1 tend to be associated with scores of 1 (1 = female, pro-admin) and zeros (0 = male, anti-admin)

Page 88: Econ stat1

Economic Statistics

Less HS Some College Graduate Totalthan HS Graduate College Graduate Degree

Laborer/Farmers 347 128 84 37 5 601Skilled Crafts 164 277 103 43 36 623Sales/Clerical 30 77 217 147 80 551

Professional/Managerial 2 34 82 198 267 583Total 543 516 486 425 388 2358

Data for Determining the Relationship Between Level of Education and Occupational Choice

Lambda (λ) coefficient

mm

j

j

I

Immimmj

nnn

nnnn

21 1

nmj = largest frequency in the jth column nim = largest frequency in the ith row nm+ = largest marginal row total n+m = largest marginal column total n = number of observation

Page 89: Econ stat1

Economic Statistics

j

jmjn

1

1306267198217277347

j

jimn

1

1108267217277347

nm+ = 623

n+m = 543

n = 2358

543623)2358(2

54362311081306

= 0.394

There is a moderate relationship between level of education and occupational choice. Based on the data, those individuals with more education tend to have sales/clerical or professional/ managerial positions, where as those with less education tend to have laborer/farmer or skilled-crafts positions.

Page 90: Econ stat1

Economic StatisticsPerson Immigrating Rank of Socio-

Generation (X) economic Status (Y)A 1 1B 1 2C 1 3D 0 4E 0 5F 1 6G 1 7H 0 8I 1 9J 0 10K 0 11L 0 12

Data for Calculating the Rank-Biserial Correlation Coefficient

Need to know the relationship between the fact that an individual is at least a second-generation American (X) and socio-economic status (Y).

The X variable (immigration status) is considered a nominal dichotomy ( 0 = less than second generation; 1 = second generation or greater). The data for the Y variable (socio-economic status) are ranked with 1 = highest value; 2 = next highest status; and so on.

Page 91: Econ stat1

Economic StatisticsRank-Biserial Correlation Coefficient

01

2YY

nrrb

n = number of observations

1Y= mean rank for individuals with X scores equal to 1

2Y = mean rank for individuals with scores equal to 0

Page 92: Econ stat1

Economic Statistics

6

50

6

28

12

2rbr

333.8667.46

1rbr = -0.611

This negative coefficient indicates that those who are at least second-generation Americans tend to have higher socioeconomic.

Page 93: Econ stat1

Economic Statistics

Aside from Spearman rank correlation, there are correlations that are applied to two ordinal kinds of variables. These correlation coefficients are distribution free and are usually applied to the ranks of the two variables. Examples are the Gamma and the Kendal.

Page 94: Econ stat1

Economic Statistics

Goodman and Kruskal Gamma

The Gamma is a simple symmetric correlation. It does not correct for tied ranks. It is one of many indicators of monotonicity that may be applied. Monotonicity is measured by the proportion of concordant changes from one value in one variable to paired values in the other variable.

Concordance (C)--when the change in one variable is positive and the corresponding change in the other variable is also positive.

Discordance (D) --when the change in one variable is positive and the corresponding change in the other variable is negative.

Page 95: Econ stat1

Economic Statistics

Kendall's Tau a

The number of concordances minus the number of discordances is compared to the total number of pairs, n(n-1)/2.

Page 96: Econ stat1

Economic Statistics

Kendall's Tau b (the Kendall's Tau b statistic controls for tied ranks)

Page 97: Econ stat1

Economic Statistics

With specific Y and X as dependent variables, respectively:

Asymmetric Somer's D

Page 98: Econ stat1

Economic Statistics

1. Tetrachoric correlation2. Biserial correlation3. Polychoric correlation4. Polyserial correlations 5. Partial Correlation6. Multiple Correlation

Other correlations: