Descriptive Statistics Part II: Graphical Description

DESCRIPTIVE STATISTICS

Part II: Graphical Description

1

What are raw data and processed data?

2


3

4

Bar Chart and Pie Chart

Bar Chart:

An x-y chart in which the x-axis represents values of a quantitative variable or labels of a qualitative variable and the y-axis represents the corresponding values depicted by bars

Pie Chart:

A circle divided into wedge-shaped pieces that represent areas proportional to the frequencies or relative frequencies

5

Example: The data in the Table below illustrate the average points per game for eight NFL teams after 10 games of 2010 season. Construct a bar chart to compare the performance of the eight teams. What teams had the highest points per game and what teams had the lowest points per game?

TeamAverage Points/Game

Atlanta 25.6New England 28.9Baltimore 23.3New York Jets 23.8Pittsburgh 23.5Philadelphia 28.4Green Bay 25.2New Orleans 23.5

Average points per game of eight NFL teams after 10 games in 2010 season

Atlanta

New E

ngland

Baltim

ore

New Y

ork Je

ts

Pittsburg

h

Philadelp

hia

Green B

ay

New O

rlean

s0

5

10

15

20

25

30

35

NFL Team

Ave

rage

Poi

nts p

er G

ame

A Bar Chart of Average Points per Game

6

23

Using Excel® to Construct a Bar Chart

4

OUTPUT

1

7

Death Cause Percent

Car Accidents 89

Fire arms 2

Poison 4

Other 5

Example: The data shown below illustrate the percent of causes of death among U.S. residents in the age from 18 to 24. Construct a pie chart illustrating the causes of death. What cause contributes the most to the death of residents in this age range?

Percent of death incidents and their causes

Car Accidents89%

Fire arms 2%

Poison4%

Other 5%

A Pie Chart of the Main Causes of Death of U.S. Residents (Age 18-24)

8

23

Using Excel® to Construct a Pie Chart

4

OUTPUT

1

12

Working Problem 3.3.b:

The bar chart shown below illustrates the Financial Aids of some U.S. Universities(2006)

(c) Which university has the highest financial aid?

(d) Which university has the lowest financial aid?

Georgi

a Inst

itute

of Tech

nology

Universi

ty of

Tenne

ssee

Universi

ty of

Mississ

ippi

Universi

ty of

Kentuc

ky

Louisi

ana S

tate U

nivers

ity

Universi

ty of

Florid

a

Univers

ity of

Virgini

a

Universi

ty of

South

Carolin

a

Universi

ty of N

orth Caro

lina

Universi

ty of

Georgi

a

Universi

ty of

Alabam

a

Univers

ity of

Califor

nia (U

CLA)

North Dak

ota St

ate U

nivers

ity

Florid

a Stat

e Unive

rsity

$0 $2,000 $4,000 $6,000 $8,000

$10,000 $12,000 $14,000 $16,000

$8,222 $6,954 $7,532 $7,861 $8,006

$10,566

$13,449

$9,501 $9,687

$7,320 $7,980

$13,462

$5,487

$8,269

Financial Aid

14

Histogram or a Frequency Distribution

This is a distribution that results from grouping of data into mutually exclusive classes of values and displaying frequencies or the number of observations in each class

EXAMPLE:Students’ Grades

55 65 75 85 950

2

4

6

8

10

12

14

Grade Values (the Variable)

Num

ber o

f Stu

dent

s (o

r F

requ

ency

)

Student Grade (the Variable) Mid-Point

Number of Students (Frequency)

50 to 60 55 460 to 70 65 670 to 80 75 1280 to 90 85 9

90 to 100 95 4

15

Describing a Histogram or a Frequency Distribution (Basic points)

55 65 75 85 950

2

4

6

8

10

12

14

Grade Values (the Variable)N

umbe

r of S

tude

nts

(or

Fre

quen

cy)

How many students took the test? 35 students (add up the bar heights)Approximately, what are the minimum and maximum grades, or what is the range?

Min ~55, Max ~95, Range ~40

What grade did the majority of students earn (the Mode)?

Mode ~75 (or 70% to 80%)

16

Describing a Histogram or a Frequency Distribution (More information)

55 65 75 85 950

2

4

6

8

10

12

14

Grade Values (the Variable)N

umbe

r of S

tude

nts

(or

Fre

quen

cy)

Approximately, How many students fail the test (<60%)? 4 students

Approximately, How many students passed the test with C (70%) or more?25 students

Approximately, What percent of the students earned an “A” (or above 90%?

~100*(4/35) = 11.429%

17

Histogram or a Frequency Distribution


EXAMPLE:NY-College System Teachers Salaries ($)

NY-College System Teachers Salaries ($)

Mid-Point

Percent Frequency

(%)30,000 to 40,000 35,000 640,000 to 50,000 45,000 1750,000 to 60,000 55,000 2660,000 to 70,000 65,000 1870,000 to 80,000 75,000 1280,000 to 90,000 85,000 890,000 to 100,000 95,000 5100,000 to 110,000 105,000 4110,000 to 120,000 115,000 4

35,00

0

45,00

0

55,00

0

65,00

0

75,00

0

85,00

0

95,00

0

105,0

00

115,0

000

5

10

15

20

25

30

Axis Title

Axis Title

(a) Approximately, what percent of the teacher earn less than $40,000?(b) Approximately, what percent of the teacher earn more than $100,000?(c) How much does the majority of teachers (Mode) earn?

How do we construct a histogram-The Five Key Steps? Example

55 50 47 50 55 81 80 9862 38 67 70 60 69 78 3970 65 99 55 64 89 85 6575 56 75 50 100 68 95 8550 30 60 66 85 79 85 70

Values of driving-to-work distances Step 1: Find the minimum and the maximum values of the data set: min = 30 and max = 100. This gives you an idea about the span or the range of the entire data set, which is 70 miles.

Step 2: Decide on the class width that you wish to use. This will depend on number of classes or categories that you wish to use. Commonly, the minimum number of classes used is 5. Let us select a class width of 10 miles

Step 3: Form a frequency table starting with the first column (listing classes)

Step 4: Form the second column of the frequency table, which is the simply the mid-point of each class.

Step 5: Count the number of observations in each class and place the count in the third column labeled ‘frequency’.

Classes25 to < 35 35 to < 45 45 to < 55 55 to < 65 65 to < 75 75 to < 85 85 to < 95 95 to <105

Mid-Points 30405060708090100

Frequency

Total = 40

Horizontal Axis of Histogram (x)

Vertical Axis of Histogram (f)

x

f

12589654

55 50 47 50 55 81 80 98

62 38 67 70 60 69 78 39

70 65 99 55 64 89 85 65

75 56 75 50 100 68 95 85

50 30 60 66 85 79 85 70

19

Classes Mid-Points Frequency25 to < 35 30 135 to < 45 40 245 to < 55 50 555 to < 65 60 865 to < 75 70 975 to < 85 80 685 to < 95 90 595 to <105 100 4 Total = 40

How do we construct a histogram-The Five Key Steps? Example

30 40 50 60 70 80 900

1

2

3

4

5

6

7

8

9

10

Distance (miles)

Freq

uenc

y

20

What is a relative frequency distribution?

The absolute number of observations corresponding to a certain class or interval can be converted into a relative frequency value or a fraction of the total frequency using the following expression:

where fi is the frequency of the ith class, k is the number of classes, and

is the sum of all frequencies.

We can also obtain the percent relative frequency by multiplying the relative frequency by 100:

21

What is a relative frequency distribution?

Relative Frequency (rf)(1/40) or 0.025(2/40) or 0.050

0.1250.2000.2250.1500.1250.1001.000

Classes Mid-Points (x) Frequency (f)25 to < 35 30 135 to < 45 40 245 to < 55 50 555 to < 65 60 865 to < 75 70 975 to < 85 80 685 to < 95 90 595 to <105 100 4 Total = 40

Percent Relative Frequency (rf %)

2.55

12.520

22.515

12.510

100

30 40 50 60 70 80 900

0.05

0.1

0.15

0.2

0.25

Distance

Rel

ativ

e Fr

eque

ncy

30 40 50 60 70 80 900

5

10

15

20

25

Distance

Perc

ent R

elat

ive

Freq

uenc

y

24

Cumulative Frequency Curve (or Ogive):

This is a curve or a line graph used to display the cumulative frequency of each class at its upper class boundary. The horizontal axis should represent the upper boundaries of different classes and the vertical axis should represent the cumulative frequencies.

25

What is a cumulative frequency distribution?

Classes

Mid-Points

(x)Frequency

(f)

Relative Frequency

(rf)

Percent Relative

Frequency(rf %)

25 to < 35 30 1 0.025 2.535 to < 45 40 2 0.050 545 to < 55 50 5 0.125 12.555 to < 65 60 8 0.200 2065 to < 75 70 9 0.225 22.575 to < 85 80 6 0.150 1585 to < 95 90 5 0.125 12.595 to <105 100 4 0.100 10 Total = 40 1 100

Cumulative Frequency

(CF)138

1625313640

Percent Cumulative Frequency

(CF%)2.57.52040

62.577.5

90100

15 35 55 75 95 0.0

25.0

50.0

75.0

100.0

Distance

Cum

ulat

ive

Per

cent

26

15 35 55 75 95 0.0

25.0

50.0

75.0

100.0

Distance

Cum

ulat

ive

Perc

ent

Cumulative Frequency Distribution of Driving-to-Work Distance

%5.62)75( dp

What is the key information provided by a cumulative frequency distribution?

p(x < xo).The percent of data observations exhibiting values less than a certain value,

Q: What is the percent of employees driving less than 75 miles to work?

%5.62

27

Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth week of 2010 season is shown below.

• How many teams scored 20 touchdowns or less?• How many teams scored 30 touchdowns or more?

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

Touchdowns

Cum

ulat

ive

Freq

uenc

y (%

)

Cumulative frequency distribution of touchdowns of the NFL teams after the tenth week of 2010 season

28

Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth week of 2010 season is shown below.

• How many teams scored 20 touchdowns or less?• How many teams scored 30 touchdowns or more?

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

Touchdowns

Cum

ulat

ive

Freq

uenc

y (%

)

P(x < 20) ≈16%

P(x < 30) ≈72% P(x ≥ 30) ≈100 – 72 = 28%

31

Using Excel® for constructing frequency distributions: Case study: Ceramic tile area

258.5 255.0 255.5 256.4 256.6 258.4 257.2 257.4 256.3 259.5

255.7 254.9 256.1 254.5 253.3 257.9 259.1 256.8 257.7 255.1254.1 255.5 256.5 256.1 255.0 255.9 255.1 254.6 255.1 255.1

255.4 254.3 258.5 256.3 255.6 256.5 257.5 253.8 256.2 256.1

256.2 255.7 257.1 256.7 256.1 257.4 255.0 256.2 254.6 257.0

255.5 256.9 255.8 254.7 256.2 256.9 256.4 255.6 254.8 255.6

257.3 256.8 256.0 254.9 256.0 256.2 257.7 252.7 255.6 255.5253.9 256.3 255.4 256.1 256.0 254.0 257.8 252.7 256.4 256.6

255.5 255.6 255.1 256.6 254.5 255.4 254.1 256.0 256.9 256.9

254.6 254.8 256.3 255.5 256.4 253.8 254.8 254.6 255.4 255.2

Values of ceramic tile areas

Example: The data in the Table represent a sample of 100 tiles selected randomly from a tile operation producing tiles of consistent thickness and nominal dimensions of 16x16 inch, or 256 square inch area.

All data should be aligned in one column

Arrangement of Ceramic Tile Data in one Column

Ceramic Tile Area (Sq. inch)258.5255.7254.1255.4256.2255.5257.3253.9255.5254.6255

254.9255.5254.3255.7256.9256.8256.3255.6254.8255.5256.1256.5258.5257.1255.8256

255.4255.1256.3256.4254.5256.1256.3256.7254.7254.9256.1

1 2

3

3

Using Excel® to perform Descriptive Statistics of Ceramic Tile Data

Excel® Output of Descriptive Statistics of Ceramic Tile Data

• The mean area = 255.87 square inch• The median = 255.95 square inch• The mode = 255.5 square inch.

Form a Bin Range

1

Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Step 1

Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 2 & 3

Go to Data Analysis 2

Select Histogram in Data Analysis& press Ok

3

Select Cumulative Percentage& Chart Output and Press Ok

4

5

Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 4 & 5

Excel® Histogram Output of Ceramic Tile Data

Closing the gap between bars in Excel® Histogram Output

40

Bin Frequency Cumulative %252.0 0 0.00%252.5 0 0.00%253.0 2 2.00%253.5 1 3.00%254.0 4 7.00%254.5 5 12.00%255.0 13 25.00%255.5 16 41.00%256.0 13 54.00%256.5 20 74.00%257.0 11 85.00%257.5 6 91.00%258.0 4 95.00%258.5 3 98.00%259.0 0 98.00%259.5 2 100.00%260.0 0 100.00%

Example: Suppose in the tile area example discussed above, a consumer would like to purchase tiles of the following specifications: target = 256 square inch, tolerance = 256 ± 1. In other words, the consumer has a plan to use tiles of area of 256 square inch, but he is willing to tolerate tiles ranging in area from 255 to 257 square inch. What percent of tiles will meet these specifications?

41

Bin Frequency Cumulative %252.0 0 0.00%252.5 0 0.00%253.0 2 2.00%253.5 1 3.00%254.0 4 7.00%254.5 5 12.00%255.0 13 25.00%255.5 16 41.00%256.0 13 54.00%256.5 20 74.00%257.0 11 85.00%257.5 6 91.00%258.0 4 95.00%258.5 3 98.00%259.0 0 98.00%259.5 2 100.00%260.0 0 100.00%

Classes > 251.5 to 252 > 252 to 252.5 > 252.5 to 253 > 253 to 253.5 > 253.5 to 254 > 254 to 254.5 > 254.5 to 255 > 255 to 255.5 > 255.5 to 256 > 256 to 256.5 > 256.5 to 257 > 257 to 257.5 > 257.5 to 258 > 258 to 258.5 > 258.5 to 259 > 259 to 259.5 > 259.5 to 260

Target = 256 square inch, tolerance = 256 ± 1

42

> 251.5-

252

> 252-252.5

> 252.5-

253

> 253-253.5

> 253.5-

254

> 254-254.5

> 254.5-

255

> 255-255.5

> 255.5-

256

> 256-256.5

> 256-257

> 257-257.5

> 257.5-

258

> 258-258.5

> 258.5-

259

> 259-259.5

> 259.5-

260

> 260-260.5

0

2

4

6

8

10

12

14

16

18

20

22

Ceramic Tile Area (Sq. inch)

Freq

uenc

y

RejectReject

256 ± 1.0

Specified Area

Figure 3.21 Frequency Distribution: Tiles within the Specification Limits

Target = 256 square inch, tolerance = 256 ± 1

43

252.0 252.5 253.0 253.5 254.0 254.5 255.0 255.5 256.0 256.5 257.0 257.5 258.0 258.5 259.0 259.5 260.0 0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

110.00%

Area

Cum

ulat

ive

Freq

uenc

y (%

)

P(x ≤ 255) = 25%

P(x ≤ 257) = 85%

Percent of Tiles of Area between 255 and 257 square inch

44

8000 110009000 40007000 8000

12000 900011000 700021000 120008000 60007500 21000

19000 800012000 750010050 740014000 620014000 182007000 14230

12000 9000

Working Problem 3.9:

The data shown here represents the prices of a sample of 30 used cars selected randomly from a large used car lot. Using descriptive statistics, answer the following questions:

(a) Determine the mean, the mode, and the median(b) Determine the range, the standard deviation, and the variance


The data shown here represents the prices of a sample of 30 used cars selected randomly from a large used car lot. Construct a histogram and a Cumulative Frequency Curve

8000 110009000 40007000 8000

12000 900011000 700021000 120008000 60007500 21000

19000 800012000 750010050 740014000 620014000 182007000 14230

12000 9000

46


Given the Histogram of Car Prices given below, answer the following questions:

(a) What is the price of the majority of cars in the used car lot?(b) What is the range of prices of used cars in the used car lot?(c) What percent of cars of a price less than $10,000?(d) What is the percent of cars of a price greater than $20,000?

4,00

0

6,00

0

8,00

0

10,0

00

12,0

00

14,0

00

16,0

00

18,0

00

20,0

00

22,0

00 0

5

10

15

20

25

30

Car Price($)

Per

cent

Car Price

($) cumulative

lower upper midpoint

freq

.percent freq. percent4,000 < 6,000 5,000 1 3.3 1 3.3 6,000 < 8,000 7,000 8 26.7 9 30.0 8,000 < 10,000 9,000 7 23.3 16 53.3 10,000 < 12,000 11,000 3 10.0 19 63.3 12,000 < 14,000 13,000 4 13.3 23 76.7 14,000 < 16,000 15,000 3 10.0 26 86.7 16,000 < 18,000 17,000 0 0.0 26 86.7 18,000 < 20,000 19,000 2 6.7 28 93.3 20,000 < 22,000 21,000 2 6.7 30 100.0

30 100.0

47

Comparing frequency distributions- The case of second-hand smokersExample: Children represent a primary target of many studies of second-hand smoking. In this regard, the important variable to be considered is the level of blood cotinine in the body.

The data in the Table represent values of blood cotinine level of two random samples of children age 4 to 17. The first set (unexposed children) represents children who were not exposed to tobacco smoking on regular basis. The second set (exposed children) represents children who were exposed to tobacco smoking on regular basis. Blood cotinine level is measured in nanograms per millimeter (ng/ml).

Blood-cotinine levels for two groups of children

Cotinine levels of unexposed

children (ng/ml)

Cotinine levels of exposed children

(ng/ml)


children (ng/ml)


(ng/ml)0.54 1.88 0.44 1.780.62 1.57 0.4 2.040.65 2.13 0.55 1.930.49 2.61 0.35 2.440.51 2.25 0.8 2.40.48 2.02 0.33 1.910.39 1.65 0.6 1.740.63 2 1.1 1.63

0.6 4 0.51 2.20.43 1.77 0.41 1.610.39 2.21 0.51 1.97

0.5 1.53 0.42 3.50.45 2.17 1.2 1.780.57 2.21 0.61 1.810.38 1.41 0.48 2.23

48

• Questions:

• Construct histograms and cumulative frequency curves of the two data sets • According to the Center for Disease Control and Prevention, non-smokers exposed to low levels of ETS typically have blood cotinine concentrations less than 1 ng/ml. - What percent of the unexposed children has a blood cotinine level of more than 1 ng/ml? - What percent of the exposed children has a blood cotinine level of more than 1 ng/ml?


children (ng/ml)


(ng/ml)


children (ng/ml)


(ng/ml)0.54 1.88 0.44 1.780.62 1.57 0.4 2.040.65 2.13 0.55 1.930.49 2.61 0.35 2.440.51 2.25 0.8 2.40.48 2.02 0.33 1.910.39 1.65 0.6 1.740.63 2 1.1 1.63

0.6 4 0.51 2.20.43 1.77 0.41 1.610.39 2.21 0.51 1.97

0.5 1.53 0.42 3.50.45 2.17 1.2 1.780.57 2.21 0.61 1.810.38 1.41 0.48 2.23

Blood-cotinine levels for two groups of children

49

Unexposed children Exposed childrenMean 0.545 2.079Median 0.505 1.985Mode 0.51 2.21Standard Deviation 0.195 0.542Sample Variance 0.03799 0.29398Range 0.87 2.59Minimum 0.33 1.41Maximum 1.2 4Sum 16.34 62.38Count 30 30

Comparison of blood-cotinine (ng/ml) for two groups of children

51

00.2

5 0.5

0.750

0000

0000

0005 1

1.25 1.5 1.7

5 22.2

5 2.5 2.75 3

3.25 3.5 3.7

5 44.2

50

1

2

3

4

5

6

7

8

9

10

0

2

4

6

8

10

12

14

16Frequency (Exposed) Frequency (Unexposed)

Blood-cotinine level (ng/ml)

Freq

uenc

y (e

xpos

ed)

Freq

uenc

y (u

nexp

osed

)

Histograms of blood-cotinine (ng/ml) for unexposed and exposed children

52

0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.50.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Blood-cotinine level (ng/ml)

Cum

ulat

ive

Freq

uenc

y (%

) Unexposed

Exposed

P(x ≤ 1.0) = 93%

P(x > 1.0) = 7%

P(x ≤ 1.0) = 0%

P(x > 1.0) = 100%

Cumulative Frequency Curves of blood-cotinine (ng/ml) for unexposed and exposed children

53

The shape of the frequency distribution

x

Freq

uenc

y

x

Freq

uenc

y

x

Freq

uenc

y

x

Freq

uenc

y

MeanMedianMode

x

Freq

uenc

y

Mean

Median

Mode

x

Freq

uenc

y

ModeMedian

Mean

Mode 1 Mode 2Mean

(d) Steep Shape

( c) Skewed to the left(negatively skewed)

(f) Bimodal or Multimodal

(b) Skewed to the right(positively skewed)(a) Symmetrical shape

(e) Flat Shape

Common Shapes of Frequency Distribution

54

12

14

16

18

20

22

24

26

0

5

10

15

20

25

xP

erce

nt

Mean ≈ Mode ≈ Median (……)Mean > Mode > Median (……)Mean < Mode < Median (……)None of the above (……)

Mean ≈ Median ( ….. )Mean > Median (……)Mean < Median (…...) None of the above (…...)

Working Problem 3.12:In the following histograms, - Determine if the shape of the distribution is symmetric, uniform, negatively skewed, positively skewed, or bimodal. - Determine the relationship between mean, mode, and median for each shape


12

14

16

18

20

22

24

0102030405060

x

Per

cent

16

18

20

22

24

26

28

0102030405060

x

Per

cent

16

18

20

22

24

26

0102030405060

x

Per

cent



12

16

20

24

28

32

05

10152025303540

x

Per

cent

(a) …………. (b) ………………….. (c) ………………….

(d) …………. (e) …………………..

55

The weighted mean and standard deviation

Example: Suppose you had a party with 20 people visiting you. You went to McDonald’s restaurant and bought five Big Mac meals for $6 each, eight happy meals for $2.50 each, and seven grilled chicken meals for $5.50 each. How much did you pay and what is the average price per meal? What is the variance of the meal price?

Price (x) Frequency (fi)6.0 5

2.5 8

5.5 7

Sum n = 20

xifi

30.0

20.0

38.5

88.5

1.575

-1.925

1.075

2.48

3.71

1.16

12.40

29.65

8.0950.14

56

Working Problem 3.14: In a stat test, 4 students made 75/100, 24 students made 85/100, and 12 students made 95/100. What is the mean, standard deviation, and variance of the grade?

57


The Seno car body shop pays its hourly employees $10.50, $15.00, or $20.00 per hour. There are 26 hourly employees, 14 of which are paid at the $10.50 rate, 10 at the $15.00 rate, and 2 at the $20.00 rate.

- What is the mean hourly rate paid?- What is the standard deviation?- What is the variance?

58


Use the frequency distribution shown in the table below to determine the mean, standard deviation and variance of the property tax value for the sample of 50 houses.

Tax ($) Frequency (fi)6000 6

7000 8

8000 16

9000 11

10000 5

11000 4

59

What is Chebyshev’s theorem?Chebyshev’s Theorem:In a frequency distribution of data, the proportion of the values that lie within k standard deviations of the mean is at least 1-1/k2, where k is any constant greater than one.

Example: The Figure below shows the frequency distributions of annual salary of professors of two different colleges. As you can see in this figure, both colleges share the same average annual salary, but college B had twice the standard deviation, which is reflected in a much wider distribution than that of college A.

• At least what percent of the salaries lie within plus and minus 2.0 standard deviations of the mean?• At least what percent of the salaries lie within Plus and minus 3.0 standard deviations of the mean?

Annual Salaries of Full Professors in two Different Colleges

College A, s = $10,000

Rel

ativ

e Fr

eque

ncy

(%)

SalaryMean$80,000

College B, s = $20,000

60

Working Problem 3.17: In remodeling your home, you used many contractors that were paid hourly an average wage of $50. The standard deviation of wages was $15.

- At least what percent of the wages lie within plus 2 standard deviations and minus 2 standard deviations of the average wage?

- At least what percent of the wages lie within plus 3 standard deviations and minus 3 standard deviations of the average wage?

61

What is the empirical rule?

Empirical Rule: For a symmetrical, bell-shaped frequency distribution, 68.26 percent of the observations will lie within plus and minus one standard deviation of the mean; 95.44 percent of the observations will lie within plus and minus two standard deviations of the mean; and 99.74 percent will lie within plus and minus three standard deviations of the mean

Rel

ativ

e Fr

eque

ncy

(%)

m +/- 3 s99.74%

m +/- 2 s

95.44%

MeanMode

Median m +/- 1 s68.26%

x

62

What is the empirical rule?Example: Assuming that College ‘A’ frequency distribution of salary shown in the previous Figure represents a normal distribution, where the mean of professors’ salaries is $80,000, and the standard deviation is $10,000, answer the following questions:

• What is the percent of salaries within the range of the mean plus and minus 2 standard deviation?• What is the percent of salaries within the range of the mean plus and minus 3 standard deviation? Annual Salaries of Full

Professors in two Different Colleges

College A, s = $10,000

Rel

ativ

e Fr

eque

ncy

(%)

SalaryMean$80,000

College B, s = $20,000

or (from $50,000 to $110,000) is 99.74%).

The percent of salaries within the range of the mean plus and minus 3 standard deviation,

The percent of salaries within the range of the mean plus and minus 2 standard deviation,

or (from $60,000 to $100,000) is 95.44%.

63

Working Problem 3.18: A course average grade is typically 80%, and the standard deviation of grade is 5%. Assuming that the grade value has bell-shaped symmetrical distribution:

- What percent of students will make ‘C’ grade or lower (C = 70%)- What percent of students will make ‘B’ grade or better (B = 80%)- What percent of students will make ‘A’ grade or better (A = 90%)- What percent of students will make ‘C’ grade or better (C = 70%)

64


A course average grade is typically 75%, and the standard deviation of grade is 2.5%. Assuming that the grade value has bell-shaped symmetrical distribution:

- What percent of students will make ‘C’ grade or lower (C = 70%)- What percent of students will make ‘B’ grade or better (B = 80%)- What percent of students will make ‘A’ grade or better (A = 90%)- What percent of students will make ‘C’ grade or better (C = 70%)

65


Heights of men have a bell-shaped distribution with a mean of 176 cm and a standard deviation of 7 cm. Using the empirical rule, what is the approximate percentage of men:

a. 169 cm and 183 cm?b. 155 cm and 197 cm?c. Taller than 190?

Other Forms of Graphical Data Description

Dot Plot

Steam and Leaf Plot

Box Plot10 15 20 25 30 35 40

BoxPlot

x

10 15 20 25 30 35

DotPlot

x

Frequency Stem Leaf10 1 2 4 5 6 6 8 8 8 9 914 2 0 2 2 2 2 2 2 2 2 2 3 4 4 5

2 3 0 226

66

300 350 400 450 500 550 600 650 700 750

DotPlot

Monthly Rent ($)

Dot Plot of Apartments Rents

Example: Suppose you are searching for an apartment for rent near your college and a random sample of apartments were of the following monthly rent ($): 500, 500, 400, 400, 350, 350, 400, 400, 500, 500, 500, 500, 600, 600, 600, 700, 700

Construct a dot plot to illustrate the frequency distribution of apartment rent.

Dot Plot

67

Stem & Leaf Plot

G1 G2 G3 G4 G5 G6 G7 G8 G9 G10

119 128 133 135 139 140 143 203 149 211

120 129 133 135 139 140 143 147 150 153

122 130 133 136 139 142 144 147 150 198

123 130 134 137 139 142 144 147 150 153

124 130 134 137 139 142 144 147 152 190

126 131 135 137 139 142 144 147 152 155

126 131 135 137 139 142 198 148 152 215

128 131 135 137 140 142 145 148 152 156

128 131 135 139 140 143 211 149 152 159

128 132 135 139 140 143 147 149 152 160

Example: Area values of composite plates (mm2)

68

An engineer takes 10 sample groups of composite plates, each of 10 plates and measures the area of each plate (mm2). The reported values are shown in the table below. Construct a Stem & Leaf Plot

Stem & Leaf Plot

Example: Area values of composite plates (mm2)

Frequency Stem Leaf1 11 9

11 12 0 2 3 4 6 6 8 8 8 8 935 13 0 0 0 1 1 1 1 2 3 3 3 4 4 5 5 5 5 5 5 5 6 7 7 7 7 7 9 9 9 9 9 9 9 9 931 14 0 0 0 0 0 2 2 2 2 2 2 3 3 3 3 4 4 4 4 5 7 7 7 7 7 7 8 8 9 9 914 15 0 0 0 2 2 2 2 2 2 3 3 5 6 9

1 16 00 17 0 18 3 19 0 8 81 20 33 21 1 1 5

100

69

Statistics ValueMean 144.30Median 140.24Mode 198Standard Deviation 18.68Sample Variance 348.78Range 96.35Minimum 118.65Maximum 215

Percentiles and Box Plot

6121 7748 70948419 7828 83908452 8289 71787141 6644 78868461 7963 7914

Example: Data of property tax of a sample of houses in Monmouth County, New Jersey

n Annual Property Tax ($)1 61212 66443 70944 71415 71786 77487 78288 78869 7914

10 796311 828912 839013 841914 845215 8461

Sort

)1(100

nplp

A set of data can be divided into percentiles with the desired percentile obtained from the following equation:

where lp represents the location of a certain percentile, p is the required percentile, and n is the number of observations in the data set.

70

Percentiles and Box Plotn Annual Property Tax ($)

1 61212 66443 70944 71415 71786 77487 78288 78869 7914

10 796311 828912 839013 841914 845215 8461

)1(100

nplp

12)115(10075

8)115(10050

4)115(10025

75

50

25

l

l

l Lower quartile

Median

Upper quartile

6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600

Annual Property Tax ($)

Minimum Value

MaximumValue

Q1Lower 25%

(Lower quartile)

Q3 Upper 75%

(Upper quartile)

Median

71

72

Working Problem 3.21: Listed below are the property taxes ($) paid by a sample of 15 houses in Montgomery, Alabama, in 2009:

- Locate the median, the first quartile, and the third quartile for the property taxes. - Construct a steam and leaf plot- Construct a histogram- Construct a box plot

2000 1500 1800 1050 2000 2050 1800 1670 1800 1400 2800 950 1800 1200 1300


For each of he following three data sets:(a) Determine mean, median, mode, range, standard deviation, skewness, and kurtosis for each data set(b) Construct histogram, dot plot, and box plot(c) Using these graphs, explain how each graph reveals a different type of information(d) Compare the three data sets using histogram, dot plot, and box plot

14 17 19 18 2114 17 19 18 2115 17 19 18 2215 17 19 18 2216 18 20 18 1816 18 20 18 1816 18 20 18 18

20 21 23 24 2519 21 23 24 2518 22 23 24 2519 22 24 25 2518 22 24 25 2520 23 24 25 2519 23 24 25 2520 23 24 25 25

Data Set A:

Data Set B:

22 16 24 19 2221 14 23 22 1922 16 21 20 1919 17 24 14 1624 14 14 14 2414 24 24 20 2220 21 15 21 2019 16 19 16 18

Data Set C:

Example: Determine the skewness of the data in Table below and explain how the result is comparable to that of the Box Plot.

6121 7748 70948419 7828 83908452 8289 71787141 6644 78868461 7963 7914

Annual Property Tax ($)Mean 7701.87Median 7886Standard Deviation 717.80Sample Variance 515241.6Kurtosis -0.01347Skewness -0.86623Range 2340Minimum 6121Maximum 8461

Descriptive statistics of tax data

????

What does a negative Skew mean?74

6000 6500 7000 7500 8000 8500 9000

Annual Property Tax ($)

Q1 = 7159 Q3 = 8339.5Median = 7886

6000 6500 7000 7500 8000 8500 0

1

2

3

4

5

6

Taxes ($)

Freq

uenc

y

Box Plot and Histogram of Property Tax Data

75

Scatter Plot

The Number of Hours you Study Every Week

Gra

de (%

) How many hours you study every week?How does this impact your grade?

1 2 3 4 5 6 7 8

100

90

80

70

60

50

40

76

Number of hours of Study per week

Grade (out of 100%)

4 817 726 855 757 908 901 601 455 896 887 887 758 818 957 876 973 756 655 853 80

Scatter Plot: Example

You can develop a scatter plot manually or using Excel

77

Developing a Scatter Plot Using Excel Program-Steps 1 through 3

(1) Block the x and yColumns of the data of interest using the mouse

(2) Click on Insert Button

(3) Click on Charts Button

78

(4) Select XY (Scatter) and Click OKYou may also select a particular formOf Scatter Plot from the right window

Developing a Scatter Plot Using Excel Program- Step 4

79

(5) The output Scatter Plot is shown below.You can click twice on the graph and select to make the scatter customization menu appear to Allow you to customize your graph

Developing a Scatter Plot Using Excel Program- Step 5

80

Developing a Scatter Plot Using Excel Program- Step 6: Adding a Trendline

(6) Click on the points of the plot and click the right-button of the mouseYou will see the menu shown in which you select adding a trendline

81

Developing a Scatter Plot Using Excel Program- Step 7: Trendline Menu

(7) When trendline menu opens, you will find that the default Is linear (or straight line). You can also check the equation and R-Square boxes to obtain the equation representing the x-y Relationship and the degree of strength between x and y, respectively.

Developing a Scatter Plot Using Excel

82

0 1 2 3 4 5 6 7 8 90

10

20

30

40

50

60

70

80

90

100

110

f(x) = 3.95977011494253 x + 58.3712643678161R² = 0.452819307429819

Hours of Study

Tes

t Gra

de P

erce

nt

Scatter Plot Showing the Relationship between the number of Study Hours and Student’s Grade

Developing a Scatter Plot Using Excel

83

84

Working Problem 3.23: The data in table below shows a random sample of houses with their square feet area and the price. Use Excel® to develop a scatter plot relating house price to the area in square feet. Use trendline option to obtain the equation and the coefficient of correlation r.

House Square Feet House Price ($)3200 410,0003600 440,0002600 400,0002000 260,0003000 425,0002800 400,0002000 280,0002500 380,0002500 400,000

85

Working Problem 3.24: The data in table below shows the points per game and touch down s of a random sample of college teams. Use Excel® to develop a scatter plot relating touchdowns to points/game. Use trendline option to obtain the equation and the coefficient of correlation r.

TDs Points/Game27 23.526 23.325 21.521 19.231 26.828 25.728 24.425 2242 28.928 23.827 21.315 17.234 27.429 24.324 23.826 21.7

In general, an outlier is a data observation that seems to not belong to the family of data under consideration.

Examples:

• Students Grades (%):55, 65, 99, 80, 78, 80, 80, 77, 69, 68, 89, 92, 110, 78, 86

• Adult Weight (lb):

140, 82, 122, 98, 110, 165, 138, 200, 175, 204, 290, 188, 145, 167, 389, 220, 175

• A value of $1 million in a data set of a company annual wages of plausible range from $40,000 to $220,000

• A value of 180 in a data set of people ages

• A value of 7 years old in a data set of college student’s ages

• A value of $200 million in a data set of top 100 billionaire net worth

86

What is an outlier?

87

Example: Detect outliers in the following data set of monthly accidents on the job during a year evaluation in a firm of 100 people.

Month Accidents/MonthJanuary 3February 4March 1April 2May 1June 2July 3August 15September 2October 1November 3December 2

Accidents on the Job

• Judgment Come First

• Another rule used for detecting outliers is based on defining an outlier as “a value that is more than 1.5 times the inter-quartile range smaller than the lower quartile and larger than the upper quartile”

88

Descriptive statistics Accidents/Month

count 12mean 3.25sample variance 14.57sample standard deviation 3.82minimum 1maximum 15range 14

1st quartile 1.75median 23rd quartile 3interquartile range 1.25mode 2low extremes 0low outliers 0high outliers 0high extremes 10 2 4 6 8 10 12 14 16

BoxPlot

Accidents/Month

0 2 4 6 8 10 12 14 16

DotPlot

Accidents/Month

89

Property Tax ($)6000 8000 90006800 4800 70005800 11000 100007000 10050 120006800 8000 90004000 6700 7000

28000 6800 1000010000 4000 1100012000 5999 10050

9000 10000 48007000 12000 11000

10000 9500 68005600 7500 58004800 21000 7000

11000 5600 680010050 4200 4000

Working problem 3.25: The data shown in the table below indicates the property tax ($) of a random sample of houses.

- Determine the mean, the median, and the mode of property tax- Determine the range, standard deviation, and variance- Determine the outliers in the data set

Descriptive Statistics Part II: Graphical Description

Education

Transcript of Descriptive Statistics Part II: Graphical Description