Chapter 2 | Descriptive Statistics 67 2|DESCRIPTIVE STATISTICS
Descriptive Statistics Part II: Graphical Description
-
Upload
getyourcheaton -
Category
Education
-
view
554 -
download
4
Transcript of Descriptive Statistics Part II: Graphical Description
DESCRIPTIVE STATISTICS
Part II: Graphical Description
1
What are raw data and processed data?
2
What are raw data and processed data?
3
4
Bar Chart and Pie Chart
Bar Chart:
An x-y chart in which the x-axis represents values of a quantitative variable or labels of a qualitative variable and the y-axis represents the corresponding values depicted by bars
Pie Chart:
A circle divided into wedge-shaped pieces that represent areas proportional to the frequencies or relative frequencies
5
Example: The data in the Table below illustrate the average points per game for eight NFL teams after 10 games of 2010 season. Construct a bar chart to compare the performance of the eight teams. What teams had the highest points per game and what teams had the lowest points per game?
TeamAverage Points/Game
Atlanta 25.6New England 28.9Baltimore 23.3New York Jets 23.8Pittsburgh 23.5Philadelphia 28.4Green Bay 25.2New Orleans 23.5
Average points per game of eight NFL teams after 10 games in 2010 season
Atlanta
New E
ngland
Baltim
ore
New Y
ork Je
ts
Pittsburg
h
Philadelp
hia
Green B
ay
New O
rlean
s0
5
10
15
20
25
30
35
NFL Team
Ave
rage
Poi
nts p
er G
ame
A Bar Chart of Average Points per Game
6
23
Using Excel® to Construct a Bar Chart
4
OUTPUT
1
7
Death Cause Percent
Car Accidents 89
Fire arms 2
Poison 4
Other 5
Example: The data shown below illustrate the percent of causes of death among U.S. residents in the age from 18 to 24. Construct a pie chart illustrating the causes of death. What cause contributes the most to the death of residents in this age range?
Percent of death incidents and their causes
Car Accidents89%
Fire arms 2%
Poison4%
Other 5%
A Pie Chart of the Main Causes of Death of U.S. Residents (Age 18-24)
8
23
Using Excel® to Construct a Pie Chart
4
OUTPUT
1
9
10
11
12
Working Problem 3.3.b:
The bar chart shown below illustrates the Financial Aids of some U.S. Universities(2006)
(c) Which university has the highest financial aid?
(d) Which university has the lowest financial aid?
Georgi
a Inst
itute
of Tech
nology
Universi
ty of
Tenne
ssee
Universi
ty of
Mississ
ippi
Universi
ty of
Kentuc
ky
Louisi
ana S
tate U
nivers
ity
Universi
ty of
Florid
a
Univers
ity of
Virgini
a
Universi
ty of
South
Carolin
a
Universi
ty of N
orth Caro
lina
Universi
ty of
Georgi
a
Universi
ty of
Alabam
a
Univers
ity of
Califor
nia (U
CLA)
North Dak
ota St
ate U
nivers
ity
Florid
a Stat
e Unive
rsity
$0 $2,000 $4,000 $6,000 $8,000
$10,000 $12,000 $14,000 $16,000
$8,222 $6,954 $7,532 $7,861 $8,006
$10,566
$13,449
$9,501 $9,687
$7,320 $7,980
$13,462
$5,487
$8,269
Financial Aid
13
14
Histogram or a Frequency Distribution
This is a distribution that results from grouping of data into mutually exclusive classes of values and displaying frequencies or the number of observations in each class
EXAMPLE:Students’ Grades
55 65 75 85 950
2
4
6
8
10
12
14
Grade Values (the Variable)
Num
ber o
f Stu
dent
s (o
r F
requ
ency
)
Student Grade (the Variable) Mid-Point
Number of Students (Frequency)
50 to 60 55 460 to 70 65 670 to 80 75 1280 to 90 85 9
90 to 100 95 4
15
Describing a Histogram or a Frequency Distribution (Basic points)
55 65 75 85 950
2
4
6
8
10
12
14
Grade Values (the Variable)N
umbe
r of S
tude
nts
(or
Fre
quen
cy)
How many students took the test? 35 students (add up the bar heights)Approximately, what are the minimum and maximum grades, or what is the range?
Min ~55, Max ~95, Range ~40
What grade did the majority of students earn (the Mode)?
Mode ~75 (or 70% to 80%)
16
Describing a Histogram or a Frequency Distribution (More information)
55 65 75 85 950
2
4
6
8
10
12
14
Grade Values (the Variable)N
umbe
r of S
tude
nts
(or
Fre
quen
cy)
Approximately, How many students fail the test (<60%)? 4 students
Approximately, How many students passed the test with C (70%) or more?25 students
Approximately, What percent of the students earned an “A” (or above 90%?
~100*(4/35) = 11.429%
17
Histogram or a Frequency Distribution
What are raw data and processed data?
EXAMPLE:NY-College System Teachers Salaries ($)
NY-College System Teachers Salaries ($)
Mid-Point
Percent Frequency
(%)30,000 to 40,000 35,000 640,000 to 50,000 45,000 1750,000 to 60,000 55,000 2660,000 to 70,000 65,000 1870,000 to 80,000 75,000 1280,000 to 90,000 85,000 890,000 to 100,000 95,000 5100,000 to 110,000 105,000 4110,000 to 120,000 115,000 4
35,00
0
45,00
0
55,00
0
65,00
0
75,00
0
85,00
0
95,00
0
105,0
00
115,0
000
5
10
15
20
25
30
Axis Title
Axis Title
(a) Approximately, what percent of the teacher earn less than $40,000?(b) Approximately, what percent of the teacher earn more than $100,000?(c) How much does the majority of teachers (Mode) earn?
How do we construct a histogram-The Five Key Steps? Example
55 50 47 50 55 81 80 9862 38 67 70 60 69 78 3970 65 99 55 64 89 85 6575 56 75 50 100 68 95 8550 30 60 66 85 79 85 70
Values of driving-to-work distances Step 1: Find the minimum and the maximum values of the data set: min = 30 and max = 100. This gives you an idea about the span or the range of the entire data set, which is 70 miles.
Step 2: Decide on the class width that you wish to use. This will depend on number of classes or categories that you wish to use. Commonly, the minimum number of classes used is 5. Let us select a class width of 10 miles
Step 3: Form a frequency table starting with the first column (listing classes)
Step 4: Form the second column of the frequency table, which is the simply the mid-point of each class.
Step 5: Count the number of observations in each class and place the count in the third column labeled ‘frequency’.
Classes25 to < 35 35 to < 45 45 to < 55 55 to < 65 65 to < 75 75 to < 85 85 to < 95 95 to <105
Mid-Points 30405060708090100
Frequency
Total = 40
Horizontal Axis of Histogram (x)
Vertical Axis of Histogram (f)
x
f
12589654
55 50 47 50 55 81 80 98
62 38 67 70 60 69 78 39
70 65 99 55 64 89 85 65
75 56 75 50 100 68 95 85
50 30 60 66 85 79 85 70
19
Classes Mid-Points Frequency25 to < 35 30 135 to < 45 40 245 to < 55 50 555 to < 65 60 865 to < 75 70 975 to < 85 80 685 to < 95 90 595 to <105 100 4 Total = 40
How do we construct a histogram-The Five Key Steps? Example
30 40 50 60 70 80 900
1
2
3
4
5
6
7
8
9
10
Distance (miles)
Freq
uenc
y
20
What is a relative frequency distribution?
The absolute number of observations corresponding to a certain class or interval can be converted into a relative frequency value or a fraction of the total frequency using the following expression:
where fi is the frequency of the ith class, k is the number of classes, and
is the sum of all frequencies.
We can also obtain the percent relative frequency by multiplying the relative frequency by 100:
21
What is a relative frequency distribution?
Relative Frequency (rf)(1/40) or 0.025(2/40) or 0.050
0.1250.2000.2250.1500.1250.1001.000
Classes Mid-Points (x) Frequency (f)25 to < 35 30 135 to < 45 40 245 to < 55 50 555 to < 65 60 865 to < 75 70 975 to < 85 80 685 to < 95 90 595 to <105 100 4 Total = 40
Percent Relative Frequency (rf %)
2.55
12.520
22.515
12.510
100
30 40 50 60 70 80 900
0.05
0.1
0.15
0.2
0.25
Distance
Rel
ativ
e Fr
eque
ncy
30 40 50 60 70 80 900
5
10
15
20
25
Distance
Perc
ent R
elat
ive
Freq
uenc
y
22
23
24
Cumulative Frequency Curve (or Ogive):
This is a curve or a line graph used to display the cumulative frequency of each class at its upper class boundary. The horizontal axis should represent the upper boundaries of different classes and the vertical axis should represent the cumulative frequencies.
25
What is a cumulative frequency distribution?
Classes
Mid-Points
(x)Frequency
(f)
Relative Frequency
(rf)
Percent Relative
Frequency(rf %)
25 to < 35 30 1 0.025 2.535 to < 45 40 2 0.050 545 to < 55 50 5 0.125 12.555 to < 65 60 8 0.200 2065 to < 75 70 9 0.225 22.575 to < 85 80 6 0.150 1585 to < 95 90 5 0.125 12.595 to <105 100 4 0.100 10 Total = 40 1 100
Cumulative Frequency
(CF)138
1625313640
Percent Cumulative Frequency
(CF%)2.57.52040
62.577.5
90100
15 35 55 75 95 0.0
25.0
50.0
75.0
100.0
Distance
Cum
ulat
ive
Per
cent
26
15 35 55 75 95 0.0
25.0
50.0
75.0
100.0
Distance
Cum
ulat
ive
Perc
ent
Cumulative Frequency Distribution of Driving-to-Work Distance
%5.62)75( dp
What is the key information provided by a cumulative frequency distribution?
p(x < xo).The percent of data observations exhibiting values less than a certain value,
Q: What is the percent of employees driving less than 75 miles to work?
%5.62
27
Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth week of 2010 season is shown below.
• How many teams scored 20 touchdowns or less?• How many teams scored 30 touchdowns or more?
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
Touchdowns
Cum
ulat
ive
Freq
uenc
y (%
)
Cumulative frequency distribution of touchdowns of the NFL teams after the tenth week of 2010 season
28
Example: The cumulative frequency distribution of touchdowns of the 32 NFL teams after the tenth week of 2010 season is shown below.
• How many teams scored 20 touchdowns or less?• How many teams scored 30 touchdowns or more?
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
100.0
Touchdowns
Cum
ulat
ive
Freq
uenc
y (%
)
P(x < 20) ≈16%
P(x < 30) ≈72% P(x ≥ 30) ≈100 – 72 = 28%
29
30
31
Using Excel® for constructing frequency distributions: Case study: Ceramic tile area
258.5 255.0 255.5 256.4 256.6 258.4 257.2 257.4 256.3 259.5
255.7 254.9 256.1 254.5 253.3 257.9 259.1 256.8 257.7 255.1254.1 255.5 256.5 256.1 255.0 255.9 255.1 254.6 255.1 255.1
255.4 254.3 258.5 256.3 255.6 256.5 257.5 253.8 256.2 256.1
256.2 255.7 257.1 256.7 256.1 257.4 255.0 256.2 254.6 257.0
255.5 256.9 255.8 254.7 256.2 256.9 256.4 255.6 254.8 255.6
257.3 256.8 256.0 254.9 256.0 256.2 257.7 252.7 255.6 255.5253.9 256.3 255.4 256.1 256.0 254.0 257.8 252.7 256.4 256.6
255.5 255.6 255.1 256.6 254.5 255.4 254.1 256.0 256.9 256.9
254.6 254.8 256.3 255.5 256.4 253.8 254.8 254.6 255.4 255.2
Values of ceramic tile areas
Example: The data in the Table represent a sample of 100 tiles selected randomly from a tile operation producing tiles of consistent thickness and nominal dimensions of 16x16 inch, or 256 square inch area.
All data should be aligned in one column
Arrangement of Ceramic Tile Data in one Column
Ceramic Tile Area (Sq. inch)258.5255.7254.1255.4256.2255.5257.3253.9255.5254.6255
254.9255.5254.3255.7256.9256.8256.3255.6254.8255.5256.1256.5258.5257.1255.8256
255.4255.1256.3256.4254.5256.1256.3256.7254.7254.9256.1
1 2
3
3
Using Excel® to perform Descriptive Statistics of Ceramic Tile Data
Excel® Output of Descriptive Statistics of Ceramic Tile Data
• The mean area = 255.87 square inch• The median = 255.95 square inch• The mode = 255.5 square inch.
Form a Bin Range
1
Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Step 1
Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 2 & 3
Go to Data Analysis 2
Select Histogram in Data Analysis& press Ok
3
Select Cumulative Percentage& Chart Output and Press Ok
4
5
Using Excel® Histogram Tool to construct a histogram of Ceramic Tile Data-Steps 4 & 5
Excel® Histogram Output of Ceramic Tile Data
Closing the gap between bars in Excel® Histogram Output
40
Bin Frequency Cumulative %252.0 0 0.00%252.5 0 0.00%253.0 2 2.00%253.5 1 3.00%254.0 4 7.00%254.5 5 12.00%255.0 13 25.00%255.5 16 41.00%256.0 13 54.00%256.5 20 74.00%257.0 11 85.00%257.5 6 91.00%258.0 4 95.00%258.5 3 98.00%259.0 0 98.00%259.5 2 100.00%260.0 0 100.00%
Example: Suppose in the tile area example discussed above, a consumer would like to purchase tiles of the following specifications: target = 256 square inch, tolerance = 256 ± 1. In other words, the consumer has a plan to use tiles of area of 256 square inch, but he is willing to tolerate tiles ranging in area from 255 to 257 square inch. What percent of tiles will meet these specifications?
41
Bin Frequency Cumulative %252.0 0 0.00%252.5 0 0.00%253.0 2 2.00%253.5 1 3.00%254.0 4 7.00%254.5 5 12.00%255.0 13 25.00%255.5 16 41.00%256.0 13 54.00%256.5 20 74.00%257.0 11 85.00%257.5 6 91.00%258.0 4 95.00%258.5 3 98.00%259.0 0 98.00%259.5 2 100.00%260.0 0 100.00%
Classes > 251.5 to 252 > 252 to 252.5 > 252.5 to 253 > 253 to 253.5 > 253.5 to 254 > 254 to 254.5 > 254.5 to 255 > 255 to 255.5 > 255.5 to 256 > 256 to 256.5 > 256.5 to 257 > 257 to 257.5 > 257.5 to 258 > 258 to 258.5 > 258.5 to 259 > 259 to 259.5 > 259.5 to 260
Target = 256 square inch, tolerance = 256 ± 1
42
> 251.5-
252
> 252-252.5
> 252.5-
253
> 253-253.5
> 253.5-
254
> 254-254.5
> 254.5-
255
> 255-255.5
> 255.5-
256
> 256-256.5
> 256-257
> 257-257.5
> 257.5-
258
> 258-258.5
> 258.5-
259
> 259-259.5
> 259.5-
260
> 260-260.5
0
2
4
6
8
10
12
14
16
18
20
22
Ceramic Tile Area (Sq. inch)
Freq
uenc
y
RejectReject
256 ± 1.0
Specified Area
Figure 3.21 Frequency Distribution: Tiles within the Specification Limits
Target = 256 square inch, tolerance = 256 ± 1
43
252.0 252.5 253.0 253.5 254.0 254.5 255.0 255.5 256.0 256.5 257.0 257.5 258.0 258.5 259.0 259.5 260.0 0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
110.00%
Area
Cum
ulat
ive
Freq
uenc
y (%
)
P(x ≤ 255) = 25%
P(x ≤ 257) = 85%
Percent of Tiles of Area between 255 and 257 square inch
44
8000 110009000 40007000 8000
12000 900011000 700021000 120008000 60007500 21000
19000 800012000 750010050 740014000 620014000 182007000 14230
12000 9000
Working Problem 3.9:
The data shown here represents the prices of a sample of 30 used cars selected randomly from a large used car lot. Using descriptive statistics, answer the following questions:
(a) Determine the mean, the mode, and the median(b) Determine the range, the standard deviation, and the variance
Working Problem 3.10:
The data shown here represents the prices of a sample of 30 used cars selected randomly from a large used car lot. Construct a histogram and a Cumulative Frequency Curve
8000 110009000 40007000 8000
12000 900011000 700021000 120008000 60007500 21000
19000 800012000 750010050 740014000 620014000 182007000 14230
12000 9000
46
Working Problem 3.11:
Given the Histogram of Car Prices given below, answer the following questions:
(a) What is the price of the majority of cars in the used car lot?(b) What is the range of prices of used cars in the used car lot?(c) What percent of cars of a price less than $10,000?(d) What is the percent of cars of a price greater than $20,000?
4,00
0
6,00
0
8,00
0
10,0
00
12,0
00
14,0
00
16,0
00
18,0
00
20,0
00
22,0
00 0
5
10
15
20
25
30
Car Price($)
Per
cent
Car Price
($) cumulative
lower upper midpoint
freq
.percent freq. percent4,000 < 6,000 5,000 1 3.3 1 3.3 6,000 < 8,000 7,000 8 26.7 9 30.0 8,000 < 10,000 9,000 7 23.3 16 53.3 10,000 < 12,000 11,000 3 10.0 19 63.3 12,000 < 14,000 13,000 4 13.3 23 76.7 14,000 < 16,000 15,000 3 10.0 26 86.7 16,000 < 18,000 17,000 0 0.0 26 86.7 18,000 < 20,000 19,000 2 6.7 28 93.3 20,000 < 22,000 21,000 2 6.7 30 100.0
30 100.0
47
Comparing frequency distributions- The case of second-hand smokersExample: Children represent a primary target of many studies of second-hand smoking. In this regard, the important variable to be considered is the level of blood cotinine in the body.
The data in the Table represent values of blood cotinine level of two random samples of children age 4 to 17. The first set (unexposed children) represents children who were not exposed to tobacco smoking on regular basis. The second set (exposed children) represents children who were exposed to tobacco smoking on regular basis. Blood cotinine level is measured in nanograms per millimeter (ng/ml).
Blood-cotinine levels for two groups of children
Cotinine levels of unexposed
children (ng/ml)
Cotinine levels of exposed children
(ng/ml)
Cotinine levels of unexposed
children (ng/ml)
Cotinine levels of exposed children
(ng/ml)0.54 1.88 0.44 1.780.62 1.57 0.4 2.040.65 2.13 0.55 1.930.49 2.61 0.35 2.440.51 2.25 0.8 2.40.48 2.02 0.33 1.910.39 1.65 0.6 1.740.63 2 1.1 1.63
0.6 4 0.51 2.20.43 1.77 0.41 1.610.39 2.21 0.51 1.97
0.5 1.53 0.42 3.50.45 2.17 1.2 1.780.57 2.21 0.61 1.810.38 1.41 0.48 2.23
48
• Questions:
• Construct histograms and cumulative frequency curves of the two data sets • According to the Center for Disease Control and Prevention, non-smokers exposed to low levels of ETS typically have blood cotinine concentrations less than 1 ng/ml. - What percent of the unexposed children has a blood cotinine level of more than 1 ng/ml? - What percent of the exposed children has a blood cotinine level of more than 1 ng/ml?
Cotinine levels of unexposed
children (ng/ml)
Cotinine levels of exposed children
(ng/ml)
Cotinine levels of unexposed
children (ng/ml)
Cotinine levels of exposed children
(ng/ml)0.54 1.88 0.44 1.780.62 1.57 0.4 2.040.65 2.13 0.55 1.930.49 2.61 0.35 2.440.51 2.25 0.8 2.40.48 2.02 0.33 1.910.39 1.65 0.6 1.740.63 2 1.1 1.63
0.6 4 0.51 2.20.43 1.77 0.41 1.610.39 2.21 0.51 1.97
0.5 1.53 0.42 3.50.45 2.17 1.2 1.780.57 2.21 0.61 1.810.38 1.41 0.48 2.23
Blood-cotinine levels for two groups of children
49
Unexposed children Exposed childrenMean 0.545 2.079Median 0.505 1.985Mode 0.51 2.21Standard Deviation 0.195 0.542Sample Variance 0.03799 0.29398Range 0.87 2.59Minimum 0.33 1.41Maximum 1.2 4Sum 16.34 62.38Count 30 30
Comparison of blood-cotinine (ng/ml) for two groups of children
50
51
00.2
5 0.5
0.750
0000
0000
0005 1
1.25 1.5 1.7
5 22.2
5 2.5 2.75 3
3.25 3.5 3.7
5 44.2
50
1
2
3
4
5
6
7
8
9
10
0
2
4
6
8
10
12
14
16Frequency (Exposed) Frequency (Unexposed)
Blood-cotinine level (ng/ml)
Freq
uenc
y (e
xpos
ed)
Freq
uenc
y (u
nexp
osed
)
Histograms of blood-cotinine (ng/ml) for unexposed and exposed children
52
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.50.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Blood-cotinine level (ng/ml)
Cum
ulat
ive
Freq
uenc
y (%
) Unexposed
Exposed
P(x ≤ 1.0) = 93%
P(x > 1.0) = 7%
P(x ≤ 1.0) = 0%
P(x > 1.0) = 100%
Cumulative Frequency Curves of blood-cotinine (ng/ml) for unexposed and exposed children
53
The shape of the frequency distribution
x
Freq
uenc
y
x
Freq
uenc
y
x
Freq
uenc
y
x
Freq
uenc
y
MeanMedianMode
x
Freq
uenc
y
Mean
Median
Mode
x
Freq
uenc
y
ModeMedian
Mean
Mode 1 Mode 2Mean
(d) Steep Shape
( c) Skewed to the left(negatively skewed)
(f) Bimodal or Multimodal
(b) Skewed to the right(positively skewed)(a) Symmetrical shape
(e) Flat Shape
Common Shapes of Frequency Distribution
54
12
14
16
18
20
22
24
26
0
5
10
15
20
25
xP
erce
nt
Mean ≈ Mode ≈ Median (……)Mean > Mode > Median (……)Mean < Mode < Median (……)None of the above (……)
Mean ≈ Median ( ….. )Mean > Median (……)Mean < Median (…...) None of the above (…...)
Working Problem 3.12:In the following histograms, - Determine if the shape of the distribution is symmetric, uniform, negatively skewed, positively skewed, or bimodal. - Determine the relationship between mean, mode, and median for each shape
Mean ≈ Mode ≈ Median (……)Mean > Mode > Median (……)Mean < Mode < Median (……)None of the above (……)
12
14
16
18
20
22
24
0102030405060
x
Per
cent
16
18
20
22
24
26
28
0102030405060
x
Per
cent
16
18
20
22
24
26
0102030405060
x
Per
cent
Mean ≈ Mode ≈ Median (……)Mean > Mode > Median (……)Mean < Mode < Median (……)None of the above (……)
Mean ≈ Mode ≈ Median (……)Mean > Mode > Median (……)Mean < Mode < Median (……)None of the above (……)
12
16
20
24
28
32
05
10152025303540
x
Per
cent
(a) …………. (b) ………………….. (c) ………………….
(d) …………. (e) …………………..
55
The weighted mean and standard deviation
Example: Suppose you had a party with 20 people visiting you. You went to McDonald’s restaurant and bought five Big Mac meals for $6 each, eight happy meals for $2.50 each, and seven grilled chicken meals for $5.50 each. How much did you pay and what is the average price per meal? What is the variance of the meal price?
Price (x) Frequency (fi)6.0 5
2.5 8
5.5 7
Sum n = 20
xifi
30.0
20.0
38.5
88.5
1.575
-1.925
1.075
2.48
3.71
1.16
12.40
29.65
8.0950.14
56
Working Problem 3.14: In a stat test, 4 students made 75/100, 24 students made 85/100, and 12 students made 95/100. What is the mean, standard deviation, and variance of the grade?
57
Working Problem 3.15:
The Seno car body shop pays its hourly employees $10.50, $15.00, or $20.00 per hour. There are 26 hourly employees, 14 of which are paid at the $10.50 rate, 10 at the $15.00 rate, and 2 at the $20.00 rate.
- What is the mean hourly rate paid?- What is the standard deviation?- What is the variance?
58
Working Problem 3.16:
Use the frequency distribution shown in the table below to determine the mean, standard deviation and variance of the property tax value for the sample of 50 houses.
Tax ($) Frequency (fi)6000 6
7000 8
8000 16
9000 11
10000 5
11000 4
59
What is Chebyshev’s theorem?Chebyshev’s Theorem:In a frequency distribution of data, the proportion of the values that lie within k standard deviations of the mean is at least 1-1/k2, where k is any constant greater than one.
Example: The Figure below shows the frequency distributions of annual salary of professors of two different colleges. As you can see in this figure, both colleges share the same average annual salary, but college B had twice the standard deviation, which is reflected in a much wider distribution than that of college A.
• At least what percent of the salaries lie within plus and minus 2.0 standard deviations of the mean?• At least what percent of the salaries lie within Plus and minus 3.0 standard deviations of the mean?
Annual Salaries of Full Professors in two Different Colleges
College A, s = $10,000
Rel
ativ
e Fr
eque
ncy
(%)
SalaryMean$80,000
College B, s = $20,000
60
Working Problem 3.17: In remodeling your home, you used many contractors that were paid hourly an average wage of $50. The standard deviation of wages was $15.
- At least what percent of the wages lie within plus 2 standard deviations and minus 2 standard deviations of the average wage?
- At least what percent of the wages lie within plus 3 standard deviations and minus 3 standard deviations of the average wage?
61
What is the empirical rule?
Empirical Rule: For a symmetrical, bell-shaped frequency distribution, 68.26 percent of the observations will lie within plus and minus one standard deviation of the mean; 95.44 percent of the observations will lie within plus and minus two standard deviations of the mean; and 99.74 percent will lie within plus and minus three standard deviations of the mean
Rel
ativ
e Fr
eque
ncy
(%)
m +/- 3 s99.74%
m +/- 2 s
95.44%
MeanMode
Median m +/- 1 s68.26%
x
62
What is the empirical rule?Example: Assuming that College ‘A’ frequency distribution of salary shown in the previous Figure represents a normal distribution, where the mean of professors’ salaries is $80,000, and the standard deviation is $10,000, answer the following questions:
• What is the percent of salaries within the range of the mean plus and minus 2 standard deviation?• What is the percent of salaries within the range of the mean plus and minus 3 standard deviation? Annual Salaries of Full
Professors in two Different Colleges
College A, s = $10,000
Rel
ativ
e Fr
eque
ncy
(%)
SalaryMean$80,000
College B, s = $20,000
or (from $50,000 to $110,000) is 99.74%).
The percent of salaries within the range of the mean plus and minus 3 standard deviation,
The percent of salaries within the range of the mean plus and minus 2 standard deviation,
or (from $60,000 to $100,000) is 95.44%.
63
Working Problem 3.18: A course average grade is typically 80%, and the standard deviation of grade is 5%. Assuming that the grade value has bell-shaped symmetrical distribution:
- What percent of students will make ‘C’ grade or lower (C = 70%)- What percent of students will make ‘B’ grade or better (B = 80%)- What percent of students will make ‘A’ grade or better (A = 90%)- What percent of students will make ‘C’ grade or better (C = 70%)
64
Working Problem 3.19:
A course average grade is typically 75%, and the standard deviation of grade is 2.5%. Assuming that the grade value has bell-shaped symmetrical distribution:
- What percent of students will make ‘C’ grade or lower (C = 70%)- What percent of students will make ‘B’ grade or better (B = 80%)- What percent of students will make ‘A’ grade or better (A = 90%)- What percent of students will make ‘C’ grade or better (C = 70%)
65
Working Problem 3.20:
Heights of men have a bell-shaped distribution with a mean of 176 cm and a standard deviation of 7 cm. Using the empirical rule, what is the approximate percentage of men:
a. 169 cm and 183 cm?b. 155 cm and 197 cm?c. Taller than 190?
Other Forms of Graphical Data Description
Dot Plot
Steam and Leaf Plot
Box Plot10 15 20 25 30 35 40
BoxPlot
x
10 15 20 25 30 35
DotPlot
x
Frequency Stem Leaf10 1 2 4 5 6 6 8 8 8 9 914 2 0 2 2 2 2 2 2 2 2 2 3 4 4 5
2 3 0 226
66
300 350 400 450 500 550 600 650 700 750
DotPlot
Monthly Rent ($)
Dot Plot of Apartments Rents
Example: Suppose you are searching for an apartment for rent near your college and a random sample of apartments were of the following monthly rent ($): 500, 500, 400, 400, 350, 350, 400, 400, 500, 500, 500, 500, 600, 600, 600, 700, 700
Construct a dot plot to illustrate the frequency distribution of apartment rent.
Dot Plot
67
Stem & Leaf Plot
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
119 128 133 135 139 140 143 203 149 211
120 129 133 135 139 140 143 147 150 153
122 130 133 136 139 142 144 147 150 198
123 130 134 137 139 142 144 147 150 153
124 130 134 137 139 142 144 147 152 190
126 131 135 137 139 142 144 147 152 155
126 131 135 137 139 142 198 148 152 215
128 131 135 137 140 142 145 148 152 156
128 131 135 139 140 143 211 149 152 159
128 132 135 139 140 143 147 149 152 160
Example: Area values of composite plates (mm2)
68
An engineer takes 10 sample groups of composite plates, each of 10 plates and measures the area of each plate (mm2). The reported values are shown in the table below. Construct a Stem & Leaf Plot
Stem & Leaf Plot
Example: Area values of composite plates (mm2)
Frequency Stem Leaf1 11 9
11 12 0 2 3 4 6 6 8 8 8 8 935 13 0 0 0 1 1 1 1 2 3 3 3 4 4 5 5 5 5 5 5 5 6 7 7 7 7 7 9 9 9 9 9 9 9 9 931 14 0 0 0 0 0 2 2 2 2 2 2 3 3 3 3 4 4 4 4 5 7 7 7 7 7 7 8 8 9 9 914 15 0 0 0 2 2 2 2 2 2 3 3 5 6 9
1 16 00 17 0 18 3 19 0 8 81 20 33 21 1 1 5
100
69
Statistics ValueMean 144.30Median 140.24Mode 198Standard Deviation 18.68Sample Variance 348.78Range 96.35Minimum 118.65Maximum 215
Percentiles and Box Plot
6121 7748 70948419 7828 83908452 8289 71787141 6644 78868461 7963 7914
Example: Data of property tax of a sample of houses in Monmouth County, New Jersey
n Annual Property Tax ($)1 61212 66443 70944 71415 71786 77487 78288 78869 7914
10 796311 828912 839013 841914 845215 8461
Sort
)1(100
nplp
A set of data can be divided into percentiles with the desired percentile obtained from the following equation:
where lp represents the location of a certain percentile, p is the required percentile, and n is the number of observations in the data set.
70
Percentiles and Box Plotn Annual Property Tax ($)
1 61212 66443 70944 71415 71786 77487 78288 78869 7914
10 796311 828912 839013 841914 845215 8461
)1(100
nplp
12)115(10075
8)115(10050
4)115(10025
75
50
25
l
l
l Lower quartile
Median
Upper quartile
6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8600
Annual Property Tax ($)
Minimum Value
MaximumValue
Q1Lower 25%
(Lower quartile)
Q3 Upper 75%
(Upper quartile)
Median
71
72
Working Problem 3.21: Listed below are the property taxes ($) paid by a sample of 15 houses in Montgomery, Alabama, in 2009:
- Locate the median, the first quartile, and the third quartile for the property taxes. - Construct a steam and leaf plot- Construct a histogram- Construct a box plot
2000 1500 1800 1050 2000 2050 1800 1670 1800 1400 2800 950 1800 1200 1300
Working Problem 3.22:
For each of he following three data sets:(a) Determine mean, median, mode, range, standard deviation, skewness, and kurtosis for each data set(b) Construct histogram, dot plot, and box plot(c) Using these graphs, explain how each graph reveals a different type of information(d) Compare the three data sets using histogram, dot plot, and box plot
14 17 19 18 2114 17 19 18 2115 17 19 18 2215 17 19 18 2216 18 20 18 1816 18 20 18 1816 18 20 18 18
20 21 23 24 2519 21 23 24 2518 22 23 24 2519 22 24 25 2518 22 24 25 2520 23 24 25 2519 23 24 25 2520 23 24 25 25
Data Set A:
Data Set B:
22 16 24 19 2221 14 23 22 1922 16 21 20 1919 17 24 14 1624 14 14 14 2414 24 24 20 2220 21 15 21 2019 16 19 16 18
Data Set C:
Example: Determine the skewness of the data in Table below and explain how the result is comparable to that of the Box Plot.
6121 7748 70948419 7828 83908452 8289 71787141 6644 78868461 7963 7914
Annual Property Tax ($)Mean 7701.87Median 7886Standard Deviation 717.80Sample Variance 515241.6Kurtosis -0.01347Skewness -0.86623Range 2340Minimum 6121Maximum 8461
Descriptive statistics of tax data
????
What does a negative Skew mean?74
6000 6500 7000 7500 8000 8500 9000
Annual Property Tax ($)
Q1 = 7159 Q3 = 8339.5Median = 7886
6000 6500 7000 7500 8000 8500 0
1
2
3
4
5
6
Taxes ($)
Freq
uenc
y
Box Plot and Histogram of Property Tax Data
75
Scatter Plot
The Number of Hours you Study Every Week
Gra
de (%
) How many hours you study every week?How does this impact your grade?
1 2 3 4 5 6 7 8
100
90
80
70
60
50
40
76
Number of hours of Study per week
Grade (out of 100%)
4 817 726 855 757 908 901 601 455 896 887 887 758 818 957 876 973 756 655 853 80
Scatter Plot: Example
You can develop a scatter plot manually or using Excel
77
Developing a Scatter Plot Using Excel Program-Steps 1 through 3
(1) Block the x and yColumns of the data of interest using the mouse
(2) Click on Insert Button
(3) Click on Charts Button
78
(4) Select XY (Scatter) and Click OKYou may also select a particular formOf Scatter Plot from the right window
Developing a Scatter Plot Using Excel Program- Step 4
79
(5) The output Scatter Plot is shown below.You can click twice on the graph and select to make the scatter customization menu appear to Allow you to customize your graph
Developing a Scatter Plot Using Excel Program- Step 5
80
Developing a Scatter Plot Using Excel Program- Step 6: Adding a Trendline
(6) Click on the points of the plot and click the right-button of the mouseYou will see the menu shown in which you select adding a trendline
81
Developing a Scatter Plot Using Excel Program- Step 7: Trendline Menu
(7) When trendline menu opens, you will find that the default Is linear (or straight line). You can also check the equation and R-Square boxes to obtain the equation representing the x-y Relationship and the degree of strength between x and y, respectively.
Developing a Scatter Plot Using Excel
82
0 1 2 3 4 5 6 7 8 90
10
20
30
40
50
60
70
80
90
100
110
f(x) = 3.95977011494253 x + 58.3712643678161R² = 0.452819307429819
Hours of Study
Tes
t Gra
de P
erce
nt
Scatter Plot Showing the Relationship between the number of Study Hours and Student’s Grade
Developing a Scatter Plot Using Excel
83
84
Working Problem 3.23: The data in table below shows a random sample of houses with their square feet area and the price. Use Excel® to develop a scatter plot relating house price to the area in square feet. Use trendline option to obtain the equation and the coefficient of correlation r.
House Square Feet House Price ($)3200 410,0003600 440,0002600 400,0002000 260,0003000 425,0002800 400,0002000 280,0002500 380,0002500 400,000
85
Working Problem 3.24: The data in table below shows the points per game and touch down s of a random sample of college teams. Use Excel® to develop a scatter plot relating touchdowns to points/game. Use trendline option to obtain the equation and the coefficient of correlation r.
TDs Points/Game27 23.526 23.325 21.521 19.231 26.828 25.728 24.425 2242 28.928 23.827 21.315 17.234 27.429 24.324 23.826 21.7
In general, an outlier is a data observation that seems to not belong to the family of data under consideration.
Examples:
• Students Grades (%):55, 65, 99, 80, 78, 80, 80, 77, 69, 68, 89, 92, 110, 78, 86
• Adult Weight (lb):
140, 82, 122, 98, 110, 165, 138, 200, 175, 204, 290, 188, 145, 167, 389, 220, 175
• A value of $1 million in a data set of a company annual wages of plausible range from $40,000 to $220,000
• A value of 180 in a data set of people ages
• A value of 7 years old in a data set of college student’s ages
• A value of $200 million in a data set of top 100 billionaire net worth
86
What is an outlier?
87
Example: Detect outliers in the following data set of monthly accidents on the job during a year evaluation in a firm of 100 people.
Month Accidents/MonthJanuary 3February 4March 1April 2May 1June 2July 3August 15September 2October 1November 3December 2
Accidents on the Job
• Judgment Come First
• Another rule used for detecting outliers is based on defining an outlier as “a value that is more than 1.5 times the inter-quartile range smaller than the lower quartile and larger than the upper quartile”
88
Descriptive statistics Accidents/Month
count 12mean 3.25sample variance 14.57sample standard deviation 3.82minimum 1maximum 15range 14
1st quartile 1.75median 23rd quartile 3interquartile range 1.25mode 2low extremes 0low outliers 0high outliers 0high extremes 10 2 4 6 8 10 12 14 16
BoxPlot
Accidents/Month
0 2 4 6 8 10 12 14 16
DotPlot
Accidents/Month
89
Property Tax ($)6000 8000 90006800 4800 70005800 11000 100007000 10050 120006800 8000 90004000 6700 7000
28000 6800 1000010000 4000 1100012000 5999 10050
9000 10000 48007000 12000 11000
10000 9500 68005600 7500 58004800 21000 7000
11000 5600 680010050 4200 4000
Working problem 3.25: The data shown in the table below indicates the property tax ($) of a random sample of houses.
- Determine the mean, the median, and the mode of property tax- Determine the range, standard deviation, and variance- Determine the outliers in the data set