Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a...
Transcript of Chapter 2: Summarizing Data€¦ · political affiliation (Independent, Democrat, Republican) is a...
1Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter 2:
Summarizing Data
Hildebrand, Ott and Gray
Basic Statistical Ideas for Managers
Second Edition
2Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Learning Objectives for Ch. 2
• Understanding the different ways of classifying data
• Understanding the difference between descriptive statistics and inferential statistics
• Graphing data to reveal the basic pattern of data
• Understanding how the sample mean, trimmed sample mean and sample median measure the central tendency of data
• Understanding how the interquartile range and standard deviation measure the variability of data
3Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Learning Objectives for Ch. 2
• Understanding when and how the Empirical
Rule should be used
• Understanding the idea of outliers
• Understanding the need for calculators and
statistical software
• Understanding the basic idea about statistical
quality control
4Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
5Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
• Variable is the measurable characteristic of an entity.
Exercise 2.60:
An office supply company does a third of its business
supplying local government and school districts. This
business is done by competitive bids. Each potential sale
requires a clerk to prepare a bid form. The firm had no
real idea of how much effort the bid preparation required,
so the bid clerk was asked to record the start and stop
times for a sample of 65 bids. The data were recorded
two ways: minutes spent per bid (MINPRBID in the output
below) and bids per hour BIDPERHR = 60/MINPRBID.
6Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
CASE MINPRBID BIDPERHR
1 155.000 0.387. . .. . . . . . 65 126.000 0.476
• MNPRBID is a .
• BIDPERHR is a .
variable
variable
7Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
Example: Monthly percentage returns for IBM
from November 2000 through
September 2003 is a .
• Both examples illustrate data as opposed
to data.
• Exercise 2.60 illustrates data.
• The RIBM example illustrates data.
• The R^DJI and RIBM are variables.
• The use of the values 1, 2 and 3 to designate a voter’s
political affiliation (Independent, Democrat, Republican)
is a variable.
variable
observational
experimental
cross-sectional
time-series
quantitative
qualitative
8Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
• Descriptive Statistics vs. Inferential Statistics
• Descriptive Statistics - Data summarization
• Inferential Statistics - Use of sample data to make
inferences about a population
parameter.
• Population: the collection of objects upon which
measurements could be taken.
• Sample: a subset of the population.
• An example illustrating how Descriptive Statistics assists
in the Inferential Statistics process follows.
9Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Introduction
Example: A poll to determine the percentage favoring a national policy
• Using the sample percentage to make inference about the unknownpopulation percentage.
• Using numerical and graphical tools to summarize the sample of 500 people chosen.
• Treat data as being a sample from a population.
Population
Parameter of Interest:
Percentage of people
favoring the national
policy = ?
Sample
Of 500 people randomly
chosen, 300 support the
policy.
Sample Percentage:
300/500
[Inferential statistics]
[Descriptive statistics]
10Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.1
The Distribution of Values of a Variable
(Graphical Procedures)
11Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
Example: Monthly percentage returns for IBM and ^DJI.
• A partial listing of RIBM and R^DJI
Y(35)10.60Y35-1.49Y(35)35.39Y357.71Sep-03
Y(34)8.67Y341.97Y(34)31.77Y341.14Aug-03
Y(33)8.56Y332.76Y(33)19.71Y33-1.52Jul-03
Y(32)6.11Y321.53Y(32)17.83Y32-6.28Jun-03
Y(31)5.94Y314.37Y(31)10.31Y313.89May-03
Y(30)4.37Y306.11Y(30)8.24Y308.24Apr-03
Y(29)3.59Y291.28Y(29)7.71Y290.62Mar-03
………………………
………………………
Y(6)-5.48Y68.67Y(6)-10.50Y619.71Apr-01
Y(5)-5.87Y5-5.87Y(5)-10.70Y5-3.72Mar-01
Y(4)-6.23Y4-3.60Y(4)-10.81Y4-10.70Feb-01
Y(3)-6.87Y30.92Y(3)-10.84Y331.77Jan-00
Y(2)-11.08Y23.59Y(2)-19.46Y2-9.10Dec-00
Y(1)-12.37Y1-5.07Y(1)-22.65Y1-4.95Nov-00
Ordered
Sorted
R^DJI(%)DataR^DJI(%)Ordered
Sorted
RIBM(%)DataRIBM(%)Month
Monthly Percentage Returns for IBM and ^DJI
12Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• The RIBM and R^DJI will be used to illustrate the
graphical and numerical procedures that follow.
• Histogram: a graphical display of quantitative data.
• Horizontal axis shows intervals or classes (adjacent,
mutually exclusive).
• Vertical axis shows frequency or relative frequency
per class.
13Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Side-by-side histograms
Frequency
32241680-8-16-24
20
15
10
5
0
32241680-8-16-24
R^DJI RIBM
Side-by-Side Histograms of R^DJI and RIBM
14Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Is there a “typical” return for either ^DJI or IBM?
Which intervals have the greatest frequency?
• Which return has more variability?
• Are the histograms “mound-shaped”?
15Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Procedure to obtain side-by-side histograms using Minitab:
� Click on Graph � Histograms � Simple
� Click on “OK”
� In “Graph Variables” box, enter desired variables: R^DJI and RIBM
� Click on “Labels”, enter title
� Click on “OK”
� Click on “Multiple Graphs”
� Under “Show Graph Variables”, select “In separate panels of the same graph”
� Under “Same Scales for Graphs”, select “Same Y” and “Same X including bins”
� Click on “OK”
16Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Superimposed histograms facilitate comparison
of R^DJI and RIBM.
Data
Frequency
32241680-8-16-24
20
15
10
5
0
Variable
R^DJI
RIBM
Superimposed Histograms of RIBM(%) and R^DJI(%)
• For what interval(s), did R^DJI dominate RIBM?
17Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Procedure to obtain superimposed histograms using Minitab:
� Click on Graph � Histograms �With Outline and Graph
� Click “OK”
� In “Graph Variables” box, enter desired variables: R^DJI and RIBM
� Click on “Labels”, enter title
� Click on “OK”
� Click on “Multiple Graphs”
� Under “Show Graph Variables”, select “Overlaid on the same graph”
� Click on “OK”
18Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Stem-and-Leaf Diagram
• Truncate data if necessary.
• Separate each truncated value into a stem
component and a leaf component.
• Leaves are displayed individually in ascending order
for each stem.
19Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
Example: Monthly percentage returns for IBM.
• In general, the first data value is placed in the stem and the second
digit is the leaf.
• Sometimes, the first two digits go in the stem.
Example: Suppose the data range from 170 to 240.
• For a reasonable number of groups, we sometimes split the stem.
Example: Suppose the data range from 20 to 40.
……
……
-10-10.8393
-19-19.4639
-22-22.6463
Truncated ValuesRIBM
20Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
Example: Monthly percentage returns for IBM and ^DJI.
1) RIBM
Stem-and-leaf of RIBM N = 35
Leaf Unit = 1.0
1 -2 2
2 -1 9
6 -1 0000
11 -0 98876
(8) -0 44332210
16 0 001134
10 0 67778
5 1 0
4 1 79
2 2
2 2
2 3 1
1 3 5
21Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
2) R^DJIStem-and-leaf of R^DJI N = 35
Leaf Unit = 1.0
1 -1 2
2 -1 1
2 -0
4 -0 66
9 -0 55554
13 -0 3332
17 -0 1100
(8) 0 00111111
10 0 2223
6 0 45
4 0 6
3 0 88
1 1 0
• The stem-and-leaf diagram is a histogram turned on its side with more refinement.
22Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Procedure to obtain stem-and-leaf plot using Minitab:
� Click on Graph � Stem-and-Leaf
� In “Graph Variables” box, enter R^DJI and RIBM
� Click on “OK”
23Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• The boxplot, another graphical procedure, is
presented as part of Section 2.4.
24Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.1 The Distribution of Values of a Variable
(Graphical Procedures)
• Graphical displays can answer the following:
• Shape of the distribution (symmetric or skewed)
• Existence of a “typical value”
• Degree of variation
• Presence of outliers
25Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.2
Two-Variable Summaries
26Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• Both Variables are QualitativeExercise 9.41:
A personnel director for a large, research-oriented firm categorizes colleges and universities as most desirable, good, adequate, and undesirable for purposes of hiring their graduates. Data are collected on 156 recent graduates and each is rated by a supervisor as outstanding, average or poor. It has been suggested that the school type makes a difference in the rating decision of the supervisor.
• The data are presented in a two-way frequency table or cross-tabulation table:
Supervisor’s Rating
School Type Outstanding Average Poor
Most Desirable 21 25 2
Good 20 36 10
Adequate 4 14 7
Undesirable 3 8 6
• Both factors or variables, “school type” and “rating” are qualitative.
27Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• Several approaches can be used to determine if there is a relationship between school type and rating.
• One approach is to look at the row percentages for each type of school. These are shown below.
Outstanding Average Poor Total
Most 21 25 2 4843.8 52.1 4.2 30.77
Good 20 36 10 6630.3 54.5 15.2 42.31
Adequate 4 14 7 2516.0 56.0 28.0 16.03
Undesirable 3 8 6 1717.6 47.1 35.3 10.90
Total 48 83 25 15630.77 53.21 16.03 100.00
• There appears to be a relation between the two factors since there is a tendency for “outstanding” percentage to decrease and the “poor”percentage to increase as one moves from the “Most Desirable” to the “Undesirable” school types.
• This is shown in the Excel “100% Stacked Column” graph that follows.
28Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• The blue columns (Outstanding Rating) increase as one goes from Undesirable to Most Desirable type of school from which to recruit. The white columns (Poor Rating) decrease as one goes from Undesirable to Most Desirable type of school.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Undesirable Adequate Good Most Desirable
Outstanding Average Poor
29Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• Another approach is to look at the column percentages
for each type of rating.
• These column percentages can be viewed as “given that
a person has received a specific rating, what is the
chance that the individual graduated from a certain type
of school?”
• This is shown in the Excel “100% Stacked Column” graph
that follows.
30Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• The green columns (Most Desirable Type of School) decrease as one goes from Outstanding to Poor supervisory rating. The blue columns (Undesirable Type of School) increase as one goes from Outstanding to Poor supervisory rating.
0%
20%
40%
60%
80%
100%
Outstanding Average Poor
Undesirable Adequate Good Most Desirable
31Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• Another perspective is obtained by using Excel’s “3-D Column” graph.
• Average ratings are greatest for graduates from “Good” schools.
• Outstanding ratings are greatest for graduates from “Most Desirable”schools.
• This problem will be revisited and reanalyzed in Chapter 9.
Undesirable
Adequate
Good
Most Desirable
Outstanding
Average
Poor
0
5
10
15
20
25
30
35
40
Outstanding
Average
Poor
32Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.2 Two-Variable Summaries
• Both Variables are Quantitative
• Suppose one variable is labeled X and the other Y. A scatterplot is a
graph of the (X,Y) pairs and is used to assess the simultaneous
behavior of two quantitative variables.
• The scatterplot was introduced in Chapter 1 to assess if the RIBM
(the Y variable) increase, stay constant, or decrease as R^DJI (the X
variable) increase. The scatterplot follows.
• It is seen that as the R^DJI increases, the RIBM increases.
R ^ D JI
RIBM
1 050- 5- 1 0- 1 5
4 0
3 0
2 0
1 0
0
- 1 0
- 2 0
- 3 0
S c a t t e r p l o t o f R IB M v s . R ^ D J I
33Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.3
On the Average: Typical Values
(Numerical Methods for Summarizing Data)
34Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
• Data: y1, y2, ..., yn (n denotes size of sample)
Measures of Central Tendency or Location
• Sample Mean
∑=
=n
i
iyny1
)/1(
35Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
Example: Monthly percentage returns for IBM and ^DJI.
Descriptive Statistics: RIBM, R^DJI
Variable N N* Mean SE Mean TrMean StDev Variance Minimum Q1
RIBM 35 0 0.437 2.08 -0.315 12.30 151.21 -22.65 -8.24
R^DJI 35 0 -0.341 0.894 -0.251 5.287 27.948 -12.369 -4.399
Variable Median Q3 Maximum Range IQR
RIBM -1.52 7.09 35.39 58.03 15.33
R^DJI 0.194 2.764 10.605 22.973 7.164
• RIBM = 0.437
• R^DJI = -0.341
y
y
Using only the mean return, IBM is
the preferable investment.
[Past returns may not reflect future returns.]
36Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
• Procedure to obtain descriptive statistics using Minitab:
� Click on Stat � Basic Statistics � Display Descriptive Statistics
� In “Variables” box, enter R^DJI and RIBM
� Click on “Statistics”
� Select additional statistics of interest, such as Interquartile Range
� Click on “OK”
37Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
• Trimmed Mean (TRMEAN)
• Trim off the largest 5%, for example, and the smallest
5% of the observations and then calculate the sample
mean of the remaining 90% of the data values.
• Purpose: Minimize the effect of unusual observations.
Example: Monthly percentage returns for IBM and ^DJI.
• Trimmed meanRIBM = -0.315
• Trimmed meanR^DJI = -0.251
• The trimmed mean for IBM differs greatly from = 0.437
because of the two very high returns for IBM
y
38Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
• Sample Median
The middle value after the data are arranged in
ascending order: y(1), y(2), ..., y(n)
. ,
,
even is n if yy
odd; is n if yMedian
+=
=
+
+
1 2
n
2
n
2
1n
2
1
39Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
Example: Monthly percentage returns for IBM and ^DJI.
n = 35 ⇒ (n+1)/2 = 18
Median = y(18) {18th ordered observation}
• For RIBM: y(18) = -1.52
• For R^DJI: y(18) = 0.19
Using only the median return, ^DJI
is the preferable investment.
40Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.3 On the Average: Typical Values
(Numerical Methods for Summarizing Data)
• Sample Mode
Don’t use as a measure of central tendency.
41Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.4
Measuring Variability
42Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Sample Range (denoted by R)
R = Largest observation
- Smallest observation
Example: Monthly percentage returns for IBM and ^DJI
• RIBM: R = 35.39 – ( - 22.65) = 58.03
• R^DJI: R = 10.60 – ( - 12.37) = 22.97
• Primary area of application: Statistical Process Control
• “The range is very sensitive to outliers …” (H, O & G)
• “... as the sample size increases, the range tends to
increase …” (H, O & G)
43Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Sample Variance (denoted by s2)
• Preliminary concept: How do you measure distance
between two points a and b on the real number line?
a b
• ( )1
1n =
= −− ∑
22
1
n
i
i
s y y
Measure of distance: 2( )b a−
Measures squared distance
between each observation and
the sample mean
44Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• The divisor (n – 1) is the “degrees of freedom”.
Example: Let n=3. Suppose we have three data values.
The building blocks of s2 are deviations:
• There is one constraint on these deviations: Σ(yi - ) = 0.
• For this example, the degrees of freedom = 3-1 = 2.• You have freedom to specify any 2 of the 3 deviations.
• Once you specify any two of the deviations, the third deviation has to be a value so that all deviations add to 0.
• In general, the degrees of freedom = n -1.
=⇒=== yyyy ,, 321 7 23 4
=−=−=− yyyyyy 321 ,,3 -1 -2
y
45Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Sample Variance
s2RIBM = 151.21 s2R^DJI = 27.948
• Sample Standard Deviation
sRIBM = 12.30% sR^DJI = 5.287%
Using only the standard deviation as the criterion,
^DJI is the preferable investment.
2s s=
46Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Linking the histogram with the sample mean and sample standard deviation.
• Empirical Rule:
For a set of measurements having a mound-shaped histogram, the interval
• The approximation may be poor if the data are severely
skewed or bimodal, or contain outliers.
.3
;%952
;%681
tsmeasurementheofallelyapproximatcontainssy
tsmeasurementheofelyapproximatcontainssy
tsmeasurementheofelyapproximatcontainssy
±
±
±
47Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
Example: Monthly percentage returns for IBM and ^DJI.
• For the RIBM and R^DJI data, determine the intervals
and .
• Then determine the actual percentage of observations
within each of these intervals.
• How do these results compare with the percentages
specified by the Empirical Rule?
• Do the results correspond or disagree?
• State the reason for your answer.
sy ± sy 2±
48Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
RIBM Actual Percentage E.R.Percentage Percentage Difference
s = [- 11.86%, 12.74%] 29/35 = 82.9% 68% 21.9%
2s = [- 24.16%, 25.04%] 33/35 = 94.3% 95% -0.74%
R^DJI Actual Percentage E.R.Percentage Percentage Difference
s = [- 5.63%, 4.95%] 25/35 = 71.4% 68% 5.04%
2s = [- 10.92%, 10.23%] 32/35 = 91.4% 95% -3.75%
±y±y
±y±y
49Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• For the RIBM data set, the actual percentage within one standard deviation of the mean does not correspond with the value specified by the Empirical Rule. Look at the histogram for RIBM. This is not a mound-shaped histogram. We wouldn’t expect the E.R. to work well. Actually, it should not even be applied to the RIBM data. However, the correspondence between the actual percentage and that specified by the E.R. is fairly good for the two standard deviation interval, even though the histogram shows the E.R. should not have been used. The message here is that the E.R. can give results close to the actual percentage even though it should not have been used.
• If the actual and E.R. percentages do not correspond, this is a signal that the histogram is not mound-shaped.
• For the ^DJI data set, the actual percentage and that specified by the E.R. are closer as would be expected, because the histogram for R^DJI is closer to being mound-shaped.
• A fuller explanation of the E.R. is provided in Chapter 5.
50Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Quartiles• Separate data into 4 sections
• First Quartile = Q1 = y(k) , where k = (n + 1) / 4
• Third Quartile = Q3 = y(3k)
Example: Monthly percentage returns for IBM and ^DJI.
k = (35+1)/4 = 9 ⇒ Q1 = y(9), Q3 = y(27)
• How are the quartiles used to measure variability?
2.7647.09Q3
-4.399-8.24Q1
R^DJIRIBM
51Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Interquartile Range (IQR)
IQR = Q3 – Q1
Example: Monthly percentage returns for IBM and ^DJI.
For RIBM, IQR = 7.09 – (- 8.24) = 15.33
For R^DJI, IQR = 2.764 – (- 4.399) = 7.164
• Does the value of the IQR change if the smallest
(largest) observation get smaller (larger)?
• The IQR is used in the boxplot.
52Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• The boxplot uses 5 numbers to represent the
data distribution.
I N T E R Q U A R T I L E R A N G E
L E F T -H A N DW H I S K E R
R I G H T -H A N DW H I S K E R
Q 1
( 2 5 t h P E R C E N T I L E )
Q 3
( 7 5 t h P E R C E N T I L E )
OUTLIER
Extends to the
smallest observation
that is not an outlier
Extends to the
largest observation
that is not an outlier
MEDIAN
53Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
Example: Monthly percentage returns for IBM and ^DJI.
• RIBM has greater variation since its IQR is wider.
Data
RIBMR^DJI
40
30
20
10
0
-10
-20
-30
Side-by-Side Boxplots of R^DJI and RIBM
54Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Procedure to obtain side-by-side boxplots using Minitab:
� Click on Graph � Boxplot � Multiple Y’s (Simple)
� Click on “OK”
� In “Graph variables” box, enter columns where data is stored
� Select “Labels” and enter title
� Select “Data View” and choose desired options, such as “Median Symbol” and “Median Connect Line”
� Click on “OK”
55Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Determination of outliers and serious outliers.
• An outlier is an observation outside the inner fences.
• Lower Inner Fence (LIF) = Q1 – (1.5)(IQR)
• Upper Inner Fence (UIF) = Q3 + (1.5)(IQR)
Example: Monthly percentage returns for IBM and ^DJI.
For RIBM: LIF = -8.24 – (1.5)(15.33) = -31.235
UIF = 7.09 + (1.5)(15.33) = 30.085
For R^DJI: LIF = -4.399 – (1.5)(7.164) = -15.145
UIF = 2.764 + (1.5)(7.164) = 13.51
• A serious outlier is an observation outside the outer fences.
• For outer fences, replace (1.5) by (3.0).
56Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.4 Measuring Variability
• Boxplot of RIBM and R^DJI with inner fences:
• RIBM has 2 outliers.
Data
RIBMR^DJI
40
30
20
10
0
-10
-20
-30
Side-by-Side Boxplots of R^DJI and RIBM
57Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.5
Calculators and Statistical Software
58Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.5 Calculators and Statistical Software
• Facilitates the statistical analysis
• Software falls in two categories:• Dedicated statistical software, such as Minitab®,
SPSS®, and SAS®.
• Spreadsheet software.
• Should one use a spreadsheet or statistical software?
• The interested reader should refer to the following web sites:www.amstat.org/education/ASA_endorsement.html in the section titled “Support”
www-unix.oit.umass.edu/~evagold/excel.html
www.seismo.unr.edu//ftp/pub/updates/louie/mccullough.pdf
59Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Section 2.6
Statistical Methods
and Quality Improvement
60Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.6 Statistical Methods
and Quality Improvement
• History of Quality
• 1931 – Economic Control of Quality of Manufactured Product,
Walter Shewart
• Concept of statistical variation was unveiled
• Process control charts were introduced
• 1985 – Out of the Crisis,
Dr. W. Edwards Deming
• “14 Points for Management” were established
61Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.6 Statistical Methods
and Quality Improvement
• Variation is part of any process
• Two types of variation:
• Common cause
• Inherent to every system
• Accounts for 80-90% of observed variation
• Special cause
• External sources
• Not inherent to system
• Accounts for 10-20% of observed variation
• Control chart is used to distinguish between common
cause and special cause variation.
62Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.6 Statistical Methods
and Quality Improvement
• Generic Control Chart
0 1 2 3 4 5
Value of Some Statistic
UCL
CL
LCL
TimeUCL – Upper
Control Limit
CL – Center
Line
LCL – Lower
Control Limit
• If the statistic plots outside the control limits, this
denotes special cause variation may be present.
• It could also be a false alarm.
Common Cause
Variation
Special Cause Variation
63Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Keywords: Chapter 2
• Descriptive Statistics
• Inferential Statistics
• Relative Frequency
• Histogram
• Mound-shaped
• Stem-and-Leaf Graph
• Boxplot
• Outlier
• Median
• Trimmed Mean
• Mean
• Variance
• Standard Deviation
• Empirical Rule
• Interquartile Range
• Control Chart
64Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Summary of Chapter 2
• The different ways of classifying data
• The difference between descriptive statistics and inferential statistics
• Graphing data to understand how the values are distributed
• The rationale underlying the mean, trimmed mean and median as measures of central tendency
• The rationale underlying the standard deviation and interquartile range as measures of variability
65Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Summary of Chapter 2
• How the Empirical Rule links the histogram with
the mean and standard deviation
• An objective criterion used to determine outliers
• The need for calculators and statistical software
in analyzing data
• The difference between common and special
causes in statistical quality control
• How control charts distinguish between special
and common causes
66Hildebrand, Ott & Gray, Basic Statistical Ideas for Managers, 2nd edition, Chapter 2Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Summary of Chapter 2
Graphical and Numerical Methods for Summarizing Data
Graphical Methods
• Histogram
• Stem-and-leaf Plot
• Boxplot
Objective method for
determing outliers:
Inner fences
Q1 – (1.5)(IQR)
Q3 + (1.5)(IQR)
Numerical Methods
Measures of central tendency
• Sample mean ( )
• Trimmed mean
• Median
Measures of dispersion
• Sample standard deviation (s)
• IQR (Q3 – Q1)
yEmpirical R
ule
Empirical Rule