Chapter 01 - Introduction and Descriptive Statistics

11

Click here to load reader

Transcript of Chapter 01 - Introduction and Descriptive Statistics

Page 1: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 1/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

STATISTICS FOR BUSINESS [IUBA]  

CHAPTER 01

INTRODUCTION AND DESCRIPTIVE STATISTICS

1. SAMPLES AND POPULATIONS

  Population consists of the set of all

measurements in which the

investigator is interested.

  Sample is a subset of measurements

selected from the population.

  Random sample is a sample selected

in the way that sampling from the

population is often done randomly,

such that every possible sample of n 

elements will have an equal chance

of being selected.

2. PERCENTILES AND QUARTILES

  Percentiles: The th percentile of a group of numbers is that value below which lie % 

( percent) of the numbers in the group. The position of the th percentile is given by

( + )/, where  is the number of data points.

  Quartiles: The percentage points that break down the data set into quarters—first

quarter, second quarter, third quarter, and fourth quarter.

+ The 1st 

 quartile/lower quartile is the 25th

 percentile.

+ The median is the 50th

 percentile.

+ The 3rd 

 quartile/lower quartile is the 75th

 percentile.

Page 2: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 2/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

+ Interquartile Range = 3rd 

 Quartile – 1st 

 Quartile

= Upper Quartile – Lower Quartile

= 75

th

 Quartile – 25

th

 Quartile

+ Range = Largest Observation – Smallest Observation

Example The following data are numbers of passengers on  flights of Delta Air Lines between San

Francisco and Seattle over 33 days in April and early May.

128, 121, 134, 136, 136, 118, 123, 109, 120, 116, 125, 128, 121, 129, 130,

131, 127, 119, 114, 134, 110, 136, 134, 125, 128, 123, 128, 133, 132, 136,

134, 129, 132

Find the lower, middle, and upper quartiles of this data set. Also  find the 10th, 15th, and

65th percentiles. What is the interquartile range?

(Hint : Use ( + )/ )Solution Firstly, let’s order the data from smallest to largest

109, 110, 114, 116, 118, 119, 120, 121, 121, 123,

123, 125, 125, 127,128, 128, 128, 128, 129, 129,

130, 131, 132, 132, 133, 134, 134, 134, 134, 136,

136, 136, 136

n=33   The lower quartile is the observation in position

(33 + 1)25/ 100 = 8.5, which is 121.

  The middle quartile (median) is the observation in position

(33 + 1)50/ 100 = 17, which is 128.

  The upper quartile is the observation in position

(33 + 1)75/100 = 25.5, which is 133.5.

  The 10th percentile is the observation in position

(33 + 1)10/100 = 3.4, which is 114+(116− 114)(0.4) = 114.8.

  The 15th percentile is the observation in position

(33 + 1)15/100 = 5.1, which is 118+(119

−118)(0.1) = 118.1.

 The 65th percentile is the observation in position(33 + 1)65/100 = 22.1, which is 131 +(132− 131)(0.1) = 131.1.

  The interquartile range is equal to

Third quartile – First quartile = 133.5− 121 = 12.5 

Page 3: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 3/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

3. MODE/MEAN/VARIANCE/STANDARD DEVIATION

Sample Population

Mode

The mode of the data setis the value that occurs

most frequently.

MeanThe mean of a set of

observations is their

average.

=    / n  µ =  

  / N 

VarianceThe variance of a set of

observations is the

average squared

deviation of the data

points from their mean.

s =(x − x)

  / (n− 1)  σ =(x − µ)

  / N 

Standard DeviationThe standard deviation

of a set of observations

is the (positive) square

root of the variance of

the set.

s = s = (x − x)   / (n− 1)  σ = σ = (x − µ)

  / N 

Page 4: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 4/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

CALCULATOR INSTRUCTIONS FOR STATISTICS

Note: This page is only relevant for CASIO scientif ic calculator FX-570ES

Computing Mean and Standard Deviation of Sample / Population.

(Chapter 01 | Introduction and Descriptive Statistics)

Step 01: Press MODE + 3: STAT 

Step 02: Press 1: 1 – VAR 

Step 03: Input the data

Step 04: Press SHIFT + 1 [STAT] 

Step 05: Press 5: VAR 

Step 06:

Press 2:   to compute the sample mean or population mean

Press 3:   to compute the population standard deviation

Press 4: −  to compute the sample standard deviation

Page 5: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 5/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

Example Case of population Case of sample

The future Euroyen is the price of the

 Japanese yen as traded in the European

 futures market. The following are 30-day

Euroyen prices on an index from 0 to100%:

99.24, 99.37, 98.33, 98.91, 98.51, 99.38,

99.71, 99.21, 98.63, 99.10.

Find the mean, standard deviation, and

variance, viewed as a population.

The daily expenditure on food by a

traveler, in dollars in summer 2006, was as

 follows:

17.5, 17.6, 18.3, 17.9, 17.4, 16.9, 17.1,17.1, 18.0, 17.2, 18.3, 17.8, 17.1, 18.3,

17.5, 17.4.

Find the mean, standard deviation, and

variance.

(Hint: Use the calculator) (Hint : Use the calculator)

Solution It is not necessary to order the data f rom smallest to largest in bot h cases

Step 01   Press MODE + 3: STAT    Press MODE + 3: STAT 

Step 02   Press 1: 1 – VAR    Press 1: 1 – VAR 

Step 03   Input the data   Input the data

Step 04   Press SHIFT + 1 [STAT]    Press SHIFT + 1 [STAT] 

Step 05   Press 5: VAR    Press 5: VAR 

Step 06   Press 2:  to compute the

population mean

  Press 2:  to compute the sample

mean

The result we can get is 99.039  The result we can get is 17.588 

  Press 3:   to compute the

population standard deviation

  The result we can get is 0.414 

  Press 4: −  to compute the

sample standard deviation

  The result we can get is 0.466 

Finally, to compute the

population variance, we use

the following formula: = (0.414) ≈ 0.172 

Finally, to compute the sample

variance, we use the following

formula: = (0.466) ≈ 0.217 

Conclusion   Population mean = 99.039 

  Population standard deviation = 0.414 

  Population variance

= 0.172 

  Sample mean̅ = 17.588 

  Population standard deviation = 0.466 

  Population variance

= 0.217 

Page 6: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 6/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

4. CHEBYSHEV’S THEOREM AND THE EMPIRICAL RULE

Chebychev’s Theorem

No condition: The Chebychev’s theorem can apply in any case.

1.   At least three-quarters of the observations in a set will lie within 2 standard

deviations of the mean.

2.   At least eight-ninths of the observations in a set will lie within 3 standard deviations

of the mean.

PROCEDURE OF CHEBYCHEV’S THEOREM

  STEP 01: Determine the sample mean (

) and the sample standard deviation (

 STEP 02: Choose the rule of Chebyshev’s theorem and determine the value of  

  STEP 03: Calculate the interval ±   STEP 04: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range

by the total number of observations in the data set)

  STEP 05: Draw a conclusion

Page 7: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 7/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

Empirical Rule

Condition: The empirical rule can apply if the distribution of the data is mound-shaped—

that is, if the histogram of the data is more or less symmetric with a single mode or high

point.

1.   Approximately 68% of the observations will be within 1 standard deviation of the mean.

2.   Approximately 95% of the observations will be within 2 standard deviations of the mean.

3.   A vast majority of the observations (all, or almost all) will be within 3 standard

deviations of the mean.

PROCEDURE OF THE EMPIRICAL RULE

  STEP 01: Draw the histogram of the data and check the condition that the

distribution of the data is mound-shaped  If the distribution of the data is mound-shaped, follow the next five steps.

  If not, do nothing more.

  STEP 02: Determine the sample mean () and the sample standard deviation () 

  STEP 03: Choose the rule of the Empirical Rule and determine the value of  

  STEP 04: Calculate the interval ±   STEP 05: Determine the percentage of observations lying into the specified range̅± (Divide the number of observations lying into the specified range

by the total number of observations in the data set)

  STEP 06: Draw a conclusion.

Page 8: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 8/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

Example Check the applicability of Chebyshev’s theorem and the empirical rule for the following

data set

12.5, 13, 14.8, 11, 16.7, 9, 8.3,   1.2, 3.9, 15.5, 16.2, 18, 11.6, 10, 9.5

SolutionChebyshev’s Theorem:

We found that:

  the sample mean = . 

  the sample standard deviation = . 

According to rule 1 of Chebyshev’s Theorem, the value of =   and the interval± = .±×.= [., .] From the data set itself, we see that there are 14 of 15 observations in the set,

⁄  ≈

.

=

.

% are within the specified range, so the rule that at least

three-quarters will be within range is satisfied.

The Empi rical Rule:

Since the distribution of the data is not mound-shaped, the empirical rule cannot apply.

Page 9: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 9/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

5. BOX PLOT

  Introduction

+ A box plot  (also called a box-and-whisker plot) is another way of looking at a data setin an effort to determine its central tendency, spread, skewness, and the existence of

outliers

+ A box plot is a set of five summary measures of the distribution of the data: 

1.  The median of the data

2.  The lower quartile

3.  The upper quartile

4.  The smallest observation

5.  The largest observation

  The elements of a box plot

-  The median is marked as a vertical line across the box.

-  The hinges  of the box are the upper and lower quartiles (the rightmost and

leftmost sides of the box).

-  The interquartile range (IQR) is the distance from the upper quartile to the lower

quartile (the length of the box from hinge to hinge): = −  -  The inner fence as a point at a distance of .() above the upper quartile;

similarly, the lower inner fence is Q  −  1.5(IQR). 

-  The outer fences are defined similarly but are at a distance of () above or

below the appropriate hinge.

Page 10: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 10/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

10

THE ELEMENTS OF A BOX PLOT

  Box plots are very useful for the following purposes.

1.  To identify the location of a data set based on the median.2.  To identify the spread of the data based on the length of the box, hinge to hinge (the

interquartile range), and the length of the whiskers (the range of the data without extreme

observations: outliers or suspected outliers).

3.  To identify possible skewness of the distribution of the data set. If the portion of the box to the

right of the median is longer than the portion to the left of the median, and/or the right whisker

is longer than the left whisker, the data are right-skewed. Similarly, a longer left side of the box

and/or left whisker implies a left-skewed data set. If the box and whiskers are symmetric, the

data are symmetrically distributed with no skewness.

4.  To identify suspected outliers (observations beyond the inner fences but within the outer fences)

and outliers (points beyond the outer fences).

5.  To compare two or more data sets. By drawing a box plot for each data set and displaying thebox plots on the same scale, we can compare several data sets.

Page 11: Chapter 01 - Introduction and Descriptive Statistics

8/13/2019 Chapter 01 - Introduction and Descriptive Statistics

http://slidepdf.com/reader/full/chapter-01-introduction-and-descriptive-statistics 11/11

International University  IU 

Powered by statisticsforbusinessiuba.blogspot.com    S   t   a   t   i   s   t   i   c   s    f   o   r   B   u   s   i   n   e   s   s

    |

   C    h   a   p   t   e   r   0   1  :   I   n   t   r   o    d   u   c   t   i   o   n   a

   n    d

   D   e   s   c   r   i   p   t   i   v   e   S   t   a   t   i   s   t   i   c   s

11

Example Construct a box plot for the following data set

5, 8, 6, 9, 17, 24, 10, 5, 6, 13, 5, 3, 6, 12, 11, 10, 9, 10, 14, 15

Solution Let’s order the data from smallest to largest

3, 5, 5, 5, 6, 6, 6, 8, 9, 9, 10, 10, 10, 11, 12, 13, 14, 15, 17, 24

= 20 

The median is the observation in position (20 + 1)50/ 100 = 10.5, which is 9.5.

The lower quartile is the observation in position (20 + 1)25/100 = 5.25, which is 6.

The upper quartile is the observation in position (20 + 1)75/100 = 15.75, which is 12.75.

The smallest observation is 3.

The largest observation is 24.

Table 1

Smallest

Observation

Lower

Quartile Median

Upper

Quartile

Largest

ObservationPosition 5.25  10.5  15.75 

Observation 3  6  9.5  12.75  24 

IQR = Upper Quartile – Lower Quartile = 12.75 – 6 = 6.75 

Lower Inner Fence= Q − 1.5(IQR) = 6− 10.125=−4.125 

Upper Inner Fence= Q +1.5(IQR) = 12.75+10.125 = 22.875 

Lower Outer Fence= Q − 3(IQR) = 6− 20.25=−14.25 

Upper Outer Fence= Q +3(IQR) =12.75+20.25 = 33 

Table 2

Lower Outer

Fence

Lower Inner

FenceMedian

Upper Inner

Fence

Upper Outer

Fence

Q − 3(IQR)  Q − 1.5(IQR) 9.5 

Q +1.5(IQR)  +3(IQR) −1 .25  −4.125  22.875  33 

Box Plot

Conclusion:

Based on the box plot, we can see that the distribution of the data is relatively symmetric.

And there is one suspected outlier, 24.