BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

35
BCOR 1020 Business Statistics Lecture 4 – January 29, 2008

Transcript of BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Page 1: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

BCOR 1020Business Statistics

Lecture 4 – January 29, 2008

Page 2: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Overview

• Chapter 4 – Descriptive Statistics…– Numerical Description– Central Tendency– Dispersion

Page 3: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

Population (Size = N): Characterized by Parameterse.g., = pop. Mean, = pop. Std. dev.

Sample (Size = n): Statistics are computed and estimate parameters

e.g., = sample mean, S = sample std. dev.X

Recall:• Statistics are descriptive measures derived from a sample (n items).

• Parameters are descriptive measures derived from a population (N items).

Page 4: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

There are three key characteristics of numerical data:CharacteristicCharacteristic InterpretationInterpretation

Central TendencyCentral Tendency Where are the data values concentrated? Where are the data values concentrated? What seem to be typical or middle data values?What seem to be typical or middle data values?

DispersionDispersion How much variation is there in the data? How much variation is there in the data? How spread out are the data values? How spread out are the data values? Are there unusual values?Are there unusual values?

ShapeShape Are the data values distributed symmetrically? Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?Skewed? Sharply peaked? Flat? Bimodal?

Page 5: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

Example: Vehicle Quality• Consider the data set of vehicle defect rates from J. D.

Power and Associates.

• Numerical statistics can be used to summarize this random sample of brands.

• Must allow for sampling error since the analysis is based on sampling.

• Defect rate = total no. defectsno. inspected

x 100

Page 6: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

• Number of defects per 100 vehicles, 2004 models.

Page 7: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

• Sorted data provides insight into central tendency and dispersion.

Page 8: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description

Visual Displays:• The dot plot offers a visual impression of the data.

• Histograms with 5 bins (suggested by Sturges’ Rule) and 10 bins are shown below.

• Both are symmetric with no extreme values and show a modal class toward the low end.

Page 9: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical Description• We can compute descriptive statistics using Excel and

discuss measures of central tendency and dispersion…– Figures 4.4 and 4.5 in your text details the Excel menus for

computing descriptive statistics.– Figure 4.7 in your text details the MegaStat menus for computing

descriptive statistics.

Page 10: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Numerical DescriptionMegaStat output…

Page 11: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency• The central tendency is the middle or typical

values of a distribution. • Central tendency can be assessed using a dot

plot, histogram or more precisely with numerical statistics.

• The Text presents six measures of central tendency…– Mean – Median– Mode – Midrange– Geometric Mean (G) – Trimmed Mean

• The mean and median are the most frequently used, but we will discuss the merits of all six.

Page 12: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Mean –

• A familiar measure of central tendency.• In Excel, use function =AVERAGE(Data) where Data is

an array of data values.• For the sample of n = 37 car brands:

Population Formula Sample Formula

1

N

ii

x

N

1

n

ii

xx

n

1 87 93 98 ... 159 164 173 4639125.38

37 37

n

ii

xx

n

Page 13: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Characteristics of the Mean:• Arithmetic mean is the most familiar average.• Affected by every sample item.• The balancing point or fulcrum for the data.

• Regardless of the shape of the distribution, distances from the mean to the data points always sum to zero.

1

( ) 0n

ii

x x

Page 14: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central TendencyMedian (M) – the 50th percentile or midpoint of the sorted sample data.• Use Excel’s function =MEDIAN(Data) where Data is an

array of data values.• M separates the upper and lower half of the sorted

observations.– If n is even, the median is the average of the middle two

observations in the data array.– If n is odd, the median is the middle observation in the data

array.

Page 15: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Median:• To compute the median by hand, sort the n observations To compute the median by hand, sort the n observations

in the data: in the data:

nxxxx ,...,,, 321

For even For even nn,, Median = Median = / 2 ( / 2 1)

2n nx x

For odd For odd nn,, Median = Median = ( 1) / 2nx

where nxxxx ...321

Page 16: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Example:• Consider the following n = 6 data values:

11 12 15 17 21 32• What is the median?

M = (x3+x4)/2 = (15+17)/2 = 16

For even For even nn, Median = , Median = / 2 ( / 2 1)

2n nx x

nn/2 = 6/2 = 3 and /2 = 6/2 = 3 and nn/2+1 = 6/2 + 1 = 4/2+1 = 6/2 + 1 = 4

Page 17: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Clickers

Consider the following n = 7 data values:12 23 23 25 27 34 41

What is the median?

A = 24

B = 25

C = 26

D = 27

Page 18: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Median• For the 37 vehicle quality ratings (odd n) the

position of the median is (n+1)/2 = (37+1)/2 = 19.

• So, the median is x19 = 121.

• When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.

Page 19: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central TendencyCharacteristics of the Median• The median is insensitive to extreme data values.• For example, consider the following quiz scores for 3

students:

• What does the median for each student tell you?

Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

Page 20: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central TendencyMode – The most frequently occurring data value.• Similar to mean and median if data values occur often

near the center of sorted data.• May have multiple modes or no mode.• Easy to define, not easy to calculate in large samples.• Use Excel’s function =MODE(Array)

– will return #N/A if there is no mode.– will return first mode found if multimodal.

• May be far from the middle of the distribution and not at all typical.

• Generally isn’t useful for continuous data since data values rarely repeat.– Best for attribute data or a discrete variable with a small range

(e.g., Likert scale).

Page 21: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Mode:• A bimodal distribution refers to the shape of the

histogram rather than the mode of the raw data.• Occurs when dissimilar populations are

combined in one sample. For example,

Page 22: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central TendencySkewness:• Compare mean and median or look at histogram

to determine degree of skewness.

Mean, Median & Skewness:If median > mean, skewed left.If median = mean, symmetric.If median < mean, skewed right.

Mean, Mode & Skewness:If mode > mean, skewed left.If mode = mean, symmetric.If mode < mean, skewed right.

Page 23: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Midrange – the point halfway between the lowest and highest values of X.

• Easy to use but sensitive to extreme data values.

min max

2

x xMidrange =

Page 24: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

ClickersConsider the J. D. Power quality data (n=37):

What is the midrange?

A = 121 B = 122

C = 130 D = 173

Page 25: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Central Tendency

Trimmed Mean:• To calculate the trimmed mean, first remove the highest

and lowest k percent of the observations.• To determine how many observations to trim, multiply

k x n:– Remove (k x n) highest and lowest observations.

• Mitigates the effects of extreme values.• May exclude relevant data values.

Page 26: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

• Variation is the “spread” of data points about the center of the distribution in a sample. The text considers the following measures of dispersion:– Range– Variance (S2)– Standard Deviation (S)– Coefficient of Variation (CV)– Mean Absolute Deviation (MAD)

• The variance and standard deviation are the most frequently used, but we will briefly discuss the merits of all five.

Page 27: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

Range – The difference between the largest and smallest observation.

• Easy to calculate, but sensitive to extreme data values.

Range = xmax – xmin

Page 28: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

Variance:• The population variance (2) is

defined as the sum of squared deviations around the mean divided by the population size.

• For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance 2.

22 1

N

ii

x

N

22 1

1

n

ii

x xs

n

Page 29: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

Standard Deviation – The square root of the variance.• Explains how individual values in a data set vary from

the mean.• Units of measure are the same as X.

• For the 37 vehicle quality ratings …

Population standard deviation

21

N

ii

x

N

Sample

standard deviation

21

1

n

ii

x xs

n

Page 30: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

38.125)173...989387(371 x

89.22

))38.125173(...)38.12593()38.12587(( 222137

1

S

S

Page 31: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – DispersionCalculating Standard Deviation:• Excel’s built in functions are…

• The standard deviation is nonnegative because deviations around the mean are squared.

• When every observation is exactly equal to the mean, the standard deviation is zero.

• Standard deviations can be large or small, depending on the units of measure.

• Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.

StatisticStatistic Excel Excel populationpopulation formulaformula

Excel Excel sample sample formulaformula

VarianceVariance =VARP(Array)=VARP(Array) =VAR(Array)=VAR(Array)

Standard deviationStandard deviation =STDEVP(Array)=STDEVP(Array) =STDEV(Array)=STDEV(Array)

Page 32: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

Coefficient of Variation – A unit-free measure of dispersion.• Expressed as a percent of the mean.

• Useful for comparing variables measured in different units or with different means.

• Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.

100s

CVx

Page 33: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

ClickersRecall from the J. D. Power quality data (n=37):

What is the Coefficient of Variation ?

A = 5.48%

B = 18.26%

C = 22.89%

D = 125.38%

38.125)173...989387(371 x

89.22

))38.125173(...)38.12593()38.12587(( 222137

1

S

S

Page 34: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – DispersionMean Absolute Deviation (MAD) – reveals the average distance from an individual data point to the mean (center of the distribution).

• Uses absolute values of the deviations around the mean.

• Excel’s function is =AVEDEV(Array).

1

n

ii

x xMAD

n

Page 35: BCOR 1020 Business Statistics Lecture 4 – January 29, 2008.

Chapter 4 – Dispersion

• Consider the histograms of hole diameters drilled in a steel plate during manufacturing.

• The desired distribution is outlined in red.

Machine A Machine B

Central Tendency vs. Dispersion: Manufacturing

Desired mean (5mm) but too much variation.

Acceptable variation but mean is less than 5 mm.