MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical...
-
Upload
diana-rogers -
Category
Documents
-
view
213 -
download
0
Transcript of MBA7025_04.ppt/Jan 27, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical...
MBA7025_04.ppt/Jan 27, 2015/Page 1Georgia State University - Confidential
MBA 7025
Statistical Business Analysis
Descriptive Statistics
Jan 27, 2015
MBA7025_04.ppt/Jan 27, 2015/Page 2Georgia State University - Confidential
Agenda
Central Limit Theorem
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Percentile, Variance
and Standard Deviation
3. Measures of AssociationCoefficient of Variation
Confidence Interval
MBA7025_04.ppt/Jan 27, 2015/Page 3Georgia State University - Confidential
1. It is the Arithmetic Average of data values:
2. The Most Common Measure of Central Tendency
3. Affected by Extreme Values (Outliers)
n
xn
ii
1 n
xxx ni 2
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
xSample Mean
Mean
MBA7025_04.ppt/Jan 27, 2015/Page 4Georgia State University - Confidential
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
1. Important Measure of Central Tendency
2. In an ordered array, the median is the “middle” number.• If n is odd, the median is the middle number.• If n is even, the median is the average of the 2
middle numbers.
3. Not Affected by Extreme Values
Median
MBA7025_04.ppt/Jan 27, 2015/Page 5Georgia State University - Confidential
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
1. A Measure of Central Tendency
2. Value that Occurs Most Often
3. Not Affected by Extreme Values
4. There May Not be a Mode
5. There May be Several Modes
6. Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
Mode
MBA7025_04.ppt/Jan 27, 2015/Page 6Georgia State University - Confidential
• Describes How Data Are Distributed
• Measures of Shape:
Symmetric or skewed
Right-SkewedLeft-Skewed SymmetricMean = Median = Mode Mean Median Mode Median Mean Mode
Shape
MBA7025_04.ppt/Jan 27, 2015/Page 7Georgia State University - Confidential
Agenda
Central Limit Theorem
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Percentile, Variance and Standard Deviation
3. Measures of Association Coefficient of Variation
Confidence Interval
MBA7025_04.ppt/Jan 27, 2015/Page 8Georgia State University - Confidential
• Measure of Variation
• Difference Between Largest & Smallest
Observations: Range =
• Ignores How Data Are Distributed:
The Range
SmallestLa xx rgest
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
MBA7025_04.ppt/Jan 27, 2015/Page 9Georgia State University - Confidential
Percentile
1. Arrange data in ascending order.
2. The middle number is the median.
3. The number halfway to the median is the first quartile.
4. The number halfway past the median is the 3rd quartile.
5. A number with (no more than) 66% of the values less than it is the 66th percentile, and so forth.
MBA7025_04.ppt/Jan 27, 2015/Page 10Georgia State University - Confidential
Percentile
ObsMedals
ObsMedals
ObsMedals
ObsMedals
ObsMedals
1 110 12 24 23 10 34 6 45 3
2 100 13 19 24 9 35 6 46 3
3 72 14 18 25 8 36 6 47 2
4 47 15 18 26 8 37 5 48 2
5 46 16 16 27 7 38 5 49 2
6 41 17 15 28 7 39 5 50 2
7 40 18 14 29 7 40 4 51 2
8 31 19 13 30 6 41 4 52 1
9 28 20 11 31 6 42 4 53 1
10 27 21 10 32 6 43 4 54 1
11 25 22 10 33 6 44 3 55 1
2008 Olympic Medal Tally for top 55 nations. What is the percentile score for a country with 9 medals? What is the 50th percentile?
MBA7025_04.ppt/Jan 27, 2015/Page 11Georgia State University - Confidential
Percentile Solutions
Order all data (ascending or descending).
1. Country with 9 medals ranks 24th out of 55. There are 31 nations (56.36%) below it and 23 nations (41.82%) above it. Hence it can be considered a 57th or 58th percentile score.
2. The medal tally that corresponds to a 50th percentile is the one in the middle of the group, or the 28th country, with 7 medals. Hence the 50th percentile (Median) is 7.
MBA7025_04.ppt/Jan 27, 2015/Page 12Georgia State University - Confidential
Box Plot
Median
Q1 Q3Smallest Largest
MBA7025_04.ppt/Jan 27, 2015/Page 13Georgia State University - Confidential
• Important Measure of Variation
• Shows Variation About the Mean:
• For the Population:
• For the Sample:
Variance
N
X i
2
2
1
2
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
N
SS2
12
n
SSs
or
or
MBA7025_04.ppt/Jan 27, 2015/Page 14Georgia State University - Confidential
• Most Important Measure of Variation
• Shows Variation About the Mean:
• For the Population:
• For the Sample:
Standard Deviation
N
X i
2
1
2
n
XXs i
For the Population: use N in the denominator.
For the Sample : use n - 1 in the denominator.
1n
SSs
N
SSor
or
MBA7025_04.ppt/Jan 27, 2015/Page 15Georgia State University - Confidential
Computing Standard Deviation
Computing Sample Variance and Standard Deviation
Mean of X = 6
Deviation
X From Mean Squared
3 -3 9
4 -2 4
6 0 0
8 2 4
9 3 9
26 Sum of Squares
6.50 Variance = SS/n-1
2.55 Stdev = Sqrt(Variance)
MBA7025_04.ppt/Jan 27, 2015/Page 16Georgia State University - Confidential
The Normal Distribution
A property of normally distributed data is as follows:
Distance from Mean
Percent of observations included in that range
± 1 standard deviation
Approximately 68%
± 2 standard deviations
Approximately 95%
± 3 standard deviations
Approximately 99.74%
MBA7025_04.ppt/Jan 27, 2015/Page 17Georgia State University - Confidential
Comparing Standard Deviations
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
MBA7025_04.ppt/Jan 27, 2015/Page 18Georgia State University - Confidential
Outliers
• Typically, a number beyond a certain number of standard deviations is considered an outlier.
• In many cases, a number beyond 3 standard deviations (about 0.25% chance of occurring) is considered an outlier.
• If identifying an outlier is more critical, one can make the rule more stringent, and consider 2 standard deviations as the limit.
MBA7025_04.ppt/Jan 27, 2015/Page 19Georgia State University - Confidential
Agenda
Central Limit Theorem
Descriptive Summary Measures
1. Measures of Central LocationMean, Median, Mode
2. Measures of VariationThe Range, Percentile, Variance
and Standard Deviation
3. Measures of Association Coefficient of Variation
Confidence Interval
MBA7025_04.ppt/Jan 27, 2015/Page 20Georgia State University - Confidential
• Measure of Relative Variation
• Always a %
• Shows Variation Relative to Mean
• Used to Compare 2 or More Groups
• Formula (for Sample):
100%
X
StDevCV
Coefficient of Variation
MBA7025_04.ppt/Jan 27, 2015/Page 21Georgia State University - Confidential
• Stock A: Average Price last year = $50
Standard Deviation = $5
• Stock B: Average Price last year = $100
Standard Deviation = $5
100%
X
StDevCV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Computing Coefficient of Variation
MBA7025_04.ppt/Jan 27, 2015/Page 22Georgia State University - Confidential
Agenda
Central Limit Theorem
Descriptive Summary Measures
Confidence Interval
MBA7025_04.ppt/Jan 27, 2015/Page 23Georgia State University - Confidential
Central Limit Theorem
• Regardless of the population distribution, the distribution of the sample means is approximately normal for sufficiently large sample sizes (n>=30), with
• For a Sample Sizes of 30 or More, Distribution of the Sample Mean Will Be Normal, with
– mean of sample means = population mean, and
– standard error = [population deviation] / [sqrt(n)]
and
x
nx
MBA7025_04.ppt/Jan 27, 2015/Page 24Georgia State University - Confidential
Level of Significance & Level of Confidence
• Level of Significance – α (alpha), equals the maximum allowed percent of error. If the maximum allowed error is 5%, then α = 0.05.
• Level of Confidence is the desired degree of certainty. A 95% Confidence Level is the most common. A 95% Confidence Level would correspond to a 95% Confidence Interval of the Mean. This would state that the actual population mean has a 95% probability of lying within the calculated interval. A 95% Confidence Level corresponds to a 5% level of significance, or α = 0.05. The Confidence Level therefore equals 1- α.
MBA7025_04.ppt/Jan 27, 2015/Page 25Georgia State University - Confidential
Why Does Central Limit Theorem Work?
As Sample Size Increases:
1. Most Sample Means will be Close to Population Mean,
2. Some Sample Means will be Either Relatively Far Above or Below Population Mean.
3. A Few Sample Means will be Either Very Far Above or Below Population Mean.
MBA7025_04.ppt/Jan 27, 2015/Page 26Georgia State University - Confidential
Agenda
Confidence Interval
Descriptive Summary Measures
Central Limit Theorem
MBA7025_04.ppt/Jan 27, 2015/Page 27Georgia State University - Confidential
Confidence Intervals
• The population mean is within 2 Standard Errors (SE) of the sample mean, 95% of the time.
• Thus , is in the range defined by:
2*SE, about 95% of the time.
• (2 *SE) is also called the Margin of Error (MOE).
95% is called the confidence level.
• Sample Mean + Margin of Error (MOE)
• Called a Confidence Interval
MBA7025_04.ppt/Jan 27, 2015/Page 28Georgia State University - Confidential
The Standard Normal Distribution
X Bar - Number of SEs from the Mean
Frequency
3210-1-2-3-4
500
400
300
200
100
0
Standardized Histogram of X BarNormal Distribution with Mean 0 and Standard Error of 1
68%
95%
99.7%
MBA7025_04.ppt/Jan 27, 2015/Page 29Georgia State University - Confidential
Confidence Interval for Mean
• In general, the confidence interval for is given by
z.
• is the sample mean• z is the confidence factor. It is the number of standard errors one
has to go from the mean in order to include a certain percent of observations. For 95% confidence the value is 1.96 (approximately 2.00).
• is the standard error of the sample means.
In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.96
xn
x
n
MBA7025_04.ppt/Jan 27, 2015/Page 30Georgia State University - Confidential
Confidence Interval for Mean
• Since is generally not known we substitute the sample standard deviation, ‘s’. This changes the distribution of the sample means from z (standard normal) to a t-distribution, a close relative.
t.
• The t value is slightly larger than the z for a given confidence level, thereby increasing the margin of error. That is the price of using s in place of
x ns
MBA7025_04.ppt/Jan 27, 2015/Page 31Georgia State University - Confidential
Confidence Interval for Mean (Example 1)Gas Price
• A sample of 49 gas stations nationwide shows average price of unleaded is $ 3.87 and a standard deviation of $ 0.15 . Estimate the mean price of gas nationwide with 95% confidence.
In Excel, compute t with 5% error and (n-1), or 48 degrees of freedom=tinv(0.05,48) = 2.010635, rounded to 2.01.
95% CI for the Mean is: t
=3.87 ± [2.01 * (0.15/√49)] = $ 3.87 ± 0.043
Thus, $3.827 < < $3.913
Interpret the result!
x ns
MBA7025_04.ppt/Jan 27, 2015/Page 32Georgia State University - Confidential
Confidence Interval for Mean (Example 2)Federal Aid Problem
• Suppose a census tract with 5000 families is eligible for aid under program HR-247 if average income of families of 4 is between $7500 and $8500 (those lower than 7500 are eligible in a different program). A random sample of 12 families yields data below.
7,300 7,700 8,100 8,4007,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600
Representative Sample
MBA7025_04.ppt/Jan 27, 2015/Page 33Georgia State University - Confidential
Confidence Interval for Mean (Example 2)Federal Aid Problem
7,300 7,700 8,100 8,4007,800 8,300 8,500 7,600 7,400 7,800 8,300 8,600
112
)79838600(..)79837300(441$
983,7$
22
s
x
Representative Sample
x MOE 7 983, MOE
In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom=tinv(0.05,11) = 2.201.
MBA7025_04.ppt/Jan 27, 2015/Page 34Georgia State University - Confidential
Confidence Interval for Mean (Example 2)Federal Aid Problem
In Excel, compute t with 5% error and (n-1), or 11 degrees of freedom=tinv(0.05,11) = 2.201.
95% CI for the Mean is: t
=7,983 ± MOE
=7,983 ± [2.201 * (441/√12)] = 7,983 ± 280
Thus, $7,703 < < $8,263
x ns
Interpretation of Confidence Interval
• 95% Confident that Interval $7,983 + $280 Contains Unknown PopulationPopulation (Not SampleNot Sample) ) Mean Income.
• If We Selected 1,000 Samples of Size 12 and Constructed 1,000 Confidence Intervals, about 950 Would Contain Unknown Population Mean and 50 Would Not.
MBA7025_04.ppt/Jan 27, 2015/Page 35Georgia State University - Confidential
Confidence Interval for Proportions
• For proportions, • p = population proportion• = sample proportion
• Confidence Interval for p is given by
± z .
p̂
p̂n
pp )ˆ1(ˆ
MBA7025_04.ppt/Jan 27, 2015/Page 36Georgia State University - Confidential
Confidence Interval for Proportions (Example 1)Presidential Election
• The Wall Street Journal for Sept 10, 2008 reports that a poll of 860 people shows a 46% support for Sen. Obama as President.
Find the 95% CI for the proportion of the population that supports him.
In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.960
95% CI for the Proportions is:
= 0.46 ± 0.033
Thus, .427 < p < .493
860
)46.01(46.096.146.0
MBA7025_04.ppt/Jan 27, 2015/Page 37Georgia State University - Confidential
Confidence Interval for Proportions (Example 2)Japan Business Survey
• N =200 Californians
• Yes = 116
• No = 84
Is Japan the ForemostEconomic Power Today?
.p 116
2000 58
MBA7025_04.ppt/Jan 27, 2015/Page 38Georgia State University - Confidential
Confidence Interval for Proportions (Example 2)Japan Business Survey
In Excel, compute z with 95% confidence level (i.e. level of significance = 0.05)z score = normsinv(1-0.05/2) = 1.960
95% CI for the Proportions is: = 0.58 ± MOE= 0.58 ± 0.068
Thus, .512 < p < .648
In Excel, compute z with 90% confidence level (i.e. level of significance = 0.10)z score = normsinv(1-0.10/2) = 1.645
90% CI for the Proportions is: = 0.58 ± MOE= 0.58 ± 0.057Thus, .523 < p < .637
200
)58.01(58.096.158.0
200
)58.01(58.0645.158.0
MBA7025_04.ppt/Jan 27, 2015/Page 39Georgia State University - Confidential
Sample Means versus Sample Proportion
• Income/Loss
• Time to Complete Loan Papers
• Number of Fat Calories in Burger
• Breaking Strength of Cellular Phone Housing
• Americans Who Believe that Japan is #1 Economic Power
• Circuit Boards with One or More Failed Solder Connections
• African-Americans Who Pass CPA
Mean Proportion of
Means and Proportions Not the Same!!!!
MBA7025_04.ppt/Jan 27, 2015/Page 40Georgia State University - Confidential
Similarities and Differences Between Sample Means and Proportions
• Sample Means Computed from Data that Are MeasuredMeasured. Estimate Population Means.
• Sample Proportions Computed from Data that Are CountedCounted. Estimate Population Proportions.