Summary of Statistics II
-
Upload
jackflash2010 -
Category
Documents
-
view
215 -
download
0
Transcript of Summary of Statistics II
-
8/12/2019 Summary of Statistics II
1/8
8/26/20
Numerical
Summaries
of Data
Numerical Summary(primarily for quantitative variables)
Location Variation
Measures of Location
Give middle or typical valuesor central tendency.
Measures of Variation
Describe spread or scatter
or dispersion in the data.
Mode
Measures of Location
1. Mean
2. Median
3. Mode
Measures of Location
1. Mean
the center of gravity
of the data (histogram).
Population Mean =Sample Mean = X
Measures of Location
1. Mean
the center of gravity
of the data (histogram).
Population Mean =Sample Mean = X
-
8/12/2019 Summary of Statistics II
2/8
8/26/20
formula for mean
Sample
Mean =Sum of observations
divided by
sample size
SXinX =
X1 + X2 + +Xnn=
Median- midpoint of distribution
At least half of the observations are
less than or equal to the median,
and at least half are
greater than or equal to the median.
Note: For n observations,
the median is located at the
n+ 1
2
in the ordered sample.
-th observation
Example 1
Data: 14, 18, 20, 12, 24, 15, 14
(n = 7odd)
Step 1: Order the data:
12, 14, 14, 15, 18, 20, 24
7 + 1
2= 4thlocation of median
q Median is the middle value.
q At least half the values are at or greater;
at least half are at or lower.
median example
Data: 14, 18, 20, 12, 24, 15, 14
(n = 7odd)
Step 1: Order the data:
12, 14, 14, 15, 18, 20, 24
94 (outlier)
94
Original: X = 16.71
with outlier: X = 26.71
Example 2
q Median is stillthe middle value.q Median isresistant to outliers.
Data: 14, 18, 20, 12, 24, 15, 14, 214
(n = 8even, outlier)
1st: Order the data:
12, 14, 14, 15, 18, 20, 24, 214
q Median is the average of the two middle values.
q Exactly half the values are greater, half lower.
Example 3
8 + 1
2= 4.5thlocation of median
16.5
-
8/12/2019 Summary of Statistics II
3/8
8/26/20
1. Order the data.
2. For odd n, the median isthe center observation.
3. For even n, the median is
the average of the two center
observations.
Summary for Finding Median 3. Mode - most frequentlyoccurring value
In a histogram, modal class
is the one havinglargest frequency,
i.e., highest bar.
Good for a discrete quant variable (few values)
or a categorical variable.
What typeof variable is it?
q Ifcategorical, use the mode.
Average is meaningless;
look at percentages of occurrences.
q If variable is quantitative,
first look at a graph:
l Skewed or outliers?
l More or less symmetric?
Use median.
Use mean.
Numerical Summary
Location Variation
Mean
Median
ModePercentilesQuartiles
Range
Std. Deviation
IQR
Mutual Fund Selection
Two mutual funds; annual returns
for last three years from each.
Which fund would you choose?
Fund A: X = 10.0% Fund B: X = 10.0%8.0
12.0
10.0
60.0
-20.0
-10.0
Why does variation matter?Measures of Variation
1. Range
2. Variance &
Standard Deviation
3. Interquartile Range (IQR)
-
8/12/2019 Summary of Statistics II
4/8
8/26/20
Highest minus lowest
value in the sample.
1. Range Example 4: 3, 4, 1, 7, 4, 5
1 2 3 4 5 6 7
Example 5: 1, 1, 1, 7, 7, 7
1 2 3 4 5 6 7
Range = Hi - Lo = 7-1 = 6
Range = Hi - Lo = 7-1 = 6
How far are the data
from the middle,
on average?
2. Variance &
Standard Deviation
Sample Variance = s2Sample Std. Dev. = s
Population Variance = s2Population Std. Dev. = s
Notation:
Example 4: 3, 4, 1, 7, 4, 5 (miles)
1 2 3 4 5 6 7
X = 4.0- 3
- 1+1
+3
Avoid this by using either
1. absolute valueor
2. squaring
of the differences.
Note:
q The average of the deviations
from the mean will always be zero.We cannot let the negatives
cancel out the positives.
s
2
= n- 1
S(Xi-X)2
20
6-1=
= 4.0 miles2
Example 4 data:
Equation for Variance (for a sample):
3
4
1
7
4
5
x x - x (x x)2
-1
0
-3
3
0
1
24
x = 4.0Total 0
1
0
9
9
0
1
20
s2
Standard deviation:
s= s2
= 2.0 mi.
-
8/12/2019 Summary of Statistics II
5/8
8/26/20
Equation for Variance:
s2
= N
S(Xim)2
For a population:
s2=n- 1
S(XiX)2
For a sample:
Advantages: Good properties;
uses all the data.Disadvantages:
Units are squared.
Not resistant.
Variance
Standard Deviation
S= S2 The square root
of the variance.
= 4.0
= 2.0miles
Advantage:
Easier to interpret
than variance,
Units same as data.
If x= the 100pthpercentile, thenat least 100p% of data is x,at least 100(1-p)% of data is x.
Sample 100pthpercentile:
82% of the sample have scores 47,AND 18% have scores 47.
Example: You are told you scored 47;
then you hear 47 is at the 82ndpercentile.
1. Minimum2. 1stQuartile, Q1 = 25th ptile
3. Median
4. 3rdQuartile, Q3 = 75th ptile
5. Maximum
Five Number Summary
1st Quartile (25th percentile) :25% of the data values
lie at orbelowit.
3rd Quartile (75th percentile) :75%of the data values
lie at or belowit.
Quartiles:
-
8/12/2019 Summary of Statistics II
6/8
8/26/20
Quartiles
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
25% 25% 25% 25%
Q3Q2Q1
Method 1: Percentile method
Q1located at position(n+1)*1/4
Q2located at position(n+1)*2/4
Q3located at position(n+1)*3/4
n Q1 Q2 Q3
5
8
11
31.5 4.5
4.52.25 6.75
63 9
median of observations
below theposition of
the median.
Q3 = median ofobservationsabove theposition of
the median.
Q1 =
Method 2: Median method Ordered data:
12, 14, 16, 18, 19, 21, 22, 25, 27
Max =
Q3 =
Median =
Q1 =Min =
27.0
19.0
12.0
Q1= 15.0 Q3= 23.5
23.5
15.0
IQR = 8.5
Example 6
IQR = Q - Q13
q IQR is the rangeof the
middle50%of the data.
q Observations more than
1.5 IQRsbeyondquartiles
are considered outliers.
3. Interquartile Range (IQR)Which summary statistics
should I use?
Symmetric:
Use mean,
& std. dev.
Skewed right:
Use median,
& IQR.
-
8/12/2019 Summary of Statistics II
7/8
8/26/20
Boxplot
A graphically display of
the five number summary
(also called a box-and-whiskers plot)
Ordered data:
12, 14, 16, 18, 19, 21, 22, 25, 27
Max =
Q3 =
Median =
Q1 =Min =
27.0
19.0
12.0
Q1= 15.0 Q3= 23.5
23.5
15.0
IQR = 8.5
Example 6
-
8/12/2019 Summary of Statistics II
8/8
8/26/20
28
22
24
12
14
16
18
20
26
Median
1
3Q
Q
Note:
Middle 50%of data are
within the box
Minimum
Maximum
Max =
Q3 =
Median =
Q1 =Min =
27.0
19.0
12.0
23.5
15.0
IQR = 8.5