1 Descriptive Statistics: Numerical Methods Chapter III.
-
Upload
abel-lindsey -
Category
Documents
-
view
225 -
download
0
Transcript of 1 Descriptive Statistics: Numerical Methods Chapter III.
1
Descriptive Statistics: Numerical Methods
Chapter III
2
Key Learning Objectives and Topics in this Chapter
Measures of Location: (Mean, Median, Mode, Percentiles, Quartiles)
Measures of Dispersion/Variability ( Range, Variance, Standard Deviation, Coefficient of Variation)
Measures of distribution shape, and association between two variables
3
Important Note
In all cases :
Know the formulas, learn the computation procedures (i.e., apply the formulas) and know the meaning (interpretation) of the measures computed.
Use Excel; Practice! Practice! and
Practice!
4
3.1. Introduction
When describing data, usually we focus our attention on two types of measures..
Central location (e.g. average) Variability or Spread
These measures could be computed for Population: Parameters Sample : Statistics
5
With one data pointclearly the central location is at the pointitself.
3.2 Measures of Central Location
A center is a reference point. Thus a good measure of central location is expected to reflect the locations of all the other actual points in the data.
How?
if the third data point appears on the left hand-sideof the center, it should “pull”the central location to the left.
With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).
6
Measures of LocationIf the measures are computed
for data from a sample,they are called sample statistics.
If the measures are computed for data from a population,
they are called population parameters.
A sample statistic is referred toas the point estimator of the
corresponding population parameter.
Mean Median Mode Percentiles Quartiles
7
This is the most popular and useful measure of central location
i) The Arithmetic Mean (µ)
Sum of the observationsNumber of observationsMean =
8
Sample mean Population mean
N
N
1iix
Number of observationsIn the sample (Sample size)
Number of ObservationsIn the Population (Population size)
n
Xx
n
ii
1
i) The Arithmetic MeanSum of the values of
Observations in the dataSum of the values of
Observations in the data
9
• Example 1Time (hours) spent by 10 adults on the Internet are as follows: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours.
Based on this data, compute the mean (average) amount of time spent on the Internet?
i) The Arithmetic Mean
hours1110
110==
10
22+9+0 +8+14+33+5+12+7+0
n
Xx
n
ii
1
Based on this data, the average amount of time spent on the internet by a typical adult is 11 hours.
10
The Median of a set of observations is the value that falls in the middle of a data that is arranged in certain order (ascending or descending).
It is the value that divides the observation into two equal halves
ii) The Median
ii) The Median
To find the median: We Put the data in an array (in increasing or decreasing order).
If the total number of observation in the data set is an ODD number, the median is the middle value.
If the total number of observation contained in the data set is EVEN, then the median is the AVERAGE of the middle two values.
12
Odd Number Observations Median= 8
0, 0, 5, 7, 8 9, 12, 14, 22
Example 2aFind the median for the following observations.
0, 7, 12, 5, 14, 8, 0, 9, 22
iii) The Median
Step-1: Arrange the data in increasing/ decreasing order
Step-2: Count the total number of observation in the data (9) …
13
0, 0, 5, 7, 8, 9, 12, 14, 22, 33
Example 2bFind the median for the following observations.
0, 7, 12, 5, 33, 14, 8, 0, 9, 22
iii) The Median
Even number Observations
Median=(8+9)/2=8.5
Step-1: Arrange the data in increasing/ decreasing order
Step-2: Count the total number of observation in the
data (10)…
ii) The Median
Note: The median (8 in example 2a)of an odd set of data is a
member of the data values.
The median (8.5 in example 2b) of an even data set is not necessarily a member of the set of values.
Unlike the mean, the median is not affected by the value of an observation in the data set.
III) The Center: Mode
The mode is the most frequent value.
The Mode is the value that occurs most frequently in the data. It is the value with the highest frequency
In any data set there is only one value for the mean or the median. However, a data set may have more than one value for the mode.
16
One modal class
III) The Center: Mode
Two modal classes
Histogram of Income distribution
17
Example 3: What is the mode for the following data?
0, 7, 12, 5, 33, 14, 8, 0, 9, 22
Solution All observation except “0” occur once. There are two “0”
values. Thus, the mode is zero.
Is this a good measure of central location?
The value “0” does not reside at the center of this set(compare with the mean = 11.0 and the median = 8.5).
III) The Center: Mode
18
• If mean = median = mode, the shape of the distribution is symmetric.
Comparing Measures ofCentral Tendency: Mean, Median, Mode
19
If mode < median < mean, the shape of the distribution trails to the right, is positively skewed.
A positively skewed distribution(“skewed to the right”)
MeanMedian
Mode MeanMedian
Mode
A negatively skewed distribution(“skewed to the left”)
Comparing Measures ofCentral Tendency: Mean, Median, Mode
• If mode > median > mean, the shape of the distribution
trails to the left, is negatively skewed.
20
A percentile provides information about the relative location and spread of the data between the smallest to the largest value.
Is a measure of the relative location, but not necessarily that of the central location
Percentile tells us the proportion of observationsthat lie below or above a certain value in the data. Example: Admission test scores for colleges and universitiesare frequently reported in terms of percentiles.
Percentiles
21
Definition:
The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more.
Percentiles
22
Arrange the data in ascending order.
Compute the ith position of the pth percentile.
If i is not an integer, round up. The p th percentile is the value in the i th position.
If i is an integer, the p th percentile is the average of the values in positions i and i +1.
Computing Percentiles
100100
xp
i
xn
pi
100
23
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
i = (p/100)n = (75/100)X10 =7.5
Rounding 7.5, we note that the 8th data value is
The 75th Percentile = 435
Compute the 75th percentile of the following data
24
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
i = (p/100)n = (50/100)X10 =5
Averaging the 5th and 6th data value, we get
Compute the 50th percentile of the following data
5th Percentile = (435 + 435)/2 = 435
25
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = the Median
Third Quartile = 75th Percentile
26
Quartiles Divide a data set into four equal parts
QuartilethitheoflocationtheisiQWhere
NQ
NQ
NQ ; ;
4
)1(334
)1(22;
4
)1(1
+=
+=
+=
27
3.2 Measures of Variability
28
3.2 Measures of Variability
Measures of central location fail to tell the whole story about the distribution.
A question of interest that remains unanswered even after obtaining measures of central location is how spread out are the observations around the central (say, mean) value?
• Variability is Important in business decisions.
• For example, in choosing between two suppliers A and B, we might consider not only the average delivery time for each, but also the variability in delivery time for each.
29
Measures of Variability
Range
Inter-Quartile Range
Variance
Standard Deviation
Coefficient of Variation
30
The range in a set of observations is the difference between the largest and smallest observations.
The range is the distance between the smallest and the largest data value in the set.
• Range = largest value – smallest value
Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the
dispersion of the observations between the two end points. It is also very sensitive to the smallest and largest data
values
i) The Range
31
This is a measure of the spread of the middle 50% of the observations
Large value indicates a large spread of the observations
Is not sensitive to extreme data values
Inter quartile range = Q3 – Q1
ii) Inter Quartile Range
32
iii) The Variance
Is the average of the squared differences between each data value and the measure of central location (mean)
Is calculated differently when we use population and when we use a sample
The variance is a measure of variability that utilizes all the data.
33
N
xN
ii
1
2
2
)-(
1-
)-(1
2
2
n
xxs
n
ii
iv) The Variance
Variance of a Population
Variance of a sample
34
Why divide by n-1 instead of n ?
Better approximation of the population variance
iii) The Variance
Why square the difference?
Sum of deviation from the mean is zero
1-
)-(1
2
2
n
xxs
n
ii
35
1-
)-(1
2
2
n
xxs
n
ii
Example- Computing the Variance-Based on a Sample data
Variance of a sample
Find the variance of the following sample observations
9 11 8 12
36
Computing Variance of a sample
33.33
10
14
2)2(11 22222
s
8-10= -2
9-10= -111-10= +1
12-10= +2
104
40
4
128119
XStep-1: Find the mean
Step-2: Compute deviations from the mean
Step-3: Square the deviations, add them together, and divide
the sum of the squared deviations by n-1
37
The standard deviation of a set of observations is the square root of the variance .
2
2
:deviationandardstPopulation
ss:deviationstandardSample
iv) Standard Deviation
38
Why Standard Deviation?
The standard deviation Is often reported in the actual unit of measure in
which the data is recorded.
Thus it can be used to compare the variability of several distributions that are measured in the same units,
It can also be used to make a statement about the general shape of a distribution (Kurtosis).
39
Computing the standard deviation
33.33
10
14
2)2(11 22222
s10
4
40
4
128119
X
8-10= -2
9-10= -111-10= +1
12-10= +2
Step-1: Find the mean
Step-2: Compute deviations from the mean
Step-3: Square the deviations, add them together, and divide
the sum of the squared deviations by n-1
step-4: Take the square root of the variance 824.133.32 ss
40
The coefficient of variation is computed as follows:
V) Coefficient of Variation
100 %s
x
The coefficient of variation is a measure of how large the standard deviation is relative to the mean.
for asample
for apopulation
100 %
CV=
41
A standard deviation of 10 may be perceived large when the mean value is 100, but it is only moderately large if the mean value is 500
Why Coefficient of Variation?
Example: Is a standard deviation of 10 large?
Coefficient of Variation can be used to compare variability in data sets that are measured in different units.
42
54.74100 % 100 % 11.15%
490.80sx
22 ( )
2,996.161
ix xs
n
2 2996.47 54.74s s
the standarddeviation isabout 11%
of the mean
Variance
Standard Deviation
Coefficient of Variation
Variance, Standard Deviation,and Coefficient of Variation
44
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Compute every single measure of central location and Variability you have learned in this chapter for the following sample rent data on 70 efficiency apartments