Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the...
-
Upload
kelly-townsend -
Category
Documents
-
view
229 -
download
2
Transcript of Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the...
![Page 1: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/1.jpg)
Lecture 2Lecture 2
Describing Data IIDescribing Data II
©
![Page 2: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/2.jpg)
Summarizing and Summarizing and Describing DataDescribing Data
Frequency distribution and Frequency distribution and the shape of the distributionthe shape of the distribution
Measures of variabilityMeasures of variability
![Page 3: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/3.jpg)
1. Frequency distribution 1. Frequency distribution and the shape of the and the shape of the
distributiondistribution
In the previous lecture, we saw that the mean of the household savings gives an inflated image of the saving of a “normal household”.
This was because the shape of the histogram was not symmetric.
It is important to look at how the observations are distributed.
![Page 4: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/4.jpg)
Japanese household savingsJapanese household savingsHistgram of J apanese Household Savings
14.1
10.69.5
8.26.9 6.2
5.1 4.5 3.5 3 3 2.7 2 2 1.9 1.7 1.2 1.3 1 1
10.7
02468
10121416
below 2,000
2,000-4,000
4,000-6,000
6,000-8,000
8,000-10,000
10,000-12,000
12,000-14,000
14,000-16,000
16,000-18,000
18,000-20,000
20,000-22,000
22,000-24,000
24,000-26,000
26,000-28,000
28,000-30,000
30,000-32,000
32,000-34,000
34,000-36,000
36,000-38,000
38,000-40,000
Above 40,000
Savings in thousand yen
Perce
ntage
Sample Average=17,280,000
Median =10,520,000
![Page 5: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/5.jpg)
1-1 Frequency 1-1 Frequency DistributionDistribution
The frequency table that we used in the previous lecture is also called the frequency distribution.frequency distribution. A frequency distribution is usually referred to how observations are distributed. When we plot the frequency table, it is called a HistogramHistogram.
A histogram usually shows the number of observations in a specific range. However, sometimes, it shows the percentage of observations in a specific range.
![Page 6: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/6.jpg)
1-2 Shape of the Distribution1-2 Shape of the Distribution
The shape of the distribution refers to the shape of the Histogram.
![Page 7: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/7.jpg)
1-3 Symmetric 1-3 Symmetric DistributionDistribution
The shape of the distribution is said to be symmetricsymmetric if the observations are balanced, or evenly distributed, about the mean. The shape of the distribution is symmetric if the shape of the histogram is symmetric
![Page 8: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/8.jpg)
Symmetric DistributionSymmetric DistributionSymmetric Distribution
0123456789
10
1 2 3 4 5 6 7 8 9
Fre
qu
en
cy
Note: For a symmetric distribution, the mean and median are equal.
![Page 9: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/9.jpg)
Symmetric Distribution: An Symmetric Distribution: An exampleexample
The age distribution of the clients (from the previous lecture note) is nearly symmetric.
Histogram
0 0
45
11 11
6
4
2
0 00
2
4
6
8
10
12
Clients' Age range
Freq
uenc
y
![Page 10: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/10.jpg)
1-4 Skewed Distribution1-4 Skewed Distribution
A distribution is skewedskewed if the observations are not symmetrically distributed above and below the mean. A positively skewedpositively skewed (or skewed to the right) distribution has a tail that extends to the right in the direction of positive values. A negatively skewednegatively skewed (or skewed to the left) distribution has a tail that extends to the left in the direction of negative values.
![Page 11: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/11.jpg)
Positively skewed Positively skewed distributiondistribution
Positively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
ency
![Page 12: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/12.jpg)
Positively skewed Positively skewed distribution: An exampledistribution: An example
The household saving histogram (from the previous lecture) is an example of a positively skewed distribution.
Histgram of J apanese Household Savings
14.1
10.69.5
8.26.9 6.2
5.1 4.5 3.5 3 3 2.7 2 2 1.9 1.7 1.2 1.3 1 1
10.7
02468
10121416
below 2,000
2,000-4,000
4,000-6,000
6,000-8,000
8,000-10,000
10,000-12,000
12,000-14,000
14,000-16,000
16,000-18,000
18,000-20,000
20,000-22,000
22,000-24,000
24,000-26,000
26,000-28,000
28,000-30,000
30,000-32,000
32,000-34,000
34,000-36,000
36,000-38,000
38,000-40,000
Above 40,000
Savings in thousand yen
Perc
enta
ge
Sample Average=17,280,000
Median =10,520,000
![Page 13: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/13.jpg)
Positively skewed Positively skewed distribution: distribution:
A noteA note
For a positively skewed distribution the mean is greater than the median.
![Page 14: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/14.jpg)
Negatively skewed Negatively skewed distributiondistribution
Negatively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
en
cy
Note: For a negatively skewed distribution, the mean is less than the median.
![Page 15: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/15.jpg)
2. Measures of Variability2. Measures of Variability
VarianceStandard deviation
![Page 16: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/16.jpg)
ExampleExample Data “Sales at two different stores”
contain daily sales data for two different stores. Data are collected for 60 days.
Store A’s average daily sales is 231,800 yen. Store B’s average daily sales is 230,500 yen.
Can we say that they are similar stores? Look at the following graphs.
![Page 17: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/17.jpg)
Daily sales of the two storesDaily sales of the two stores
Store A: Daily Sales
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
0 10 20 30 40 50 60 70
Day
Daily
sales
in 10
00 ye
n
Average=231,800yen
Store B: Daily Sales
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
0 10 20 30 40 50 60 70
Day
Daily
sales
in 1
00 ye
n Average=230,500yen
![Page 18: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/18.jpg)
Daily sales of the two Daily sales of the two storesstores
The difference between the two stores is that, Store A’s sales have much higher variation than Store B’s sales.
We need a measure of variability in data.
![Page 19: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/19.jpg)
2-1 How to measure the 2-1 How to measure the variability (1)variability (1)
Take the Store A’s data as an example, variability of each observation can be seen from the difference between the observation and the mean.
But, how do we measure the overall variability of the data?
Store A: Daily Sales
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
0 10 20 30 40 50 60 70
Day
Daily s
ales in
1000
yen
Average=231,800yen
For eachobservation, you cancompute thedifference from theaverage
![Page 20: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/20.jpg)
How to measure the variability How to measure the variability (2)(2)
Overall variabilityOverall variability How about taking the
average of all differences?
This is not a good idea, since the differences can be both positive or negative, so they would sum up to zero.
Therefore, we take the square of each difference. This is the first step to compute the “Variance”, a measure of overall variability.
Store A: Daily Sales
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
0 10 20 30 40 50 60 70
Day
Daily
sales
in 1
000
yen Average
=231,800yen
For eachobservation, you cancompute thedifference from theaverage
![Page 21: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/21.jpg)
2-2 Variance2-2 VarianceA measure of variabilityA measure of variability
Variance is computed in the following way.1. Subtract the mean from each observation
(compute the difference between each observation and the mean. Note that the difference can be minus)
2. Then, square each difference3. Sum all the squared differences4. Divide the sum of squared differences by n-1
(the number of observations minus 1) We will learn the reason why we divide the sum
of squares by n-1 after we learn the concept of the expectation.
![Page 22: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/22.jpg)
Computation of the variance:Computation of the variance:ExerciseExercise
Open the data “Computation of Variance”, and compute the variance of Store A’s daily sales
Compute the variance of Store B’s daily sales
![Page 23: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/23.jpg)
Computation of the variance:Computation of the variance:ExerciseExercise
Store A: Average daily sales =231.8 thousand yen Variance =4979.9 Store B: Average daily sales=230.5 thousand yen Variance =335.9 Notice that variance for Store A is higher
than that for Store B. This is because the variation in the daily sales is higher for Store A.
![Page 24: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/24.jpg)
Variance: noteVariance: note In the previous slide, we did not use any
unit of measurement for variance. (For example, we do not say that the variance for Store A is 4979.9 thousand yen.)
This is because, when we compute the variance, we square the data. Therefore, the unit of measurement for variance is “square of thousand yen”, which is not a meaningful unit.
Therefore, we use the Standard Deviation, another measure of variation.
![Page 25: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/25.jpg)
2-3 A measure of variability: 2-3 A measure of variability: Standard deviationStandard deviation
Standard deviation is the square root of the variance.
Exercise: Compute the standard
deviation of the daily sales for Store A and Store B.
VarianceDeviation Standard
![Page 26: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/26.jpg)
Standard Deviation: Store Standard Deviation: Store sales data examplesales data example
Standard deviation of Store A’s daily sales=70.57 thousand yen.
Standard deviation for Store B’s daily sales= 18.33 thousand yen.
This means that the average variation of the store A’s sales is about 70.6 thousand yen, and the average variation of the store B’s sales is about 18.3 thousand yen.
![Page 27: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/27.jpg)
Standard deviation and variance Standard deviation and variance as measures of risk (or as measures of risk (or
uncertainty)uncertainty)
Often standard deviation and variance are used as measures of uncertainty or risk.
If you would like to work as a store manager, then store B may be a better store to work for; although the average sales is almost the same as store A, the uncertainty is lower (low standard deviation)
![Page 28: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/28.jpg)
Standard deviation and variance Standard deviation and variance as measures of risk (or as measures of risk (or
uncertainty)uncertainty) In the store sales data, the average sales for both
stores are similar. However, in many other occasions, higher return
(higher average sales) comes with higher risk (higher standard deviation).
One makes a decision by choosing a good combination of return and risk. For example, if you invest in a stock, you would choose a stock with a combination of return and risk that suits your preference.
Therefore, standard deviation and variance are important numerical measures of summarizing data for a decision making purpose.
![Page 29: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/29.jpg)
2-4. Understanding the 2-4. Understanding the mathematical notation of the mathematical notation of the
variancevariance
Most of the time, we only have sample data (not population data).
Variance computed from a sample is called sample variance. We denote sample variance by s2.
When we have population data (which does not happen often), we can compute the population variance. We denote the population variance by σ2.
![Page 30: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/30.jpg)
Understanding the Understanding the mathematical notation of mathematical notation of
sample variancesample variance
Observation id Variable X
1 x1
2 x2
3 x3
.
...
n xn
The typical data we use comes in this format. Using this format, we would like to represent variance in a mathematical form.
![Page 31: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/31.jpg)
Understanding the Understanding the mathematical notation of mathematical notation of
sample variancesample variance
Obs idVariabl
e X
Each data-the mean
(Each data-the mean)2
1 X1 X1 - (X1 - )2
2 X2 X2 - (X2 - )2
3 X3 X3 - (X3 - )2
: : :
n Xn Xn -(Xn - )
2
Average
X
X
X
X
X
X
X
X
X
The first steps of computing variance are written in the table.
The variance can be computed by summing the last column, and divide the sum by (n-1)
Therefore, mathematically, a sample variance, s2, can be written as
next page
![Page 32: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/32.jpg)
Understanding the Understanding the mathematical notation for mathematical notation for
sample variancesample variance
1
)(
1
)()()()( 1
222
32
22
12
n
Xx
n
XxXxXxXxs
n
ii
n
Mathematically, sample variance, denoted as s2, can be written as
![Page 33: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/33.jpg)
Mathematical notation for Mathematical notation for population variancepopulation variance
Though not often, we may have population data. Then we can compute the population variance. We use the notation, σ2, to denote the population variance. We also use upper case N to denote the number of observations. The mathematical notation for the population variance is
N
x
N
xxxx
N
ii
n
1
222
32
22
12
)()()()()(
Unlike the case for sample variance, we do not have to divide the sum of squares by N-1. We simply divide it by N.
![Page 34: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/34.jpg)
2-5. Mathematical notation for 2-5. Mathematical notation for the sample standard deviation the sample standard deviation
The sample standard deviation, s, sample standard deviation, s, is written as
1
)(1
2
2
n
Xxss
n
ii
![Page 35: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/35.jpg)
Mathematical Notation for Mathematical Notation for population standard deviationpopulation standard deviation
The population standard deviation, population standard deviation, , , is written as
N
xN
ii
1
2
2
)(
![Page 36: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/36.jpg)
2-6. Short-cut formula for 2-6. Short-cut formula for sample variance sample variance
The short-cut formula for the sample sample variance variance is:
1
)( 2
1
2
2
n
Xnxs
n
ii
![Page 37: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/37.jpg)
ExerciseExercise
Compute the variance for the sales of Store A by applying the short-cut formula for sample variance, and show that this indeed coincides with our previous calculation.
![Page 38: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/38.jpg)
Other Measures of VariabilityOther Measures of Variability1. The Range 1. The Range
The range range in a set of data is the difference between the largest and smallest observations
![Page 39: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/39.jpg)
Other Measures of Central Other Measures of Central TendencyTendency2. Mode2. Mode
The mode, mode, if one exists, is the most frequently occurring observation in the sample or population.
![Page 40: Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649cf35503460f949c10d9/html5/thumbnails/40.jpg)
This lecture note covers:This lecture note covers:
Textbook P23~P28: Frequency distribution
Textbook 3.1, 3.2: Measures of central tendency and variability