Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · ·...
Transcript of Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · ·...
![Page 1: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/1.jpg)
Chapter 5 Exploring Data: Distributions
5.1. Displaying Distributions: Histograms
![Page 2: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/2.jpg)
Learning Objective
• How to read statistical data?
• How to visualize statistical data?
How to draw Histogram( bar graph )
![Page 3: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/3.jpg)
A small part of a data set collected from the students in a large statistics class by anonymous responses to a class
questionnaire.
![Page 4: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/4.jpg)
Notation
• First Column: Student number 2 to 8
• Column A: Sex- female or male
• Column B: right-handedness or left-handedness
• Column C: height in inches
• Column D: time spent in studying in minutes per weeknight
• Column E: number of coins each student are carrying
![Page 5: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/5.jpg)
How can we read the data in the table?
• What is the height of the student number 2?
• Who spends the least amount of time on studying per weeknight?
• How many students use right hand?
• How many students study at least 2 hours per weeknight?
![Page 6: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/6.jpg)
• What does each row represent?
• What does each column represent?
![Page 7: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/7.jpg)
Terminology
• Any characteristics are called variables.
• Individuals are the objects described by a set of data.
![Page 8: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/8.jpg)
Variables, Individuals
• What are variables in the table?
• How many individuals are we considering?
![Page 9: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/9.jpg)
The distribution of a variable gives information
( as a table, graph, or formula) about how often the variable takes certain values or intervals of values.
![Page 10: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/10.jpg)
Examples of Frequency Distributions
• Distribution of Sex
• Distribution of Coins
Value F M
Frequency
![Page 11: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/11.jpg)
relative frequency distribution
The relative frequency distribution of a variable states all observed values of the variable and what fraction (or percentage) of the time each value occurs.
Relative Frequency:= Frequency
Total number of individuals
![Page 12: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/12.jpg)
relative frequency distribution
• Conversion of decimal to percent and percent to decimal
• Relative frequency distribution of Sex
• Relative frequency distribution of Coins
![Page 13: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/13.jpg)
Grouped frequency distribution
If there are many individuals, then it is better to analyze the data based on the grouped frequency distribution.
Remark:
Visualization of grouped frequency distribution is Histogram.(Bar graph).
![Page 14: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/14.jpg)
![Page 15: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/15.jpg)
Individuals, Variables
• What are individuals?
• How many variables do we have?
![Page 16: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/16.jpg)
Procedure to make a grouped frequency distribution
1. Find the minimum and
maximum data values
2. Group neighboring data values into consecutive non-overlapping intervals: You need to decide the interval length which shows the data effectively.
3. Record the relevant frequencies
![Page 17: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/17.jpg)
Method to record the grouped frequency distribution: table or histogram (bar graph)
Class Count
0.0 to 4.9 27
5.0 to 9.9 13
10.0 to 14.9 2
15.0 to 19.9 4
20.0 to 24.9 0
25.0 to 29.9 1
30.0 to 34.9 2
35.0 5o 39.9 0
40.0 to 44.9 1
• In the above table, data values range between 0.7 and 42.1. In this case, it is better to group the data values so that they range between 0 and 45 with interval lengths 5.
• Data values are recorded to one decimal point. This affects how to record the table form of data, rather histogram.
![Page 18: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/18.jpg)
Histogram: What is the interpretation of the bar a, b and c?
![Page 19: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/19.jpg)
Miss Smith's Math class has just taken a test. In order to come up with meaningful grades, Miss Smith will make a histogram to represent the
distribution of grades.
Data
Student Grade
Bullwinkle 84
Rocky 91
Bugs 75
Daffy 68
Wylie 98
Mickey 78
Minnie 77
Lucy 86
Linus 94
Asterix 64
Obelix 59
Donald 54
Sam 89
Taz 76
1. What is the highest score?
2. What is the lowest score?
3. What does the horizontal axis in the histogram represent?
4. What does the vertical axis in the histogram represent?
5. Fill the blank about the table with five bins and draw the histogram.
Class Count
50 - 59
60 - 69
70 - 79
80 – 89
90 - 99
![Page 20: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/20.jpg)
6. Construct the table with 10 bins and draw the histogram.
7. Which histogram shows students’ overall academic performance in a better way? Why?
Data
Student Grade
Bullwinkle 84
Rocky 91
Bugs 75
Daffy 68
Wylie 98
Mickey 78
Minnie 77
Lucy 86
Linus 94
Asterix 64
Obelix 59
Donald 54
Sam 89
Taz 76
![Page 21: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/21.jpg)
5.2. Interpreting Histograms
![Page 22: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/22.jpg)
Global shape of histograms: Determining global shapes of histograms may be somewhat subjective although sometimes,
it has significant pattern. Skewed to the left or positively-
skewed: the longer tail of the histogram is on the left side
Skewed to the right or negatively-skewed: The longer tail of the histogram is on the right side
![Page 23: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/23.jpg)
Symmetric: the right and left sides of the histogram are approximately mirror images of each other
Non-specific
![Page 24: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/24.jpg)
Outlier: individual value(observation) that falls outside the overall pattern-It may show some particular aspects of data. However, it may suggest that there is an error in recording.
![Page 25: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/25.jpg)
Important aspects to be discussed in interpreting histogram
outliers, shape (peaks, skewed distribution, symmetric distribution), center( next section ), spread( next section )
![Page 26: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/26.jpg)
Example (Percent of Adult Population of Hispanic Origin by State in 2000 Census revisited)
Class Count
0.0 to 4.9 27
5.0 to 9.9 13
10.0 to 14.9 2
15.0 to 19.9 4
20.0 to 24.9 0
25.0 to 29.9 1
30.0 to 34.9 2
35.0 5o 39.9 0
40.0 to 44.9 1
How many peaks are in the graph?
![Page 27: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/27.jpg)
Example (Percent of Adult Population of Hispanic Origin by State in 2000 Census revisited)
Class Count
0.0 to 4.9 27
5.0 to 9.9 13
10.0 to 14.9 2
15.0 to 19.9 4
20.0 to 24.9 0
25.0 to 29.9 1
30.0 to 34.9 2
35.0 5o 39.9 0
40.0 to 44.9 1
What is the pattern of the graph? Skewed to the right? Skewed to the left? Or Symmetric?
![Page 28: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/28.jpg)
Example (Percent of Adult Population of Hispanic Origin by State in 2000 Census revisited)
Class Count
0.0 to 4.9 27
5.0 to 9.9 13
10.0 to 14.9 2
15.0 to 19.9 4
20.0 to 24.9 0
25.0 to 29.9 1
30.0 to 34.9 2
35.0 5o 39.9 0
40.0 to 44.9 1
What are outliers?
![Page 29: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/29.jpg)
5.4. Describing Center: Mean and Median
![Page 30: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/30.jpg)
Terminology
• Mean: average
mean =Sum of all data values
Number of data value
𝑥 = 𝑥1 + ⋯+ 𝑥𝑛
𝑛
• Median: mid number in the ordered list
Median=1
2(𝑛 + 1) th value
• Mode: the number that is repeated more often than any other. If no number is repeated, then there is no mode for the list.
![Page 31: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/31.jpg)
To consider mean, median and mode, we need the data with individual values because we have to use actual values to calculate. Thus, histogram doesn’t give enough information about mean and median. However, mean, median and mode give some information about the shape of histogram.
![Page 32: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/32.jpg)
![Page 33: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/33.jpg)
Example
Find the mean, median, and mode for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
![Page 34: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/34.jpg)
Remark
Mean and median don’t have to be a value from the original list.
![Page 35: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/35.jpg)
Remark
Half of the values in the data set lie below the median and half lie above the median.
![Page 36: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/36.jpg)
Remark
The median is the most commonly quoted figure used to measure property prices. The use of the median avoids the problem of the mean property price which is affected by a few expensive properties that are not representative of the general property market.
![Page 37: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/37.jpg)
Example
Find the mean, median, and mode for the following list of values:
8, 9, 10, 10, 10, 11, 11, 11, 12, 13
![Page 38: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/38.jpg)
Group Discussion
The marks of nine students in a physics test that had a maximum possible mark of 50 are given below:
50 35 37 32 38 39 36 34 35
Find the mean, median and mode of this set of data values. Round to the nearest tenth.
![Page 39: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/39.jpg)
Group Discussion
We have a distribution which is skewed to the left.
1. Draw any histogram which is skewed to the left. You don’t have to mark any values.
2. List median, mean and mod in order from the least to the greatest. Explain why you think so.
![Page 40: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/40.jpg)
5.5. Describing Spread: The Quartiles
![Page 41: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/41.jpg)
Remark:
• Mean can be affected much by outliers. For example, several very expensive houses in Barbourville can affect the average value of houses in Barbourville.
• Median doesn’t reflect the values from outliers much.
• However, we can infer something by comparing mean and median. To know the spread more clearly, we consider range and quartiles
![Page 42: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/42.jpg)
Terminology:
• Range:=largest value – smallest value
Quartile
• first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
• second quartile (designated Q2) = median M = cuts data set in half = 50th percentile
• third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% = 75th percentile
![Page 43: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/43.jpg)
Remark:
The method of calculation is slightly different, depending on whether the given information has values collected from even or odd number of individuals.
![Page 44: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/44.jpg)
Example 1:(even number of data)
After sorting, the city mileages of the 12 gasoline-powered midsized cars are:
15, 16, 18, 19, 20, 20, 21, 21, 21, 22, 24, 27
1. Find the range.
2. Find the first quartile, median, third quartile.
![Page 45: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/45.jpg)
Example 2(odd number of data)
After sorting, the city mileages of the 12 gasoline-powered midsized cars are:
15, 16, 18, 19, 20, 20, 21, 21, 21, 22, 24, 27, 48
1. Find the range.
2. Find the first quartile, median, third quartile.
![Page 46: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/46.jpg)
5.6. The Five-Number Summary and Boxplots
![Page 47: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/47.jpg)
The five-number summary
Minimum Q1 M Q3 Maximum
![Page 48: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/48.jpg)
Boxplot( or Box-and-Whisker diagram)
• Visualization of the five-number summary
• Boxplots can be drawn either horizontally or vertically.
• It helps visualize the rough shape of histogram from the information on spread, skewness, and outliers.
![Page 49: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/49.jpg)
Boxplot
Vertical Visualization Horizontal Visualization
![Page 50: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/50.jpg)
Group Activity
Below are the exam scores of 30 students. Make a boxplot of these data.
24 31 38 49 51 55 56 59 62
63 65 66 69 72 72 74 76 81
84 84 86 86 86 88 88 88 91
91 92 99
![Page 51: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/51.jpg)
Answer
![Page 52: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/52.jpg)
Group Activity
Below are the ages of 30 people who died in a city hospital in one month. Make a boxplot of these data.
7 22 25 31 37 38 41 48 49
50 55 58 62 62 64 65 66 66
72 75 76 76 76 85 86 88 88
88 92 94
![Page 53: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/53.jpg)
Answer
![Page 54: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/54.jpg)
5.7. Describing Spread: The Standard Deviation
![Page 55: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/55.jpg)
Word meaning of deviation
Deviate: to turn aside or move away from what is considered a correct or normal course, standard of behavior, way of thinking, etc (deviated, deviating)
![Page 56: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/56.jpg)
Survey on the quality of a restaurant
0
20
40
60
80
100
120
Terrible(1)
Poor(2)
Average(3)
VeryGood(4)
Excellent(5)
Terrible(1
)
Poor
(2)
Averag
e(3)
Very
Good(4)
Excelle
nt(5)
Food 1 99
Servic
e
5 15 30 35 15
Value 25 75
Atmo
spher
e
10 90
![Page 57: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/57.jpg)
Terminology
(sample) Standard deviation s (of n observations x1, …., xn) It measures standard or average amount of deviation from their mean
Formula:
𝒔 =(𝒙𝟏 − 𝒙 )𝟐+(𝒙𝟐 − 𝒙 )𝟐+⋯+ (𝒙𝒏 − 𝒙 )𝟐
𝒏 − 𝟏
![Page 58: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/58.jpg)
Remark
• Standard deviation is also denoted by σ.
• Standard deviation is zero only when there is no spread.
• Standard deviation is more sensitive to outliers than mean. If there are some outliers or a distribution is a strongly skewed distribution, then a standard deviation doesn’t give much information about the spread of a distribution.
![Page 59: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/59.jpg)
in the following figures, the standard deviation of a is bigger than the standard deviation of b. However, the standard deviation of c may be
bigger than a because of outliers.
![Page 60: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/60.jpg)
Calculation of Standard deviation
• Calculate the mean. • Write the list of deviation:= observation value- mean • Write the list the squared
deviations. • Add all values in the list of the
squared deviations. • Divide by n-1, where n is the
number of observations. • Square the whole value.
𝒔 =(𝒙𝟏 − 𝒙 )𝟐+(𝒙𝟐 − 𝒙 )𝟐+⋯+ (𝒙𝒏 − 𝒙 )𝟐
𝒏 − 𝟏
![Page 61: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/61.jpg)
Example
On six consecutive Sundays, a tow-truck operator received 9, 7, 11, 10, 13, 7 service calls. Calculate s.
• Calculate the mean. • Use
𝒔 =(𝒙𝟏 − 𝒙 )𝟐+(𝒙𝟐 − 𝒙 )𝟐+⋯+ (𝒙𝒏 − 𝒙 )𝟐
𝒏 − 𝟏
![Page 62: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/62.jpg)
Group Activity
Find the mean and then, standard deviation for the following data series:
12, 6, 7, 3, 15, 10, 18, 5
![Page 63: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/63.jpg)
5.8. Normal distribution
Normal distribution(bell-shaped distribution): distribution whose shape is described by a normal curve
![Page 64: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/64.jpg)
Normal curve
• smoothed-out histogram. Normal curve is symmetric with the same mean, median and mode
• The area under the curve is exactly 1.
![Page 65: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/65.jpg)
Normal curve
The area under the curve between two vertical lines=proportion (%) of all values of the variable lies in that interval.
![Page 66: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/66.jpg)
Examples of normal distributions
heights of men or women, blood pressure, marks on a standardized test such as SAT
Remark: If you say “That man is tall or has normal height”, then you are talking about rough statistical sense. Relate that with the figure above.
![Page 67: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/67.jpg)
Learning Objective of the remaining section
• Understanding the geometric meaning of the standard deviation in a normal curve.
• Use the standard deviation to obtain the first quartile(25%) and the third quartile(75%) in a normal curve
![Page 68: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/68.jpg)
Concave Up and Down
Concave up
Concave down
![Page 69: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/69.jpg)
Change of Curvature
The point where the curve changes its concavity.
Concave down to concave up
Or
Concave up to concave down
![Page 70: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/70.jpg)
![Page 71: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/71.jpg)
Standard deviation and Change of Curvature in normal curve
(Geometric Meaning)
Standard deviation =
distance from the center to
the change-of-curvature points on either side
![Page 72: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/72.jpg)
Standard Deviation and Quartiles in Normal Distribution
(Quantitative meaning of standard deviation)
• First quartile= mean − (0.67 X standard deviation)
• Third quartile = mean + (0.67 X standard deviation)
![Page 73: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/73.jpg)
Example The distribution of heights of American women aged 18-24 is approximately normal with mean 64.5 inches and standard deviation 2.5 inches. a. What is the first
quartile? What is the interpretation of this?
b. What is the third quartile?
c. Between what two values do the middle 50% of scores lie?
![Page 74: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/74.jpg)
Example
The scores of students on a standardized test form a normal distribution with a mean score of 500 and a standard deviation of 100. Between what two values do the middle 50% of scores lie?
![Page 75: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/75.jpg)
Example
The distribution of the scores on a standardized exam is approximately normal with mean 250 and standard deviation 20. Between what two values do the middle 50% of scores lie?
![Page 76: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/76.jpg)
Summary of Normal Curve
• Area under the normal curve= 1
• Middle 50% = from the first quartile to the third quartile
• First quartile=mean-(0.67X standard deviation)
• Third quartile= mean+(0.67X standard deviation)
![Page 77: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/77.jpg)
5.9. The 68-95-99.7 Rule
![Page 78: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/78.jpg)
If you use the 68-95-99.7 rule, then you can interpret
the data more effectively. The following figure illustrates the 68-95-99.7 rule,
when mean is 0 and standard deviation is 1.
![Page 79: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/79.jpg)
Normal Distributions 68-95-99.7 Rule
• 68% of the observations fall within 1 standard deviation of the mean.
• 95% of the observations fall within 2 standard deviations of the mean.
• 99.7% of the observations fall within 3 standard deviations of the mean.
![Page 80: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/80.jpg)
Example 1 ( Heights of American Women)
The distribution of heights of American women aged 18-24 is approximately normal with mean 64.5 inches and standard deviation 2.5 inches. Use the 68-95-99.7 rule to interpret the data.
Calculate the intervals of 1, 2 and 3 standard deviations:
• The interval of 1 standard deviation:
• The interval of 2 standard deviation:
• The interval of 3 standard deviation:
![Page 81: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/81.jpg)
Example 1 ( Heights of American Women)
• The interval of 1 standard deviation:64.5 − 2.5, 64.5 + 2.5 =[62, 67]
• The interval of 2 standard deviation:
[64.5 − 2 × 2.5, 64.5 + 2× 2.5] = [59.5, 69.5]
• The interval of 3 standard deviation: [64.5 − 3 × 2.5, 64.5 + 3
× 2.5] = [57, 72]
Apply the 68-95-99.7 rule. • About 68% of young women
are between 62 and 67 inches tall.
• About 95% of young women are between 59.5 and 69.5 inches tall.
• About 99.7% of young women are between 57 and 72 inches tall.
• About 2.5 % of young women are taller than 69.5 inches.
![Page 82: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/82.jpg)
Example
The scores of students on a standardized test form a normal distribution with a mean of 400 and a standard deviation of 30. One thousand students took the test. Find the number of students who score above 460.
![Page 83: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/83.jpg)
Example
The distribution of the scores on a standardized exam is approximately normal with mean 100 and standard deviation 15. What percentage of scores lie between 115 and 130?
![Page 84: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/84.jpg)
Chapter 7 Data for Decisions
![Page 85: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/85.jpg)
Sampling Terminology Example
The population is the entire group
from which you are getting
information.
The population for a study of
childhood cancer in the USA is all
childhood cancer patients in the USA.
A sample is used, when data are
collected from only part of the
population. This sample must be
representative of the population.
Valid conclusions are obtained when
the sample results represent those of
the population.
When a census is conducted, data are
collected from the entire population.
Sample might be childhood cancer
patients in the largest children’s
hospital in each State.
![Page 86: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/86.jpg)
Potential Problem
Problems may arise if a person does not consider bias, use of language, ethics, cost and time, timing, privacy, cultural sensitivity.
![Page 87: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/87.jpg)
Potential
Problem
What it means Example
Bias The question influences in favor
of, or against the topic of the
data collection.
Suppose a person asks: ‘Don’t you think
calories of McDonald’s foods are too
high?’
This person has a bias against the calories
of McDonald’s foods. The bias influences
how the survey questions are written.
Use of
Language
The use of language in question
could lead people to give a
particular answer.
‘Don’t you think calories of McDonald’s
foods are too high?’ may lead people to
answer yes. A better question would be
‘do you think calories of McDonald’s foods
are too low, low, medium, high, too high?’
Timing When the data are collected
could lead to particular results.
A survey is conducted to find opinions on
the need for a winter tire. The answer may
vary depending on whether Barbourville
has a lot of snow or not.
![Page 88: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/88.jpg)
Potential
Problem
What it means Example
Privacy If the topic of the data
collection is personal, a
person may not want to
participate or may give an
untrue answer on purpose.
Anonymous surveys may
help.
Suppose you are a grade 9 teacher
and plan to conduct a survey about
smoking in classes he or she is
teaching. Students who smoke may be
afraid of punishment and may try to
avoid participating survey.
Cultural
Sensitivity
Cultural sensitivity means you
are aware of other cultures.
You must avoid being
offensive and asking
questions that do not apply
to that culture.
You go to Muslim community and
survey their favorite cooking method
of pork. For example, circle your
favorite method of cooking pork:
Fry Barbeque Bake
![Page 89: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/89.jpg)
Potential
Problem
What it means Example
Ethics Ethics dictate that the
collected data must not be
used for purposes other than
those told to the participants.
Otherwise, your actions are
considered unethical.
Suppose you tell to your classmates
that you want to know their favorite
snacks to help you plan your birthday
party. If you use that to sell favorite
snacks to your classmates, then it is
unethical.
Cost The cost of collecting data
must be taken into account.
Printing questionnaires, Pay people to
collect data
Time The time needed for
collecting data must be
considered.
A survey that takes an hour to
complete may be too long. This will
limit the number of people who are
willing to participate.
![Page 90: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/90.jpg)
Inferences
![Page 91: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/91.jpg)
Learning Objective
Statistical Inference: How do we generalize the collected data from samples?
• Parameter vs. Statistic
• Confidence interval: related to 68-95-99.7 rule
• Central limit theorem: related to the sizes of each sample
• Law of large numbers: related to the number of samplings (experiments)
![Page 92: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/92.jpg)
A survey question:
‘I like buying new clothes, but shopping is often frustration and time consuming.’
Circle your opinion:
Agree disagree
Sampling method: nationwide random selection of 2500 adults
![Page 93: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/93.jpg)
Terminology Meaning Example
Statistical
inferences
Methods for drawing
conclusions about
the entire population
on the bases of data
from a sample.
Drawing conclusions
about an entire
population of 230
million American
adults.
![Page 94: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/94.jpg)
Terminology Meaning Example
Parameter
Notation: p
Fixed (usually unknown)
number that describes a
population such as
proportion, mean or
standard deviation
Suppose that 60% of entire
American population agreed.
Then, 0.6 or 60% is the
parameter.
Statistic
Notation: p
Number that describes a
sample.
Known for the sample
we take
Varies from sample to
sample
Useful to estimate an
unknown parameter
Suppose that 1650 adults
from the random sample of
2500 adults answered that
they agree. Then, the statistic
is the proportion 0.66( 0r 66%
from 1650
2500)
![Page 95: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/95.jpg)
To draw a conclusion for the entire American adult population, we take the following steps:
Steps, Terminology, Property
Example
Simulation: Drawing many
samples at random from a
population that we specify.
Assumption in SRS: All
possible samples of n objects
are equally likely to occur.
SRS (Simple Random
Sampling): Draw 1000
separate samples of size 100
from a population that we
suppose has a parameter
value p=0.6 by generating a
computer program.
![Page 96: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/96.jpg)
To draw a conclusion for the entire American adult population, we take the following steps:
Steps, Terminology,
Property
Example
Take a large number of
random samples from the
same population.
Draw 1000 separate samples of size
100 from a population that we
suppose has a parameter value p=0.6
Calculate the sample
proportion p for each
sample.
p
=Count of successes in the sample
size of sample
=# of Agree
100
![Page 97: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/97.jpg)
To draw a conclusion for the entire American adult population, we take the following steps:
Steps, Terminology,
Property
Example
Make a histogram of
the values of p
![Page 98: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/98.jpg)
To draw a conclusion for the entire American adult population, we take the following steps:
Steps, Terminology, Property
Example
Examine the distribution displayed in
the histogram for shape, center, and
spread, as well as outliers or other
deviations.
When we analyze the curve, we get the
following information. Center and
spread are based on the actual
calculation with specific sampling(1000
separate samples of size 100).
Shape: the sampling distribution of p
is approximately normal because
each sample sizes are 100.
Center: 0.598
Spread: Standard deviation =0.051
![Page 99: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/99.jpg)
Theorem: (Sampling Distribution of a Sample Proportion)
Assumption: Choose an SRS of size n from a large population that contains population proportion p of successes.
Shape: For large sample sizes( 𝑛 ≥ 30), the sampling distribution of p is approximately normal. (Central limit theorem)
• Center: The mean of the
sampling distribution p = the parameter p.
• Spread: Standard deviation of the sampling distribution
of p =p(1−p)
n
![Page 100: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/100.jpg)
Theorem:
Sampling Distribution of a
Sample Proportion
Example( Continued)
Assumption: Choose an SRS of
size n from a large population
that contains population
proportion p of successes.
n=100
We took a simple random
sample from the population
proportion p=0.6 of successes.
![Page 101: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/101.jpg)
Theorem:
Sampling Distribution of a Sample
Proportion
Example( Continued)
Shape: For large sample sizes(
𝑛 ≥ 30), the sampling distribution
of p is approximately
normal.(Central limit theorem)
Center: The mean of the sampling
distribution p = the parameter p.
Spread: Standard deviation of the
sampling distribution of p =
p(1−p)
n
Shape: the sampling distribution of
p is approximately normal because
each sample sizes are 100.
Center:
Mean of the sampling distribution
p = 0.598 ≈ 0.6 = p (the parameter
p).
Spread: Standard deviation of the
sampling distribution of
p =0.6(1−0.6)
100= 0.0024 ≈ 0.04899
![Page 102: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/102.jpg)
Terminology
Example
Margin of Error=2𝑝 (1−𝑝 )
𝑛
Meaning: It is 95% confident
that true p is within a range of
p ± 2𝑝 (1−𝑝 )
𝑛
Why we use margin of error? In
real life, we don’t know what
true parameter p is. Thus, we
want to draw conclusion about
true parameter p from the
chosen simple random sample.
Margin of error with the chosen sample
we have been considering is
20.049(1 − 0.049)
100≈ 0.043
It is 95% confident that true p is within a
range of p ± 2𝑝 (1−𝑝 )
𝑛= 0.598 ±
0.043 = 0. 555 𝑡𝑜 0.641
Remark: we can see that true parameter
p(=0.6) is in the range of 0.555 and
0.641.
![Page 103: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/103.jpg)
Law of large numbers( informal )
If the number of times a situation is repeated becomes larger and larger, the proportion of successes( i.e., expected outcomes or events really happen) will tend to come closer and closer to the actual probability of success.
![Page 104: Chapter 5 Exploring Data: Distributions - Seongchun Kwon's …skwon.org/Statistics.pdf · · 2012-10-15Group neighboring data values into consecutive non-overlapping ... 40 60 80](https://reader031.fdocuments.in/reader031/viewer/2022022504/5ab59db77f8b9a6e1c8d0879/html5/thumbnails/104.jpg)
Law of large numbers(formal)
Observe any random phenomenon having numerical outcomes with finite mean 𝜇. As the random phenomenon is repeated a large number of times,
• The proportion of trials on which each outcome occurs gets closer and closer to the probability of that outcome.
• The mean 𝑥 of the observed values gets closer and closer to 𝜇.