Section 6A Characterizing a Data Distribution
description
Transcript of Section 6A Characterizing a Data Distribution
Section 6ASection 6ACharacterizing a Characterizing a Data DistributionData Distribution
Pages 380-388Pages 380-388
Definition -The distribution of a variable (or data set) describes the values taken on by the variable and the frequency (or relative frequency) of these values.
Example: Lengths of words in the Gettysburg Address
word length Frequency
1 1
2 5
3 49
4 53
5 59
6 35
7 24
8 19
9 5
10 10
11 7
total 267
word length
Frequency
1086420
60
50
40
30
20
10
0
710
5
19
24
35
59
53
49
5
1
Histogram of word length
How do we characterize a data How do we characterize a data distribution?distribution?
Center =Average
- Mean- Mean- Median- Median- Mode- Mode
- Effect of an Outlier- Effect of an Outlier- Confusion- Confusion
Shape of a Distribution
- Number of Peaks- Number of Peaks- Symmetry or Skewness- Symmetry or Skewness- Variation- Variation
more in section 6Bmore in section 6B
AveragesAverages
The word “Average” actually has The word “Average” actually has several meanings.several meanings.
Generally – Generally –
average = average = centercenter of a of a distributiondistribution
or or typical typical representativerepresentative
6-A
The mean is what we most commonly call the average value. It is defined as follows:
The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even).
The mode is the most common value (or group of values) in a distribution.
Measures of Center in a Measures of Center in a DistributionDistribution
6-A
sum of all valuesmeantotal number of values
Mean DistanceMean Distance6-A
PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)
MercuryMercury 3636
VenusVenus 6767
EarthEarth 9393
MarsMars 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
*Pluto*Pluto 3,6543,654
MeanMean distance distance
36 + 67 + 93 + 142+ 484 + 887 + 36 + 67 + 93 + 142+ 484 + 887 + 1,765 + 2,791 + 3,654 1,765 + 2,791 + 3,654
= 9,922= 9,922
Mean distance= 9,922/ 9 Mean distance= 9,922/ 9
= = 1,102.41,102.4 million milesmillion miles
sum of all valuesmeantotal number of values
6-A
Median DistanceMedian Distance6-A
PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)
MercuryMercury 3636
VenusVenus 6767
EarthEarth 9393
MarsMars 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
*Pluto*Pluto 3,6543,654
The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even).
Median DistanceMedian Distance6-A
PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)
MercuryMercury 3636
VenusVenus 6767
EarthEarth 9393
MarsMars 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
PlutoPluto 3,6543,654
4 below
4 above
Steps for Finding the Steps for Finding the MedianMedian
1.1. Sort the data (put it in order) !!!!!!Sort the data (put it in order) !!!!!!2.2. Count the data (Count the data (nn pieces). Decide pieces). Decide
if if nn is is oddodd or or eveneven..3.3. If If nn is is oddodd – the median will be in – the median will be in
positionposition (n+1)/2. (n+1)/2.4.4. If If nn is is eveneven – the median will be – the median will be
locatedlocated halfway between the halfway between the numbers in positions n/2 and numbers in positions n/2 and (n+1)/2.(n+1)/2.
6-A
Median Distance – ‘Real’ Planet Median Distance – ‘Real’ Planet ListList
6-A
PlanetPlanet Distance from sun Distance from sun (millions of (millions of miles)miles)
MercuryMercury 3636
VenusVenus 6767
EarthEarth 9393
MarsMars 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
Median DistanceMedian Distance 6-A
PlanetPlanet Distance from sun (Distance from sun (millions millions of miles)of miles)
MercuryMercury 3636
VenusVenus 6767
MarsMars 9393
EarthEarth 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
Median is halfway between 142 and 484, so
median = (142+484)/2 = 313
Comment about the Comment about the MedianMedian
1.1. The median splits the data The median splits the data into two equal-sized pieces.into two equal-sized pieces.
2.2. Half the data (50%) will be Half the data (50%) will be below the median.below the median.
3.3. Half the data (50%) will be Half the data (50%) will be above the median.above the median.
6-A
Mode DistanceMode Distance6-A
PlanetPlanet Distance from sun Distance from sun (millions (millions of miles)of miles)
MercuryMercury 3636
VenusVenus 6767
EarthEarth 9393
MarsMars 142142
JupiterJupiter 484484
SaturnSaturn 887887
UranusUranus 1,7651,765
NeptuneNeptune 2,7912,791
*Pluto*Pluto 3,6543,654
The mode is the most common value (or group of values) in a distribution.
Mode ExamplesMode Examples6-A
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 2 6 6 8 9 10 8
Mode ExamplesMode Examples6-A
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 2 6 6 8 9 10 8
Mode is Mode is
55
Mode ExamplesMode Examples6-A
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 2 6 6 8 9 10 8
Mode is Mode is
55
BimodalBimodal
Mode ExamplesMode Examples6-A
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 2 6 6 8 9 10 8
Mode is 5Mode is 5
Bimodal Bimodal
TrimodalTrimodal
The ModeThe Mode
You may not have one!You may not have one! Could have multiple modes!Could have multiple modes! The mode is easy to spot in a The mode is easy to spot in a
graph – it occurs at the peak.graph – it occurs at the peak. The mode is the only measure The mode is the only measure
of “center” available for of “center” available for categorical datacategorical data – – e.g. gendere.g. gender
6-A
How do we characterize a data distribution?
Average
- Mean- Median- Mode
- Effect of an Outlier- Confusion
Shape of a Distribution
- Number of Peaks- Symmetry or Skewness- Variation
OutliersOutliers
An An outlieroutlier is an observation that is is an observation that is much higher (or much lower) than all much higher (or much lower) than all the other values in your list.the other values in your list.
i.e. – an i.e. – an extremely unusualextremely unusual observation.observation.
Note – every not every set of data has Note – every not every set of data has outliers. The minimum and maximum outliers. The minimum and maximum values are not necessarily outliers!!!values are not necessarily outliers!!!
The Effect of an OutlierDefinition: An outlier is a data value that is much higher or much lower than almost all other values.
Five graduating seniors on a college basketball team receive the following first-year contract offers to play in the National Basketball Association: $0, $0, $0, $0, $3,500,000
(0+0+0+0+3500000)mean = $700,000
5
median: 0, 0, 0, 0, $3,500,000 median: $0
mode: 0, 0, 0, 0, $3,500,000
mode: $0
Including an outlier can pull the mean significantly upward or downward.Including an outlier does not significantly affect the median.Including an outlier does not affect the mode.
A track coach wants to determine an appropriate heart rate for her athletes during their workouts. In the middle of the workout, she reads the following heart rates (beats/min) from five athletes: 130, 135, 140, 145, 325.
The Effect of an Outlier
_____________________________________________Cleary 325 is an outlier. Clearly 325 is a mistake (faulty heart monitor?)
(130+135+140+145+325)mean = 175bpm
5
median: 130, 135, 140, 145, 325
median: 140 bpm
(130+135+140+145)mean = 137.5bpm
4
Throw out the outlier?
median: 130, 135, 140, 145 median: 137.5 bpm
mode: none
mode: none
How do we characterize a data distribution?
Average
- Mean- Median- Mode- Effect of an Outlier- Confusion
Shape of a Distribution
- Number of Peaks- Symmetry or Skewness- Variation
Mean vs. MedianMean vs. MedianA news article reports that of the 411 A news article reports that of the 411
players on the NBA roster in February, players on the NBA roster in February, 1988, only 139 “made more than the 1988, only 139 “made more than the league league average salaryaverage salary of $2.36 of $2.36 million.”million.”
Recall that the word “average” can have Recall that the word “average” can have several interpretations. In this case, is several interpretations. In this case, is $2.36 million the $2.36 million the meanmean or the or the median median salarysalary for 1988 NBA players? for 1988 NBA players? Explain.Explain.
6-A
Confusion about “Average”
A newspaper surveys wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right?
median: $19
salaries: $19, $19, $19, $19, 39
(19+19+19+19+39) $115mean = $23
5 5
Confusion about “Average”
A newspaper survey wages for assembly workers and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact is $23 per hour. Can they both be right?
median: $23
salaries: $6, $20, $23, $23, $23
(6+20+23+23+23) $95mean = $19
5 5
Confusion about “Average”All 100 first-year students at a small college take three courses in the Core Studies Program. The first two courses are taught in large lectures, with all 100 students in a single class. The third course is taught in ten classes of 10 students each. The students claim that the mean size of their Core Studies classes is 70. The administrators claim that the mean class size is only 25 students. Explain.
Students say my average class size is:
(100+100+ 10)70
3
Administrators say the average Core Studies class size is:
(total students enrolled in all Core Studies classes) 30025
(number of Core Studies classes) 12
mean class size per student
mean number of students per class
How do we characterize a data distribution?
Average
- Mean- Median- Mode- Effect of an Outlier- Confusion
Shape of a Distribution
- Number of Peaks- Symmetry or Skewness- Variation
Describing a distributionDescribing a distribution6-A
Shape of a DistributionSymmetry and Skewness
Mode = Mean = Median
SYMMETRIC
A distribution is symmetric if its left half is a mirror image of its right half.
SKEWED LEFT(negatively)
Mean Mode Median
Shape of a DistributionSymmetry and Skewness
A distribution is left-skewed if its ‘tail’ is on the left.
SKEWED RIGHT(positively)
Mean Mode Median
Shape of a DistributionSymmetry and Skewness
A distribution is right-skewed if its ‘tail’ is on the right.
Symmetric and Skewed Symmetric and Skewed DistributionsDistributions
6-A
Mode = Mean = Median
SYMMETRIC
SKEWED LEFT(negatively)
Mean Mode Median
SKEWED RIGHT(positively)
Mean Mode Median
Use Mean to describe center
Use Median to describe center
Do you expect the distribution of heights of 100 women to be symmetric, left-skewed, or right-skewed? Explain.
Do you expect the distribution of speeds of cars on a road where a visible patrol car is using radar to be symmetric, left-skewed, or right skewed. Explain.
Shape of a DistributionSymmetry and Skewness
Variation = horizontal spreadVariation = horizontal spread
6-A
High variationModerate variation
Low variation
How would you expect the variation to differ between times in the Olympic marathon and times in the New York Marathon? Explain.
Describing a distributionDescribing a distribution ShapeShape
Number of peaks, symmetry/skewnessNumber of peaks, symmetry/skewness Outliers?Outliers?
CenterCenter Use Use meanmean if the data is if the data is symmetricsymmetric Use Use medianmedian is there is a is there is a strong skewstrong skew or or
are are outliersoutliers Spread
Horizontal spread – Is the data tightly clustered around the center? (low or high variation?)
6-A
HomeworkHomework
Pages 388-390Pages 388-390
# 10, 14, 18, 21, 22, 27, 28, 30, # 10, 14, 18, 21, 22, 27, 28, 30, 35, 38*35, 38*
* It is not necessary to draw the * It is not necessary to draw the sketch for this one.sketch for this one.
6-A