Post on 07-Mar-2018
What is statistics?
Statistics is the science of:
Collecting information
Organizing and summarizing the information collected
Analyzing the information collected in order to draw conclusions
Two types of Statistics
Descriptive Statistics
Organizing and summarizing the
information collected.
Inferential Statistics
Draws conclusion from the information
collected.
Chapter 1
Exploring Data
Lesson 1-1, Displaying
Distributions with Graphs
Bar Graphs and Pie Charts
Data
Individuals
are objects described by a set of data. Individuals may be people, animals or things.
Variable
is any characteristic of an individual. A variable can take different values for different individuals
Types of Variables
Categorical variable
allows for classification of individuals
based on some attribute or
characteristics.
Quantitative variable
provides numerical measures of
individuals.
Example, Page 7, #1.2
Data from a medical study contain values of many
variables for each of the people who where subjects
of the study. Which of the following variables are
categorical and which are quantitative?
Example, Page 7, #1.2
a) Gender (female or male)
categorical
b) Age (years)
Quantitative
c) Race (Asian, black, white or other)
categorical
d. Smoker (yes or no)
categorical
e) Systolic blood pressure (millimeters of mercury)
Quantitative
f) Level of calcium in blood (micrograms per milliliter)
Quantitative
Distribution
Distribution Tells us what values the variable takes
and how often it takes each value
Displaying Distributions
Categorical Variables
Bar Graphs
Pie Charts
Quantitative Variables
Dotplots
Stemplots
Histograms
Example – Page 11, #1.6
In 1997 there were 92,353 deaths from accidents in the
United States. Among these were 42.340 deaths from
Motor vehicle accidents, 11,858 from falls, 10,163 from
poisoning, 4051 from drowning, and 3601 from fires.
A) Find the percent of accidental deaths from each of
these causes, rounded to the nearest percent. What
percent of accidental deaths were due to other causes?
Example – Page 11, #1.6
Accidents Number Percentage
Motor Vehicle 42,340
Falls 11,858
Poisoning 10,163
Drowning 4051
Fires 3601
Other Causes 20,340
Total 92,353
42,34045.8 46%
92,353
13%
11%
4%
4%
22%
100%
Example – Page 11, #1.6
STAT
Example – Page 11, #1.6
Example – Page 11, #1.6
B) Make a well-labeled bar graph of the distribution of
causes of accidental deaths. Be sure to include an
“other causes” bar.
Example – Page 11, #1.6
Causes of Accidental Deaths
Pe
rce
nta
ge
of A
ccid
en
tal D
ea
ths US Accidental Death – 1997
50
40
10
20
30
MV Falls Drown FiresPoison OC
Example – Page 11, #1.6
C) Would it also be correct to use a pie chart to display
these data? If so, construct the pie chart. If not
explain why not.
Yes, since categories represent parts of a whole.
Example – Page 11, #1.6
Accidents Number Percentage
MV 42,340
Falls 11,858
Poisoning 10,163
Drowning 4051
Fires 3601
OC 20,340
Total 92,353
46%
13%
11%
4%
4%
22%
100%
Pie Chart
0.46 360
165.6 166
47
40
14
14
79
360°
Example – Page 11, #1.6
Example – Page 11, #1.6
US Accidental Deaths - 1997
46%
13%
11%
4%
4%
22% Motor Vehicle
Falls
Poisoning
Drowning
Fires
Other Causes
Lesson 1-1, Displaying
Distributions with Graphs
Dot Plots and Stem Leaf Plots
Overall Pattern of Distribution
(Quantitative Variables)
Center Divides the data in half
Spread Smallest to largest values
Shape Skewness of the data
Outlier Data that falls outside of the pattern
Example – Page 16, #1.8
Are you driving a gas guzzler? Table 1.3 displays the highway
gas mileage for 32 model year 2000 midsize cars.
A). Make a dot plot of these data.
Example – Page 16, #1.8
Example – Page 16, #1.8
21 23 25 27 29 31 33
Highway Gas Mileage
21 23 25
Example – Page 16, #1.8
B) Describe the shape, center, and spread of the distribution
of gas mileages. Are there any potential outliers?
The shape of the distribution is skewed to the left, with a
major peak at 28 and a minor peak at 24. The spread is
relatively narrow (21 to 32 mpg). The two observations at
21 and the observation at 32 appear to outliers. The center
is 28 mpg.
Example – Page 35, #1.28
In 1978 the English scientist Henry Cavendish measured
the density of the earth by careful work with a torsion balance.
The variable recorded was the density of the earth as a
multiple of the density water. Here are Cavendish’s 29
measurements:
5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65
5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39
5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85
Example – Page 35, #1.28
5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65
5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39
5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85
Present these measurements graphically in a stemplot.
Discuss the shape, center, and spread of the distribution.
Are there any outliers? What is estimate of the density of
the earth based on these measurements?
Example – Page 35, #1.28
48 8
49
50 7
51 0
52 6 7 9 9
53 0 4 4 6 9
54 2 4 6 7
55 0 3 5 7 8
56 1 2 3 5 8
57 5 9
58 5
48|8 = 4.88%
The shape of the distribution is
roughly symmetric with one
possible outlier at 4.88 that is
somewhat low. The spread
between 4.88 to 5.85. The
center of the distribution if
between 5.4 and 5.5. Based on
the plot, we would estimate the
Earth’s density to be about
halfway between 5.4 and 5.5.
Density of the Earth
Lesson 1-1 Displaying
Distributions with Graphs
Histograms and Relative
Frequency Graphs
Histogram and categories
18 23 28
0
10
20
30
40
50
60
Age (in years)
Fre
quency (
Count)
Age of Spring 1998 Stat 250 Students
n=92 students
2 3 4
0
1
2
3
4
5
6
7
GPAF
req
uency (
Co
unt)
GPAs of Spring 1998 Stat 250 Students
n=92 students
too few categories too many categories
Example – Histogram
Suppose you are considering investing in a Roth IRA.
You collect the data table, which represent the three-year
rate of return (in percent) for 40 small capitalization growth
mutual funds.
27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7
16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0
10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2
24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5
35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1
Example – Histogram
STAT
Example – Histogram
A) Construct a histogram to display these data. Record
your class intervals and counts
Step 1 – Find the class intervals
Locate the smallest number (10.8) and the largest
number (47.7)
Lower class limit will be 10.0 with a class width of 5
Example – Histogram
3-yr Rate of Return Frequency
Total
10.00
15.0
14.9
20.0
25.0
30.0
35.0
40.0
45.0
19.9
24.9
29.9
34.9
39.9
44.9
49.9
7
11
8
6
3
3
0
2
40
Example – Histogram
Stat Plot
2nd Y=
Step – 2 Graph it using the TI
Window
Example – Histogram
Graph Trace
Example - Histogram
10 15 20 25 30 35 40 45 50
4
8
12
Rate of Return
Fre
qu
ency
3 – Year Rate of Return of Mutual Funds
Example – Histogram
B) Describe the distribution of 3 – Year Rate of Return.
The shape of the distribution is
skewed to the right with the
center at class 15.0% – 19.9%.
There is one outlier in class the
45.0% – 49.9%. The spread is
between 10% to 50%.
Shape of a Distribution
Uniform (symmetric)
Bell-shaped (Symmetric)
Skewed Right
Skewed Left
Uniform Distribution
Symmetric – Bell Shaped
Skewed Right
Skewed left
Example – Relative Cumulative Frequency
Suppose you are considering investing in a Roth IRA.
You collect the data table, which represent the three-year
rate of return (in percent) for 40 small capitalization growth
mutual funds.
27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7
16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0
10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2
24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5
35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1
Example – Relative Cumulative Frequency
Class Freq Relative
Frequency
Cumulative
Frequency
Relative cumulative
Frequency
10.0 – 14.9 7
15.0 – 19.9 11
20.0 – 24.9 8
25.0 – 29.9 6
30.0 – 34.9 3
35.0 – 39.9 3
40.0 – 44.9 0
45.0 – 49.9 2
Total 40 1
70.175
40
0.20
0.275
0.15
0.075
0.075
0
0.05
7
7 111 8
18 28 6
32
35
38
38
40
0.175
0.2750.175 0.45
0.20.4 655 0.
0.8
0.875
0.95
0.95
1
Example – Relative Cumulative Frequency
Class Freq Rel Freq Cum Freq Rel Cum Freq
20.0 – 24.9 8 0.2 26 0.65
45.0 – 49.9 2 0.05 40 1
26 of the 40 mutual funds had a 3 year rate of
return of 24.9% or less
65% of the mutual funds had 3 year rate of return of
24.9% or less
A mutual fund with a 3 year rate of return of 45% or
higher is out performing 95% of its peers.
Example – Relative Cumulative Frequency
L3 – Upper Class Limits
L4 – Relative Cumulative Frequency
Example – Relative Cumulative Frequency
Example – Relative Cumulative Frequency
3 Year Rate of Return for Small Capitalization
Mutal Funds
0
0.2
0.40.6
0.8
1
1.2
10 14.9 19.9 24.9 29.9 34.9 39.9 44.9 49.9
Rate of Return
Cu
mu
lati
ve
Rela
tive F
req
uen
cy
Lesson 1-2 Describing
Distributions with Numbers
Measuring the center
Mean
1 2 ... nx x x
Xn
To find the sample mean add up all of the observations and
divided by the number of observations.
x
Xn
Is affected by unusual values called outliers.
Median
Another name for the 50th percentile
Is not affected by unusual values called outliers
The median is the midpoint of a distribution, such
that half the observation are smaller and the other
half are larger.
Center and Distribution
Mean < Median
Skewed Left
Mean = Median
Symmetric
Mean > Median
Skewed Right
Measuring the Spread
Range
Quartiles
Boxplots
Standard Deviation
Variance
Range
The range is the difference between the largest
and smallest observation.
max minR x x
Quartiles
1Q 2Q
Quartiles divides the observation into fourths, or four equal
parts.
Smallest
Data Value
Largest
Data Value3Q
25% of
the data
25% of
the data
25% of
the data
25% of
the data
Interquartile Range (IQR)
The interquartile range (IQR) is the distance between
the first and third quartiles
3 1IQR Q Q
Outliers
3 1.5( )Q IQR
Upper Cutoff
Lower Cutoff
1 1.5( )Q IQR
Five Number Summary
Smallest observation (minimum)
Quartile 1
Quartile 2 (median)
Quartile 3
Largest observation (maximum)
Example – Page 41, #1.32
The Survey of Study Habits and Attitudes (SSHA) is a
Psychological test that evaluates college students’
Motivation, study habits and attitudes toward school.
A private college gives the SSHA to a sample of 18 of
Its incoming first-year women students. There scores are
154 109 137 115 152 140 154 178 101
103 126 126 137 165 165 129 200 148
Example – Page 41, #1.32
A) Make a stemplot of these data. The overall shape of the
distribution is irregular, as often happens when only a few
observations are available. Are there any potential
outliers? About where is the center of the distribution (the
score with half the scores above it and half below)?
What is the spread of the scores (ignoring any outliers)?
STATEDIT1:edit
Example – Page 41, #1.32
10 1 3 9
11 5
12 6 6 9
13 7 7
14 0 8
15 2 4 4
16 5 5
17 8
18
19
20 0
200 is a potential outlier. The center
Is approximately 140. The spread
(excluding 200) is 178 – 101 = 77.
Example – Page 41, #1.32
154 109 137 115 152 140 154 178 101
103 126 126 137 165 165 129 200 148
Example – Page 41, #1.32
B) Find the mean.
C) Find the median of these scores. Which larger: the
median or the mean? Explain why.
141.058x
138.5Median
The mean is larger than the median because the outlier
at 200, which pulls the mean towards the long right
tail of the distribution.
Example – Page 47, #1.36
Here are the scores on the Survey of Study Habits and
Attitudes (SSHA) for 18 first-year college women:
and for 20 first-year college men:
A) Make side-by side boxplots to compare the distribution.
154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148
108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104
Example – Page 47, #1.36
Men
Wo
men
0 40 80 120 160 200
SSHA SCORES Box Plot
Example – Page 47, #1.36
B) Compute the numerical summaries for these two
distributions.
Min Q1 Median Q3 Max
Women 141.06 101 126 138.5 154 200
Men 121.25 70 98 114.5 143 187
x
Example – Page 47, #1.36
C) Write a paragraph comparing SSHA scores for men and
women.
All the displays and descriptions reveal that women
generally score higher than men. The men’s scores
(IQR = 45) are more spread out than the women’s
(even if we don’t ignore the outlier). The shapes of the
distributions are reasonable similar, with each
displaying right skewness.
Describing Distributions with
Numbers
Standard Deviation and
Variance
Standard Deviation
The standard deviation (s) measures the average distance
of observations from their mean.
Example, Page 52, #1.40
The level of various substances in the blood influence
our health. Here are measurements of the level of
phosphate in the blood of a patient, in milligrams
of phosphate per deciliter of blood, made on 6
consecutive visits to a clinic.
5.6 5.2 4.6 4.9 5.7 6.4
Example, Page 52, #1.40
5.6 5.2 4.6 4.9 5.7 6.4
A. Find the mean.
5.6 5.2 4.6 4.9 5.7 6.4 32.45.4
6 6x
Example, Page 52, #1.40
Observation Deviations Square Deviations
5.6
5.2
4.6
4.9
5.7
6.4
ixix x
2
ix x
5.6 5.4 0.2
5.2 5.4 0.2
4.6 5.4 0.8
4.9 5.4 0.5
5.7 5.4 0.3
6.4 5.4 1
0
Example, Page 52, #1.40
5.04.5 5.5 6.56.0
5.4x 4.6x 6.4x
0.8 1
Example, Page 52, #1.40
Observation Deviations Square Deviations
5.6
5.2
4.6
4.9
5.7
6.4
ixix x
2
ix x
5.6 5.4 0.2
5.2 5.4 0.2
4.6 5.4 0.8
4.9 5.4 0.5
5.7 5.4 0.3
6.4 5.4 1
0SUM
2(0.2) 0.04
0.04
0.64
0.25
0.09
1
2.06SUM
Example – Page 52, #1.40
22 1
1is x x
n
B) Find the standard deviation (s) from its definition.
1 1
2.06 2.06 0.4126 1 5
0.412 0.64187 0.6419 2s s
Example – Page 52, #1.40
C) Use your TI-83 to find and Do the result agree with
part B.
x .s
STAT
Example – Page 52, #1.40
Standard Deviation
Standard deviation (s) is the square root of the
variance (s² )
Units are the original units
Measures spread about the mean and should only
be used when the mean is chosen as the center
If s = 0 then there is no spread. Observations are
the same value
As s gets larger the observations are more spread
out.
Highly affected by outliers. Best for symmetric data
Variance
Variance (s²) measures the average
squared deviation of observations from the
mean
Units are squared
Highly affected by outliers.
How to Choose?
Skewed Distribution or Outliers
Five number summary
Symmetric Distribution or No Outliers
Mean
Standard Deviation
Homework
HW, page 52, #1.41, 1.43
Read pages 53 – 61
Linear Transformation
A linear transformation changes the original variable
x into the new variable xnew given by an equation of the
form
newx a bx
Adding the constant a shifts all values of x upward or
downward by the same amount.
Multiplying by the positive constant b changes the size
of the unit of measurement.
Example – Page 56, #1.44
Maria measures the lengths of 5 cockroaches that
she finds at school. Here are her results in inches
1.4 2.2 1.1 1.6 1.2
A. Find the mean and standard deviation.
Example – Page 56, #1.44
1.4 2.2 1.1 1.6 1.2
Example – Page 56, #1.44
B) Maria’s science teacher is furious to discover that
she has measured the cockroaches lengths in inches
rather than centimeters. (There are 2.54 cm in 1 inch).
Find the mean and standard deviation of the 5
cockroaches in centimeters.
1.5
1.5(2.54)
3.81
x
cm
0.436
0.436(2.54)
1.017
s
cm
Example – Page 56, #1.44
C) Considering the 5 cockroaches that Maria found
as a small sample from the population of all
cockroaches at her school, what would you
estimate as the average length of the population
of cockroaches? How sure of your estimate
are you?
The average cockroach length can be estimate
as the mean length of the 5 sampled cockroaches
of 1.5 inches. This is a questionable estimate,
because the sample is so small.
Example – Page 63, #1.56
A change of units that multiplies each unit by b, such
as change from inches x to centimeters
xnew, multiplies our usual measures of spread by b. This
is true of the IQR and standard deviation. What happens
to the variance when we change units this way?
0 2.54newx x
Variance is changed by a factor of 2.54² = 6.4516
Homework
HW, Page 56, #1.45
HW, Page 63, #1.55
1-2 Describing Distributions
with Numbers.
Comparing Distributions
Example – Page 59, #1.48
The table below gives the distribution grades earned
by students taking the Calculus AB and Statistics
exam in 2000.
5 4 3 2 1
16.8% 23.2% 23.5% 19.6% 16.8%
9.8% 21.5% 22.4% 20.5% 25.8%
Calculus
Statistics
A. Make a graphical display to compare the AP exam
grades for Calculus AB and Statistics.
2000 AP Exam
0.0
5.0
10.0
15.0
20.0
25.0
30.0
1 2 3 4 5
Grade on Exam
% o
f stu
den
ts E
arn
ing
Gra
de
Calculus AB
Statistics
Example – Page 59, #1.48
Example – Page 59, #1.48
B) Write a few sentences comparing the two
distributions of exam grade. Do you know which
now know which exam is easier? Why or why not?
The distributions are very similar for grades 2, 3, and 4.
The major difference occurs for grades 1 and 5. With a
larger proportion of Statistics students receiving a grade
of 1 and a smaller proportion of Statistics student receiving
a grade of 5.
This suggest that the Statistics exam is harder in the
sense that students are more likely to get a poor grade
on the Statistics Exam than on the Calculus AB exam.
Example – Page 63, 1.54
x
The mean and standard deviation s measure the center
and spread but are not a complete description of a
distribution. Data sets with different shapes can have
the same mean and standard deviation. To demonstrate
this fact, use your calculator to find and s for the
following to small data sets. Then make a stem plot
of each and comment on the shape of each distribution
x
Data A 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74
Data B 6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50
Example – Page 63, 1.54
Set A Set B
Example – Page 63, 1.54
3 1
4 7
5
6 1
7 2
8 1 1 7 7
9 1 1 2
Set A
3|1 = 3.1
5 2 5 7
6 5 8
7 0 7 9
8 4 8
9
10
11
12 5
Set B
Example – Page 63, 1.54
The means and standard are basically the same. Set A is
skewed to the left, while Set B has a higher outlier.
Homework
HW, Page 59, #1.47, #1.49
HW, Page 62, #1.51, 1,57