MATH& 146 Lesson 11 - Amazon S3 · MATH& 146 Lesson 11 Section 1.6 Categorical Data 1. Frequency...
Transcript of MATH& 146 Lesson 11 - Amazon S3 · MATH& 146 Lesson 11 Section 1.6 Categorical Data 1. Frequency...
MATH& 146
Lesson 11
Section 1.6
Categorical Data
1
Frequency
The first step to organizing categorical data is to count the number of data values there are in each category of interest.
We can organize these counts (or frequencies) into a frequency table, which records the totals and the category names.
2
Frequency
A class with 20 students had the following
distribution of grades:
A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F
3
GRADE FREQUENCY
A 3
B 5
C 3
D 6
F 3
GRADE FREQUENCY RELATIVE FREQUENCY
A 3 0.15
B 5 0.25
C 3 0.15
D 6 0.30
F 3 0.15
Relative Frequency
A relative frequency is the proportion of times a
category occurs. Relative frequencies can be
written as fractions, decimals, or percents.
4
GRADE FREQUENCYRELATIVE
FREQUENCY
CUMULATIVE RELATIVE
FREQUENCY
A 3 0.15 0.15
B 5 0.25 0.40
C 3 0.15 0.55
D 6 0.30 0.85
F 3 0.15 1.00
Cumulative Relative
Frequency
Cumulative relative frequency is the
accumulation of the previous relative frequencies.
5
Example 1
Fifty part-time students were asked how many courses
they were taking this term. The (incomplete) results
are shown below:
a. Fill in the blanks in the table above.
b. What percent of students take exactly two courses?
c. What percent of students take at most two courses?
6
# of Courses Frequency Relative Frequency
Cumulative Relative
Frequency
1 30 0.6
2 15
3
Graphs of Categorical Data
There are two simple visual summaries that are
used for categorical data
Circle graphs (pie charts) show the amount of
data that belong to each category as a proportional
part of the whole.
Bar graphs consist of bars that are separated
from each other. The bars can be rectangles or
they can be rectangular boxes and they can be
vertical or horizontal.
7
Graphs of Categorical Data
To get a better sense of graphing categorical data,
consider the following table about the Titanic. The
table lists the number and percentages in each class
on the Titanic's voyage.
8
CLASS FREQUENCY RELATIVE FREQUENCY
First 325 14.77%
Second 285 12.95%
Third 706 32.08%
Crew 885 40.21%
Total 2201 100.01%
When you are interested in relative frequencies, a
pie chart might be your display of choice.
Pie Charts
They slice the circle into
pieces whose size is
proportional to the
fraction of the whole in
each category.
9
10
Pie Charts
There are two rules to
follow when creating a
pie chart:
1) The pieces have to
add up to 100%.
2) No person can be
represented in
more than one
piece.
11
BAD PIE CHART
271% even without
an Other category.
Example 2
Which set of percentages
would best fit this pie
chart?
A. 54%, 8%, 30%, 8%
B. 47%, 23%, 8%, 22%
C. 51%, 17%, 15%, 17%
D. 27%, 26%, 24%, 23%
12
Bar Charts
A bar chart displays the distribution of a
categorical variable, showing the counts for each
category next to each other for easy comparison.
Notice that each bar is separated from each other.
13
Pie Charts vs. Bar Charts
While pie charts are well known, they are not
typically as useful as other charts. It is generally
more difficult to compare group sizes in a pie chart
than in a bar chart, especially when categories
have nearly identical counts or proportions.
14
Example 3
Use the graphs to rank the categories from largest
to smallest.
15
Example 4
Which category is largest? Which is smallest?
16
The Titanic
Here is part of a data matrix about the passengers
and crew aboard the Titanic. Each case (row) of
the data table represents a person on board the
ship.
Survived Age Sex Class
Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew17
The Titanic
The problem with data matrices is that you can't
see what's going on. And seeing is just what we
want to do. We need ways to show the data so
that we can see patterns, relationships, trends,
and exceptions.
Survived Age Sex Class
Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew18
The Titanic
To look at two categorical variables together, we
often arrange the counts in a two-way table. Here
is a two-way table of those aboard the Titanic,
classified according to class of ticket and whether
or not they survived.
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
19
The Titanic
Because the table shows how the individuals are
distributed along each variable, contingent on the
value of the other variable, such a table is called a
contingency table.
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
20
Class Frequency
First 325
Second 285
Third 706
Crew 885
Total 2201
The margins of the table, both on
the right and at the bottom, give
totals. The bottom line is just the
frequency table of the variable
Class.
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
21
The right column of the table is the frequency table
of the variable Survival.
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
Survival Frequency
Survived 711
Died 1490
Total 2201
22
Each cell of the table gives the count for a
combination of values of the two variables. For
example, the highlighted cell shows that 118
second-class passengers survived.
So what does the green highlighted cell show?
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
23
Row Proportions
The table below shows the row proportions for
the Titanic data set. The row proportions are
computed as the counts divided by their row totals.
24
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
Row Proportions
So what does 203/711 = .286 (first column, first
row) represent?
It corresponds to the proportion of survivors who
were in first class.
25
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
Example 5
a) What does 167/1490 = .112 (second column,
second row) represent in the table?
b) What does 885/2201 = .402 (fourth column,
third row) represent in the table?
26
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
Column Proportions
A contingency table of the column proportions is
computed in a similar way, where each column
proportion is computed as the count divided by the
corresponding column total.
27
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
Example 6
a) What does 167/285 = .586 (second column,
second row) represent in the table?
b) What does 711/2201 = .323 (fifth column, first
row) represent in the table?
28
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
Column Proportions
In the table, the value 0.625 indicates that 62.5%
of first class passengers survived. This rate of
survival is much higher compared to second class
passengers (41.4%), third class passengers
(25.2%), or crew members (24.0%).
29
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
Column Proportions
Because these differences in survival rates
between the classes is unlikely from random
chance alone, this provides evidence that the class
and survival variables are associated. We say the
two variables are dependent.
30
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
Example 3
A random set of 100 people who have pets were
polled to see if there was an association between
gender and whether they preferred either a dog or
a cat. The results of the survey are below.
31
Dog Cat Total
Male 40 10 50
Female 20 30 50
Total 60 40 100
Example 3 continued
a) Compute and interpret the column proportions.
b) Does there appear to be an association
between gender and type of pet? Explain.
32
Dog Cat Total
Male 40 10 50
Female 20 30 50
Total 60 40 100
Example 4
There are 10 boys and 12 girls in Mr. Fleck's fourth
grade class and 15 boys and 18 girls in Mrs. Parker’s
fourth grade class. One student is randomly selected
to be hall monitor.
a) Use this information to complete the contingency
table below.
33
Teacher
Gender
Boy Girl Total
Mr. Fleck
Mrs. Parker
Total
Example 4 continued
a) Compute and interpret the row proportions.
b) Does there appear to be an association between
teacher and student's gender? Explain.
34
Gender
Boy Girl Total
Mr. Fleck 10 12 22
Mrs. Parker 15 18 33
Total 25 30 55