8/7/2019 Chapter 03 - Processed
1/86
Copyright 2009Pearson Education, Inc.
Chapter 3Displaying and DescribingCategorical Data
To open hyperlinks (underlined in blue) in PowerPoints normal mode right click onthem and choose open hyperlink. (If you are viewing the presentation in the slide showmode all you need is to double click on the links.)
I recommend that you view the presentation in the normal mode so that you can see
the speaker notes below the slides and the bubble comments in the body of the slides.Additional comments are found in them.
8/7/2019 Chapter 03 - Processed
2/86
Slide 3- 2Copyright 2009Pearson Education, Inc.
Frequency Tables: Making Piles
n We can pile the data by counting the number ofdata values in each category of interest.
8/7/2019 Chapter 03 - Processed
3/86
Slide 3- 3Copyright 2009Pearson Education, Inc.
Frequency Tables: Making Piles
n
We can pile the data by countingthe number of data values ineach category of interest.
n
n
n
n This table is called a frequencytable. It records the totals andthe category names.
8/7/2019 Chapter 03 - Processed
4/86
Slide 3- 4Copyright 2009Pearson Education, Inc.
Frequency Tables: Making Piles
n A relative frequency table is similar, but gives thepercentages (instead of counts) for eachcategory.
8/7/2019 Chapter 03 - Processed
5/86
Slide 3- 5Copyright 2009Pearson Education, Inc.
Relative Frequency vs. Percentagen
The table on the previous page may be called arelative frequency table, but it is displayingpercentages, not relative frequencies.
8/7/2019 Chapter 03 - Processed
6/86
Slide 3- 6Copyright 2009Pearson Education, Inc.
Relative Frequency vs. Percentagen
The table on the previous page may be called arelative frequency table, but it is displayingpercentages, not relative frequencies.
Class Frequency RelativeFrequency %First 325 0.1477 14.77
Second 285 0.1295 12.95
Third 706 0.3208
32.08
Crew 885 0.4021 40.21
Total 2201 1.000 100.0
8/7/2019 Chapter 03 - Processed
7/86Slide 3- 7Copyright 2009Pearson Education, Inc.
Whats Wrong
With ThisPicture?
n
You might thinkthat a goodway to showthe Titanic
data is withthis display:
n
n
Frequency
8/7/2019 Chapter 03 - Processed
8/86Slide 3- 8Copyright 2009Pearson Education, Inc.
The Area Principle
n The ship displaymakes it looklike most ofthe people on
the Titanicwere crewmembers,with a few
passengersalong for theride.
8/7/2019 Chapter 03 - Processed
9/86Slide 3- 9Copyright 2009Pearson Education, Inc.
The Area Principle
n When we look ateach ship, wesee the areataken up by the
ship, instead ofthe length ofthe ship.
8/7/2019 Chapter 03 - Processed
10/86Slide 3- 10Copyright 2009Pearson Education, Inc.
The Area Principle
n The ship displayviolates the areaprinciple:
8/7/2019 Chapter 03 - Processed
11/86Slide 3- 11Copyright 2009Pearson Education, Inc.
The Area Principle
n The ship displayviolates the areaprinciple:
n The areaoccupiedby a partof thegraph
shouldcorrespond to themagnitude
of thevalue it
8/7/2019 Chapter 03 - Processed
12/86Slide 3- 12Copyright 2009Pearson Education, Inc.
Bar Charts
n A bar chart displays
the distribution of
a categorical
variable, showing
the counts foreach category
next to each
other for easy
comparison.
Watch my video on how t
http://web.utk.edu/~leon/stat201/fall09/Video/Bar%20Charts/Bar%20Charts.htmhttp://web.utk.edu/~leon/stat201/fall09/Video/Bar%20Charts/Bar%20Charts.htm8/7/2019 Chapter 03 - Processed
13/86Slide 3- 13Copyright 2009Pearson Education, Inc.
Bar Charts
n A bar chart displays
the distribution of a
categorical variable,
showing the counts
for each categorynext to each other
for easy
comparison.
n A bar chart stays true
to the area
principle.
8/7/2019 Chapter 03 - Processed
14/86Slide 3- 14Copyright 2009Pearson Education, Inc.
Bar Charts
n A bar chart displays
the distribution of a
categorical variable,
showing the counts
for each categorynext to each other
for easy
comparison.
n A bar chart stays true
to the area
principle.
n
Thus, it is a betterdis la for the shi
8/7/2019 Chapter 03 - Processed
15/86Slide 3- 15Copyright 2009Pearson Education, Inc.
Bar Charts
n A relative frequencybar chart displaysthe relativeproportion ofcounts for eachcategory.
8/7/2019 Chapter 03 - Processed
16/86Slide 3- 16Copyright 2009Pearson Education, Inc.
Bar Charts
n A relative frequencybar chart displaysthe relativeproportion ofcounts for eachcategory.
n A relative frequencybar chart alsostays true to thearea principle.
8/7/2019 Chapter 03 - Processed
17/86Slide 3- 17Copyright 2009Pearson Education, Inc.
Output from JMP Software
Bar Chart Relative Frequency Bar Chart
8/7/2019 Chapter 03 - Processed
18/86Slide 3- 18Copyright 2009Pearson Education, Inc.
A Bar Chart with Sorted Categories168 Late Arrivals to Work
http://en.wikipedia.org/wiki/Pareto_chart
http://en.wikipedia.org/wiki/Pareto_charthttp://en.wikipedia.org/wiki/Pareto_charthttp://en.wikipedia.org/wiki/Pareto_chart8/7/2019 Chapter 03 - Processed
19/86Slide 3- 19Copyright 2009Pearson Education, Inc.
A Bar Chart with Sorted Categories168 Late Arrivals to Work
The
categoriesare orderedby theirfrequency
8/7/2019 Chapter 03 - Processed
20/86Slide 3- 20Copyright 2009Pearson Education, Inc.
A Bar Chart with Sorted Categories168 Late Arrivals to Work
It is clear that thebest way to reducelate arrivals is byeliminating trafficrelated late arrival,
say, by leaving earlieror taking a fasterroute.
8/7/2019 Chapter 03 - Processed
21/86Copyright 2009Pearson Education, Inc.
A Bar Chart with Sorted Categories168 Late Arrivals to Work
It is clear that the bestway to reduce latearrivals is by eliminatingtraffic related late arrival,say, by leaving earlier ortaking a faster route.
The Pareto Chart isused in qualityimprovement effortsbecause it identifies
the high leverageactions that wouldimprove quality
8/7/2019 Chapter 03 - Processed
22/86
8/7/2019 Chapter 03 - Processed
23/86
Slide 3- 23Copyright 2009Pearson Education, Inc.
n When you are
interested in parts of
the whole, a pie chart
might be your display
of choice.n Pie charts show the
whole group of
cases as a circle.
Pie Charts
8/7/2019 Chapter 03 - Processed
24/86
Slide 3- 24Copyright 2009Pearson Education, Inc.
n When you are interested
in parts of the whole, a
pie chart might be your
display of choice.
n Pie charts show the
whole group of cases
as a circle.
n They slice the circle into
pieces whose
size is
proportional to the
Pie Charts
8/7/2019 Chapter 03 - Processed
25/86
Slide 3- 25Copyright 2009Pearson Education, Inc.
Pie Charts from JMP
Counts Percentages
8/7/2019 Chapter 03 - Processed
26/86
Slide 3- 26Copyright 2009Pearson Education, Inc.
Contingency Tables
8/7/2019 Chapter 03 - Processed
27/86
Slide 3- 27Copyright 2009Pearson Education, Inc.
Contingency Tables
A contingency table allows us to look at twocategorical variables together.
8/7/2019 Chapter 03 - Processed
28/86
8/7/2019 Chapter 03 - Processed
29/86
Slide 3- 29Copyright 2009Pearson Education, Inc.
Contingency Tables
It shows how individuals are distributed along eachvariable, contingent on the value of the other variable
Example: we can examine the class of ticket andwhether a person survived the Titanic:
8/7/2019 Chapter 03 - Processed
30/86
Slide 3- 30Copyright 2009Pearson Education, Inc.
Contingency Tables
n The margins of thetable, both on the
right and on thebottom, givetotals and thefrequencydistributions foreach of the
variables.
8/7/2019 Chapter 03 - Processed
31/86
Slide 3- 31Copyright 2009Pearson Education, Inc.
Contingency Tables
n The margins of the
table, both on theright and on thebottom, give totalsand the frequencydistributions foreach of the
variables.
The marginal distribution ofSurvivalis:
8/7/2019 Chapter 03 - Processed
32/86
Slide 3- 32Copyright 2009Pearson Education, Inc.
Contingency Tables
n Each cell of the table gives the count for a combination ofvalues of the two values.
8/7/2019 Chapter 03 - Processed
33/86
Slide 3- 33Copyright 2009Pearson Education, Inc.
Contingency Tables
n Each cell of the table gives the count for a combinationof values of the two values.
n For example, the second cell in the crew column tellsus that 673 crew members died when the Titanicsunk.
n
8/7/2019 Chapter 03 - Processed
34/86
Slide 3- 34Copyright 2009Pearson Education, Inc.
Conditional Distributions
This is the conditional distribution of ticketClass, conditional on having survived
8/7/2019 Chapter 03 - Processed
35/86
Slide 3- 35Copyright 2009Pearson Education, Inc.
Conditional Distributions
n A conditional distribution shows the distribution ofone variable for just the individuals who satisfysome condition on another variable.
Conditional distribution of ticket Class, conditional on having survived
8/7/2019 Chapter 03 - Processed
36/86
Slide 3- 36Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
n Conditional distribution of ticket Class,
conditional on having perished:n
8/7/2019 Chapter 03 - Processed
37/86
Slide 3- 37Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
n The conditional distributions tell us that there is a
difference in class for those who survived andthose who perished.
n
8/7/2019 Chapter 03 - Processed
38/86
Slide 3- 38Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
n The conditional distributions tell us that there is a
difference in class for those who survived andthose who perished.
n
Is this a goodway tocompare thetwo conditional
distributions?
8/7/2019 Chapter 03 - Processed
39/86
Slide 3- 39Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
First
First
Second
Second
ThirdThirdCrew
Crew
The conditional distributions tell us that there is a difference in class forthose who survived and those who perished.
28.6% 8.2%
8/7/2019 Chapter 03 - Processed
40/86
Slide 3- 40Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
The conditional distributions tell us that there is a difference in class forthose who survived and those who perished.
A pie chart of the two conditional distributions is better way to show that
the marginal distributions differ for the alive and the dead.
8/7/2019 Chapter 03 - Processed
41/86
Slide 3- 41Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
n We see that the distribution ofClass for thesurvivors is different from that of the non-survivors.
8/7/2019 Chapter 03 - Processed
42/86
Slide 3- 42Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)n We see that the distribution ofClass for the survivors is
different from that of the non-survivors.n This leads us to believe that Class and Survivalare
associated, that they are not independent.
n
8/7/2019 Chapter 03 - Processed
43/86
Slide 3- 43Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
The variables would be consideredindependentwhen the distribution ofone variable in a contingency table isthe same for all categories of the
other variable.
For example, X and Y would be independent
8/7/2019 Chapter 03 - Processed
44/86
Slide 3- 44Copyright 2009Pearson Education, Inc.
Conditional Distributions (cont.)
The variables would be consideredindependent when the distribution ofone variable in a contingency table isthe same for all categories of the
other variable.
For example, X and Y would be independent
8/7/2019 Chapter 03 - Processed
45/86
Slide 3- 45Copyright 2009Pearson Education, Inc.
Conditional Distributions
The variables would be consideredindependentwhen the distribution ofone variable in a contingency table isthe same for all categories of the
other variable.
For example, X and Y would be independent
Actually the
percentages forX-A and X-B donot have to beexactly alikebecause if this
percentagescome from asample therewould be somesampling
variation
8/7/2019 Chapter 03 - Processed
46/86
Slide 3- 46Copyright 2009Pearson Education, Inc.
Segmented Bar Charts
Second
SecondFirst
First
Third
Third
Crew
Crew
8/7/2019 Chapter 03 - Processed
47/86
Slide 3- 47Copyright 2009Pearson Education, Inc.
Segmented Bar Charts
n A segmented barchart displaysthe sameinformation asa pie chart, butin the form ofbars instead ofcircles.
Second
SecondFirst
First
Third
Third
Crew
Crew
8/7/2019 Chapter 03 - Processed
48/86
8/7/2019 Chapter 03 - Processed
49/86
Slide 3- 49Copyright 2009Pearson Education, Inc.
JMPs Mosaic Plot
Unlike the graphic on the previous page, the MosaicPlot has bar widths proportional to the frequencies inthe category on the horizontal axis.
8/7/2019 Chapter 03 - Processed
50/86
Slide 3- 50Copyright 2009Pearson Education, Inc.
Example: Spring 2009 Survey Data (from
100 randomly selected UT students)
n Gender:n female male
n Where do you sit in a classroom?:n near the frontn around the middlen
near the backn no preference
8/7/2019 Chapter 03 - Processed
51/86
Slide 3- 51Copyright 2009Pearson Education, Inc.
Example: Spring 2009 Survey Data (from
100 randomly selected UT students)n Gender:
n female male
n
Where do you sit in a classroom?:n near the frontn around the middlen near the backn
no preference
1.Are this twovariables
independent?2.Do males and
females havedifferent sittingpreferences?
8/7/2019 Chapter 03 - Processed
52/86
Slide 3- 52Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
1.Are this twovariablesindependent?
2.Do males andfemales havedifferent sittingpreferences?
8/7/2019 Chapter 03 - Processed
53/86
Slide 3- 53Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
Front Middle Back NoPref.
Total
Female 19 26 5 2 52
Male 15 18 10 5 48
Total 34 44 15 7 100
Where Do You Prefer to Sit?
Gender
1.Are this twovariablesindependent?
2.Do males andfemales havedifferentsittingpreferences?
8/7/2019 Chapter 03 - Processed
54/86
Slide 3- 54Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
Front Middle Back NoPref.
Total
Female 19 26 5 2 52
Male 15 18 10 5 48
Total 34 44 15 7 100
Where Do You Prefer to Sit?
Gender
What percent of females who answered the surveysay that they prefer to sit near the back?
8/7/2019 Chapter 03 - Processed
55/86
Slide 3- 55Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
Front Middle Back NoPref.
Total
Female 19 26 5 2 52
Male 15 18 10 5 48
Total 34 44 15 7 100
Where Do You Prefer to Sit?
Gender
Percentage of females who prefer to sit near theback = 5/52 = 9.6%
8/7/2019 Chapter 03 - Processed
56/86
Slide 3- 56Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
Front Middle Back NoPref.
Total
Female 19 26 5 2 52
Male 15 18 10 5 48
Total 34 44 15 7 100
Where Do You Prefer to Sit?
Gender
What percent of the males who answered the surveysay that they prefer to sit near the back?
8/7/2019 Chapter 03 - Processed
57/86
Slide 3- 57Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data
Front Middle Back NoPref.
Total
Female 19 26 5 2 52
Male 15 18 10 5 48
Total 34 44 15 7 100
Where Do You Prefer to Sit?
Gender
Percentage of the males who prefer to sit near theback 10/48 =20.8%
8/7/2019 Chapter 03 - Processed
58/86
Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data (Cont.)
n Percentage of females who prefer to sit near the
back = 15/100 = 9.6%n Percentage of males who prefer to sit near the
back 10/48 =20.8%
n
8/7/2019 Chapter 03 - Processed
59/86
Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data (Cont.)
n Percentage of students who prefer to sit near the
back = 15/100 = 15%n Percentage of males who prefer to sit near the
back 10/48 =21%
n
What doesthis differencesuggest?
8/7/2019 Chapter 03 - Processed
60/86
Copyright 2009Pearson Education, Inc.
Spring 2009 Survey Data (Cont.)
n Percentage of students who prefer to sit near the
back = 15/100 = 15%n Percentage of males who prefer to sit near the
back 10/48 =21%
n
Is this difference
big enough toconclude thatgender and sittingpreference aredependent?
8/7/2019 Chapter 03 - Processed
61/86
Foreshadowing Chapter 26 Methods
8/7/2019 Chapter 03 - Processed
62/86
Slide 3- 62Copyright 2009Pearson Education, Inc.
Foreshadowing Chapter 26 MethodsTests of Hypotheses
Foreshadowing Chapter 26 Methods
8/7/2019 Chapter 03 - Processed
63/86
Slide 3- 63Copyright 2009Pearson Education, Inc.
Foreshadowing Chapter 26 MethodsTests of Hypotheses
n Since the P-Values Prob>ChiSg are not smallerthan 0.05 we cannot conclude that there isdifference between males and females in sitting
preference
Foreshadowing Chapter 26 Methods
8/7/2019 Chapter 03 - Processed
64/86
Slide 3- 64Copyright 2009Pearson Education, Inc.
Foreshadowing Chapter 26 MethodsTests of Hypotheses
n Since the P-Values Prob>ChiSg are not smallerthan 0.05 we cannot conclude that there isdifference between males and females in sitting
preferencen The differences we see could be the result of
sampling variation
8/7/2019 Chapter 03 - Processed
65/86
Slide 3- 65Copyright 2009Pearson Education, Inc.
The Three Rules of Data Analysis
n The three rules of data analysis wont be difficult toremember:
8/7/2019 Chapter 03 - Processed
66/86
Slide 3- 66Copyright 2009Pearson Education, Inc.
The Three Rules of Data Analysis
n The three rules of data analysis wont be difficult toremember:
1. Make a picturethings may be revealed thatare not obvious in the raw data. These
will be things to thinkabout.
8/7/2019 Chapter 03 - Processed
67/86
8/7/2019 Chapter 03 - Processed
68/86
Slide 3- 68Copyright 2009Pearson Education, Inc.
The Three Rules of Data Analysis
n The three rules of data analysis wont be difficult toremember:
1. Make a picturethings may be revealed thatare not obvious in the raw data. These
will be things to thinkabout.2. Make a pictureimportant features of and
patterns in the data will showup. Youmay also see things that you did notexpect.
3. Make a picturethe best way to tellothersabout your data is with a well-chosenpicture.
8/7/2019 Chapter 03 - Processed
69/86
Slide 3- 69Copyright 2009Pearson Education, Inc.
Simpsons Paradox
n
Moe argues that hes a better pilot than Jilln Joe managed to land 83% of his last 120 flightn Jill managed to land 78% of her last 120
flights
Is Moe Rightin arguing
this way?
Simpsons Paradox - Another Example
8/7/2019 Chapter 03 - Processed
70/86
Slide 3- 70Copyright 2009Pearson Education, Inc.
n Jill outperforms Moe for day flights (95% vs. 90%)
and
n She also outperforms Moe at night (75% vs. 50%).
Simpson s Paradox Another Example
Simpsons Paradox Example (Cont )
8/7/2019 Chapter 03 - Processed
71/86
Slide 3- 71Copyright 2009Pearson Education, Inc.
Simpson s Paradox - Example (Cont.)
n Jill outperforms Moe for day flights (95% vs. 90%)and she outperforms Moe at night (75% vs.50%).
n But, Jill has a poorer on-time record than Moeoverall (78% vs. 83%)
Simpsons Paradox Example (Cont )
8/7/2019 Chapter 03 - Processed
72/86
Slide 3- 72Copyright 2009Pearson Education, Inc.
Simpson s Paradox - Example (Cont.)
n Jill outperforms Moe for day flights (95% vs. 90%)and she outperforms Moe at night (75% vs.50%).
n But, Jill has a poorer on-time record than Moeoverall (78% vs. 83%).
n This seems to be a contradiction (or paradox).
8/7/2019 Chapter 03 - Processed
73/86
Slide 3- 73Copyright 2009Pearson Education, Inc.
Simpsons Paradox - Example (Cont.)n
The explanation is thatn Jill has mostly night flights (more difficult)n Moe has mostly day flights (easier)n
8/7/2019 Chapter 03 - Processed
74/86
Slide 3- 74Copyright 2009Pearson Education, Inc.
Simpsons Paradox - Example (Cont.)
Taking an overall average is misleading.
If you were Jill, what statistics would you report?
8/7/2019 Chapter 03 - Processed
75/86
Slide 3- 75Copyright 2009Pearson Education, Inc.
Simpsons Paradox - Example (Cont.)
Taking an overall average is misleading.
If you were Jill, what statistics would you report?What about if you were Moe?
8/7/2019 Chapter 03 - Processed
76/86
Slide 3- 76Copyright 2009Pearson Education, Inc.
Simpsons Paradox:Does It Matter in the Real World?
University of California sex-bias study in graduate admissions:
Is there a gender bias?
Si P d
8/7/2019 Chapter 03 - Processed
77/86
Copyright 2009Pearson Education, Inc.
Simpsons Paradox:Does It Matter in the Real World?
University of California sex-bias study in graduate admissions:
Simpsons Paradox:
8/7/2019 Chapter 03 - Processed
78/86
Slide 3- 78Copyright 2009Pearson Education, Inc.
pDoes It Matter in the Real World?
n Note that department by department if anythingthere seems to be discrimination againstmales!
8/7/2019 Chapter 03 - Processed
79/86
Slide 3- 79Copyright 2009Pearson Education, Inc.
Resolving the Paradox
Women are applying to
departments that are harderto get in!!!
8/7/2019 Chapter 03 - Processed
80/86
Slide 3- 80Copyright 2009Pearson Education, Inc.
What You Should Do?
Look at the variables separately
What Can Go
8/7/2019 Chapter 03 - Processed
81/86
Slide 3- 81Copyright 2009Pearson Education, Inc.
Wrong?
n
Slight departuresfrom perfectindependencedo not provedependence
What Can Go
8/7/2019 Chapter 03 - Processed
82/86
Slide 3- 82Copyright 2009Pearson Education, Inc.
Wrong?
n
Slight departuresfrom perfectindependencedo not provedependence
Since P-Values are not smaller than 0.05 we cannotconclude that the observed differences between males
and females is sufficient to disprove independence
8/7/2019 Chapter 03 - Processed
83/86
Slide 3- 83Copyright 2009Pearson Education, Inc.
What Can Go Wrong?
n Dont use unfair or sillyaveragesthis couldlead to SimpsonsParadox
8/7/2019 Chapter 03 - Processed
84/86
Slide 3- 84Copyright 2009Pearson Education, Inc.
n Do not violate the area principlen Which pie chart do you prefer?n
n
n
n
n
n
n
n
n
n
What Can Go Wrong?
8/7/2019 Chapter 03 - Processed
85/86
Slide 3- 85Copyright 2009Pearson Education, Inc.
What Can Go Wrong? (cont.)
n Make sure your display shows what it says itshows
n
n
n
n
Percentage of high-school studentswho engage in specified dangerous behaviors
8/7/2019 Chapter 03 - Processed
86/86
What have we learned?
n We can summarize categorical data by countingthe number of cases in each category(expressing these as counts or percents).
n We can display the distribution in a bar chart or
pie chart.n We can examine two-way tables called
contingency tables, examining marginal and/orconditional distributions of the variables.
n If conditional distributions of one variable are thesame for every category of the other, thevariables are independent (i.e., not related).
Top Related