Statistics That Deceive. Simpson’s Paradox It is a widely accepted rule that the larger the data...

10
Statistics That Deceive

Transcript of Statistics That Deceive. Simpson’s Paradox It is a widely accepted rule that the larger the data...

Page 1: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Statistics That Deceive

Page 2: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Simpson’s Paradox

It is a widely accepted rule that the larger the data set, the better

Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one

Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

Page 3: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Example: Simpson’s Paradox

First Half Second Half Total Season

Carson .400 .250 .264

Kennington .350 .200 .336

Baseball batting statistics for two players:

How could Carson beat Kennington for both halves individually,but then have a lower total season batting average?

Page 4: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Example Continued

First Half Second Half Total Season

Carson 4/10 (.400) 25/100 (.250) 29/110 (.264)

Kennington 35/100 (.350)

2/10 (.200) 37/110 (.336)

We weren’t told how many at bats each player had:

Carson’s dismal second half and Kennington’s great first halfhad higher weights than the other two values.

Page 5: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Another Example

Average college physics grades for students in an engineering program:

HS Physics No HS PhysicsNumber of Students 50 5Average Grade 80 70

Average college physics grades for students in a liberal arts program:

HS Physics No HS PhysicsNumber of Students 5 50Average Grade 95 85

It appears that in both classes, taking high school physics improvesyour college physics grade by 10.

Page 6: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Example continued

In order to get better results, let’s combine our datasets.

In particular, let’s combine all the students that took high school physics.

More precisely, combine the students in the engineering program thattook high school physics with those students in the liberal arts program that took high school physics.

Likewise, combine the students in the engineering program that did nottake high school physics with those students in the liberal arts program that did not take high school physics.

But be careful! You can’t just take the average of the two averages,because each dataset has a different number of values.

Page 7: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Example continuedAverage college physics grades for students who took high school physics:

# Students Grades WeightEngineering 50 80 50/55*80=72.7Lib Arts 5 95 5/55*95=8.6Total 55Average (72.7 + 8.6) 81.3

Average college physics grades for students who did not take high school physics:

# Students Grades WeightEngineering 5 70 5/55*70=6.4Lib Arts 50 85 50/55*85=77.3Total 55Average (6.4 + 77.3) 83.7

Did the students that did not have high school physics actually do better?

Page 8: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

Example another wayAverage college physics grades for students who took high school physics:

# Students Grades Grade PtsEngineering 50 80 4000Lib Arts 5 95 475Total 55 4475Average (4000/4475*80 + 475/4475*95) 81.3

Average college physics grades for students who did not take high school physics:

# Students Grades Grade PtsEngineering 5 70 350Lib Arts 50 85 4250Total 55 4600Average (350/4600*70 + 4250/4600*85) 83.7

Did the students that did not have high school physics actually do better?

Page 9: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

The Problem

Two problems with combining the data There was a larger percentage of one

type of student in each table The engineering students had a more

rigorous physics class than the liberal arts students, thus there is a hidden variable

So be very careful when you combine data into a larger set

Page 10: Statistics That Deceive. Simpson’s Paradox  It is a widely accepted rule that the larger the data set, the better  Simpson’s Paradox demonstrates that.

More …

There are many real examples of this type of situation which leads to an apparent contradiction

The deceptive results is based on this [remember this]: If you view the same data in 2 different ways or break it into 2 different parts, you CAN get different results!