Section 4.4: Simpson Paradox Section 4.5:

16
Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-1

description

Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations. 4- 1. Simpson’s Paradox. Consider the following study: Accident rates in California. - PowerPoint PPT Presentation

Transcript of Section 4.4: Simpson Paradox Section 4.5:

Page 1: Section 4.4:   Simpson Paradox Section 4.5:

Section 4.4: Simpson Paradox

Section 4.5:Linearizing an association between two variable by performing a Mathematical Transformations

4-11

Page 2: Section 4.4:   Simpson Paradox Section 4.5:

Consider the following study: Accident rates in California.A study showed that male teenagers have twice the accident rate of female teenagers.

Male FemaleProportion of accidents: 0.162 0.075

The study did not take into account the confounding variable: number of miles driven per year!

Male FemaleAccident rate 0.162 0.075Average number of miles p.p. 9,557 4,643Average number of accidents 1.78 1.77per 100,000 miles

This more accurate study shows NO DIFFERENCE!!The higher proportion of accidents for male teenagers is explained away by the fact that men typically drive more!

This is an example of Simpson’s paradox!!

Simpson’s ParadoxSimpson’s Paradox

Page 3: Section 4.4:   Simpson Paradox Section 4.5:

Another example:Medical study of a treatment:http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247

Table 1 Number of patients responding to treatment A vs. treatment B:

A is better than B Response No response Response rateTreatment A 20 20 20/40=50%Treatment B 16 24 16/40=40%

Table 2 Number of patients with high serum X responding to treatment A vs. treatment B:

in this subgroup, B is better than A Response No response Response rateTreatment A 18 12 18/30=60%Treatment B 7 3 7/10=70%

Table 3 Number of patients with low serum X responding to treatment A vs. treatment B:

in this subgroup too, B is better than A Response No response Response rateTreatment A 2 8 2/10=20%Treatment B 9 21 9/30=30%

Page 4: Section 4.4:   Simpson Paradox Section 4.5:

Conclusion:

http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247

“Thus, if the patient's serum X level is unknown, treatment A

seems to be better, but if serum X is known, treatment B is

preferable (and one can better predict the response rate of a patient). This phenomenon is a result of the aggregation of two (or more) subgroups.1 The numbers of the example are kept

simple to demonstrate this phenomenon of severe confounding,

but there are a number of real examples in the literature, including the medical literature.2–4. This aggregation effect can

occur in the case of an uneven distribution of a ‘latent variable’ (in this case the serum X level) among the groups studied. “

Page 5: Section 4.4:   Simpson Paradox Section 4.5:

• Simpson’s Paradox represents a situation in which an association between two variables inverts or goes away when :

• data are collapsed across a sub-classification (in the previous example: across different serum X levels), the overall change may not represent what is really happening.

• there is a combination of a lurking variable and/or data from unequal sized groups being combined into a single data set. The unequal group sizes, in the presence of a lurking variable, can weight the results incorrectly.

Page 6: Section 4.4:   Simpson Paradox Section 4.5:

Exponential relationshipxy ab

4-6

Nonlinear RegressionNonlinear Regression

Page 7: Section 4.4:   Simpson Paradox Section 4.5:

Power relationship: by ax

4-7

Page 8: Section 4.4:   Simpson Paradox Section 4.5:

• Apply a logarithm transformation to re-express the previous Exponential or Power functions into Linear Functions

• Use Log function properties:

loga (MN) = loga M + loga N

loga Mr = r loga M

(M, N, and a are positive real numbers, a > 1, and r is any real number.)

LinearizationLinearization

Page 9: Section 4.4:   Simpson Paradox Section 4.5:

y = abx Exponential Model

log y = log (abx) Take the common logarithm of both sides

log y = log a + log bx

log y = log a + x log b

Y = A + B x where

b = 10B a = 10A

4-9

Page 10: Section 4.4:   Simpson Paradox Section 4.5:

y = axb Power Model

log y = log (axb) Take the common logarithm of both sides

log y = log a + log xb

log y = log a + b log x

Y = A + b X where a = 10A

4-10

Page 11: Section 4.4:   Simpson Paradox Section 4.5:

Example: The statistics of poverty and inequalityExample: The statistics of poverty and inequalityData from U.N.E.S.C.O. 1990 Demographic Year Book .For 97 countries in the world, data are given for birth rates and for an index of the Gross National Product.

Exponential relation!

Page 12: Section 4.4:   Simpson Paradox Section 4.5:

The plot before shows a non-linear association! we can make it linear by using the transformation natural log of GNP.

Birth rate vs Log G.N.P.

Linearization using LOG function:

Page 13: Section 4.4:   Simpson Paradox Section 4.5:

EXAMPLE Finding the Curve of Best Fit to a Power Model

Cathy wishes to measure the relation between a light bulb’s intensity and the distance from some light source. She measures a 40-watt light bulb’s intensity 1 meter from the bulb and at 0.1-meter intervals up to 2 meters from the bulb and obtains the following data.

DistanceDistance IntensityIntensity1.0 0.09721.1 0.08041.2 0.06741.3 0.05721.4 0.04951.5 0.04331.6 0.03841.7 0.03391.8 0.02941.9 0.02682.0 0.0224

4-13

Page 14: Section 4.4:   Simpson Paradox Section 4.5:

(a) Draw a scatter diagram of the data treating the distance, x, as the predictor variable.

(b) Determine X = log x and Y = log y and draw a scatter diagram treating the day, X = log x, as the predictor variable and Y = log y as the response variable. Comment on the shape of the scatter diagram.

(c) Find the least-squares regression line of the transformed data.

(d) Determine the power equation of best fit and graph it on the scatter diagram obtained in part (a).

(e) Use the power equation of best fit to predict the intensity of the light if you stand 2.3 meters away from the bulb.

4-14

Page 15: Section 4.4:   Simpson Paradox Section 4.5:

4-15

Page 16: Section 4.4:   Simpson Paradox Section 4.5:

4-16