Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of...

55
Heads Up! Heads Up! Sept 22 – Oct 4 Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time

Transcript of Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of...

Page 1: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Heads Up!Heads Up!Sept 22 – Oct 4Sept 22 – Oct 4

Probability

Perceived by many as a difficult topic

Get ready ahead of time

Page 2: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Last Time:Last Time:

Least Squares Regression

(Simple Linear Regression)

Correlation

Page 3: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

In Least-Squares Regression:

XbYa

XX

YYXXb N

ii

N

iii

,

1

2

1

N

i

N

iii

N

i

N

ii

N

iiii

XXN

YXYXN

b

1

2

1

2

1 11

ComputationalFormula

Page 4: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

N

i

N

iii

N

i

N

ii

N

iiii

XXN

YXYXN

b

1

2

1

2

1 11Can wedo this?

X X Squared Y Y Squared XY86 7396 82.6 6822.76 7103.6

109.3 11946.49 112.6 12678.76 12307.1873.3 5372.89 70 4900 513180.6 6496.36 76.6 5867.56 6173.9686.6 7499.56 84 7056 7274.485.3 7276.09 86 7396 7335.883.3 6938.89 82.6 6822.76 6880.5878.6 6177.96 81.3 6609.69 6390.1892 8464 86.6 7499.56 7967.276 5776 75.3 5670.09 5722.8851 73344.24 837.6 71323.18 72286.7Totals:

Page 5: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Calculating the Least Squares Regression Line contd.

2851)24.344,73(10

)6.837)(851()70.286,72(10

b

201,7244.442,733

6.797,712867,722

09.14.241,9

4.069,10

XbYa 910

85109.1

10

837

Page 6: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

10

10.9Slope is 1.09

Intercept is -9

You can’t see it in thisgraph

TRIAL = 1.09 PRACTICE - 9

RegressionEquation

Page 7: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

A view from further away….

0

20

40

60

80

100

120

0 50 100 150

X

Y

Page 8: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Look at the residuals:

Residual Plot

-10

-5

0

5

10

0 20 40 60 80 100 120

X

Res

idu

als

We wanta shot-gun blast

shape, i.e.,a random blob

Page 9: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Look at Residuals & Line Fit

ResidualPlot

Line FitPlot

Problem:Relationship is not linear

Page 10: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Look at Residuals & Line Fit

ResidualPlot

Problem:Predictions are very precise for small predicted values,

but very unprecise for large predicted values. (Not good)

Page 11: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

1 2 3 4 5 6 7 8 9 10 11 12

Problem: Lurking (third) variables (?)

Here: Seasonal Trend?

Look at ResidualsResidual

Plot

Page 12: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

CorrelationHow strong is the linear relationship

between two variables X and Y?

Slope in regression ofstandardized variables

XX S

XXZ

YY S

YYZ

This slope tells meHow much a given change (in standardized units) of X

translates into a change (in standardized units) of Y

Page 13: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

CorrelationHow strong is the linear relationship

between two variables X and Y?

Correlation Coefficient

Y

iN

i X

iN S

YY

S

XXr

11

1

Computational Formula:

YX

N

ii

N

i

N

iiNii

SSN

YXYX

r)1(

11 1

1

Page 14: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Properties of Correlation

• Symmetric Measure (You can exchange X and Y and get the same value)

• -1 ≤ r ≤ 1

• -1 is “perfect” negative correlation

• 1 is “perfect” positive correlation

• Not dependent on linear transformations of X and Y

• Measures linear relationship only

Page 15: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

X Z_X Y Z_Y Z_X Z_Y86 0.088816751 82.6 -0.10192166 -0.009052

109.3 2.388183737 112.6 2.533983252 6.051617673.3 -1.164486285 70 -1.20900172 1.407865980.6 -0.444083753 76.6 -0.62910264 0.279374386.6 0.148027918 84 0.021087239 0.003121585.3 0.019737056 86 0.196814233 0.003884583.3 -0.177633501 82.6 -0.10192166 0.018104778.6 -0.641454309 81.3 -0.2161442 0.138646692 0.680928421 86.6 0.249532331 0.169913776 -0.898036033 75.3 -0.74332518 0.6675328851 837.6 8.731009285.1 83.76 0.9701121

PRACTICE TRIALCASE 1 86 82.6CASE 2 109.3 112.6CASE 3 73.3 70CASE 4 80.6 76.6CASE 5 86.6 84CASE 6 85.3 86CASE 7 83.3 82.6CASE 8 78.6 81.3CASE 9 92 86.6CASE 10 76 75.3

Let’s try it out on our X = PRACTICE, Y = TRIAL

Data Set

Check this calculation at home!

Page 16: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

TodayToday

Finish Theory on RegressionFinish Theory on Regression

Pathologies and TrapsPathologies and Traps

in Linear Regression and Correlationin Linear Regression and Correlation

Relationships between Relationships between

Categorical VariablesCategorical Variables

Page 17: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression on Standardized Variables

ii XY rZZ ˆ

0intercept, :slope1

11

N

iYXN iiZZr

X

i

Y

i

S

XXr

S

YY

ˆ

XXS

SrYY i

X

Yi ˆ

Page 18: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

ii XY rZZ ˆ

XXS

SrYY i

X

Yi ˆ

iX

Y

X

Yi X

S

SrX

S

SrYY ˆ

ii bXaY ˆ?

XbYaS

Srb

X

Y ,

Page 19: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

iiN bXaYYYY ˆ from ˆ,...,ˆ,ˆGiven 21

What is the variance of ? ˆ,...,ˆ,ˆ21 NYYY

22 bemust it that know We XSb

X

Y

S

Srb :know also We

2222 :Therefore YX SrSb

22

22

:Thus rS

Sb

Y

X

Page 20: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

22

22

:Thus rS

Sb

Y

X

Variance ofpredicted Y’s

Variance ofobserved Y’s

Proportion of Varianceof observed Y’s

that is accounted forby the regression

Proportion of Variance explained

Page 21: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

22

22

:Thus rS

Sb

Y

X

Proportion of Varianceof observed Y’s

that is accounted forby the regression

Proportion of Variance explained

Note: If you exchange X and Y in the regression, you find the same r and r squared

Page 22: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Correlation only checks magnitude of

Linear Relationships!

It can happen that r=0, even though X and Y are highly related to each other!

Need to look at Scatter Plot and Residual Plot to make sure that you don’t miss an obvious relationship overlooked by linear regression!

Page 23: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

2XY Y = X-squared Line Fit Plot

-200

0

200

400

0 5 10 15 20

X

Y

How does a Linear RegressionModel approximate (for X=1,2,…,15)

Y = X-squared Residual Plot

-50

0

50

0 5 10 15 20

X

Res

idu

als

For these particular datathe regression

model finds

a = -45b = 16

The residuals have a systematic trend!!

This Linear Regressionis inappropriate!!

ii bXaY ˆ

Page 24: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

2XY How does a Linear RegressionModel approximate (for X=-8,-7,…,7,8)

For these particular datathe regression

model finds

a = 24b = 0

The residuals have a systematic trend!!

This Linear Regressionis inappropriate!!

Y = X_squared Line Fit Plot

0

50

100

-10 -5 0 5 10

X

Y

Y = X_squared Residual Plot

-50

0

50

-10 -5 0 5 10

X

Res

idu

als

ii bXaY ˆ

Page 25: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

2XY How does a Linear RegressionModel approximate (for X=-8,-7,…,7,8)

For these particular datathe regression

model finds

a = 24b = 0

r = 0

Y = X_squared Line Fit Plot

0

50

100

-10 -5 0 5 10

X

Y

Correlation is Zero: No LINEAR Relationship

Is there “no relationship” between X and Y?

There is an extremely strong (nonlinear) relationship here!

ii bXaY ˆ

Page 26: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

)ln(XY How does a Linear RegressionModel approximate (for X=1,2,…,15)

For these particular datathe regression

model finds

a = .54b = .16

The residuals have a systematic trend!!

This Linear Regressionis inappropriate!!

Y = ln(X) Line Fit Plot

0

2

4

0 5 10 15 20

X

Y

Y = ln(X) Residual Plot

-1

0

1

0 5 10 15 20

X

Res

idu

als

ii bXaY ˆ

Page 27: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Correlation is not Causation!

Correlation between the size of your big toe and your performance on reading tasks is highly positive!

??

Lurking Third Variable: AGE

Page 28: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Correlation is not Causation!

Only experimentationexperimentation allows us to attribute causationto the relationship between independent and

dependent variables.

Page 29: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Ecological Correlation:Correlations between averages

are higher than correlations between individuals

X

Y

X Group averages

Y Group averages

Page 30: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Problem of Restricted Range

GRE scores

Successin Graduate

School

Strong LinearRelationship

No LinearRelationship

Page 31: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Extrapolations are Dangerous

Year

Number ofPassengers

Page 32: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward the Mean

The term “Regression” is associated with Sir Francis Galton (1822 – 1911)

Picture taken from http://www.gene.ucl.ac.uk/

Galton (1885)“Regression towards Mediocrity

In Hereditary Stature”Journal of the Anthropological

Institute

Page 33: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward the Mean

60. : and between n Correlatio

:son of IQ

:father of IQ

rYX

Y

X

Suppose:

XY ZZ 6.ˆ

Page 34: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward Mediocrity??

60. : and between n Correlatio

:son of IQ

:father of IQ

rYX

Y

XXY ZZ 6.ˆ

2.1)0.2(6.Z :son mediocre morepredict willWe

0.2 Z:fathert intelligenVery

Y

X

2.1)0.2(6.Z :son dumb less apredict willWe

0.2 Z:father dumbVery

Y

X

Predictions are closer to zero (the mean) then the observations!!

Page 35: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

r=.60

2.0

1.2

2.0

1.2

XZ

YZ

Page 36: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

r=.60

2.0

1.2

XZ

YZ

Among families where the father is approximately 2 standard deviations above the mean, the average son is only about 1.2 standard deviations above the mean.

Page 37: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward Mediocrity??

Do the sons just become more similar to each other than their fathers were?

Page 38: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward Mediocrity??

: of Variance XZ

: of Variance YZ

1XZ

S

1YZ

S

Variability of the Z scores is the same!

No slide into mediocrity!!

Page 39: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward the mean

When you have a lucky and exceptionally good performance in an exam,you expect to do worse next time, because there is no reason to believe

that you will be so exceptionally lucky again.

When you have a mental block and exceptionally bad performance in an exam,

you expect to do better next time, because there is no reason to believe that you will be so exceptionally unlucky again.

This does not mean that you are becomingmore and more average as time progresses.

It means that your average performance, as a reasonablepredictor for future performance, will lead to such a pattern

of relationships between observed and predicted performance

Page 40: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward the mean

Your room mate makes a huge mess in your room. You complain. The next few days are cleaner.

Your room mate has cleaned up the room.You praise your room mate. The next few days the room gets dirtier.

Does this mean that punishment leads to better performance and reward leads to worse performance?

No….

Page 41: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Regression toward the mean

Your room mate makes a huge mess in your room. You do nothing. The next few days are cleaner.

Your room mate has cleaned up the room.You do nothing. The next few days the room gets dirtier.

Your room mate simplymakes messes,

cleans them,makes messes,cleans them …

Your best guess for the future is an “average” level of messiness

Page 42: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Implications for Research

It is very risky to study anything based on selection of extreme groups

Test RetestExtremes become less extreme

May look like a treatment effect!

Page 43: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Relationships between Categorical Variables

Baby Held

Right-Handed Mother

Left-Handed Mother

Left 212 25

Right 43 7

237

50

255 32 287

Marginal Distributions

Page 44: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Theory

“Mothers tend to hold their babies with the non-dominant hand,

so that the dominant hand is available to do stuff.”

Page 45: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Relationships between Categorical Variables

Baby Held

Right-Handed Mother

Left-Handed Mother

Left

Right

.826 (82.6%)

.174 (17.4%)

.889(88.9%)

.111(11.1%)

Marginal Proportions (Percentages)

Vast majority of babies held leftVast majority of mothers right-handed

Page 46: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Relationships between Categorical Variables

Baby Held

Right-Handed Mother

Left-Handed Mother

Left .894 .105

Right .860 .140

1 (100%)

1 (100%)

Conditional proportions,given side on which the baby is held

Absolute size not taken into account

Page 47: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Relationships between Categorical Variables

Baby Held

Right-Handed Mother

Left-Handed Mother

Left .831 .781

Right .169 .219

1 (100%) 1 (100%)

Conditional proportions,given dexterity of mother

Absolute size not taken into account

Page 48: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Relationships between Categorical Variables

1 (100%) 1 (100%)

For any given dexterity of the mother,there is an overwhelming tendency to hold the

baby on the left hand side.

Absolute size not taken into account

Baby Held

Right-Handed Mother

Left-Handed Mother

Left .831 .781

Right .169 .219

Page 49: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Segmented Bargraphs

Segmented Bargraph

0

50

100

150

200

250

left holding right holding

Side Baby is held

Fre

qu

ency

left-handed

right-handed

Page 50: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Segmented BargraphsSegmented Bargraph

0

50

100

150

200

250

300

right-handed left-handed

Dexterity

Fre

qu

ency

right holding

left holding

Page 51: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Conclusion??

Lurking Third Variable?

Heart beat helps baby calm down

Page 52: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Simpson’s Paradox

Admit Deny

Male 480 120

Female 180 20

Admit Deny

Male 10 90

Female 100 200

Business School

Law School

Page 53: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Simpson’s Paradox

Admit Deny

Male 490 210

Female 280 220

Admit Deny

Male .70 30

Female .56 .44

Overall:

Overallconditional proportionsper gender

700

500

Men Priviliged!!Gender Discr.!!

Page 54: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Simpson’s Paradox

Admit Deny

Male 480 120

Female 180 20

Admit Deny

Male 10 90

Female 100 200

Admit Deny

Male .80 .20

Female .90 .10

Admit Deny

Male .10 .90

Female .33 .67

600

200

100

300

WomenPriviliged!?!

WomenPriviliged!?!

Page 55: Heads Up! Sept 22 – Oct 4 Probability Perceived by many as a difficult topic Get ready ahead of time.

Simpson’s Paradox

Admit Deny

Male 480 120

Female 180 20

Admit Deny

Male 10 90

Female 100 200

Admit Deny

Male .80 .20

Female .90 .10

Admit Deny

Male .10 .90

Female .33 .67

600

200

100

300

However: Higher admission rate for male dominated discipline