. Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200,...

74
. Please start your Daily Portfolio

Transcript of . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200,...

Page 1: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

.Please start yourDaily Portfolio

Page 2: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Introduction to Statistics for the Social Sciences

SBS200, COMM200, GEOG200, PA200, POL200, or SOC200Lecture Section 001, Summer Session II, 2013

9:00 - 11:20am Monday - FridayRoom 312 Social Sciences (Monday – Thursdays)

Room 480 Marshall Building (Fridays)

http://www.youtube.com/watch?v=oSQJP40PcGI

Page 3: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

My last name starts with a letter somewhere between

A. A – DB. E – LC. M – RD. S – Z

Please click in

Please double check All cell phones other electronic

devices are turned off and stowed away

Page 4: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Homework due – Wednesday

On class website: Please print and complete homework worksheet #13

Multiple Regression

Page 5: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Schedule of readings

Before Friday

Please read chapters 10 – 14

Please read Chapters 17, and 18 in PlousChapter 17: Social InfluencesChapter 18: Group Judgments and Decisions

Study Guide is

online

Page 6: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Next couple of lectures 7/30/13

Use this as your study guide

Simple and Multiple RegressionUsing correlation for predictions

r versus r2

Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)

Coefficient of correlation is name for “r”Coefficient of determination is name for “r2”

(remember it is always positive – no direction info)

Standard error of the estimate is our measure of the variability of the dots around the regression line

(average deviation of each data point from the regression line – like standard deviation)

Coefficient of regression will “b” for each variable (like slope)

Page 7: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Other Problems

The expected frequeny of teeth brushing for having one cavity is

Frequency of teeth brushing= 5.5 + (-.91) Cavities If “Cavities” = 3, what is the prediction for “Frequency of teeth brushing”?

Frequency of teeth brushing= 5.5 + (-.91) Cavities Frequency of teeth brushing= 5.5 + (-.91) (3) Frequency of teeth brushing= 5.5 + (-2.73) = 2.77 (3.0, 2.77)

Prediction lineY’ = a + b1X1

Y-intercept

SlopeIf number of cavities = 3

Frequency of Teeth brushing

will be 2.77

Review

Page 8: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

r = - 0.85 b1 = - 0.91(slope)

b0 = 5.5(intercept)

Draw a regression lineand regression equation

Prediction lineY’ = b1X1+ b0

Y’ = (-.91)X 1+ 5.5Review

Page 9: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation - let’s predict how often they brushed their teeth

0 1 2 3 4 5

Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

Find prediction lineY’ = b1 X + b0

Y’ = (-0.91) X + 5.5

Y’ = (-0.91) 1 + 5.5 = 4.59(plot 1,4.59)

Y’ = (-0.91) 5 + 5.5 = 0.95(plot 5,0.95)

Plot line - predict Y’ from X- Pick an X

- Pick another X

Let’s try X of 1

Let’s try X of 5

Review

Page 10: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

r = -0.85b1 = - 0.91b0 = 5.5

Y’ = b1 X + b0

Y’ = (-0.91) 3 + 5.5 = 2.77

Y’ = (-0.91) 1 + 5.5 = 4.59

Y’ = (-0.91) 2 + 5.5 = 3.68

Y’ = (-0.91) 3 + 5.5 = 2.77

Y’ = (-0.91) 5 + 5.5 = .95

Y’ = (-0.91) X + 5.5

X Y .

1 53 42 33 25 1

0 1 2 3 4 5

Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

Review

Page 11: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation - Evaluating the prediction line

Does the prediction line perfectlypredict the Ys from the Xs?

No, let’s see

How much “error” is there?Exactly?

Prediction lineY’ = b1X 1+ b0

Y’ = (-.91)X 1+ 5.5

0 1 2 3 4 5Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

Residuals

The green lines show how much “error” there is in our prediction line…how much

we are wrong in our predictions

Page 12: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation

Perfect correlation = +1.00 or -1.00

The more closely the dots approximate a straight line,(the less spread out they are) the stronger the relationship is.

One variable perfectly predicts the other

No variability in the scatterplot

The dots approximate a straight line

AnyResiduals?

Page 13: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

0 1 2 3 4 5Number of cavities

5

Num

ber

of ti

mes

per

da

y te

eth

are

brus

hed

1

2

3

4

0

• Shorter green lines suggest better prediction – smaller error

• Longer green lines suggest worse prediction – larger error

• Why are green lines vertical? Remember, we are predicting the variable on the Y axis So, error would be how we are wrong about Y (vertical)

How well does the prediction line predict the Ys from the Xs?

Residuals

A note about curvilinear relationships and patterns

of the residuals

Page 14: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

0 1 2 3 4 5Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

• Slope doesn’t give “variability” info• Intercept doesn’t give “variability info

• Correlation “r” does give “variability info

How well does the prediction line predict the Ys from the Xs?

Residuals

• Residuals do give “variability info

Page 15: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What if we want to know the “average deviation score”? Finding the standard error of the estimate (line)

Standard error of the estimate:

• a measure of the average amount of predictive error • the average amount that Y’ scores differ from Y scores

• a mean of the lengths of the green lines

Standard error of the estimate (line)

Sound familiar??

Page 16: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation - let’s predict how often they brushed their teeth

0 1 2 3 4 5

Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

Find prediction lineY’ = b1 X + b0

Y’ = (-0.91) X + 5.5

Y’ = (-0.91) 1 + 5.5 = 4.59(plot 1,4.59)

Y’ = (-0.91) 5 + 5.5 = 0.95(plot 5,0.95)

Plot line - predict Y’ from X- Pick an X

- Pick another X

Let’s try X of 1

Let’s try X of 5

Page 17: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

r = -0.85b1 = - 0.91b0 = 5.5

Y’ = b1 X + b0

Y’ = (-0.91) 3 + 5.5 = 2.77

Y’ = (-0.91) 1 + 5.5 = 4.59

Y’ = (-0.91) 2 + 5.5 = 3.68

Y’ = (-0.91) 4 + 5.5 = 1.86

Y’ = (-0.91) 5 + 5.5 = .95

Y’ = (-0.91) X + 5.5

X Y Y’ Y-Y’.

1 5 4.59 0.413 4 2.77 1.232 3 3.68 -0.683 2 2.77 -0.775 1 0.95 0.05

0 1 2 3 4 5

Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

These are our “predicted values” for each X score

A note on

Adding up

deviations

.41

1.23

-.77

-.68

0.05

Page 18: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

r = -0.85b1 = - 0.91b0 = 5.5

Y’ = b1 X + b0

Y’ = (-0.91) 3 + 5.5 = 2.77

Y’ = (-0.91) 1 + 5.5 = 4.59

Y’ = (-0.91) 2 + 5.5 = 3.68

Y’ = (-0.91) 4 + 5.5 = 1.86

Y’ = (-0.91) 5 + 5.5 = .95

Y’ = (-0.91) X + 5.5

X Y Y’ Y-Y’. (Y-Y’)2

1 5 4.59 0.41 0.1683 4 2.77 1.23 1.5132 3 3.68 -0.68 0.4623 2 2.77 -0.77 0.5935 1 0.95 0.05 .0025

0 1 2 3 4 5

Number of cavities

Num

ber

of t

imes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

.41

1.23

-.77

-.68

0.05

2.739

2.739

30.95

This is like our average

(or standard) size of our residual “Standard Error

of the Estimate”

Page 19: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Is the regression line better than just guessing the mean of the Y variable?

How much does the information about the relationship actually help?

0 1 2 3 4 5Number of cavities

Num

ber

of ti

mes

per

da

y te

eth

are

brus

hed

1

2

3

4

5

0

5

# of

tim

es

teet

h ar

e br

ushe

d

1

2

3

4

00 1 2 3 4 5

Number of cavities

Which minimizes errorbetter?

How much better does the regression line predict the observed results?

r2 Wo

w!

Page 20: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What is r2?

r2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable

If mother’s and daughter’s heights are correlated with an r = .8, then what amount

(proportion or percentage) of variance of mother’s height is accounted

for by daughter’s height?

Examples

.64 because (.8)2 = .64

Page 21: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What is r2?

r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable

If mother’s and daughter’s heights are correlated with an r = .8, then what

proportion of variance of mother’s height

is not accounted for by daughter’s height?

Examples

.36 because (1.0 - .64) = .36or

36% because 100% - 64% = 36%

Page 22: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What is r2?

r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable

If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of

variance of ice cream sales is accounted for by temperature?

Examples

.25 because (.5)2 = .25

Page 23: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What is r2?

r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable

If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of

ice cream sales is not accounted for by temperature?

Examples

.75 because (1.0 - .25) = .75or

75% because 100% - 25% = 75%

Page 24: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

regression equations

Questions on homework?

Page 25: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 26: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05

The relationship between

+0.92

positive strong

updown

6.085755.286

y' = 6.0857x + 55.286207.43

85.71.846231 or 84%

84% of the total variance of “weekly pay” is accounted for by “hours worked”

For each additional hour worked, weekly pay will increase by $6.09

Page 27: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

400380360340320300

4 85 6 7

Number of Operators

Wai

t Tim

e

280

Page 28: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

-.73

The relationship between

wait time and number of operators working is negative and strong. This correlation is not significant, r(3) = 0.73; n.s.

negativestrong

number of operators increase, wait time decreases

458

-18.5

y' = -18.5x + 458

365 seconds

328 seconds

.53695 or 54%

The proportion of total variance of wait time accounted for by number ofoperators is 54%.

For each additional operator added, wait time will decrease by 18.5 seconds

Critical r = 0.878No we do not reject the null

Page 29: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

39363330272421

Median Income

Perc

ent o

f BA

s

45 48 51 54 57 60 63 66

Page 30: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

0.8875

The relationship between

median income and percent of residents with BA degree is strong and positive. This correlation is significant, r(8) = 0.89; p < 0.05.

positivestrong

median income goes up so does percent of residents who have a BA degree

3.1819

25% of residents

35% of residents.78766 or 78%

The proportion of total variance of % of BAs accounted for by median income is 78%.

For each additional $1 in income, percent of BAs increases by .0005

Percent of residents with a BA degree

108

0.0005

y' = 0.0005x + 3.1819

Critical r = 0.632Yes we reject the null

Page 31: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

30272421181512

Median Income

Crim

e R

ate

45 48 51 54 57 60 63 66

Page 32: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

-0.6293

The relationship between

crime rate and median income is negative and moderate. This correlation is not significant, r(8) = -0.63; p < n.s. [0.6293 is not bigger than critical of 0.632] .

negativemoderate

median income goes up, crime rate tends to go down

4662.5

2,417 thefts

1,418.5 thefts.396 or 40%

The proportion of total variance of thefts accounted for by median income is 40%.

For each additional $1 in income, thefts go down by .0499

Crime Rate

108

-0.0499

y' = -0.0499x + 4662.5

Critical r = 0.632No we do not reject the null

Page 33: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Example of Simple Regression

The manager of copier company wants to determine whether there is a relationship between the number of sales calls made in a month and the number of copiers sold that month. The manager selects a random sample of 10 representatives and determines the number of sales calls each representative made last month and the number of copiers sold.

Page 34: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

What are we predicting?

Correlation: Independent and dependent variables• When used for prediction we refer to the predicted variable as the dependent variable and the predictor variable as the independent variable

Dependent Variable

Independent Variable

Soni

MarkTomSusan

JeffCarlos

Who sold the most copiers?

Who sold the fewest copiers?

Page 35: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation Coefficient – Excel Example

Page 36: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation Coefficient – Excel Example

0.759014

Interpret r = 0.759

• Positive relationship between the number of sales calls and the number of copiers sold.

• Strong relationship

• Remember, we have not demonstrated cause and effect here, only that the two variables—sales calls and copiers sold—are related.

Page 37: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation Coefficient – Excel Example

0.759014

Interpret r = 0.759

• Does this correlation reach significance?

• n = 10, df = 8

• alpha = .05

• Observed r is larger than critical r (0.759 > 0.632) therefore we reject the null hypothesis.

• r (8) = 0.759; p < 0.05

Page 38: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Coefficient of Determination – Excel Example

0.759014

Interpret r2 = 0.576(.7592 = .576)

• we can say that 57.6 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls.

• Remember, we lose the directionality of the relationship with the r2

Page 39: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Find Regression Equation – Excel Example

Page 40: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Find Regression Equation – Excel Example

Page 41: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Regression Equation - Example

State the regression equationY’ = a + bxY’ = 18.9476 + 1.1842x

Solve for some value of Y’Y’ = 18.9476 + 1.1842 (20)Y’ = 42.63

If make this many calls

If you probably sell this much

What is the expected number of copiers sold

by a representative who made 20 calls?

Interpret the slopeY’ = 18.9476 + 1.1842x“For each additional sales call made we sell

1.842 more copiers”

Page 42: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Regression Equation - Example

What is the expected number of copiers sold

by a representative who made 40 calls?

Solve for some value of Y’Y’ = 18.9476 + 1.1842 (40)Y’ = 66.3156

If make this many calls

If you probably sell this much

Page 43: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

An example for The Standard Error of Estimate

The standard error of estimate measures the scatter, or dispersion, of the observed values around the line of regression

A formula that can be used to compute the standard error:

Standard error of the estimate (line)

Page 44: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Regression Analysis – Least Squares Principle

When we calculate the regression line we try to:• minimize distance between predicted Ys and actual (data) Y points (length of green lines)• remember because of the negative and positive values cancelling each other out we have to square those distance (deviations)• so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

Page 45: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

The Standard Error of Estimate

Step 1: List all the Y data points

Page 46: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

The Standard Error of Estimate

Step 1: List all the Y data points

Step 2: Find all the predicted Y’ data points

Page 47: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

The Standard Error of Estimate

Step 3: Find deviations

Step 4: Square and add up deviations

Page 48: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Then simply plug in the numbers and solve for the standard error of the estimate

Remember conceptually, this is like the average of the length of those green lines

784.211

10 - 2= 9.901=

Page 49: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Writing Assignment - 5 Questions

2. What is a residual? How would you find it?

1. What is regression used for?• Include and example

3. What is Standard Error of the Estimate (How is it related to residuals?)

4. Give one fact about r2

5. How is regression line like a mean?

Page 50: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Writing Assignment - 5 Questions

Regressions are used to take advantage of relationshipsbetween variables described in correlations. We choose a valueon the independent variable (on x axis) to predict values forthe dependent variable (on y axis).

1. What is regression used for?• Include and example

Page 51: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Writing Assignment - 5 Questions

2. What is a residual? How would you find it?

Residuals are the difference between our predicted y (y’)and the actual y data points. Once we choose a value on ourindependent variable and predict a value for our dependentvariable, we look to see how close our prediction was. Weare measuring how “wrong” we were, or the amount of “error”for that guess.

Y – Y’

Page 52: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Writing Assignment - 5 Questions

3. What is Standard Error of the Estimate (How is it related to residuals?)

The average length of the residualsThe average error of our guessThe average length of the green linesThe standard deviation of the regression line

Page 53: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Writing Assignment - 5 Questions

4. Give one fact about r2

5. How is regression line like a mean?

Page 54: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation - the prediction line

Prediction line

• makes the relationship easier to see(even if specific observations - dots - are removed)

• identifies the center of the cluster of (paired) observations

• identifies the central tendency of the relationship (kind of like a mean)

• can be used for prediction

• should be drawn to provide a “best fit” for the data

• should be drawn to provide maximum predictive (explanatory) power for the data

• should be drawn to provide minimum predictive error

- what is it good for?

r2

Page 55: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Some useful terms

• Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)

• Coefficient of correlation is name for “r”• Coefficient of determination is name for “r2”

(remember it is always positive – no direction info)

• Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation)

Page 56: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Correlation: Independent and dependent variables

• When used for prediction we refer to the predicted variable as the dependent variable and the predictor variable as the independent variable

Dependent VariableDependent

Variable Independent Variable

Independent Variable

What are we predicting?

What are we predicting?

Page 57: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

How many dependent variables?

Multiple regression equations

Prediction line Y’ = b1X 1+ b0

Prediction line Y’ = b1X 1+ b2X 2+ b0

Prediction line Y’ = b1X 1+ b2X 2+ b3X 3+ b0

How many independent variables?

1

How many dependent variables?

1How many independent variables?

3

We can predict amount of crime in a city from • the number of bathrooms in city• the amount spent on education in city• the amount spent on after-school

programs

We can predict amount of crime in a city from • the number of bathrooms in city• the amount spent on education in city

We can predict amount of crime in a city from • the number of bathrooms in city

Page 58: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Multiple regression

• Used to describe the relationship between several independent variables and a dependent variable.

Prediction line Y’ = b1X 1+ b2X 2+ b3X 3+ b0

Can we predict amount of crime in a city from the number of bathrooms and the amount of spent on educationand on after-school programs?

• X1 X2 and X3 are the independent variables.• Y is the dependent variable (amount of crime)• b0 is the Y-intercept• b1 is the net change in Y for each unit change in X1

holding X2 and X3 constant. It is called a regression coefficient.

Page 59: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Multiple regression will use multiple independent variables to predict the single dependent variable

Expenses per year

Ye

arl

yIn

com

e

If you spend this much

You probably make this much

The predicted variable goes on the“Y” axis and is called the dependentvariable.

The predictor variable goes on the“X” axis and is called the independent variable

Dep

ende

nt V

aria

ble

(Pre

dict

ed)

Independent

Variable 1

(Predictor)Independent

Variable 2

(Predictor)

If you spend this much

If you save this much

You probably make this much

Page 60: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

14-60

Regression Plane for a 2-Independent Variable Linear Regression Equation

Page 61: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Multiple regression equations

Can use variables to predict • behavior of stock market• probability of accident• amount of pollution in a particular well• quality of a wine for a particular year• which candidates will make best workers

Page 62: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

14-62

Can we predict heating cost?

Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace.

To investigate, Salisbury's research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January

Multiple Linear Regression - Example

Page 63: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Multiple Linear Regression - Example

Page 64: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

14-64

The Multiple Regression Equation – Interpreting the Regression Coefficients

b1 = The regression coefficient for mean outside temperature

(X1) is -4.583.

The coefficient is negative and shows a negative correlation between heating cost and temperature.

As the outside temperature increases, the cost to heat the home decreases. The numeric value of the regression coefficient provides more information. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.

Page 65: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

14-65

The Multiple Regression Equation – Interpreting the Regression Coefficients

b2 = The regression coefficient for mean attic insulation (X2) is -14.831.

The coefficient is negative and shows a negative correlation between heating cost and insulation.

The more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional

inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.

Page 66: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

14-66

The Multiple Regression Equation – Interpreting the Regression Coefficients

b3 = The regression coefficient for mean attic insulation (X3) is 6.101

The coefficient is positive and shows a negative correlation between heating cost and insulation.

As the age of the furnace goes up, the cost to heat the home increases.

Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.

Page 67: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.

Applying the Model for Estimation

What is the estimated heating cost for a home if:• the mean outside temperature is 30 degrees,• there are 5 inches of insulation in the attic, and• the furnace is 10 years old?

Page 68: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 69: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 70: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 71: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 72: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 73: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.
Page 74: . Please start your Daily Portfolio Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section.