Multiple Regression. What Techniques Can Tell Us Chi Square- Do groups differ (nominal data)? T Test...

Post on 29-Dec-2015

218 views 1 download

Transcript of Multiple Regression. What Techniques Can Tell Us Chi Square- Do groups differ (nominal data)? T Test...

Multiple Regression

What Techniques Can Tell Us

• Chi Square- • Do groups differ (nominal data)?• T Test• Do Groups/Variables differ?• Gamma/Lambda/Kendall’s Tau etc• Are variables related to each other? (nominal

data)• Correlation• Are variables related to each other?

(ratio/interval data)

Interpreting Correlations

• 3 questions we can answer

1. Is there a relationship between 2 variables?

2. What is the direction of the relationship?

3. What is the Strength of a relationship

Correlations

1 .506**

. .000

1623 1608

.506** 1

.000 .

1608 1776

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

IDEO

PID

IDEO PID

Correlation is significant at the 0.01 level(2-tailed).

**.

Interpreting Correlations

• Are there limitations here? And if so, what?

• Don’t know amount of effect of one variable on other

• Don’t know impact of other variables

Correlations

1 .506**

. .000

1623 1608

.506** 1

.000 .

1608 1776

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

IDEO

PID

IDEO PID

Correlation is significant at the 0.01 level(2-tailed).

**.

VAR00002

3020100

VA

R8

80

60

40

20

0

-20

-40

-60

RND2

403020100-10-20-30-40

RN

D1

40

30

20

10

0

-10

-20

-30

-40

Strength

VAR00002

3020100

VA

R4

30

20

10

0

-10

VAR00002

3020100

VA

R6

30

20

10

0

-10

Strong Relationships

Perfect Relationship

VAR00002

3020100

VA

R0

00

01

30

20

10

0

Basic Equations

• Let your DV (Y)= total cost of bananas• Suppose you buy X lbs of bananas at $.49 a lb• How would you express this as an equation to

figure out how much your bananas are worth?• Y=.49 X• Can use for prediction• 10lbs=$4.90• 2lbs=$.98

Multivariate Equations

• Suppose you have a phone plan that charges – $5.95 a month– $.10 a minute instate long distance– $.08 a minute interstate long distance– $.01 a minute Local Calls

• How would you represent?

• Total=.1x1+.08x2+.01x3+5.95

Regression Analysis

• Lets you work the problem Backwards

• How much do different IVs contribute to a DV

• How do different IVs relate to DV

• Lets you build a model of more complicated relationships

• In addition to existence, direction, strength, gives you the amount of change

Expressing A regression equation

• Y=b1x1+b2x2+…..bixi+constant+error

• Error is part of probabilistic nature of social science

• Constant- what Y would equal if all Xs=0

• Estimation process- fit a line to data that minimizes the distance to all observed data points

Scatter Plots and Regression Lines

• PID and Ideology • Correlation here is .37, not bad, but you can see,

there are deviations in some cases

Linear Regression

2.00 4.00 6.00

ideo

0.00

2.00

4.00

6.00

pid

pid = -1.05 + 0.81 * ideoR-Square = 0.37

Fitting the Regression Line

• Goal: Minimize the squared distances (error) between predicted values of Y and observed values.

• Goal, explain the variance in Y in terms of X

• Error in prediction is unexplained variance

Party and Ideology

• Set up PID as DV, Ideology as IV, run analysis• Can also do Ideology as DV

Coefficientsa

-8.34E-03 .127 -.066 .948

.645 .027 .506 23.511 .000

(Constant)

IDEO

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: PIDa.

Coefficientsa

3.236 .059 54.924 .000

.397 .017 .506 23.511 .000

(Constant)

PID

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: IDEOa.

Goodness of Fit

• Measure of how much variance is explained by model you build

• R2= correlation coefficient squared • R2= proportion of variance explained• R2 is symetrical• In previous example R2 = .256• R2 ranges from 0-1• Adjusted R2 takes into account the degrees of

freedom, more appropriate measure

Run for the Border Using Multiple Regression

• Suppose that you and some friends ate at Taco bell every week for a year.

• For each meal, you know the total amount spent, and the number of each item, but not what each item cost.

• You could use multiple regression to get parameter estimates of the true values.

• Data set was constructed by choosing a random number (Between 0 and 4) of Bean Burritos, Tacos, Chalupas, Chicken Tacos, Beef Burritos, 7 Layer Burritos, and Soft drinks

• Data matrix includes a variable for number of each

Border Model 1

• We’ll look at impact of bean burritos on total

Model Summaryb

.039a .002 -.018 3.74743Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), BEANBURa.

Dependent Variable: TOTAL2b.

Coefficientsa

21.561 1.165 18.507 .000

-.131 .476 -.039 -.276 .784 1.000 1.000

(Constant)

BEANBUR

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Border Model 2

• Bean Burritos and Tacos

Model Summaryb

.257a .066 .028 3.66072Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), TACO, BEANBURa.

Dependent Variable: TOTAL2b.

Coefficientsa

19.655 1.538 12.781 .000

-.185 .466 -.055 -.397 .693 .996 1.004

.842 .457 .255 1.843 .071 .996 1.004

(Constant)

BEANBUR

TACO

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Border Model 3Model Summaryb

.298a .089 .032 3.65375Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), CHICKTAC, BEANBUR, TACOa.

Dependent Variable: TOTAL2b.

Coefficientsa

18.032 2.139 8.432 .000

-.160 .465 -.047 -.343 .733 .994 1.006

.891 .458 .270 1.945 .058 .986 1.014

.554 .508 .151 1.090 .281 .987 1.013

(Constant)

BEANBUR

TACO

CHICKTAC

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Model 4Model Summaryb

.744a .553 .505 2.61316Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), CHALUPA, CHICKTAC,BEANBUR, TACO, BEEFBUR

a.

Dependent Variable: TOTAL2b.

Coefficientsa

9.080 2.027 4.479 .000

5.312E-02 .334 .016 .159 .874 .984 1.016

.739 .332 .224 2.224 .031 .959 1.043

.955 .374 .260 2.550 .014 .931 1.074

1.617 .322 .514 5.029 .000 .929 1.076

1.707 .331 .516 5.153 .000 .967 1.034

(Constant)

BEANBUR

TACO

CHICKTAC

BEEFBUR

CHALUPA

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Linear Regression

16.00 20.00 24.00 28.00

total2

16.00000

20.00000

24.00000

28.00000U

nst

and

ard

ized

Pre

dic

ted

Val

ue

Unstandardized Predicted Value = 9.50 + 0.55 * total2R-Square = 0.55

Model 5Model Summaryb

.923a .852 .832 1.52228Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), SEVLAYR, BEEFBUR, TACO,CHALUPA, BEANBUR, CHICKTAC

a.

Dependent Variable: TOTAL2b.

Coefficientsa

3.426 1.322 2.592 .013

.568 .202 .169 2.810 .007 .914 1.095

.610 .194 .185 3.140 .003 .954 1.048

1.285 .221 .350 5.816 .000 .908 1.101

1.634 .187 .519 8.720 .000 .929 1.076

1.546 .194 .468 7.982 .000 .960 1.042

1.797 .189 .577 9.516 .000 .896 1.116

(Constant)

BEANBUR

TACO

CHICKTAC

BEEFBUR

CHALUPA

SEVLAYR

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Linear Regression

16.00 20.00 24.00 28.00

total2

16.00000

20.00000

24.00000

28.00000U

nst

and

ard

ized

Pre

dic

ted

Val

ue

Unstandardized Predicted Value = 3.15 + 0.85 * total2R-Square = 0.85

Full ModelModel Summaryb

1.000a 1.000 1.000 .00000Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), DRINK, SEVLAYR, BEEFBUR,TACO, BEANBUR, CHICKTAC, CHALUPA

a.

Dependent Variable: TOTAL2b.

Coefficientsa

2.269E-15 .000 . .

.690 .000 .205 . . .906 1.104

.790 .000 .239 . . .936 1.069

1.390 .000 .379 . . .904 1.107

1.590 .000 .505 . . .928 1.078

1.190 .000 .360 . . .893 1.120

1.890 .000 .607 . . .891 1.122

1.290 .000 .404 . . .909 1.100

(Constant)

BEANBUR

TACO

CHICKTAC

BEEFBUR

CHALUPA

SEVLAYR

DRINK

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Linear Regression

16.00 20.00 24.00 28.00

total2

16.00000

20.00000

24.00000

28.00000

Un

stan

dar

diz

ed P

red

icte

d V

alu

e

Unstandardized Predicted Value = 0.00 + 1.00 * total2R-Square = 1.00

Model 4 Revisited

• Bean Burrito- .69,Taco .79, Chalupa 1.19, Chicken taco 1.39, Beef Burrito 1.59,7 layer 1.89, Drink 1.29

Coefficientsa

9.080 2.027 4.479 .000

5.312E-02 .334 .016 .159 .874 .984 1.016

.739 .332 .224 2.224 .031 .959 1.043

.955 .374 .260 2.550 .014 .931 1.074

1.617 .322 .514 5.029 .000 .929 1.076

1.707 .331 .516 5.153 .000 .967 1.034

(Constant)

BEANBUR

TACO

CHICKTAC

BEEFBUR

CHALUPA

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: TOTAL2a.

Some Data Requirements for Regression

• DV must be interval or ratio, and continuous

• IVs should not be correlated with each other

• Error should be constant at high and low predicted value (homoschedasticity)

• Relationship must be linear• Errors of subsequent observations should

not be correlated (no serial correlation)

For Next time

• Multicolinearity

• Heteroskedasticity

• Interaction terms

• Pass out Stat Assignment II