My regression lecture mk3 (uploaded to web ct)
-
Upload
chrisstiff -
Category
Technology
-
view
5.133 -
download
2
description
Transcript of My regression lecture mk3 (uploaded to web ct)
LEARNING OBJECTIVES
In this lecture you will learn: What simple and multiple regression mean. The rationale behind these forms of analyses How to conduct a simple bivariate and multiple
regression analyses using SPSS How to interpret the results of a regression
analysis
2
REGRESSION What is regression?
Regression is similar to correlation in the sense that both assess the relationship between two variables
Regression is used to predict values of an outcome variable (y) from one or more predictor variables (x)
Predictors must either be continuous or categorical with ONLY two categories
3
SIMPLE REGRESSION
Simple regression involves a single predictor variable and an outcome variable
Examines changes in an outcome variable from a predictor variable
Other names: Outcome = dependent, endogenous or
criterion variable.Predictor = independent, exogenous or
explanatory variable. 4
SIMPLE REGRESSION The relationship between two variables can be
expressed mathematically by the slope of line of best fit.
Usually expressed as
Y = a + b X
Outcome Intercept + (Coefficient x Predictor)
5
SIMPLE REGRESSIONWhere: Y = Outcome (e.g., amount of stupid behaviour)
a = Intercept/constant (average amount of stupid behaviour is nothing is drunk
b = Unit increment in the outcome that is explained by a unit increase in the predictor – line gradient
X = Predictor (e.g., amount of alcohol drunk)
6
LINE OF BEST FIT
0102030405060708090100
0 5 10 15 20 25 30
7Amount of alcohol
Stu
pid
beha
viou
r
LINE OF BEST FIT – POOR EXAMPLE
0102030405060708090100
0 5 10 15 20 25 30
Behaviour
8
Stu
pid
beha
viou
r
Number of pairs of socks
?
SIMPLE REGRESSION USING SPSS
Analyze RegressionLinear
9
10
SPSS OUTPUT
11
Variables Entered/Removedb
Model
Variables
Entered
Variables
Removed Method
1 amounta . Enter
a. All requested variables entered.
b. Dependent Variable: behaviour
SPSS OUTPUT
12
R = correlation between amount drunk and stupid behaviourR square = proportion of variance in outcome (behaviour) accounted for by the predictor (amount drunk)Adjusted R square = takes into account the sample size and the number of predictor variables
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .746a .556 .531 20.44929
a. Predictors: (Constant), amount
THE R2
The R2, increases with inclusion of more predictor variables into a regression model Commonly reported
The adjusted R2 however only increases when the new predictor(s) improves the model more than would be expected by chanceThe adj. R2 will always be equal to, or less
than R2
Particularly useful during variable selection stage of model building
13
SPSS OUTPUT
14
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 9421.425 1 9421.425 22.530 .000a
Residual 7527.125 18 418.174
Total 16948.550 19
a. Predictors: (Constant), amount
b. Dependent Variable: behaviour
SPSS OUTPUT
15
Beta = standardised regression coefficient and shows the degree to which a unit increase in the predictor variable produces a standard deviation change in the outcome variable with all other things constant
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) 16.250 8.042 2.021 .058
amount 2.227 .469 .746 4.747 .000
a. Dependent Variable: behaviour
REPORTING THE RESULTS OF SIMPLE REGRESSION
ß = 74, t(18) = 4.74, p < .001, R2 = .56
16Beta value t value and associate df and p R square
GENERATING DF AND T
df = n – p - 1 Where n is number of observations and p is number of parameters estimated (i.e.,
predictor(s) + constant).
NB This is for regression, df can be calculated differently for other tests!
17
ASSUMPTIONS OF SIMPLE REGRESSION
Outcome variable should be measured at interval level
When plotted the data should have a linear trend
18
SUMMARY OF SIMPLE REGRESSION
Used to predict the outcome variable from a predictor variable
Used when one predictor variable and one outcome variable
The relationship must be linear
19
MULTIPLE REGRESSION
Multiple regression is used when there is more than one predictor variable
Two major uses of multiple regression: Prediction Causal analysis
20
USES OF MULTIPLE REGRESSION
Multiple regression can be used to examine the following: How well a set of variables predict an outcome Which variable in a set of variables is the best
predictor of the outcome Whether a predictor variable still predicts the
outcome when another variable is controlled for.
21
MULTIPLE REGRESSION - EXAMPLE
22
Attendance at lectures
Books read
Motivation
Exam Performance
(Grade)
What might predict exam performance?
MULTIPLE REGRESSION USING SPSS
Analyze Regression Linear
23
24
MULTIPLE REGRESSION: SPSS OUTPUT
25
Variables Entered/Removedb
Lecturesattended,Number ofbooksread
a
. Enter
Model1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Grade achievedb.
MULTIPLE REGRESSION: SPSS OUTPUT
26
Model Summary
.605a .367 .336 13.711Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Lectures attended, Number ofbooks read
a.
MULTIPLE REGRESSION: SPSS OUTPUT
27
ANOVAb
4569.053 2 2284.526 12.153 .000a
7895.258 42 187.982
12464.311 44
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Lectures attended, Number of books reada.
Dependent Variable: Grade achievedb.
For overall model: F(2, 42) = 12.153, p<.001
MULTIPLE REGRESSION: SPSS OUTPUT
28
Coefficientsa
39.173 6.625 5.913 .000
3.832 1.712 .331 2.238 .031
1.290 .536 .356 2.407 .021
(Constant)
Number of books read
Lectures attended
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Grade achieveda.
Number of books read is significant predictorb=.33, t(42) = 2.24, p<.05
Lectures attended is a significant predictorb=.36, t(42) = 2.41, p<.05
MAJOR TYPES OF MULTIPLE REGRESSION
There are different types of multiple regression:Standard multiple regression
EnterHierarchical multiple regression
Block entrySequential multiple regression
Forward Backward Stepwise
29
}}Statistical model building
Theory-based model building
STANDARD MULTIPLE REGRESSION Most common method. All the predictor
variables are entered into the analysis simultaneously (i.e., enter)
Used to examine how much: An outcome variable is explained by a set of
predictor variables as a group Variance in the outcome variable is explained by
a single predictor (unique contribution).
30
EXAMPLE The different methods of regression and their
associated outputs will be illustrated using: Outcome variable
Essay mark Predictor variables
Number lectures attended (out of 20) Motivation of student (on scale from 0 – 100) Number of course books read (from 0 -10)
31
Attendance at lectures
Books read
Motivation
Exam Performance
(Grade)
ENTER OUTPUT
32
Variables Entered/Removedb
Model
Variables
Entered
Variables
Removed Method
1 books, lectures,
motivationa
. Enter
a. All requested variables entered.
b. Dependent Variable: essay
ENTER OUTPUT
33
R square = proportion of variance in outcome accounted for by the predictor variables Adjusted R square = takes into account the sample size and the number of predictor variables
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .918a .842 .812 6.84522
a. Predictors: (Constant), books, lectures, motivation
ENTER OUTPUT
34
ANOVAb
95293.006 3 31764.335 17.030 .000a
382376.0 205 1865.249
477669.0 208
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Gender identification, Negative impressions males holdabout females, Positive impressions males hold about females
a.
Dependent Variable: Negative impression about malesb.
ENTER OUTPUT
35
Beta = standardised regression coefficient and shows the degree to which the predictor variable predicts the outcome variable with all other things constant
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) 19.738 5.399 3.656 .002
lectures 1.217 .469 .490 2.595 .020
motivation .352 .144 .466 2.450 .026
books .509 .504 .103 1.010 .327
a. Dependent Variable: essay
HIERARCHICAL MULTIPLE REGRESSION aka sequential regression
Predictor variables entered in a prearranged order of steps (i.e., block entry)
Can examine how much variance is accounted for by a predictor when others already in the model
36
37
38
Don’t forget to choose the r-square change option from the Statistics menu
BLOCK ENTRY OUTPUT
39
Variables Entered/Removedb
Model
Variables
Entered
Variables
Removed Method
1 lecturesa . Enter
2 books,
motivationa
. Enter
a. All requested variables entered.
b. Dependent Variable: essay
BLOCK ENTRY OUTPUT
40
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .884a .781 .768 7.60374
2 .918b .842 .812 6.84522
a. Predictors: (Constant), lectures
b. Predictors: (Constant), lectures, books, motivation
Model Summary
Model
Change Statistics
R Square
Change F Change df1 df2 Sig. F Change
1 .781 64.069 1 18 .000
2 .061 3.105 2 16 .073
NB – this will be in one long line in the output!
BLOCK ENTRY OUTPUT
41
ANOVAc
Model Sum of Squares df Mean Square F Sig.
1 Regression 3704.295 1 3704.295 64.069 .000a
Residual 1040.705 18 57.817
Total 4745.000 19
2 Regression 3995.288 3 1331.763 28.422 .000b
Residual 749.712 16 46.857
Total 4745.000 19
a. Predictors: (Constant), lectures
b. Predictors: (Constant), lectures, books, motivation
c. Dependent Variable: essay
BLOCK ENTRY OUTPUT
42
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) 30.311 3.042 9.965 .000
lectures 2.194 .274 .884 8.004 .000
2 (Constant) 19.738 5.399 3.656 .002
lectures 1.217 .469 .490 2.595 .020
motivation .352 .144 .466 2.450 .026
books .509 .504 .103 1.010 .327
a. Dependent Variable: essay
STATISTICAL MULTIPLE REGRESSION
aka sequential techniques
43
STATISTICAL MULTIPLE REGRESSION aka sequential techniques
Relies on SPSS selecting which predictor variables to include in a model
Three types: Forward selection Backward selection Stepwise selection
44
Forward Starts with no variables in model, tries them all, includes best predictor, repeats
Backward Starts with ALL variable, removes lowest contributor, repeats
Stepwise Combination. Starts as Forward, checks that all variables are making contribution after each iteration (like Backward)
45
SUMMARY OF MODEL SELECTION TECHNIQUES
Theory basedEnter - all predictors entered together
(standard)Block entry – predictors entered in groups
(hierarchical)
Statistical basedForward – variables entered in to the model
based on their statistical significanceBackward – variables are removed from the
model based on their statistical significanceStepwise – variables are moved in and out of
the model based on their statistical significance46
ASSUMPTIONS OF REGRESSION
Linearity Relationship between the dependent and predictors must be
linear check: violations assessed using a scatter-plot
Independence Values on outcome variables must be independent
i.e., each value comes from a different participant Homoscedasity
At each level of the predictor variable the variance of the residual terms should be equal (i.e. all data points should be about as close to the line of best fit) Can indicate if all data is drawn from same sample
Normality Residuals/errors should be normally distributed
check : violations using histograms (e.g., outliers) Multicollinearity
Predictor variables should not be highly correlated
47
OTHER IMPORTANT ISSUES Regression in this case is for
continuous/interval or categorical predictors with ONLY two categories More than two are possible (dummy coding)
Outcome must be continuous/interval
Sample Size Multiple regression needs a relatively large sample
size some authors suggest using between 10 and 20
participants per predictor variable others argue should be 50 cases more than the
number of predictors to be sure that one is not capitalising on chance effects
48
OUTCOMES So – what is regression?
This lecture has: introduced the different types regression detailed how to conduct and interpret regression
using SPSS described the underlying assumptions of regression outlined the data types and sample sizes needed
for regression outlined the major limitation of a regression
analysis
49
REFERENCES
Allison, P. D. (1999). Multiple regression: a primer. Thousand oaks: pine press.
Clark-carter, D. (2004). Quantitative psychological research: A student’s handbook. Hove: psychology press.
Coolican, H. (2004). Research methods and statistics in psychology (4th ed). Oxon: Hodder Arnold.
George, D., & Mallery, P. (2005). SPSS for windows step by step (5th ed). Pearson: Boston.
Field, A. (2002). Discovering statistics using SPSS for windows. London: sage publications.
Pallant, J. (2002). SPSS survival manual. Buckingham: open university press.
http://www.statsoft.com/textbook/stmulreg.html#aassumption
50