Introduction to simple linear regression ASW, 12.1-12.2 Economics 224 – Notes for November 5,...
-
Upload
victoria-fletcher -
Category
Documents
-
view
242 -
download
3
Transcript of Introduction to simple linear regression ASW, 12.1-12.2 Economics 224 – Notes for November 5,...
Introduction to simple linear regression
ASW, 12.1-12.2
Economics 224 – Notes for November 5, 2008
Regression model• Relation between variables where changes in some
variables may “explain” or possibly “cause” changes in other variables.
• Explanatory variables are termed the independent variables and the variables to be explained are termed the dependent variables.
• Regression model estimates the nature of the relationship between the independent and dependent variables. – Change in dependent variables that results from changes
in independent variables, ie. size of the relationship.– Strength of the relationship.– Statistical significance of the relationship.
Examples• Dependent variable is retail price of gasoline in Regina –
independent variable is the price of crude oil.• Dependent variable is employment income – independent
variables might be hours of work, education, occupation, sex, age, region, years of experience, unionization status, etc.
• Price of a product and quantity produced or sold:– Quantity sold affected by price. Dependent variable is
quantity of product sold – independent variable is price. – Price affected by quantity offered for sale. Dependent
variable is price – independent variable is quantity sold.
0
100
200
300
400
500
60019
81M
01
1982
M01
1983
M01
1984
M01
1985
M01
1986
M01
1987
M01
1988
M01
1989
M01
1990
M01
1991
M01
1992
M01
1993
M01
1994
M01
1995
M01
1996
M01
1997
M01
1998
M01
1999
M01
2000
M01
2001
M01
2002
M01
2003
M01
2004
M01
2005
M01
2006
M01
2007
M01
2008
M01
0
20
40
60
80
100
120
140
160
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis
Source: CANSIM II Database (Vector v1576530 and v735048 respectively)
Bivariate and multivariate models
(Education) x y (Income)
(Education) x1
(Sex) x2
(Experience) x3
(Age) x4
y (Income)
Bivariate or simple regression model
Multivariate or multiple regression model
Price of wheat Quantity of wheat produced
Model with simultaneous relationship
Bivariate or simple linear regression (ASW, 466)• x is the independent variable• y is the dependent variable• The regression model is
• The model has two variables, the independent or explanatory variable, x, and the dependent variable y, the variable whose variation is to be explained.
• The relationship between x and y is a linear or straight line relationship.
• Two parameters to estimate – the slope of the line β1 and the y-intercept β0 (where the line crosses the vertical axis).
• ε is the unexplained, random, or error component. Much more on this later.
xy 10
Regression line• The regression model is• Data about x and y are obtained from a sample.• From the sample of values of x and y, estimates b0 of
β0 and b1 of β1 are obtained using the least squares or another method.
• The resulting estimate of the model is
• The symbol is termed “y hat” and refers to the predicted values of the dependent variable y that are associated with values of x, given the linear model.
xy 10
xbby 10ˆ y
Relationships
• Economic theory specifies the type and structure of relationships that are to be expected.
• Historical studies.• Studies conducted by other researchers – different
samples and related issues.• Speculation about possible relationships.• Correlation and causation.• Theoretical reasons for estimation of regression
relationships; empirical relationships need to have theoretical explanation.
Uses of regression
• Amount of change in a dependent variable that results from changes in the independent variable(s) – can be used to estimate elasticities, returns on investment in human capital, etc.
• Attempt to determine causes of phenomena.• Prediction and forecasting of sales, economic
growth, etc.• Support or negate theoretical model. • Modify and improve theoretical models and
explanations of phenomena.
Income hrs/week Income hrs/week
8000 38 8000 35
6400 50 18000 37.5
2500 15 5400 37
3000 30 15000 35
6000 50 3500 30
5000 38 24000 45
8000 50 1000 4
4000 20 8000 37.5
11000 45 2100 25
25000 50 8000 46
4000 20 4000 30
8800 35 1000 200
5000 30 2000 200
7000 43 4800 30
Summer Income as a Function of Hours Worked
0
5000
10000
15000
20000
25000
30000
0 10 20 30 40 50 60
Hours per Week
Inc
om
e
xy 2972461ˆ
R2 = 0.311Significance = 0.0031
15
Outliers
• Rare, extreme values may distort the outcome.– Could be an error.– Could be a very important observation.
• Outlier: more than 3 standard deviations from the mean.
GPA vs. Time Online
0
2
4
6
8
10
12
50 55 60 65 70 75 80 85 90 95 100
GPA
Tim
e O
nlin
e
GPA vs. Time Online
0
1
2
3
4
5
6
7
8
9
50 55 60 65 70 75 80 85 90 95 100
GPA
Tim
e O
nlin
e
0
20
40
60
80
100
120
140
160
0 100 200 300 400 500 600
Crude Oil Price Index (1997=100)
Re
gu
lar
ga
so
line
pri
ce
s, r
eg
ina
, ce
nts
pe
r lit
re
Source: CANSIM II Database (Vector v1576530 and v735048 respectively)
Correlation = 0.8703
19
U-Shaped Relationship
0
2
4
6
8
10
12
0 2 4 6 8 10 12
X
Y
Correlation = +0.12.
Next Wednesday, November 12
• Least squares method (ASW, 12.2)• Goodness of fit (ASW, 12.3)• Assumptions of model (ASW, 12.4)
• Assignment 5 will be available on UR Courses by late afternoon Friday, November 7.
• No class on Monday, November 10 but I will be in the office, CL247, from 2:30 – 4:00 p.m.