Linear Regression Analysis and Least Square Methods
-
Upload
lovekeshthakur -
Category
Documents
-
view
245 -
download
1
Transcript of Linear Regression Analysis and Least Square Methods
-
8/10/2019 Linear Regression Analysis and Least Square Methods
1/65
LINEAR REGRESSION
ANALYSIS AND LEASTSQUARE METHODSCDR SUMEET SINGH
CDR SUNIL TYAGI
CDR LOVEKESH THAKUR
CDR ASHIM MAHAJAN
-
8/10/2019 Linear Regression Analysis and Least Square Methods
2/65
THE SCHEME
ORIGINSCATTER DIAGRAM AND REGRESSIONLEAST SQUARE METHODSSTANDARD ERROR ESTIMATESCORRELATION ANALYSISEXAMPLESLIMITATIONS ERRORS AND CAVEATS
-
8/10/2019 Linear Regression Analysis and Least Square Methods
3/65
ORIGIN OF WORD REGRESSION
First used as a statistical term in 1877 by Sir Francis
Galton
A study by him showed that children born to tallparents tend to move back or regress towards the
mean height of the population.
He designated the word regression as the name ofa general process of predicting one variable from
another.
-
8/10/2019 Linear Regression Analysis and Least Square Methods
4/65
WHY REGRESSION ANALYSIS
The new CEO of a pharmaceutical firm wantsevidence that suggests the profit of the firm is relatedto the amount of spending in the R&D.
The past data on R&D spending and the annual profitearned in past is available.
Using the Regression techniques and by making useof the past known data, the estimation of futureoutcome can be made.
-
8/10/2019 Linear Regression Analysis and Least Square Methods
5/65
PRACTICAL APPLICATIONS OF
REGRESSION ANALYSISEpidemiology - Early evidence relating tobacco
smoking to mortality and morbidity came from observationalstudies employing regression analysis.
Finance- The capital asset pricing model uses linearregression for analyzing and quantifying the systematic risk ofan investment.
Economics- Linear regression is the predominant empirical toolin economics. Eg., it is used to predict consumption spending, fixed
investment spending, inventory investment, purchases of acountry's exports, spending on imports, the demand to hold liquidassets, labor demand and labor supply.
Environmental Science - Linear regression finds application in
a wide range of environmental science applications.
-
8/10/2019 Linear Regression Analysis and Least Square Methods
6/65
INTRODUCTION TO REGRESSION
ANALYSISHow to determine the relationship between variables.Regression analysisis used to:
Predict the value of a dependent variable based on the
value of at least one independent variable
Explain the impact of changes in an independent variableon the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain the
dependent variable
-
8/10/2019 Linear Regression Analysis and Least Square Methods
7/65
SIMPLE LINEAR REGRESSION MODEL
Only oneindependent variable, xRelationship between x and y is described
by a linear function
Changes in y are assumed to be caused by
changes in x
-
8/10/2019 Linear Regression Analysis and Least Square Methods
8/65
TYPES OF RELATIONSHIPS
Direct Relationship: As the independent variableincreases, the dependent variable also increases.
Positive Linear Relationship
-
8/10/2019 Linear Regression Analysis and Least Square Methods
9/65
TYPES OF RELATIONSHIPS
Inverse Relationship: In this relationship thedependent variable decreases with an increase in theindependent variable
Negative Linear Relationship
-
8/10/2019 Linear Regression Analysis and Least Square Methods
10/65
SCATTER PLOTS AND
CORRELATIONA scatter plot(or scatter diagram) is used to showthe relationship between two variables
Correlationanalysis is used to measure strength
of the association (linear relationship) between twovariables
Only concerned with strength of the
relationship
No causal effect is implied
-
8/10/2019 Linear Regression Analysis and Least Square Methods
11/65
SCATTER PLOT EXAMPLES
y
x
y
x
y
y
x
x
Linear relationships Curvilinear relationships
-
8/10/2019 Linear Regression Analysis and Least Square Methods
12/65
SCATTER PLOT EXAMPLES
y
x
y
x
y
y
x
x
Strong relationships Weak relationships
-
8/10/2019 Linear Regression Analysis and Least Square Methods
13/65
SCATTER PLOT EXAMPLES
y
x
y
x
No relationship
-
8/10/2019 Linear Regression Analysis and Least Square Methods
14/65
ESTIMATION USING THE REGRESSION
LINE
(X2, Y2) or (4, 11)
-
8/10/2019 Linear Regression Analysis and Least Square Methods
15/65
EQUATION FOR A STRAIGHT LINE
Y intercept
Slope of theLine
Dependent
Variable
Independent
variable
bXaY
-
8/10/2019 Linear Regression Analysis and Least Square Methods
16/65
LEAST SQUARES METHOD
-
8/10/2019 Linear Regression Analysis and Least Square Methods
17/65
20
LINEAR REGRESSION
ASSUMPTIONSError values (e) are statistically independentError values are normally distributed for any givenvalue of x
The probability distribution of the errors is normalThe probability distribution of the errors has constant
varianceThe underlying relationship between the x variable
and the y variable is linear
-
8/10/2019 Linear Regression Analysis and Least Square Methods
18/65
21
LINEAR REGRESSION
Random Error for
this x value
y
x
Observed Value
of y for xi
Predicted Value
of y for xi
xi
Slope = b
Intercept = a
i
Y = a + bx
-
8/10/2019 Linear Regression Analysis and Least Square Methods
19/65
METHOD OF LEAST SQUARES
Means - square of the errorsErrors - is the difference between the actual
data point and the corresponding point on theestimated line.
Why least squares and not algebraic sum orabsolute sum?
Lets go step by step
-
8/10/2019 Linear Regression Analysis and Least Square Methods
20/65
GOOD FITGraph 1 Graph 2
ALGEBRAIC SUM ABSOLUTE SUM
Y-ValuesY-Values
1
2
-3
1 + 2 -3=0 4 +2+2=8
-
8/10/2019 Linear Regression Analysis and Least Square Methods
21/65
GOOD FITGraph 1 Graph 2
LEAST SQUARES
Y-ValuesY-Values
1
2
-3
(1)^2 +( 2)^2 +(-3) ^2= 14 (4)^2 +(2)^2+(2)^2= 24
-
8/10/2019 Linear Regression Analysis and Least Square Methods
22/65
GOOD FIT - ALGEBRAIC SUM
Method 1ALGEBRAIC SUM
Let us take a data samplethree points forease
(4,8) (8,1) ( 12,6)Graph 1 and 2 show the types of line that could
describe the association between the pointsBasic understanding of good fit
A line should be a good fit if it minimises the error
between the estimated points on a line and theactual points
-
8/10/2019 Linear Regression Analysis and Least Square Methods
23/65
Two different lines - mean error = 0 .
The problem with adding the individual errors is thecancelling effect of the positive and negative values
Graph 1 Graph 2Y-Values
Y-Values
1
2
-3
The individual random error terms ei have a mean of zero
1 + 2 -3=0 4 -2 -2=0
GOOD FIT - ALGEBRAIC SUM
-
8/10/2019 Linear Regression Analysis and Least Square Methods
24/65
GOOD FIT - ABSOLUTE SUM
Method 2ABSOLUTE SUM
Let us take a data samplethree points forease
(4,8) (8,1) ( 12,6)Graph 1 and 2 show the types of line that could
describe the association between the pointsBasic understanding of good fit
Let us now take the absolute values of errors
without their signs - IeI - for the two lines
-
8/10/2019 Linear Regression Analysis and Least Square Methods
25/65
Two different lines absolute sum seems to represent therelation between the variables better.
Graph 1 Graph 2
Y-ValuesY-Values
1
2
3
1 +2+3=6 4+2+2=8
GOOD FIT - ABSOLUTE SUM
GOOD FIT ABSOLUTE SUM
-
8/10/2019 Linear Regression Analysis and Least Square Methods
26/65
GOOD FITABSOLUTE SUMBut before we reach any conclusion let us look at a peculiar
situationData set { (2,4), (7,6), (10,2)
Graph 1 Graph 2
Y-ValuesY-Values
3
0 +0+3=3 1+2+1.5=4.5
0
0
Graph 1 ignores the middle points but still has lower absolute error. Intuitively Graph 2
should have given a better fit for the complete data. So what is the problem?
-
8/10/2019 Linear Regression Analysis and Least Square Methods
27/65
Problem with absolute sum is that in the line
passing through the middle of the data, Which isa better representative may have larger absolute
error and hence get rejected.Sum of the absolute error method does not
stress the magnitude of the error with respect tothe sample data.
A representative line should have several
small errors rather than a few large errors
GOOD FIT - ABSOLUTE SUM
-
8/10/2019 Linear Regression Analysis and Least Square Methods
28/65
GOOD FITLEAST SQUARESFor the same data set now let us use the least square methodwe square the individual errors before we add them
Data set { (2,4), (7,6), (10,2)
Graph 1 Graph 2
Y-ValuesY-Values
3
0 +0+(3)^2=9 (1)^2+(2)^2+(1.5)^2= 7.25
0
0
Graph 2 which Intuitively was giving a better fit of the data sample now shows the line to
be giving a better fit than Graph 1
-
8/10/2019 Linear Regression Analysis and Least Square Methods
29/65
Squaring the errors has the following advantages:-It magnifies or penalises the larger errors.
It cancels the negative errors Sq of a negative value is a
positive numberThe estimating line that minimises the sum of the
square of errors is called the line of the least square
method.
GOOD FITLEAST SQUARES
-
8/10/2019 Linear Regression Analysis and Least Square Methods
30/65
33
LEAST SQUARES CRITERION
a and b are obtained by finding the valuesof a and b that minimize the sum of the
squared residuals
2
22
bx))(a(y
)y(ye
-
8/10/2019 Linear Regression Analysis and Least Square Methods
31/65
34
THE LEAST SQUARES EQUATION
The formulas for b1 and b0 are:
algebraic equivalent:
and
n
xx
n
yxxy
b2
2)(
2)(
))((
xx
yyxxb
xbya
xayi b
-
8/10/2019 Linear Regression Analysis and Least Square Methods
32/65
35
ais the estimated average value of ywhen the value of x is zero
bis the estimated change in the
average value of y as a result of a one-unit
change in x
INTERPRETATION OF THE
SLOPE AND THE INTERCEPT
-
8/10/2019 Linear Regression Analysis and Least Square Methods
33/65
ERRORS AND CORRELATION
-
8/10/2019 Linear Regression Analysis and Least Square Methods
34/65
ERRORS
How to Check Accuracy of Estimated LineHow to Check Reliability of Estimated Line
-
8/10/2019 Linear Regression Analysis and Least Square Methods
35/65
DRUNKEN DRIVING AND
HOSPITAL EMERGENCIES EXPD
Checks Expenditure(Lakhs)1 123
3 130
7 11010 60
15 21
Exp
-
8/10/2019 Linear Regression Analysis and Least Square Methods
36/65
p
Accuracy Check
Follow of PathIndividual Errors should cancel each other
-
8/10/2019 Linear Regression Analysis and Least Square Methods
37/65
40
CHECKING ACURACY
y
x
Y = a + bx
-
8/10/2019 Linear Regression Analysis and Least Square Methods
38/65
Reliability
y y
More Reliable Lesser Reliable
Measured as Deviation around the regression line
-
8/10/2019 Linear Regression Analysis and Least Square Methods
39/65
Standard Error of Estimate. . .
The standard deviation of the variation of
observations around the regression line is
estimated by
Where
SSE = Sum of squares error = (Y Y)2
n = Sample size
2
n
SSEs
-
8/10/2019 Linear Regression Analysis and Least Square Methods
40/65
INTERPRETING STANDARD ERROR
Smaller the Se : Better is the reliability
If Se = 0 : All points will lie on the regression line
: 100% Reliabiity
-
8/10/2019 Linear Regression Analysis and Least Square Methods
41/65
INTERPRETING STANDARD ERRORAssuming that observed points are normally distributed and
Variance of distribution around each possible value of Y is same
Se
2 X Se
68.2%
95.5%
-
8/10/2019 Linear Regression Analysis and Least Square Methods
42/65
45
Interpreting SE
y
x
Y = a + bx
Y = a + bx +1Se
Y = a + bx -1Se
Y = a + bx +2Se
= a + bx -2Se
-
8/10/2019 Linear Regression Analysis and Least Square Methods
43/65
46
Interpreting SE
y
x
Y = a + bx
Y = a + bx +1Se
Y = a + bx -1Se
Y = a + bx +2Se
= a + bx -2Se
-
8/10/2019 Linear Regression Analysis and Least Square Methods
44/65
Drunk Driving ChecksX:Checks Y:
Expenditure(Lakhs)
1 123
3 130
7 110
10 60
15 21
Se = 1.88 LAKHS 68.2% ACCURACY WITHIN 1.88 LAKHS
95.5% ACCURACY WITHIN 3.76 LAKHS
EXCEL FUNCTION : STEYX
-
8/10/2019 Linear Regression Analysis and Least Square Methods
45/65
CORRELATION ANALYSIS
Describe Degree to which one variable islinearly related to another
Used in conjunction with RA to explain how
well the regression line explains the variationof dependent variableCoefficient of Determination
Coefficient of Correlation
-
8/10/2019 Linear Regression Analysis and Least Square Methods
46/65
COEFFICIENT OF
DETERMINATIONMeasures Strength of AssociationDeveloped from Variations
Fitted Regression Line (Y Y)2
Their own mean (Y Y)2
R2 = 1 - (Y Y)2
(Y Y)2
Varies Between 0 and 1
I t tti R2
-
8/10/2019 Linear Regression Analysis and Least Square Methods
47/65
Interpretting R2
-
8/10/2019 Linear Regression Analysis and Least Square Methods
48/65
COEFFICIENT OF CORRELATION
Another Measure of AssociationR = R2
Varies Between -1 and 1
-0.9 explains negative relation between x,y
= 0.81 means 81% variation in Y is
lained by Regression Line
-
8/10/2019 Linear Regression Analysis and Least Square Methods
49/65
EXAMPLES AND USAGE OFREGRESSION
Si l Li R i E l
-
8/10/2019 Linear Regression Analysis and Least Square Methods
50/65
54
Simple Linear Regression Example
Cost accountants often estimate overhead
based on the level of production. At the
Standard Knitting Co., they have collected
information on overhead expenses and
units produced at different plants, and want
to estimate a regression equation to predict
future overhead.
D t id d
-
8/10/2019 Linear Regression Analysis and Least Square Methods
51/65
55
Data provided
OVERHEADS UNITS PRODUCED
191 40
170 42
272 53
155 35280 56
173 39
234 48
116 30153 37
178 40
Si l Li R i E l
-
8/10/2019 Linear Regression Analysis and Least Square Methods
52/65
56
Simple Linear Regression Example
Develop the Regression Equation
Predict overhead when 50 units are producedCalculate the standard error of estimate
Firstly , determine what is theDependent variable (y) = overhead
Independent variable (x) = units produced
Remember Least Squares Equation
-
8/10/2019 Linear Regression Analysis and Least Square Methods
53/65
57
Remember- Least Squares Equation
The formulas for b and a are:
algebraic equivalent:
and
n
xx
n
yxxy
b2
2 )(
2)(
))((
xx
yyxxb
xbya
bxay
W ki t th bl
-
8/10/2019 Linear Regression Analysis and Least Square Methods
54/65
58
OVERHEAD(y)
UNITS(x)
y2 x2 xy
1 191 40 36481 1600 7640
2 170 42 28900 1764 7140
3 272 53 73984 2809 14416
4 155 35 24025 1225 5425
5 280 56 78400 3136 15680
6 173 39 29929 1521 6747
7 234 48 54756 2304 11232
8 116 30 13456 900 3480
9 153 37 23409 1369 566110 178 40 31684 1600 7120
Sums 1922 420 395024 18228 84541
Means 192.2 42
Working out the problem
S b tit ti i f l
-
8/10/2019 Linear Regression Analysis and Least Square Methods
55/65
59
OVERHEAD(y)
UNITS(x)
y2 x2 xy
Sums 1922 420 395024 18228 84541
Means 192.2 42
Substituting in formulae
n
xx
n
yxxy
b 22
)(10
)420(18228
10
)1922)(420(84541
2
b
4915.6
588
3817b
)42)(4915.6(2.192 a
xbya
4430.80a
R i E ti d l d
-
8/10/2019 Linear Regression Analysis and Least Square Methods
56/65
60
Regression Equation developed
Predict overhead when 50 units are produced
The predicted price for 50 units is 244.1320
244.1320y
(50)6.491580.4430-y
6.4915x80.4430-y
6.4915x80.4430-y bxay
Remember Standard Error of
-
8/10/2019 Linear Regression Analysis and Least Square Methods
57/65
61
Remember Standard Error of
EstimateSSE = Sum of squares error = (Y Y)2n = Sample size
SSE = Sum of squares error = (Y Y)2
n = Sample size
However, easier for calculations is this :-
2
n
SSEs
2
2
nxybyays
Substituting in Formulae
-
8/10/2019 Linear Regression Analysis and Least Square Methods
58/65
62
Substituting in Formulae.OVERHEAD
(y)UNITS
(x)y2 x2 xy
Sums 1922 420 395024 18228 84541
a = -80.4430 b= 6.4915
2320.10
s
210
)84541(4915.6)1922)(4430.80(395024
s
2
2
n
xybyay
s
GRAPHICAL PRESENTATION
-
8/10/2019 Linear Regression Analysis and Least Square Methods
59/65
63
GRAPHICAL PRESENTATIONOverheadUnit Produced: Scatter plot
and Regression line
6.4915x80.4430-y
Calculating the
-
8/10/2019 Linear Regression Analysis and Least Square Methods
60/65
64
Calculating theCorrelation Coefficient
where:r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Sample correlation coefficient:
or the algebraic equivalent:
])yy(][)xx([
)yy)(xx(r
22
])y()y(n][)x()x(n[
yxxynr
2222
C l l ti E l
-
8/10/2019 Linear Regression Analysis and Least Square Methods
61/65
65
Calculation ExampleTree
Height
Trunk
Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
-
8/10/2019 Linear Regression Analysis and Least Square Methods
62/65
66
Trunk Diameter, x
TreeHeight,y
Calculation Example
r = 0.886 relatively strong positivelinear association between x and y
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
0.886
](321)][8(14111)(73)[8(713)
(73)(321)8(3142)
]y)()y][n(x)()x[n(
yxxynr
22
2222
LIMITATIONS ERRORS & CAVEATS
-
8/10/2019 Linear Regression Analysis and Least Square Methods
63/65
LIMITATIONS , ERRORS & CAVEATS
Specific limited range over which regressionequation holds from which the sample was taken
initially
Regression & Correlation analyses do not
determine cause and effect
Conditions change and invalidate the regression
equation since we use past trends to estimate
future trends , values of variables change over
time
67
LIMITATIONS ERRORS & CAVEATS
-
8/10/2019 Linear Regression Analysis and Least Square Methods
64/65
LIMITATIONS , ERRORS & CAVEATS
Misrepresenting the Coefficients of Correlationand Determination
lCoeff of correlation is misinterpreted as a percentage
lTotal variation in regression line is explained by coeff of
determinationUse of Common Sense
Use knowledge of the inherent limitation of tool
Do not find statistical relationship between random
samples with no common bond
68
REFERENCES
-
8/10/2019 Linear Regression Analysis and Least Square Methods
65/65
REFERENCES
STATISTICS FOR MANAGEMENTLEVIN & RUBINStatistics for Managers using Microsoft Excel, 5e 2008 Prentice-
hall, Inc.Mba512 Simple Linear Regression Notes, uploaded by Wilkes
University
Wikipediadss.princeton.edu Online helpAnalysisresources.esri.com/help/9.3/.../com/.../regression_analysis_basic
s.htmLinear regression , uploaded by MBA CORNER By Babasab Patil
Linear regression , Tech_MXMultiple Linear Regression II James Neill, 2013
Multiple PPTs on Slide Share
http://dss.princeton.edu/online_help/online_help.htmhttp://dss.princeton.edu/online_help/analysis/analysis.htmhttp://dss.princeton.edu/online_help/analysis/analysis.htmhttp://dss.princeton.edu/online_help/online_help.htm