Linear Regression Analysis and Least Square Methods

download Linear Regression Analysis and Least Square Methods

of 65

Transcript of Linear Regression Analysis and Least Square Methods

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    1/65

    LINEAR REGRESSION

    ANALYSIS AND LEASTSQUARE METHODSCDR SUMEET SINGH

    CDR SUNIL TYAGI

    CDR LOVEKESH THAKUR

    CDR ASHIM MAHAJAN

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    2/65

    THE SCHEME

    ORIGINSCATTER DIAGRAM AND REGRESSIONLEAST SQUARE METHODSSTANDARD ERROR ESTIMATESCORRELATION ANALYSISEXAMPLESLIMITATIONS ERRORS AND CAVEATS

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    3/65

    ORIGIN OF WORD REGRESSION

    First used as a statistical term in 1877 by Sir Francis

    Galton

    A study by him showed that children born to tallparents tend to move back or regress towards the

    mean height of the population.

    He designated the word regression as the name ofa general process of predicting one variable from

    another.

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    4/65

    WHY REGRESSION ANALYSIS

    The new CEO of a pharmaceutical firm wantsevidence that suggests the profit of the firm is relatedto the amount of spending in the R&D.

    The past data on R&D spending and the annual profitearned in past is available.

    Using the Regression techniques and by making useof the past known data, the estimation of futureoutcome can be made.

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    5/65

    PRACTICAL APPLICATIONS OF

    REGRESSION ANALYSISEpidemiology - Early evidence relating tobacco

    smoking to mortality and morbidity came from observationalstudies employing regression analysis.

    Finance- The capital asset pricing model uses linearregression for analyzing and quantifying the systematic risk ofan investment.

    Economics- Linear regression is the predominant empirical toolin economics. Eg., it is used to predict consumption spending, fixed

    investment spending, inventory investment, purchases of acountry's exports, spending on imports, the demand to hold liquidassets, labor demand and labor supply.

    Environmental Science - Linear regression finds application in

    a wide range of environmental science applications.

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    6/65

    INTRODUCTION TO REGRESSION

    ANALYSISHow to determine the relationship between variables.Regression analysisis used to:

    Predict the value of a dependent variable based on the

    value of at least one independent variable

    Explain the impact of changes in an independent variableon the dependent variable

    Dependent variable: the variable we wish to explain

    Independent variable: the variable used to explain the

    dependent variable

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    7/65

    SIMPLE LINEAR REGRESSION MODEL

    Only oneindependent variable, xRelationship between x and y is described

    by a linear function

    Changes in y are assumed to be caused by

    changes in x

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    8/65

    TYPES OF RELATIONSHIPS

    Direct Relationship: As the independent variableincreases, the dependent variable also increases.

    Positive Linear Relationship

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    9/65

    TYPES OF RELATIONSHIPS

    Inverse Relationship: In this relationship thedependent variable decreases with an increase in theindependent variable

    Negative Linear Relationship

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    10/65

    SCATTER PLOTS AND

    CORRELATIONA scatter plot(or scatter diagram) is used to showthe relationship between two variables

    Correlationanalysis is used to measure strength

    of the association (linear relationship) between twovariables

    Only concerned with strength of the

    relationship

    No causal effect is implied

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    11/65

    SCATTER PLOT EXAMPLES

    y

    x

    y

    x

    y

    y

    x

    x

    Linear relationships Curvilinear relationships

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    12/65

    SCATTER PLOT EXAMPLES

    y

    x

    y

    x

    y

    y

    x

    x

    Strong relationships Weak relationships

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    13/65

    SCATTER PLOT EXAMPLES

    y

    x

    y

    x

    No relationship

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    14/65

    ESTIMATION USING THE REGRESSION

    LINE

    (X2, Y2) or (4, 11)

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    15/65

    EQUATION FOR A STRAIGHT LINE

    Y intercept

    Slope of theLine

    Dependent

    Variable

    Independent

    variable

    bXaY

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    16/65

    LEAST SQUARES METHOD

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    17/65

    20

    LINEAR REGRESSION

    ASSUMPTIONSError values (e) are statistically independentError values are normally distributed for any givenvalue of x

    The probability distribution of the errors is normalThe probability distribution of the errors has constant

    varianceThe underlying relationship between the x variable

    and the y variable is linear

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    18/65

    21

    LINEAR REGRESSION

    Random Error for

    this x value

    y

    x

    Observed Value

    of y for xi

    Predicted Value

    of y for xi

    xi

    Slope = b

    Intercept = a

    i

    Y = a + bx

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    19/65

    METHOD OF LEAST SQUARES

    Means - square of the errorsErrors - is the difference between the actual

    data point and the corresponding point on theestimated line.

    Why least squares and not algebraic sum orabsolute sum?

    Lets go step by step

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    20/65

    GOOD FITGraph 1 Graph 2

    ALGEBRAIC SUM ABSOLUTE SUM

    Y-ValuesY-Values

    1

    2

    -3

    1 + 2 -3=0 4 +2+2=8

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    21/65

    GOOD FITGraph 1 Graph 2

    LEAST SQUARES

    Y-ValuesY-Values

    1

    2

    -3

    (1)^2 +( 2)^2 +(-3) ^2= 14 (4)^2 +(2)^2+(2)^2= 24

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    22/65

    GOOD FIT - ALGEBRAIC SUM

    Method 1ALGEBRAIC SUM

    Let us take a data samplethree points forease

    (4,8) (8,1) ( 12,6)Graph 1 and 2 show the types of line that could

    describe the association between the pointsBasic understanding of good fit

    A line should be a good fit if it minimises the error

    between the estimated points on a line and theactual points

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    23/65

    Two different lines - mean error = 0 .

    The problem with adding the individual errors is thecancelling effect of the positive and negative values

    Graph 1 Graph 2Y-Values

    Y-Values

    1

    2

    -3

    The individual random error terms ei have a mean of zero

    1 + 2 -3=0 4 -2 -2=0

    GOOD FIT - ALGEBRAIC SUM

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    24/65

    GOOD FIT - ABSOLUTE SUM

    Method 2ABSOLUTE SUM

    Let us take a data samplethree points forease

    (4,8) (8,1) ( 12,6)Graph 1 and 2 show the types of line that could

    describe the association between the pointsBasic understanding of good fit

    Let us now take the absolute values of errors

    without their signs - IeI - for the two lines

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    25/65

    Two different lines absolute sum seems to represent therelation between the variables better.

    Graph 1 Graph 2

    Y-ValuesY-Values

    1

    2

    3

    1 +2+3=6 4+2+2=8

    GOOD FIT - ABSOLUTE SUM

    GOOD FIT ABSOLUTE SUM

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    26/65

    GOOD FITABSOLUTE SUMBut before we reach any conclusion let us look at a peculiar

    situationData set { (2,4), (7,6), (10,2)

    Graph 1 Graph 2

    Y-ValuesY-Values

    3

    0 +0+3=3 1+2+1.5=4.5

    0

    0

    Graph 1 ignores the middle points but still has lower absolute error. Intuitively Graph 2

    should have given a better fit for the complete data. So what is the problem?

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    27/65

    Problem with absolute sum is that in the line

    passing through the middle of the data, Which isa better representative may have larger absolute

    error and hence get rejected.Sum of the absolute error method does not

    stress the magnitude of the error with respect tothe sample data.

    A representative line should have several

    small errors rather than a few large errors

    GOOD FIT - ABSOLUTE SUM

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    28/65

    GOOD FITLEAST SQUARESFor the same data set now let us use the least square methodwe square the individual errors before we add them

    Data set { (2,4), (7,6), (10,2)

    Graph 1 Graph 2

    Y-ValuesY-Values

    3

    0 +0+(3)^2=9 (1)^2+(2)^2+(1.5)^2= 7.25

    0

    0

    Graph 2 which Intuitively was giving a better fit of the data sample now shows the line to

    be giving a better fit than Graph 1

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    29/65

    Squaring the errors has the following advantages:-It magnifies or penalises the larger errors.

    It cancels the negative errors Sq of a negative value is a

    positive numberThe estimating line that minimises the sum of the

    square of errors is called the line of the least square

    method.

    GOOD FITLEAST SQUARES

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    30/65

    33

    LEAST SQUARES CRITERION

    a and b are obtained by finding the valuesof a and b that minimize the sum of the

    squared residuals

    2

    22

    bx))(a(y

    )y(ye

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    31/65

    34

    THE LEAST SQUARES EQUATION

    The formulas for b1 and b0 are:

    algebraic equivalent:

    and

    n

    xx

    n

    yxxy

    b2

    2)(

    2)(

    ))((

    xx

    yyxxb

    xbya

    xayi b

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    32/65

    35

    ais the estimated average value of ywhen the value of x is zero

    bis the estimated change in the

    average value of y as a result of a one-unit

    change in x

    INTERPRETATION OF THE

    SLOPE AND THE INTERCEPT

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    33/65

    ERRORS AND CORRELATION

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    34/65

    ERRORS

    How to Check Accuracy of Estimated LineHow to Check Reliability of Estimated Line

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    35/65

    DRUNKEN DRIVING AND

    HOSPITAL EMERGENCIES EXPD

    Checks Expenditure(Lakhs)1 123

    3 130

    7 11010 60

    15 21

    Exp

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    36/65

    p

    Accuracy Check

    Follow of PathIndividual Errors should cancel each other

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    37/65

    40

    CHECKING ACURACY

    y

    x

    Y = a + bx

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    38/65

    Reliability

    y y

    More Reliable Lesser Reliable

    Measured as Deviation around the regression line

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    39/65

    Standard Error of Estimate. . .

    The standard deviation of the variation of

    observations around the regression line is

    estimated by

    Where

    SSE = Sum of squares error = (Y Y)2

    n = Sample size

    2

    n

    SSEs

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    40/65

    INTERPRETING STANDARD ERROR

    Smaller the Se : Better is the reliability

    If Se = 0 : All points will lie on the regression line

    : 100% Reliabiity

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    41/65

    INTERPRETING STANDARD ERRORAssuming that observed points are normally distributed and

    Variance of distribution around each possible value of Y is same

    Se

    2 X Se

    68.2%

    95.5%

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    42/65

    45

    Interpreting SE

    y

    x

    Y = a + bx

    Y = a + bx +1Se

    Y = a + bx -1Se

    Y = a + bx +2Se

    = a + bx -2Se

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    43/65

    46

    Interpreting SE

    y

    x

    Y = a + bx

    Y = a + bx +1Se

    Y = a + bx -1Se

    Y = a + bx +2Se

    = a + bx -2Se

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    44/65

    Drunk Driving ChecksX:Checks Y:

    Expenditure(Lakhs)

    1 123

    3 130

    7 110

    10 60

    15 21

    Se = 1.88 LAKHS 68.2% ACCURACY WITHIN 1.88 LAKHS

    95.5% ACCURACY WITHIN 3.76 LAKHS

    EXCEL FUNCTION : STEYX

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    45/65

    CORRELATION ANALYSIS

    Describe Degree to which one variable islinearly related to another

    Used in conjunction with RA to explain how

    well the regression line explains the variationof dependent variableCoefficient of Determination

    Coefficient of Correlation

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    46/65

    COEFFICIENT OF

    DETERMINATIONMeasures Strength of AssociationDeveloped from Variations

    Fitted Regression Line (Y Y)2

    Their own mean (Y Y)2

    R2 = 1 - (Y Y)2

    (Y Y)2

    Varies Between 0 and 1

    I t tti R2

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    47/65

    Interpretting R2

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    48/65

    COEFFICIENT OF CORRELATION

    Another Measure of AssociationR = R2

    Varies Between -1 and 1

    -0.9 explains negative relation between x,y

    = 0.81 means 81% variation in Y is

    lained by Regression Line

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    49/65

    EXAMPLES AND USAGE OFREGRESSION

    Si l Li R i E l

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    50/65

    54

    Simple Linear Regression Example

    Cost accountants often estimate overhead

    based on the level of production. At the

    Standard Knitting Co., they have collected

    information on overhead expenses and

    units produced at different plants, and want

    to estimate a regression equation to predict

    future overhead.

    D t id d

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    51/65

    55

    Data provided

    OVERHEADS UNITS PRODUCED

    191 40

    170 42

    272 53

    155 35280 56

    173 39

    234 48

    116 30153 37

    178 40

    Si l Li R i E l

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    52/65

    56

    Simple Linear Regression Example

    Develop the Regression Equation

    Predict overhead when 50 units are producedCalculate the standard error of estimate

    Firstly , determine what is theDependent variable (y) = overhead

    Independent variable (x) = units produced

    Remember Least Squares Equation

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    53/65

    57

    Remember- Least Squares Equation

    The formulas for b and a are:

    algebraic equivalent:

    and

    n

    xx

    n

    yxxy

    b2

    2 )(

    2)(

    ))((

    xx

    yyxxb

    xbya

    bxay

    W ki t th bl

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    54/65

    58

    OVERHEAD(y)

    UNITS(x)

    y2 x2 xy

    1 191 40 36481 1600 7640

    2 170 42 28900 1764 7140

    3 272 53 73984 2809 14416

    4 155 35 24025 1225 5425

    5 280 56 78400 3136 15680

    6 173 39 29929 1521 6747

    7 234 48 54756 2304 11232

    8 116 30 13456 900 3480

    9 153 37 23409 1369 566110 178 40 31684 1600 7120

    Sums 1922 420 395024 18228 84541

    Means 192.2 42

    Working out the problem

    S b tit ti i f l

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    55/65

    59

    OVERHEAD(y)

    UNITS(x)

    y2 x2 xy

    Sums 1922 420 395024 18228 84541

    Means 192.2 42

    Substituting in formulae

    n

    xx

    n

    yxxy

    b 22

    )(10

    )420(18228

    10

    )1922)(420(84541

    2

    b

    4915.6

    588

    3817b

    )42)(4915.6(2.192 a

    xbya

    4430.80a

    R i E ti d l d

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    56/65

    60

    Regression Equation developed

    Predict overhead when 50 units are produced

    The predicted price for 50 units is 244.1320

    244.1320y

    (50)6.491580.4430-y

    6.4915x80.4430-y

    6.4915x80.4430-y bxay

    Remember Standard Error of

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    57/65

    61

    Remember Standard Error of

    EstimateSSE = Sum of squares error = (Y Y)2n = Sample size

    SSE = Sum of squares error = (Y Y)2

    n = Sample size

    However, easier for calculations is this :-

    2

    n

    SSEs

    2

    2

    nxybyays

    Substituting in Formulae

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    58/65

    62

    Substituting in Formulae.OVERHEAD

    (y)UNITS

    (x)y2 x2 xy

    Sums 1922 420 395024 18228 84541

    a = -80.4430 b= 6.4915

    2320.10

    s

    210

    )84541(4915.6)1922)(4430.80(395024

    s

    2

    2

    n

    xybyay

    s

    GRAPHICAL PRESENTATION

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    59/65

    63

    GRAPHICAL PRESENTATIONOverheadUnit Produced: Scatter plot

    and Regression line

    6.4915x80.4430-y

    Calculating the

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    60/65

    64

    Calculating theCorrelation Coefficient

    where:r = Sample correlation coefficient

    n = Sample size

    x = Value of the independent variable

    y = Value of the dependent variable

    Sample correlation coefficient:

    or the algebraic equivalent:

    ])yy(][)xx([

    )yy)(xx(r

    22

    ])y()y(n][)x()x(n[

    yxxynr

    2222

    C l l ti E l

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    61/65

    65

    Calculation ExampleTree

    Height

    Trunk

    Diameter

    y x xy y2 x2

    35 8 280 1225 64

    49 9 441 2401 81

    27 7 189 729 49

    33 6 198 1089 36

    60 13 780 3600 169

    21 7 147 441 49

    45 11 495 2025 121

    51 12 612 2601 144

    =321 =73 =3142 =14111 =713

    Calculation Example

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    62/65

    66

    Trunk Diameter, x

    TreeHeight,y

    Calculation Example

    r = 0.886 relatively strong positivelinear association between x and y

    0

    10

    20

    30

    40

    50

    60

    70

    0 2 4 6 8 10 12 14

    0.886

    ](321)][8(14111)(73)[8(713)

    (73)(321)8(3142)

    ]y)()y][n(x)()x[n(

    yxxynr

    22

    2222

    LIMITATIONS ERRORS & CAVEATS

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    63/65

    LIMITATIONS , ERRORS & CAVEATS

    Specific limited range over which regressionequation holds from which the sample was taken

    initially

    Regression & Correlation analyses do not

    determine cause and effect

    Conditions change and invalidate the regression

    equation since we use past trends to estimate

    future trends , values of variables change over

    time

    67

    LIMITATIONS ERRORS & CAVEATS

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    64/65

    LIMITATIONS , ERRORS & CAVEATS

    Misrepresenting the Coefficients of Correlationand Determination

    lCoeff of correlation is misinterpreted as a percentage

    lTotal variation in regression line is explained by coeff of

    determinationUse of Common Sense

    Use knowledge of the inherent limitation of tool

    Do not find statistical relationship between random

    samples with no common bond

    68

    REFERENCES

  • 8/10/2019 Linear Regression Analysis and Least Square Methods

    65/65

    REFERENCES

    STATISTICS FOR MANAGEMENTLEVIN & RUBINStatistics for Managers using Microsoft Excel, 5e 2008 Prentice-

    hall, Inc.Mba512 Simple Linear Regression Notes, uploaded by Wilkes

    University

    Wikipediadss.princeton.edu Online helpAnalysisresources.esri.com/help/9.3/.../com/.../regression_analysis_basic

    s.htmLinear regression , uploaded by MBA CORNER By Babasab Patil

    Linear regression , Tech_MXMultiple Linear Regression II James Neill, 2013

    Multiple PPTs on Slide Share

    http://dss.princeton.edu/online_help/online_help.htmhttp://dss.princeton.edu/online_help/analysis/analysis.htmhttp://dss.princeton.edu/online_help/analysis/analysis.htmhttp://dss.princeton.edu/online_help/online_help.htm