Chapter 13 Simple Linear Regression - casrilanka.com eight - linear... · 2 Statistics for Managers...

21
1 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter 13 Simple Linear Regression Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-2 Scatter Plots Scatter plots are used for numerical data consisting of paired observations taken from two numerical variables One variable is measured on the vertical axis and the other variable is measured on the horizontal axis

Transcript of Chapter 13 Simple Linear Regression - casrilanka.com eight - linear... · 2 Statistics for Managers...

  • 1

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-1

    Statistics for ManagersUsing Microsoft® Excel

    5th Edition

    Chapter 13

    Simple Linear Regression

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-2

    Scatter Plots

    Scatter plots are used for numerical data consistingof paired observations taken from two numericalvariables

    One variable is measured on the vertical axis and theother variable is measured on the horizontal axis

  • 2

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-3

    Scatter Plot Example

    Volumeper day

    Cost perday

    23 125

    26 140

    29 146

    33 160

    38 167

    42 170

    50 188

    55 195

    60 200

    Cost per Day vs. Production Volume

    0

    50

    100

    150

    200

    250

    20 30 40 50 60 70

    Volume per Day

    Co

    st

    per

    Day

    Class Exercise

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-4

  • 3

    Class Exercise

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-5

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-6

    Correlation vs. Regression

    A scatter plot (or scatter diagram) can beused to show the relationship between twonumerical variables

    Correlation analysis is used to measurestrength of the association (linearrelationship) between two variables Correlation is only concerned with strength of

    the relationship No causal effect is implied with correlation

  • 4

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-7

    Regression Analysis

    Regression analysis is used to:

    Predict the value of a dependent variable basedon the value of at least one independent variable

    Explain the impact of changes in an independentvariable on the dependent variable

    Dependent variable: the variable you wish to explain

    Independent variable: the variable used to explainthe dependent variable

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-8

    Simple Linear RegressionModel

    Only one independent variable, X Relationship between X and Y is described

    by a linear function

    Changes in Y are related to changes in X

  • 5

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-9

    Types of Relationships

    Y

    X

    Y

    X

    Y

    Y

    X

    X

    Linear relationships Curvilinear relationships

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-10

    Types of Relationships

    Y

    X

    Y

    X

    Y

    Y

    X

    X

    Strong relationships Weak relationships

  • 6

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-11

    Types of Relationships

    Y

    X

    Y

    XNo relationship

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-12

    The Linear Regression Model

    ii10i εXββY Linear component

    The population regression model:

    PopulationY intercept

    PopulationSlopeCoefficient

    RandomErrorterm

    DependentVariable

    IndependentVariable

    Random Errorcomponent

  • 7

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-13

    The Linear Regression Model

    Random Errorfor this Xi value

    Y

    X

    Observed Valueof Y for Xi

    Predicted Valueof Y for Xi

    ii10i εXββY

    Xi

    Slope = β1

    Intercept = β0

    εi

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-14

    Linear Regression Equation

    i10i XbaŶ

    The simple linear regression equation provides anestimate of the population regression line

    Estimate of theregressionintercept

    Estimate of theregression slope

    Estimated (orpredicted) Yvalue forobservation i

    Value of X forobservation i

  • 8

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-15

    The Least Squares Method

    b0 and b1 are obtained by finding the values of b0and b1 that minimize the sum of the squared

    differences between Y and :

    2i10i

    2ii ))Xb(b(Ymin)Ŷ(Ymin

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-16

    Finding the Least SquaresEquation

    The coefficients a0 and b1 , and other regressionresults in this chapter, will be found using thefollowing formulas

    = Σ − . Σ= Σ − Σ . ΣΣ − (Σ )

    i10i XbaŶ

  • 9

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-17

    Interpretation of theIntercept and the Slope

    a0 is the estimated mean value of Y when thevalue of X is zero

    b1 is the estimated change in the mean value ofY for every one-unit change in X

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-18

    Linear Regression Example

    A real estate agent wishes to examine therelationship between the selling price of ahome and its size (measured in square feet)

    A random sample of 10 houses is selected Dependent variable (Y) = house price in

    $1000s

    Independent variable (X) = square feet

  • 10

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-19

    Linear Regression ExampleData

    House Price in $1000s(Y)

    Square Feet(X)

    245 1400

    312 1600

    279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-20

    Linear Regression ExampleData

    Square Feet(X)

    House Price in$1000s

    (Y)XY X2

    1400 245

    1600 312

    1700 279

    1875 308

    1100 199

    1550 219

    2350 405

    2450 324

    1425 319

    1700 255

  • 11

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-21

    Linear Regression ExampleScatterplot

    050

    100150200250300350400450

    0 500 1000 1500 2000 2500 3000

    Square Feet

    Hou

    se P

    rice

    ($10

    00s)

    House price model: scatter plot

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-22

    Linear Regression ExampleExcel Output

    Regression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVAdf SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    The regression equation is:feet)(square0.1097798.24833pricehouse

  • 12

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-23

    Linear Regression ExampleGraphical Representation

    050

    100150200250300350400450

    0 500 1000 1500 2000 2500 3000

    Square Feet

    Hou

    se P

    rice

    ($10

    00s)

    House price model: scatter plot and regression line

    feet)(square0.1097798.24833pricehouse

    Slope= 0.10977

    Intercept= 98.248

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-24

    Linear Regression ExampleInterpretation of b0

    b0 is the estimated mean value of Y when the valueof X is zero (if X = 0 is in the range of observed Xvalues)

    Because the square footage of the house cannot be 0,the Y intercept has no practical application.

    feet)(square0.1097798.24833pricehouse

  • 13

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-25

    Linear Regression ExampleInterpretation of b1

    b1 measures the mean change in the average value ofY as a result of a one-unit change in X

    Here, b1 = .10977 tells us that the mean value of ahouse increases by .10977($1000) = $109.77, onaverage, for each additional one square foot of size

    feet)(square0.1097798.24833pricehouse

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-26

    Linear Regression ExampleMaking Predictions

    317.85

    0)0.1098(20098.25

    (sq.ft.)0.109898.25pricehouse

    Predict the price for a house with 2000 square feet:

    The predicted price for a house with 2000 square feetis 317.85($1,000s) = $317,850

  • 14

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-27

    Linear Regression ExampleMaking Predictions

    050

    100150200250300350400450

    0 500 1000 1500 2000 2500 3000

    Square Feet

    Hou

    se P

    rice

    ($10

    00s)

    When using a regression model for prediction, onlypredict within the relevant range of data

    Relevant range forinterpolation

    Do not try toextrapolate beyond

    the range ofobserved X’s

    Class Exercises

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-28

  • 15

    Class Exercises

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-29

    a. Construct Scatter Plotb. Interpret the slope, b1 in this

    problemc. Predict the mean audited

    newsstand sales for a magazinethat reports newsstand sales of400 000.

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-30

    Measures of VariationTotal variation is made up of two parts:

    SSESSRSST Total Sum ofSquares

    Regression Sum ofSquares

    Error Sum ofSquares

    2i )YY(SST 2ii )ŶY(SSE 2i )YŶ(SSRwhere:

    = Mean value of the dependent variable

    Yi = Observed values of the dependent variable

    i = Predicted value of Y for the given Xi valueŶ

    Y

  • 16

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-31

    Measures of Variation

    SST = total sum of squares Measures the variation of the Yi values around

    their mean Y

    SSR = regression sum of squares Explained variation attributable to the

    relationship between X and Y

    SSE = error sum of squares Variation attributable to factors other than the

    relationship between X and Y

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-32

    Measures of Variation

    Xi

    Y

    X

    Yi

    SST = (Yi - Y)2

    SSE = (Yi - Yi )2

    SSR = (Yi - Y)2

    __

    _Y

    Y

    Y_Y

  • 17

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-33

    Linear Regression ExampleDataSquare Feet

    (X)

    HousePrice

    in$1000s

    (Y)

    Yi

    SSR SSE

    SST

    (Yi - Y)2 (Yi - Y)2

    1400 245

    1600 312

    1700 279

    1875 308

    1100 199

    1550 219

    2350 405

    2450 324

    1425 319

    1700 255

    -

    Class Exercises

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-34

    Calculate the total variation in both of the exercises done above.

  • 18

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-35

    Coefficient of Determination, r2

    The coefficient of determination is the portion ofthe total variation in the dependent variable thatis explained by variation in the independentvariable

    The coefficient of determination is also called r-squared and is denoted as r2

    1r0 2

    squaresofsumtotalsquaresofsumregression

    SSTSSRr2

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-36

    Coefficient of Determination, r2

    r2 = 1

    Y

    X

    Y

    X

    r2 = -1

    r2 = 1

    Perfect linear relationship betweenX and Y:

    100% of the variation in Y isexplained by variation in X

  • 19

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-37

    Coefficient of Determination, r2

    Y

    X

    Y

    X

    0 < r2 < 1

    Weaker linear relationshipsbetween X and Y:

    Some but not all of the variationin Y is explained by variation inX

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-38

    Coefficient of Determination, r2

    r2 = 0

    No linear relationship between Xand Y:

    The value of Y is not related to X.(None of the variation in Y isexplained by variation in X)

    Y

    Xr2 = 0

  • 20

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-39

    Linear Regression ExampleCoefficient of Determination, r2

    Regression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVAdf SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    58.08% of the variation in houseprices is explained by variation in

    square feet

    0.5808232600.5000

    18934.9348

    SST

    SSRr2

    Class Exercises

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-40

    Calculate the coefficient of determination in both of the exercises done above.

  • 21

    Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-41

    Assumptions of RegressionL.I.N.E

    Linearity

    The relationship between X and Y is linear

    Independence of Errors

    Error values are statistically independent

    Normality of Error

    Error values are normally distributed for any givenvalue of X

    Equal Variance (also called homoscedasticity)

    The probability distribution of the errors has constantvariance