Simple Regression Excellent 1883

download Simple Regression Excellent 1883

of 68

Transcript of Simple Regression Excellent 1883

  • 8/7/2019 Simple Regression Excellent 1883

    1/68

    Introduction to Linear Regression

    and Correlation Analysis

  • 8/7/2019 Simple Regression Excellent 1883

    2/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-2

    Scatter Plots and Correlation

    A scatter plot (or scatter diagram) is used to show therelationship between two variables

    Correlation analysis is used to measure strength of the association (linear relationship) between twovariables

    Only concerned with strength of the relationship

    No causal effect is implied

  • 8/7/2019 Simple Regression Excellent 1883

    3/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-3

    Scatter Plot Examples

    y

    x

    y

    x

    y

    y

    x

    x

    Linear relationships Curvilinear relationships

  • 8/7/2019 Simple Regression Excellent 1883

    4/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-4

    Scatter Plot Examples

    y

    x

    y

    x

    y

    y

    x

    x

    Strong relationships Weak relationships

    (continued)

  • 8/7/2019 Simple Regression Excellent 1883

    5/68

  • 8/7/2019 Simple Regression Excellent 1883

    6/68

  • 8/7/2019 Simple Regression Excellent 1883

    7/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-7

    Features of r

    Unit freeRange between -1 and 1The closer to -1, the stronger the negativelinear relationshipThe closer to 1, the stronger the positivelinear relationship

    The closer to 0, the weaker the linear relationship

  • 8/7/2019 Simple Regression Excellent 1883

    8/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-8r = +.3 r = +1

    Examples of Approximater Values

    y

    x

    y

    x

    y

    x

    y

    x

    y

    x

    r = -1 r = -.6 r = 0

  • 8/7/2019 Simple Regression Excellent 1883

    9/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-9

    Calculating theCorrelation Coefficient

    =

    ])yy(][)xx([

    )yy)(xx(r

    22

    where:r = Sample correlation coefficientn = Sample sizex = Value of the independent variable

    y = Value of the dependent variable

    =

    ])y()y(n][)x()x(n[

    yxxynr

    2222

    Sample correlation coefficient:

    or the algebraic equivalent:

  • 8/7/2019 Simple Regression Excellent 1883

    10/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-10

    Calculation ExampleTree

    HeightTrunk

    Diameter

    y x xy y 2 x2

    35 8 280 1225 64

    49 9 441 2401 8127 7 189 729 4933 6 198 1089 3660 13 780 3600 169

    21 7 147 441 4945 11 495 2025 12151 12 612 2601 144

    =321 =73 =3142 =14111 =713

  • 8/7/2019 Simple Regression Excellent 1883

    11/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-11

    0

    10

    20

    30

    40

    50

    60

    70

    0 2 4 6 8 10 12 14

    0.886

    ](321)][8(14111)(73)[8(713)(73)(321)8(3142)

    ]y)()y][n(x)()x[n(

    yxxynr

    22

    2222

    =

    =

    =

    Trunk Diameter, x

    TreeHeight,y

    Calculation Example(continued)

    r = 0.886 relatively strong positivelinear association between x and y

  • 8/7/2019 Simple Regression Excellent 1883

    12/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-12

    Excel Output

    Tree Height Trunk Diameter Tree Height 1Trunk Diameter 0.886231 1

    Excel Correlation Output

    Tools / data analysis / correlation

    Correlation betweenTree Height and Trunk Diameter

  • 8/7/2019 Simple Regression Excellent 1883

    13/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-13

    Significance Test for Correlation

    HypothesesH0: = 0 (no correlation)

    HA: 0 (correlation exists)

    Test statistic

    (with n 2 degrees of freedom)

    2nr 1

    r t 2

    =

    The Greek letter (rho) representsthe population correlation coefficient

  • 8/7/2019 Simple Regression Excellent 1883

    14/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-14

    Example: Produce Stores

    Is there evidence of a linear relationshipbetween tree height and trunk diameter atthe .05 level of significance?

    H 0: = 0 (No correlation)H 1: 0 (correlation exists)

    =.05 , df = 8 - 2 = 6

    4.68

    28.8861

    .886

    2nr 1

    r t22

    =

    =

    =

  • 8/7/2019 Simple Regression Excellent 1883

    15/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-15

    4.68

    28

    .8861

    .886

    2n

    r 1

    r t

    22=

    =

    =

    Example: Test Solution

    Conclusion:There is sufficientevidence of alinear relationship

    at the 5% level of significance

    Decision:Reject H 0

    Reject H 0Reject H 0

    /2=.025

    -t/2Do not reject H 0

    0 t/2

    /2=.025

    -2.4469 2.4469 4.68

    d.f. = 8-2 = 6

  • 8/7/2019 Simple Regression Excellent 1883

    16/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-16

    Introduction toRegression Analysis

    Regression analysis is used to:Predict the value of a dependent variable based onthe value of at least one independent variable

    Explain the impact of changes in an independentvariable on the dependent variable

    Dependent variable: the variable we wish toexplain

    Independent variable: the variable used toexplain the dependent variable

  • 8/7/2019 Simple Regression Excellent 1883

    17/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-17

    Simple Linear Regression Model

    Only one independent variable , x

    Relationship between x and y isdescribed by a linear function

    Changes in y are assumed to becaused by changes in x

  • 8/7/2019 Simple Regression Excellent 1883

    18/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-18

    Types of Regression Models

    Positive Linear Relationship

    Negative Linear Relationship

    Relationship NOT Linear

    No Relationship

  • 8/7/2019 Simple Regression Excellent 1883

    19/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-19

    xy 10 ++=Linear component

    Population Linear Regression

    The population regression model:

    Populationy intercept

    PopulationSlope

    Coefficient

    RandomError

    term, or residualDependentVariable

    IndependentVariable

    Random Error component

  • 8/7/2019 Simple Regression Excellent 1883

    20/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-20

    Linear Regression Assumptions

    Error values () are statistically independentError values are normally distributed for anygiven value of xThe probability distribution of the errors isnormalThe distributions of possible values have

    equal variances for all values of xThe underlying relationship between the xvariable and the y variable is linear

  • 8/7/2019 Simple Regression Excellent 1883

    21/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-21

    Population Linear Regression(continued)

    Random Error for this x value

    y

    x

    Observed Valueof y for x i

    Predicted Valueof y for x i

    xy 10 ++=

    xi

    Slope = 1

    Intercept = 0

    i

  • 8/7/2019 Simple Regression Excellent 1883

    22/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-22

    xbby 10i +=

    The sample regression line provides an estimate of the population regression line

    Estimated Regression Model

    Estimate of the regression

    intercept

    Estimate of theregression slope

    Estimated(or predicted)y value

    Independentvariable

    The individual random error terms e i have a mean of zero

  • 8/7/2019 Simple Regression Excellent 1883

    23/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-23

    Least Squares Criterion

    b0 and b 1 are obtained by finding the valuesof b 0 and b 1 that minimize the sum of thesquared residuals

    210

    22

    x))b(b(y

    )y(ye

    +=

    =

  • 8/7/2019 Simple Regression Excellent 1883

    24/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-24

    The Least Squares Equation

    The formulas for b 1 and b 0 are:

    algebraic equivalent for b 1:

    =

    n

    )x(x

    nyxxy

    b 22

    1

    =

    21 )x(x

    )y)(yx(x

    b

    xbyb 10 =

    and

  • 8/7/2019 Simple Regression Excellent 1883

    25/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-25

    b0 is the estimated average value of ywhen the value of x is zero

    b1 is the estimated change in theaverage value of y as a result of aone-unit change in x

    Interpretation of theSlope and the Intercept

  • 8/7/2019 Simple Regression Excellent 1883

    26/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-26

    Simple Linear RegressionExample

    A real estate agent wishes to examine therelationship between the selling price of a homeand its size (measured in square feet)

    A random sample of 10 houses is selectedDependent variable (y) = house price in $1000sIndependent variable (x) = square feet

  • 8/7/2019 Simple Regression Excellent 1883

    27/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-27

    Sample Data for House Price Model

    House Price in $1000s(y)

    Square Feet(x)

    245 1400312 1600

    279 1700308 1875

    199 1100219 1550405 2350324 2450

    319 1425255 1700

  • 8/7/2019 Simple Regression Excellent 1883

    28/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-28

    Regression Using Excel

    Data / Data Analysis / Regression

  • 8/7/2019 Simple Regression Excellent 1883

    29/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-29

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    The regression equation is:

    feet)(square0.1097798.24833pricehouse +=

  • 8/7/2019 Simple Regression Excellent 1883

    30/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-30

    050

    100

    150200250300350400450

    0 500 1000 1500 2000 2500 3000

    Square Fee

    H o u s e

    P r i c e

    ( $ 1 0 0 0 s )

    Graphical Presentation

    House price model: scatter plot andregression line

    feet)(square0.1097798.24833pricehouse +=

    Slope= 0.10977

    Intercept= 98.248

    I i f h

  • 8/7/2019 Simple Regression Excellent 1883

    31/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-31

    Interpretation of theIntercept, b 0

    b0 is the estimated average value of Y when thevalue of X is zero (if x = 0 is in the range of observed x values)

    Here, no houses had 0 square feet, so b 0 = 98.24833

    just indicates that, for houses within the range of sizesobserved, $98,248.33 is the portion of the house pricenot explained by square feet

    feet)(square0.1097798.24833pricehouse +=

    I i f h

  • 8/7/2019 Simple Regression Excellent 1883

    32/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-32

    Interpretation of theSlope Coefficient, b 1

    b1 measures the estimated change in theaverage value of Y as a result of a one-unitchange in X

    Here, b 1 = .10977 tells us that the average value of ahouse increases by .10977($1000) = $109.77, onaverage, for each additional one square foot of size

    feet)(square0.1097798.24833pricehouse +=

  • 8/7/2019 Simple Regression Excellent 1883

    33/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-33

    Least Squares RegressionProperties

    The sum of the residuals from the least squaresregression line is 0 ( )

    The sum of the squared residuals is a minimum

    (minimized )The simple regression line always passes through themean of the y variable and the mean of the x variable

    The least squares coefficients are unbiased estimatesof 0 and 1

    0)y(y =

    2)y(y

  • 8/7/2019 Simple Regression Excellent 1883

    34/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-34

    Explained and UnexplainedVariation

    Total variation is made up of two parts:

    SSR SSE SST +=Total sum of

    SquaresSum of Squares

    RegressionSum of Squares

    Error

    = 2)yy(SST

    = 2)yy(SSE

    = 2)yy(SSR

    where: = Average value of the dependent variabley = Observed values of the dependent variable

    = Estimated value of y for the given x value y

    y

  • 8/7/2019 Simple Regression Excellent 1883

    35/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-35

    SST = total sum of squares

    Measures the variation of the y i values around their mean y

    SSE = error sum of squares

    Variation attributable to factors other than therelationship between x and y

    SSR = regression sum of squaresExplained variation attributable to the relationshipbetween x and y

    (continued)

    Explained and UnexplainedVariation

    l d d l d

  • 8/7/2019 Simple Regression Excellent 1883

    36/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-36

    (continued)

    Xi

    y

    x

    y i

    SST = (y i - y)2

    SSE = (y i - y i )2

    SSR = (y i - y)2

    _ _

    _ y

    y

    y

    _ y

    Explained and UnexplainedVariation

  • 8/7/2019 Simple Regression Excellent 1883

    37/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-37

    The coefficient of determination is the portionof the total variation in the dependent variablethat is explained by variation in the

    independent variableThe coefficient of determination is also calledR-squared and is denoted as R2

    Coefficient of Determination, R 2

    SSTSSR

    R =2 1R0 2 where

  • 8/7/2019 Simple Regression Excellent 1883

    38/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-38

    Coefficient of determination

    Coefficient of Determination, R 2

    squaresof sumtotalregressionbyexplainedsquaresof sum

    SSTSSR

    R ==2

    (continued)

    Note: In the single independent variable case, the coefficientof determination is

    where:R2 = Coefficient of determinationr = Simple correlation coefficient

    22 r R =

    E l f A i

  • 8/7/2019 Simple Regression Excellent 1883

    39/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-39

    R2 = +1

    Examples of ApproximateR2 Values

    y

    x

    y

    x

    R2 = 1

    R2 = 1

    Perfect linear relationshipbetween x and y:

    100% of the variation in y is

    explained by variation in x

    E l f A i

  • 8/7/2019 Simple Regression Excellent 1883

    40/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-40

    Examples of ApproximateR2 Values

    y

    x

    y

    x

    0 < R 2 < 1

    Weaker linear relationshipbetween x and y:

    Some but not all of the

    variation in y is explainedby variation in x

    (continued)

    E l f A i

  • 8/7/2019 Simple Regression Excellent 1883

    41/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-41

    Examples of ApproximateR2 Values

    R2 = 0

    No linear relationshipbetween x and y:

    The value of Y does not

    depend on x. (None of thevariation in y is explainedby variation in x)

    y

    xR2 = 0

    (continued)

  • 8/7/2019 Simple Regression Excellent 1883

    42/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-42

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    58.08%58.08% of the variation inhouse prices is explained by

    variation in square feet

    0.5808232600.500018934.9348

    SSTSSRR 2 ===

    T f Si ifi f

  • 8/7/2019 Simple Regression Excellent 1883

    43/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-43

    Test for Significance of Coefficient of Determination

    Hypotheses

    H0: 2 = 0

    HA: 2 0

    Test statistic

    (with D1 = 1 and D 2 = n - 2degrees of freedom)

    2)SSE/(nSSR/1F

    =

    H0: The independent variable does not explain a significantportion of the variation in the dependent variable

    HA: The independent variable does explain a significantportion of the variation in the dependent variable

  • 8/7/2019 Simple Regression Excellent 1883

    44/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-44

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    The critical F value from Appendix H for The critical F value from Appendix H for = .05 and D= .05 and D 11 = 1 and D= 1 and D 22 = 8 d.f. is 5.318.= 8 d.f. is 5.318.

    Since 11.085 > 5.318 we reject HSince 11.085 > 5.318 we reject H 00:: 2

    = 0

    11.0852)-1013665.57/(

    18934.93/12)-SSE/(n

    SSR/1F ===

  • 8/7/2019 Simple Regression Excellent 1883

    45/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-45

    Standard Error of Estimate

    The standard deviation of the variation of observations around the simple regression lineis estimated by

    2nSSE

    s =

    WhereSSE = Sum of squares error

    n = Sample size

    Th St d d D i ti f th

  • 8/7/2019 Simple Regression Excellent 1883

    46/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-46

    The Standard Deviation of theRegression Slope

    The standard error of the regression slopecoefficient (b 1) is estimated by

    ==

    nx)(

    x

    s

    )x(x

    ss 2

    2

    2

    b1

    where:= Estimate of the standard error of the least squares slope

    = Sample standard error of the estimate

    1bs

    2nSSE

    s =

  • 8/7/2019 Simple Regression Excellent 1883

    47/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-47

    Excel OutputRegression Statistics

    Multiple R 0.76211

    R Square 0.58082

    Adjusted R Square 0.52842

    Standard Error 41.33032

    Observations 10

    ANOVA

    df SS MS F Significance F

    Regression 1 18934.9348 18934.9348 11.0848 0.01039

    Residual 8 13665.5652 1708.1957

    Total 9 32600.5000

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    41.33032s =

    0.03297s1b

    =

  • 8/7/2019 Simple Regression Excellent 1883

    48/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-48

    Comparing Standard Errors

    y

    y y

    x

    x

    x

    y

    x

    1bssmall

    1bslarge

    ssmall

    slarge

    Variation of observed y valuesfrom the regression line Variation in the slope of regressionlines from different possible samples

  • 8/7/2019 Simple Regression Excellent 1883

    49/68

    I f b t th Sl

  • 8/7/2019 Simple Regression Excellent 1883

    50/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-50

    House Price in$1000s

    (y)

    Square Feet(x)

    245 1400

    312 1600

    279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

    (sq.ft.)0.109898.25pricehouse +=

    Estimated Regression Equation:

    The slope of this model is 0.1098

    Does square footage of the houseaffect its sales price?

    Inference about the Slope:t Test

    (continued)

  • 8/7/2019 Simple Regression Excellent 1883

    51/68

    R g i A l i f

  • 8/7/2019 Simple Regression Excellent 1883

    52/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-52

    Regression Analysis for Description

    Confidence Interval Estimate of the Slope:

    Excel Printout for House Prices:

    At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)

    1b/21stb

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    d.f. = n - 2

    Regression Anal sis for

  • 8/7/2019 Simple Regression Excellent 1883

    53/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-53

    Regression Analysis for Description

    Since the units of the house price variable is$1000s, we are 95% confident that the averageimpact on sales price is between $33.70 and$185.80 per square foot of house size

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

    Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

    This 95% confidence interval does not include 0 .

    Conclusion: There is a significant relationship betweenhouse price and square feet at the .05 level of significance

    Confidence Interval for

  • 8/7/2019 Simple Regression Excellent 1883

    54/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-54

    Confidence Interval for the Average y, Given x

    Confidence interval estimate for themean of y given a particular x p

    Size of interval varies accordingto distance away from mean, x

    + 2

    2p

    /2

    )x(x

    )x(x

    n

    1sty

    Confidence Interval for

  • 8/7/2019 Simple Regression Excellent 1883

    55/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-55

    Confidence Interval for an Individual y, Given x

    Confidence interval estimate for anIndividual value of y given a particular x p

    ++ 2

    2p

    /2 )x(x

    )x(x

    n1

    1sty

    This extra term adds to the interval width to reflectthe added uncertainty for an individual case

    Interval Estimates

  • 8/7/2019 Simple Regression Excellent 1883

    56/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-56

    Interval Estimatesfor Different Values of x

    y

    x

    Prediction Intervalfor an individual y,given x p

    x p

    y = b 0 + b 1 x

    x

    ConfidenceInterval for the mean of y, given x p

  • 8/7/2019 Simple Regression Excellent 1883

    57/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-57

    House Price in$1000s

    (y)

    Square Feet(x)

    245 1400

    312 1600

    279 1700

    308 1875

    199 1100

    219 1550

    405 2350

    324 2450

    319 1425

    255 1700

    (sq.ft.)0.109898.25pricehouse +=

    Estimated Regression Equation:

    Example: House Prices

    Predict the price for a housewith 2000 square feet

  • 8/7/2019 Simple Regression Excellent 1883

    58/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-58

    317.85

    0)0.1098(20098.25(sq.ft.)0.109898.25pricehouse

    =

    +=+=

    Example: House Prices

    Predict the price for a housewith 2000 square feet:

    The predicted price for a house with 2000square feet is 317.85($1,000s) = $317,850

    (continued)

    Estimation of Mean Values:

  • 8/7/2019 Simple Regression Excellent 1883

    59/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-59

    Estimation of Mean Values:Example

    Find the 95% confidence interval for the averageprice of 2,000 square-foot houses

    Predicted Price Y i = 317.85 ($1,000s)

    Confidence Interval Estimate for E(y)|x p

    37.12317.85)x(x

    )x(x

    n

    1sty

    2

    2p

    /2 =

    +

    The confidence interval endpoints are 280.66 -- 354.90,or from $280,660 -- $354,900

    Estimation of Individual Values:

  • 8/7/2019 Simple Regression Excellent 1883

    60/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-60

    Estimation of Individual Values:Example

    Find the 95% confidence interval for an individualhouse with 2,000 square feet

    Predicted Price Y i = 317.85 ($1,000s)

    Prediction Interval Estimate for y|x p

    102.28317.85)x(x

    )x(x

    n

    11sty

    2

    2p

    /2 =

    ++

    The prediction interval endpoints are 215.50 -- 420.07,or from $215,500 -- $420,070

    Finding Confidence and Prediction

  • 8/7/2019 Simple Regression Excellent 1883

    61/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-61

    Finding Confidence and PredictionIntervals PHStat

    In Excel, use

    PHStat | regression | simple linear regression

    Check theconfidence and prediction interval for X= box and enter the x-value and confidence level desired

    Finding Confidence and Prediction

  • 8/7/2019 Simple Regression Excellent 1883

    62/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-62

    Input values

    Finding Confidence and PredictionIntervals PHStat

    (continued)

    Confidence Interval Estimate for E(y)|x p

    Prediction Interval Estimate for y|x p

  • 8/7/2019 Simple Regression Excellent 1883

    63/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-63

    Residual Analysis

    PurposesExamine for linearity assumptionExamine for constant variance for all levels

    of xEvaluate normal distribution assumption

    Graphical Analysis of Residuals

    Can plot residuals vs. xCan create histogram of residuals to checkfor normality

  • 8/7/2019 Simple Regression Excellent 1883

    64/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-64

    Residual Analysis for Linearity

    Not Linear Linear

    x r e s

    i d u a

    l s

    x

    y

    x

    y

    x

    r e s

    i d u a

    l s

    Residual Analysis for

  • 8/7/2019 Simple Regression Excellent 1883

    65/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-65

    Residual Analysis for Constant Variance

    Non-constant variance Constant variance

    x x

    y

    x x

    y

    r e s

    i d u a

    l s

    r e s

    i d u a

    l s

  • 8/7/2019 Simple Regression Excellent 1883

    66/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-66

    House Price Model Residual Pl

    -60

    -40

    -20

    0

    20

    40

    60

    80

    0 1000 2000 300

    Square Fee

    R e s

    i d u a

    l s

    Excel Output

    RESIDUAL OUTPUT

    Predicted House Price

    Residuals

    1 251.92316 -6.923162

    2 273.87671 38.123293 284.85348 -5.853484

    4 304.06284 3.937162

    5 218.99284 -19.99284

    6 268.38832 -49.38832

    7 356.20251 48.797498 367.17929 -43.17929

    9 254.6674 64.33264

    10 284.85348 -29.85348

  • 8/7/2019 Simple Regression Excellent 1883

    67/68

    Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 14-67

    Chapter Summary

    Introduced correlation analysisDiscussed correlation to measure the strengthof a linear association

    Introduced simple linear regression analysisCalculated the coefficients for the simple linear regression equationDescribed measures of variation (R 2 and s )Addressed assumptions of regression andcorrelation

  • 8/7/2019 Simple Regression Excellent 1883

    68/68

    Chapter Summary

    Described inference about the slopeAddressed estimation of mean values andprediction of individual valuesDiscussed residual analysis

    (continued)