Multiple linear regression model

download Multiple linear regression model

of 5

Transcript of Multiple linear regression model

  • 8/12/2019 Multiple linear regression model

    1/5

    ETC1000/ETX9000 Business and Economic Statistics

    Demonstration Lecture Week 5: The Multiple Linear Regression Model

    This lecture provides examples of the material taught in this weeks lectures, to help

    you see its potential for real world application, and to reinforce the ideas beingcommunicated.

    Case Study: The Effect of TV on Children

    Background

    Children in high income countries like Australia are said to spend the second largest

    chunk of their waking time watching television. That is, the most time-consuming

    activity after attending school is watching TV. Even despite the widespread use ofcomputers and the Internet, TV remains the dominant form of media in childrens

    lives.

    Not surprisingly, the public and parents are concerned about potentially detrimental

    effects of TV on child cognitive development. Some argue, however, that TV can be

    beneficial to children, in that it provides exposure to language.

    Whichever way the direction runs, the effect of TV watching on cognitive

    development in early childhood is likely to have long-term lingering effects, which

    may be crucial to human capital formation and inevitably labour market outcomes

    later in life.

    So just how detrimental is television to a young childs cognitive development? We

    can answer this question using the linear regression model if we have suitable data.

    Data

    The National Longitudinal Survey of Youth offers rich information about the

    demographic, cognitive, socio-emotional and physiological characteristics of children

    and their parents. More specifically, mothers of school-aged children wereinterviewed about how many hours of television their child watched in a typical week.

    Both mother and child were also tested on their reading ability.

    We will use this data for 8-year-old children.

  • 8/12/2019 Multiple linear regression model

    2/5

    A Simple Linear Regression Model for Child Reading Score

    We can use this data to estimate a simple linear regression model for Childs Reading

    Score as a function of hours of TV watched:

    Y = childs reading score (for age) out of 100X = hours of TV watched in an average week

    Heres the Excel output we obtain:

    What does this output tell us?

    First, note:

    R2is very poor A standard error of 22.9 points out of 100 is quite high

    Intercept: b0= 39.963

    The average reading score amongst children who watch no TV during a typical week

    is estimated to be 39.963 out of 100.

    Hours TV: b1= 0.260b1tells us the estimated effect on ywhenxis one unit higher. That is, take 2 children,

    the first of whom watches 1 more hour of TV per week than the second. The model

    predicts the first child to have a reading score 0.26 (out of 100) higher than the second

    child who watches less TV. This is an interesting result, as it suggests that watching

    TV may help children in their reading. Note, however, that practically speaking this

    seems quite small.

    But is the effect statistically small? Lets perform a hypothesis test.

  • 8/12/2019 Multiple linear regression model

    3/5

    Hypothesis test

    1. Formulate Null and Alternative Hypotheses

    0:10

    H Watching TV has no impact on Childs Reading Score

    0:11

    H Watching TV does have an impact on Childs Reading Score

    2. Decide a Significance Level

    Test at 5% level of significance, i.e. = 0.05

    3. Calculate thep-value

    p-value = 2.32 x 10-9

    4. Make a Decision

    The decision rule is to rejectH0: 1= 0 if the p-value < .

    Since 2.32 x 10-9 < 0.05, we rejectH0 and conclude that watching TV has no

    impact on Childs Reading Score.

    So, even though the coefficient on Hours TV is small, it is still statistically significant.

    How can it be that the effect of TV can be practically not important, but statistically

    very important?

  • 8/12/2019 Multiple linear regression model

    4/5

    A Multiple Linear Regression Model for Child Reading Score

    Now suppose we add another explanatory variable into the model:

    Y = childs reading score (for age) out of 100

    X1= hours of TV watched in an average weekX2= mothers reading score out of 100

    And we want to estimate the following model:

    0 1 1 2 2i i i iY X X e

    Heres the output:

    Some things to note from this output:

    Both explanatory variables are highly significant (p-values very small),meaning they each help to explain childs reading score

    R2is better A standard error of 21.4 points out of 100 is still quite high

  • 8/12/2019 Multiple linear regression model

    5/5

    How about our interpretations?

    Intercept: b0= 17.932

    The average reading score amongst children who watch no TV during a typical week,

    and whose mother scored 0 in the reading test is estimated to be 17.932 out of 100.

    Hours TV: b1= -0.364

    b1 tells us the estimated effect on y when x is one unit higher, holding all other x

    variables constant. That is, take 2 children whose mothers have the same reading

    score, the first of which watches 1 more hour of TV per week than the second. The

    model predicts the first child to have a reading score 0.364 less than the second child.

    Practically speaking this is still quite smallbut of the opposite sign!

    Mothers Reading Score: b2= 0.795b2 tells us the estimated effect on y when x is one unit higher, holding all other x

    variables constant. That is, take 2 children who watch the same amount of TV per

    week, but the mother of the first scored 1 point higher on the reading test than the

    mother of the second. The model predicts the first child to have a reading score 0.87

    higher than the second child, on average. The effect in this case is quite large for

    every extra point the mother scores on the reading test, the child scores almost as

    muchgenetics appears to be a highly influential factor!

    Note that the simple regression results suggested a positive effect of TV on reading

    scores, but the multiple regression suggests a negative result. Why does the data tell a

    different story when we include an additional explanatory variable?

    Multiple regression allows us to look at the effect of TV on reading scores, once

    genetics are taken into account. The coefficient on Hours TV changes direction if

    some of what Hours TV is capturing in the simple regression case is really due to

    mothers reading score.