Econometrics Chap 2

download Econometrics Chap 2

of 32

Transcript of Econometrics Chap 2

  • 8/12/2019 Econometrics Chap 2

    1/32

  • 8/12/2019 Econometrics Chap 2

    2/32

    Chapter 2:

    TWO-VARIABLE REGRESSION

    ANALYSIS:

    SOME BASIC IDEAS

    By

    Domodar N. Gujarati

  • 8/12/2019 Econometrics Chap 2

    3/32

    Regression analysis is concerned with the study of thedependence of one variable, the dependent variable, on one or

    more other variables, the explanatory variables.

    Regression analysis, as known as the bivariate, or twovariable.

    Dependent variable (the regressand)

    Explanatory variable (the regressor)

    3

  • 8/12/2019 Econometrics Chap 2

    4/32

    Estimating and/or predicting the (population) mean

    value of the dependent variable on the basis of theknown or fixed values of the explanatory variable(s).

  • 8/12/2019 Econometrics Chap 2

    5/32

    Total population of 60 families and their weeklyincome (X) and weekly consumptionexpenditure (Y). The 60 families are divided into10 income groups.

    Consumption Expenditure is increasing asincome increases, but not as much as theincrease in income.

    The marginal propensity to consume (MPC) for a

    unit change in income is greater than zero but lessthan unitY = 1+ 2X

    0 < 2< 1

    1 and2 are parameters whereas 1is intercept and

    2

    is slope

  • 8/12/2019 Econometrics Chap 2

    6/32

  • 8/12/2019 Econometrics Chap 2

    7/32

    The dark circled points show theconditional mean values of Y againstthe various X values.

    By joining them we obtain what isknown as the population regression line(PRL), or more generally, the

    population regression curve(regression of Y on X).

  • 8/12/2019 Econometrics Chap 2

    8/32

    As weekly income level of $80, the meanconsumption expenditure is $65,wheretotal is 325 while corresponding to theincome level of $200, it is expenditure is

    $137 where total is 685. Total 10 mean values for the 10

    subpopulations of Ywhich is conditionalexpected values, as they depend on thevalues of the (conditioning) variableX.

    Symbolically

    E(Y |X)

    The expected value of Y given the value ofX.

  • 8/12/2019 Econometrics Chap 2

    9/32

  • 8/12/2019 Econometrics Chap 2

    10/32

    This figure shows that for eachX (i.e., income

    level) there is a population of Y values (weeklyconsumption expenditures) that are spreadaround the (conditional) mean of those Y values.

    Y values are distributed symmetrically aroundtheir respective (conditional) mean values. Andthe regression line (or curve) passes throughthese (conditional) mean values

    10

  • 8/12/2019 Econometrics Chap 2

    11/32

    Each conditional mean E(Y | Xi) is a function ofXi.Symbolically

    E(Y | Xi) = f (Xi)

    which is known as the conditional expectationfunction (CEF) or population regression function(PRF) or population regression (PR) for short.

    PRF is an empirical question, we may assume that

    the PRF E(Y | Xi) is a linear function of Xi

    E(Y | Xi) = 1+ 2Xi

  • 8/12/2019 Econometrics Chap 2

    12/32

    Linearity in the Variables

    Linearity is that the conditional expectation of Y is alinear function of Xi, the regression curve in this case is astraight line. But it is not a linear function

    E(Y | Xi) = 1+ 2X2

    i

    Linearity in the Parameters

    Linearity is that the conditional expectation of Y, E(Y |Xi), is a linear function of the parameters, the s; it mayor may not be linear in the variable X.

    E(Y | Xi) = 1+ 2X2iIt is a linear (in the parameter) regression model.

  • 8/12/2019 Econometrics Chap 2

    13/32

  • 8/12/2019 Econometrics Chap 2

    14/32

    Now consider the model:

    E(Y | Xi) = 1+ 22Xi.The preceding model is an example of a nonlinear (inthe parameter) regression model.

    Linear Regression will always mean a regression

    that is linear in the parameters; the s (that is, theparameters are raised to the first power only).

  • 8/12/2019 Econometrics Chap 2

    15/32

    The deviation of an individual Yiaround its expectedvalue as follows:

    ui= Yi E(Y | Xi)

    or

    Yi= E(Y | Xi) + ui

    Technically, uiis known as the stochastic disturbanceor stochastic error term.

  • 8/12/2019 Econometrics Chap 2

    16/32

    How do we interpret ?

    The expenditure of an individual family, given itsincome level, can be expressed as the sum of twocomponents:

    E(Y | Xi), the mean consumption of all familieswith the same level of income. This component isknown as the systematic, or deterministic,component,

    ui, which is the random, or nonsystematic,component.

  • 8/12/2019 Econometrics Chap 2

    17/32

    The Stochastic disturbance term is a proxy for all the

    omitted or neglected variables that may affect Y but arenot included in the regression model.

    If E(Y | Xi) is assumed to be linear in Xi, may be written as:

    Yi= E(Y | Xi) + ui

    = 1+ 2Xi+ ui

    This Equation posits that the consumption expenditure ofa family is linearly related to its income plus thedisturbance term. Thus, the individual consumptionexpenditures, givenX = $80 can be expressed as:

    Y1 = 55 = 1+ 2(80) + u1

    Y2 = 60 = 1+ 2(80) + u2

    Y3 = 65 = 1+ 2(80) + u3

    Y4 = 70 = 1+ 2(80) + u4

    Y5 = 75 = 1+ 2(80) + u5

  • 8/12/2019 Econometrics Chap 2

    18/32

    Now if we take the expected value of (2.4.1) on both sides,we obtain

    E(Yi| Xi) = E[E(Y | Xi)] + E(ui| Xi)

    = E(Y | Xi) + E(ui| Xi)

    Where expected value of a constant is that constant itself.

    Since E(Yi| Xi) is the same thing as E(Y | Xi), implies thatE(ui| Xi) = 0

    Thus, the assumption that the regression line passesthrough the conditional means of Y implies that theconditional mean values of ui(conditional upon the given

    Xs) are zero.It is clear that

    E(Y | Xi) = 1+ 2Xi

    and

    Yi= 1+ 2Xi+ ui

    Better are equivalent forms if E(ui| Xi) = 0.

  • 8/12/2019 Econometrics Chap 2

    19/32

    But the stochastic specification (2.4.2)has the advantage that it clearly showsthat there are other variables besidesincome that affect consumptionexpenditure and that an individualfamilys consumption expenditurecannot be fully explained only by the

    variable(s) included in the regressionmodel.

  • 8/12/2019 Econometrics Chap 2

    20/32

    Why dont we introduce theminto the model explicitly? Thereasons are many:

    1. Vagueness of theory: The theory, if any, determining thebehavior of Y may be, and often is, incomplete. We might beignorant or unsure about the other variables affecting Y.

    2. Unavailability of data: Lack of quantitative informationabout these variables, e.g., information on family wealthgenerally is not available.

    3. Core variables versus peripheral variables: Assume thatbesides incomeX1, the number of children per family X2, sex X3,religion X4, education X5, and geographical region X6also affectconsumption expenditure. But the joint influence of all orsome of these variables may be so small and it does not payto introduce them into the model explicitly. One hopes thattheir combined effect can be treated as a random variable ui.

  • 8/12/2019 Econometrics Chap 2

    21/32

    4. Intrinsic randomness in human behavior: Even if we

    succeed in introducing all the relevant variables into themodel, there is bound to be some intrinsic randomness inindividual Ys that cannot be explained no matter how hardwe try. The disturbances, the us, may very well reflect thisintrinsic randomness.

    5. Poor proxy variables: For example, Friedman regardspermanent consumption (Yp) as a function of permanentincome (Xp). But since data on these variables are not directlyobservable, in practice we use proxy variables, such ascurrent consumption (Y) and current income (X), there is theproblem of errors of measurement, u may in this case then

    also represent the errors of measurement.6. Principle of parsimony: we would like to keep ourregression model as simple as possible. If we can explain thebehavior of Y substantially with two or three explanatoryvariables and if our theory is not strong enough to suggestwhat other variables might be included, why introduce more

    variables? Let uirepresent all other variables.

  • 8/12/2019 Econometrics Chap 2

    22/32

    7. Wrong functional form: Often we do not know the formof the functional relationship between the regressand(dependent) and the regressors. Is consumptionexpenditure a linear (in variable) function of income or anonlinear (invariable) function? If it is the former,

    Yi= 1+ B2Xi+ uiis the proper functional relationshipbetween Y and X, but if it is the latter,

    Yi= 1+ 2Xi+ 3X2

    i+ uimay be the correct functional form.

    In two-variable models the functional form of therelationship can often be judged from the scattergram.But in a multiple regression model, it is not easy todetermine the appropriate functional form, for graphicallywe cannot visualize scattergrams in multipledimensions.

  • 8/12/2019 Econometrics Chap 2

    23/32

    The data of Table 2.1 represent the population, not a sample. In

    most practical situations what we have is a sample of Y valuescorresponding to somefixedXs.

    Pretend that the population of Table 2.1 wasnot known to us andthe only information we had was a randomly selected sample of Yvalues for the fixedXs as given in Table 2.4. each Y (given Xi) in

    Table 2.4 is chosen randomly from similar Ys corresponding to thesame Xifrom the population of Table 2.1.

    Can we estimate the PRF from the sample data? We may not beable to estimate the PRF accurately because of samplingfluctuations. To see this, suppose we draw another random

    sample from the population of Table 2.1, as presented in Table 2.5.Plotting the data of Tables 2.4 and 2.5, we obtain the scatter gramgiven in Figure 2.4. In the scatter gram two sample regressionlines are drawn so as

  • 8/12/2019 Econometrics Chap 2

    24/32

  • 8/12/2019 Econometrics Chap 2

    25/32

  • 8/12/2019 Econometrics Chap 2

    26/32

    Which of the two regression lines represents the truepopulation regression line?There is no way we can be

    absolutely sure that either of the regression linesshown in Figure 2.4 represents the true population

    regression line (or curve). Supposedly they representthe population regression line, but because of

    sampling fluctuations they are at best anapproximation of the true PR. In general, we would

    get N different SRFs for N different samples, and these

    SRFs are not likely to be the same.

  • 8/12/2019 Econometrics Chap 2

    27/32

    We can develop the concept of the sampleregression function (SRF) to represent the sampleregression line. The sample counterpart may bewritten as

    Yi = 1+ 2Xi

    where Y is read as Y-hat or Y-cap

    Yi= estimator of E(Y | Xi)

    1= estimator of 1

    2= estimator of 2

    An estimator, also known as a (sample) statistic, issimply a rule or formula or method that tells how toestimate the population parameter from theinformation provided by the sample at hand.

  • 8/12/2019 Econometrics Chap 2

    28/32

    Now just as we expressed the PRF in two equivalentforms, (2.2.2) and (2.4.2), we can express the SRF (2.6.1) inits stochastic form as follows:

    Yi= 1+ 2Xi+ui

    uidenotes the (sample) residual term. Conceptually uiisanalogous to uiand can be regarded as an estimate of ui. It isintroduced in the SRF for the same reasons as u

    i

    wasintroduced in the PRF.

    To sum up, then, we find our primary objective inregression analysis is to estimate the PRF

    Yi= 1+ 2Xi+ ui

    on the basis of the SRFYi= 1+ 2Xi+ui

    Because more often than not our analysis is based upon asingle sample from some population. But because ofsampling fluctuations our estimate of

  • 8/12/2019 Econometrics Chap 2

    29/32

  • 8/12/2019 Econometrics Chap 2

    30/32

    the PRF based on the SRF is at best an approximateone. This approximation is shown diagrammaticallyin Figure 2.5. ForX = Xi, we have one (sample)observation Y = Yi. In terms of the SRF, the observedYican be expressed as:

    Yi= Yi+ui(2.6.3)

    and in terms of the PRF, it can be expressed as

    Yi= E(Y | Xi) + ui(2.6.4)

    Now obviously in Figure 2.5 Yioverestimates thetrue E(Y | Xi) for the Xishown therein. By the sametoken, for anyXi to the left of the point A, the SRFwill underestimate the true PRF.

  • 8/12/2019 Econometrics Chap 2

    31/32

    The Critical Question

    Granted that the SRF is but an approximationof the PRF, can we devise a rule or a methodthat will make this approximation as closeas possible? In other words, how should theSRF be constructed so that1is as close aspossible to the true 1and 2is as close aspossible to the true 2even though we will

    never know the true 1and 2?

  • 8/12/2019 Econometrics Chap 2

    32/32

    32