Post on 02-Jun-2018
8/11/2019 Further Inference Topics
1/31
1/31
EC114 Introduction to Quantitative Economics10. Further Inference Topics
Department of EconomicsUniversity of Essex
13/15 December 2011
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
8/11/2019 Further Inference Topics
2/31
2/31
Outline
1 Correlations and Independence
2 Tests About Two Populations
Reference: R. L. Thomas,Using Statistics in Economics,McGraw-Hill, 2005, sections 6.36.4.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
8/11/2019 Further Inference Topics
3/31
Correlations and Independence 3/31
Although the covariance can tell us whether there is a
positive or a negative linear association between XandY,
it tells us nothing about the strength of this association.
For example, what constitutes a large linear association,
whether positive or negative?
Thecorrelation coefficient,, provides such information,
and is defined as
= Cov(X,Y)V(X)
V(Y)
.
The usefulness of the correlation coefficient lies in the factthat, unlike the covariance, it can only take values within a
definite finite range.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
8/11/2019 Further Inference Topics
4/31
Correlations and Independence 4/31
While a covariance can take any value between and+, the correlation is restricted to values within the range
1to+1.
When there is an exact (perfect) positive linear association
between XandY, the correlation takes the value = +1.
Similarly, when there is an exact (perfect) negative linear
association betweenXandY, the correlation is = 1.Furthermore, when there is no linear association betweenXandYat all, then = 0.
The correlation coefficient gives us a standard by which we
can judge the strength of any linear association between
two variables.Clearly, if were to take a value close to zero, we wouldjudge the association to be a very weak one.
However, values close to +1 or 1would imply strongpositive and negative linear associations, respectively.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
8/11/2019 Further Inference Topics
5/31
Correlations and Independence 5/31
For the die-rolling example, we have already computed the
covariance as
1.0625, whereas we found V(X) = 17.19
andV(Y) = 0.94.Hence, we obtain the correlation as:
= 1.062517.19
0.94
= 0.26.
The correlation is negative, as expected, but the value of is rather closer to 0 than to 1.So we can say that there is a fairly weak negative linear
association betweenXandYin this case.
This is not unexpected because, intuitively, we would not
expect a close relationship between X, the product of the
two numbers on the two dice, and Y, their difference.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
C l i d I d d /
8/11/2019 Further Inference Topics
6/31
Correlations and Independence 6/31
Recalling thatCov(X,Y) = E(XY)E(X)E(Y)we can writeas=
E(XY)E(X)E(Y)V(X)
V(Y)
.
It follows that, ifE(XY) =E(X)E(Y), then= 0i.e. thecorrelation betweenXandYwill be zero.
But this refers tolinearrelationships, and in Economics,
relationships are not always linear.
Hence we often need to discover whether any non-linear
relationships between variables are present.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
C l ti d I d d 7/31
8/11/2019 Further Inference Topics
7/31
Correlations and Independence 7/31
In the lecture on probability (Lecture 2), we saw that
independence between two eventsAandBimplied thatPr(AandB) = Pr(A) Pr(B).That is, the joint probability of two independent events
occurring is given by the product of the marginal
probabilities of the individual events.
Two random variables,XandY, are said to be independent
if the joint probabilities are the product of the relevant
marginal probabilities for all possible combinations of X
andY.
That is,XandYareindependentif and only ifp(X, Y) =f(X) g(Y) for allXandY.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 8/31
8/11/2019 Further Inference Topics
8/31
Correlations and Independence 8/31
Consider the following two independent random variables,
XandY, whose joint and marginal distributions are:
Y\X 1 2 3 4 5 g(Y)5 0.06 0.04 0.04 0.04 0.02 0.20
10 0.09 0.06 0.06 0.06 0.03 0.30
15 0.15 0.10 0.10 0.10 0.05 0.50
f(X) 0.30 0.20 0.20 0.20 0.10
Notice that the relationshipp(X, Y) =f(X)g(Y)holds forallcombinations ofXandY.
For example,p(4, 5) =f(4)g(5)andp(1, 10) =f(1)g(10).
If just one combination of XandYwere to fail to obey theconditionp(X,Y) = f(X)g(Y), then the variables could nolonger be called independent.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 9/31
8/11/2019 Further Inference Topics
9/31
Correlations and Independence 9/31
While zero correlation(= 0)implies the absence of anylinear association betweenXandY, independence is a
stronger condition.Independence implies the absence ofanyassociation
between XandY, linear or nonlinear.
Hence independence implies zero correlation, but zero
correlation does not necessarily imply independence.
Thus the conditionE(XY) =E(X)E(Y)impliesCov(X,Y) = 0, and hence zero correlation, but does notnecessarily imply independence.
For independence we also require p(X,Y) =f(X)g(Y).
Thus, if two variables are uncorrelated, they are notlinearly associated, but they could still be not independent
(i.e. dependent) if there were some nonlinear (possibly
weak) association present.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 10/31
8/11/2019 Further Inference Topics
10/31
Correlations and Independence 10/31
Although the discussion here has been restricted to
discrete variables, the concepts of independence and
correlation apply equally well to continuous variables.
However, it can be shown that the distinction between thetwo concepts disappears when continuous variables are
normally distributed.
If two normally distributed variables,XandY, are
uncorrelated, then they must be independent.
Another useful property of normally distributed variables is
given by the following theorem.
Theorem
Any linear function of a series of independently and normally
distributed variables is itself normally distributed.
Example: ifX,YandZare all independent normally
distributed random variables, then it follows that
W= 2X+ 4Y
3Zwill also be normally distributed.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 11/31
8/11/2019 Further Inference Topics
11/31
Correlations and Independence 11/31
Correlations can also be computed for samples.
Thesample correlation coefficient,R, is defined as
R=
(X X)(Y Y)
(X X)2
(Y Y)2 .
As with the population correlation we find that
1 R 1
with the same interpretation of values e.g. R= 1implies
a perfect negative correlation between XandYetc.To compute Rwe dont need to worry about normalising
anything bynorn 1.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 12/31
8/11/2019 Further Inference Topics
12/31
Correlations and Independence 12/31
To see this note that
(XX)(Y Y) = XY n
XY,(X X)2 = X2 nX2,(Y Y)2 =
Y2 nY2.
As an example, a sample of 10 trials of the two-dice
experiment yields the following values for XandY:
X 3 2 12 3 4 4 6 12 4 8
Y 2 1 1 2 0 3 1 1 0 2
From these values we obtain:
X= 58,
X2 = 458,
XY= 72,
Y= 13, Y2 = 25.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 13/31
8/11/2019 Further Inference Topics
13/31
p
Hence X= 58/10= 5.8, Y= 13/10= 1.3and so
(X X)(Y Y) = 72 10(5.8)(1.3) = 3.40,
(X X)2 = 458 10(5.8)2 = 121.60,
(Y Y)2 = 25 10(1.3)2 = 8.10.
It follows that
R= 3.4121.6
8.1
= 0.108,
suggesting that there is a weak negative linear relationshipbetween XandY.
(Note that the population correlation = 0.260.)
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 14/31
8/11/2019 Further Inference Topics
14/31
p
Often in statistics we need to compute parameter values
relating to two or more different populations.
Consider two cities, AandB.Suppose that a researcher suspects that mean annual
income in cityBis greater than in city A, and wishes to test
whether this is actually the case.
Let1 and2 denote the population mean incomes incitiesAandBrespectively.
As always, we formulate a null hypothesis:
H0:1=2 (no difference between mean incomes)
and an alternative hypothesis:
HA:1< 2 (mean income is greater in cityB).
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 15/31
8/11/2019 Further Inference Topics
15/31
How can we derive a suitable test statistic?
LetX1 be the annual income of a resident from city A, and
X2 the annual income of a resident in city B.
We therefore have a population of very many values forX1from cityA, with a mean1 and a variance
2
1.
Similarly, we have a population of very many values forX2from cityB, with mean2 and variance
2
2.
Notice that the absolute sizes of the populations in the two
cities are unimportant, provided both cities are large.
We can now apply the Central Limit Theorem (CLT) to bothpopulations in turn.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 16/31
8/11/2019 Further Inference Topics
16/31
Suppose we take a sufficiently large sample of sizen1 from
cityAand compute the sample mean income X1.
Then the CLT implies that
X1 N
1,2
1
n1
.
Similarly, taking a sufficiently large sample of size n2 from
cityByields the following result for the sample mean X2:
X2 N
2,2
2
n2
.
As we are interested in the difference between the two
unknown population means,1 2, it makes sense tobase any test statistic on the quantity X1 X2, thedifference between the two sample means.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 17/31
8/11/2019 Further Inference Topics
17/31
If it were possible to take very many samples, each
yielding a value for X1 X2, we would obtain a samplingdistribution for X1 X2, from which we could derive asuitable test statistic.
We therefore need to find the sampling distribution for
X1 X2.As both X1 and X2 are normally distributed it follows (fromthe Theorem on slide 10) that X1 X2 is also normallydistributed.
We therefore need to find the mean and variance ofX1 X2.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 18/31
8/11/2019 Further Inference Topics
18/31
It is straightforward to show that
E(X1 X2) =E(X1)E(X2) =1 2.
If we assume that the two samples are independent (not
unreasonable) thenCov(X1,X2) = 0and so
V(X1 X2) =V(X1) + V(X2) = 21
n1+
22n2
.
Hence
X
1 X
2 N1 2,2
1
n1 +
22
n2
.
This is summarised in the diagram on the next slide.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 19/31
8/11/2019 Further Inference Topics
19/31
We can use this normal sampling distribution to derive an
appropriate test statistic.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 20/31
8/11/2019 Further Inference Topics
20/31
We can now standardise X1 X2 in the usual way bysubtracting the mean and dividing by the standard
deviation to obtain aN(0, 1)distribution:
X1 X2 (1 2)2
1
n1+
2
2
n2
N(0, 1).
However, we require this distribution under H0:1=2,resulting in the test statistic
TS=X1 X2
2
1
n1 +
2
2
n2 N(0, 1)
underH0.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 21/31
8/11/2019 Further Inference Topics
21/31
Recall thatHA:1< 2 implying that1 2< 0.We therefore have a lower-sided one-tail test.
Adopting a 5% level of significance the critical value from
theN(0, 1)distribution is
1.64and our test criterion
becomes:
rejectH0 ifTS< 1.64but reserve judgement if TS> 1.64.
The test criterion is illustrated on the next slide.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 22/31
8/11/2019 Further Inference Topics
22/31
Suppose we take samples of size n1=n2= 200andobtain:
X1= 14, 860, s1= 1655, X2= 17, 230, s1= 2108.
As21
and22
are unknown we replace them with the
unbiased estimatorss21
ands22
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 23/31
8/11/2019 Further Inference Topics
23/31
Then
TS= 14, 860 17, 230
16552
200 + 2108
2
200
= 12.52.
Using the test criterion we find that TS< 1.64and hencewe rejectH0:1=2 in favour ofHA:1< 2 i.e. there isevidence that the mean income in cityAis below that in
cityB.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 24/31
8/11/2019 Further Inference Topics
24/31
The preceding results were based on large samples and
the CLT.
However, when samples are small we have to make use ofthe Studentst-distribution.
Provided that both populations:
1 are normally distributed, and2
have the same variance2
(i.e.2
1 =2
2 =2
),then
X1 X2 (1 2)
1
n1+ 1
n2
N(0, 1),
even when samples are small.
But this will not be the case when 2 is unknown and hasto be replaced by an estimator of it.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 25/31
8/11/2019 Further Inference Topics
25/31
When2 is unknown we can estimate it using
s2 =(n1
1)s21
+ (n2 1)s2
2n1+ n2 2 .
It then follows that
X1
X2
(1
2)
s
1
n1+ 1
n2 tn1+n22.
For example, ifH0:1=2 then we can use the teststatistic
TS=
X1
X2
s
1
n1+ 1
n2
tn1+n22
underH0 and apply the usual testing procedure.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
8/11/2019 Further Inference Topics
26/31
Tests About Two Populations 27/31
8/11/2019 Further Inference Topics
27/31
An example of an F-distribution with 20 d.f. for the
numerator and 20 d.f. for the denominator is as follows:
The distribution is strictly positive and not symmetric so wehave to find two critical values for a two-tail test from the
following table:
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 28/31
8/11/2019 Further Inference Topics
28/31
Upper 2.5% critical values of the F-distribution withv1 degrees of freedom for the numeratorandv2 degrees of freedom for the denominator
v1
v2 1 2 3 4 5 6 7 8 9
1 647.79 799.50 864.16 899.58 921.85 937.11 948.22 956.66 963.282 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.393 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.474 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.905 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.686 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.527 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.828 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36
9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.0310 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.7811 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.5912 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.4413 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.3114 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.2115 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.1216 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.0518 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.9320 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.8422 5.79 4.38 3.78 3.44 3.22 3.05 2.93 2.84 2.76
24 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.7026 5.66 4.27 3.67 3.33 3.10 2.94 2.82 2.73 2.6528 5.61 4.22 3.63 3.29 3.06 2.90 2.78 2.69 2.6130 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.5740 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.4560 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33
120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22 5.02 3.69 3.12 2.79 2.57 2.41 2.29 2.19 2.11
NB: Entries areFu such that Pr(Fv1,v2 > Fu) = 0.025.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 29/31
8/11/2019 Further Inference Topics
29/31
Table A.4 in Thomas provides additional d.f. for the
numerator as well as additional significant levels.
The table gives the upper-tail critical value,F
u
; thelower-tail critical value is simply the inverse of this i.e.
Fl = 1
Fu.
For example, with 8 d.f. for the numerator and 30d.f. for thedenominator, the table gives
Fu = 2.65 Fl = 12.65
= 0.38.
The test criterion for the test is:
rejectH0 ifTSFu
but reserve judgement if Fl
8/11/2019 Further Inference Topics
30/31
For example, suppose we have two samples yielding:
n1= 10, s21= 14.5, n2= 20, s22= 4.8.
The resulting test statistic is
TS= 14.5
4.8 = 3.02
and has an F9,19 distribution under the null.
The upper two-tail 5% critical value (which puts 2.5% into
each tail) is 2.88 and the lower-tail value is 1/2.88= 0.35.
AsTS> 2.88we reject the null in favour of the alternativethat2
1=2
2.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Summary 31/31
Summary
8/11/2019 Further Inference Topics
31/31
Summary
Correlations and independence.
Tests about two populations.
Next term:
Econometrics (but first enjoy the vacation. . . )
EC114 Introduction to Quantitative Economics 10. Further Inference Topics