review of bas stats for econometric success
Transcript of review of bas stats for econometric success
-
7/29/2019 review of bas stats for econometric success
1/42
Econometrics
Review of Basic Statistics
-
7/29/2019 review of bas stats for econometric success
2/42
-
7/29/2019 review of bas stats for econometric success
3/42
Topics
1. Descriptive Statistics:
- 1 variable: Mean and Variance
- 2 variables: Covariance, Correlation
2. Hypothesis Testing
-
7/29/2019 review of bas stats for econometric success
4/42
Descriptive Statistics
-
7/29/2019 review of bas stats for econometric success
5/42
Inferential Statistics
Involves:
- Estimation
- Hypothesis Testing
Purpose:
- Make decisions aboutpopulation characteristics
-
7/29/2019 review of bas stats for econometric success
6/42
Descriptive Statistics
-
7/29/2019 review of bas stats for econometric success
7/42
Mean
Measure of central tendency
Affected by extreme values
Formula:
-
7/29/2019 review of bas stats for econometric success
8/42
Median
Measure of central tendency
Middle value in ordered series
- If odd n, mean of the 2 middle values
Value that splits the distribution into two halves
Not affected by extreme values
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
-
7/29/2019 review of bas stats for econometric success
9/42
Mode
Measure of central tendency
Value that occurs most often
Not Affected by Extreme Values There may be more than one mode
Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21
-
7/29/2019 review of bas stats for econometric success
10/42
Sample Variance
Measure of Dispersion around the Mean
Formula:
-
7/29/2019 review of bas stats for econometric success
11/42
Sample Standard Deviation
Measure of Dispersion around the Mean
Has the same unit of measurement as the
variable itself
Formula:
-
7/29/2019 review of bas stats for econometric success
12/42
Radom Variables
Random variable: numerical summary of arandom outcome
1. Discrete: only a discrete set of possible values
=> summarized by probability distribution: list ofall possible values of the variable and the
probability that each value will occur.
2. Continuous: continuum of possible values
=> summarized by the probability density
function (pdf)
-
7/29/2019 review of bas stats for econometric success
13/42
Probability Distribution
1. List of pairs [ Xi, P(Xi) ]
Xi = Value of Random Variable (Outcome)
P(Xi) = Probability Associated with Value
2. 0 P(Xi) 1 - Mutually exclusive (no overlap)
3. P(Xi) = 1 - Collectively exhaustive (nothing
left out)
-
7/29/2019 review of bas stats for econometric success
14/42
Mean and Variance: Discrete Case
Mean, or Expected Value
Weighted Average of All Possible Values
E(X) = X= XiP(Xi) Variance
Weighted Average Squared Deviation about
the MeanE(X) = X= (Xi - X)
2P(Xi)
-
7/29/2019 review of bas stats for econometric success
15/42
Covariance
- measures joint variability ofXand Y
For discrete RVs,
Can take any value in the real numbers Depends on units of measurement (e.g., dollars, cents)
cov(X,Y) > 0 means thatXand Ytend to move together
when y is above its mean, x tends to be above its mean
cov(X,Y) < 0 means thatXand Ytend to move in opposite
directions
15
))((),cov(YX
YXEYX ),())((1
YXPYX
n
i
YiXi
-
7/29/2019 review of bas stats for econometric success
16/42
Correlation
A more convenient measure of the relationship betweenX
and Yis correlation since it is normalized to lie inside
[-1;1] interval.
-1 < corr(X,Y) < 1
If corr(X,Y) = 0, thenXand Yare uncorrelated.
If corr(X,Y) > 0 , thenXand Yare positively correlated.
If corr(X,Y) < 0, thenXand Yare negatively correlated.
16
YX
XY
YX
YX
YXcorr
)var()var(
),cov(
),(
-
7/29/2019 review of bas stats for econometric success
17/42
Note
Covariance and correlation measure only linear
dependence!
Example: Cov(X,Y)=0
Does not necessarily imply that y and x are
independent.
They may be non-linearly related.
But if X and Y are jointly normally distributed,
then they are independent.
-
7/29/2019 review of bas stats for econometric success
18/42
18
The correlationcoefficient
measures linearassociation
-
7/29/2019 review of bas stats for econometric success
19/42
The Mean and Variance of Sums of
Random Variables
19
)()()( YEXEYXE
),cov(2)var()var()var( YXYXYX
),cov(2)var()var()var( YXYXYX
-
7/29/2019 review of bas stats for econometric success
20/42
Continuous Probability Distributions:
Normal Distribution
The notation reads Xis Normally distributedwith mean and variance 2
The PDF for a normal RV is
The normal distribution has a familiar bell-shape.
The normal density is symmetric around its mean, and 95% of
probability density lies in the region .
20
),(~2
NX
2
22
)(2
1exp
2
1)(
xxf
)96.1,96.1(
X
Y
-
7/29/2019 review of bas stats for econometric success
21/42
Effects of Varying Parameters
-
7/29/2019 review of bas stats for econometric success
22/42
Infinite Number of Normal
Distribution Tables
Normal distributions differ by mean and
standard deviations
Each distribution would require its own table
Thats an infinite number of tables!
-
7/29/2019 review of bas stats for econometric success
23/42
Standard Normal Distribution IfXis a Normal RV with mean and variance 2
has a normal distribution with mean 0
and variance 1, or standard normal distribution.
X
Z
-
7/29/2019 review of bas stats for econometric success
24/42
Example
-
7/29/2019 review of bas stats for econometric success
25/42
Values of the std normal CDF, ,are tabulated in Appendix Table 1
To compute probabilities for a normal RV, it
must be standardized by subtracting its meanand dividing by standard deviation
Example: Suppose Y ~ N(2,16), and we need
P(Y
-
7/29/2019 review of bas stats for econometric success
26/42
26
Moments: Skewness, Kurtosis
skewness=
3
3
Y
Y
E Y
= measures asymmetry in a
distribution
The larger the skewness (by absolute value), the moreasymmetric is distributionskewness = 0: distribution is symmetric
skewness > ( 3: heavy tails (leptokurtotic), i.e. extreme events are
more likely to occur
-
7/29/2019 review of bas stats for econometric success
27/42
27
-
7/29/2019 review of bas stats for econometric success
28/42
Central Limit Theorem
-
7/29/2019 review of bas stats for econometric success
29/42
Important Continuous Distributions
All derived from normal
distribution
2 distribution: arises from
sum of squared normalrandom variables
t distribution: arises from
ratios of normal and 2
variables
F distribution: arises from
ratios of2variables.
-
7/29/2019 review of bas stats for econometric success
30/42
Hypothesis Testing
-
7/29/2019 review of bas stats for econometric success
31/42
Identifying Hypotheses
1. Formulate the question, e.g. test that thepopulation mean is equal to 3
2. State the question statistically (H0: = 3)
3. State its alternative statistically (H1: 3)
4. Choose level of significance
Typical values are 0.01, 0.05, 0.10
Rejection region of sampling distribution: the
unlikely values of sample statistic if null
hypothesis is true
-
7/29/2019 review of bas stats for econometric success
32/42
Identifying Hypotheses: Examples
1. Is the population average amount of TV
viewing 12 hours?
H0: = 12
H1: 12
Ch i th L l f Si ifi
-
7/29/2019 review of bas stats for econometric success
33/42
Choosing the Level of Significance:
Type I and Type II Errors
Type I Error: Reject a true null hypothesis.
Type II Error: Do not reject a false null.
We would like probabilities of both errors to be
small. BUT, we cannot make both very small at
the same time.
In statistics, we fix the probability of Type I errorat a significance level (e.g. 5%) and minimize the
probability of Type II error.
-
7/29/2019 review of bas stats for econometric success
34/42
Hypothesis Testing: Basic Idea
-
7/29/2019 review of bas stats for econometric success
35/42
Method 1: Compare Test Statistic to
Critical Value from the Table1. Convert Sample Statistic (e.g., ) to standardized
Z variable
2. Compare to Critical Value from the table
If Z-test statistic falls in the rejection region,reject H0;
Otherwise do not reject H0
-
7/29/2019 review of bas stats for econometric success
36/42
Two-Sided Test: Rejection Regions
-
7/29/2019 review of bas stats for econometric success
37/42
One-Sided Test: Rejection Region
-
7/29/2019 review of bas stats for econometric success
38/42
Method 2: Compute the P-value
Probability of obtaining a test statistic more
extreme ( or ) than actual sample value given
H0 is true.
The lowest significance level at which we reject H0
Compute p-value for the test
Use this p-value to make rejection decision: Ifp value , do not reject H0
Ifp value < , reject H0
l f 2 id d d 1 id d
-
7/29/2019 review of bas stats for econometric success
39/42
P-values for 2-sided and 1-sided tests
Two-sided test: H0: = 0
H1: 0
One-sided test:
a. H0: > 0 b. H0: < 0
H1: < 0 H1: > 0
YY
YYZPvaluep
00 2||
Y
Yvaluep
01
Y
Yvaluep
0
-
7/29/2019 review of bas stats for econometric success
40/42
P-value for a 2-Sided Test
-
7/29/2019 review of bas stats for econometric success
41/42
Method 3: Confidence Intervals
Confidence interval: set of values that containsthe true population mean with a pre-specified
probability, say 95%.
This pre-specified probability is called confidence
level.
A 95% confidence interval for Y contains the true
value ofY
in 95% of repeated samples.
The 90% CI is:
The 95% CI is:
The 99% CI is:41
)(96.1 YSEY
)(58.2 YSEY
)(645.1 YSEY
-
7/29/2019 review of bas stats for econometric success
42/42
Jarque-Bera Test for Normality
Assesses whether a given sample of data isnormally distributed
Aggregates information in the data about both
skewness and kurtosis Test of the hypothesis that S = 0 and K = 3
The 5% critical value is 5.99; if JB > 5.99, reject
the null of normality.