review of bas stats for econometric success

download review of bas stats for econometric success

of 42

Transcript of review of bas stats for econometric success

  • 7/29/2019 review of bas stats for econometric success

    1/42

    Econometrics

    Review of Basic Statistics

  • 7/29/2019 review of bas stats for econometric success

    2/42

  • 7/29/2019 review of bas stats for econometric success

    3/42

    Topics

    1. Descriptive Statistics:

    - 1 variable: Mean and Variance

    - 2 variables: Covariance, Correlation

    2. Hypothesis Testing

  • 7/29/2019 review of bas stats for econometric success

    4/42

    Descriptive Statistics

  • 7/29/2019 review of bas stats for econometric success

    5/42

    Inferential Statistics

    Involves:

    - Estimation

    - Hypothesis Testing

    Purpose:

    - Make decisions aboutpopulation characteristics

  • 7/29/2019 review of bas stats for econometric success

    6/42

    Descriptive Statistics

  • 7/29/2019 review of bas stats for econometric success

    7/42

    Mean

    Measure of central tendency

    Affected by extreme values

    Formula:

  • 7/29/2019 review of bas stats for econometric success

    8/42

    Median

    Measure of central tendency

    Middle value in ordered series

    - If odd n, mean of the 2 middle values

    Value that splits the distribution into two halves

    Not affected by extreme values

    Raw Data: 17 16 21 18 13 16 12 11

    Ordered: 11 12 13 16 16 17 18 21

    Position: 1 2 3 4 5 6 7 8

  • 7/29/2019 review of bas stats for econometric success

    9/42

    Mode

    Measure of central tendency

    Value that occurs most often

    Not Affected by Extreme Values There may be more than one mode

    Raw Data: 17 16 21 18 13 16 12 11 Ordered: 11 12 13 16 16 17 18 21

  • 7/29/2019 review of bas stats for econometric success

    10/42

    Sample Variance

    Measure of Dispersion around the Mean

    Formula:

  • 7/29/2019 review of bas stats for econometric success

    11/42

    Sample Standard Deviation

    Measure of Dispersion around the Mean

    Has the same unit of measurement as the

    variable itself

    Formula:

  • 7/29/2019 review of bas stats for econometric success

    12/42

    Radom Variables

    Random variable: numerical summary of arandom outcome

    1. Discrete: only a discrete set of possible values

    => summarized by probability distribution: list ofall possible values of the variable and the

    probability that each value will occur.

    2. Continuous: continuum of possible values

    => summarized by the probability density

    function (pdf)

  • 7/29/2019 review of bas stats for econometric success

    13/42

    Probability Distribution

    1. List of pairs [ Xi, P(Xi) ]

    Xi = Value of Random Variable (Outcome)

    P(Xi) = Probability Associated with Value

    2. 0 P(Xi) 1 - Mutually exclusive (no overlap)

    3. P(Xi) = 1 - Collectively exhaustive (nothing

    left out)

  • 7/29/2019 review of bas stats for econometric success

    14/42

    Mean and Variance: Discrete Case

    Mean, or Expected Value

    Weighted Average of All Possible Values

    E(X) = X= XiP(Xi) Variance

    Weighted Average Squared Deviation about

    the MeanE(X) = X= (Xi - X)

    2P(Xi)

  • 7/29/2019 review of bas stats for econometric success

    15/42

    Covariance

    - measures joint variability ofXand Y

    For discrete RVs,

    Can take any value in the real numbers Depends on units of measurement (e.g., dollars, cents)

    cov(X,Y) > 0 means thatXand Ytend to move together

    when y is above its mean, x tends to be above its mean

    cov(X,Y) < 0 means thatXand Ytend to move in opposite

    directions

    15

    ))((),cov(YX

    YXEYX ),())((1

    YXPYX

    n

    i

    YiXi

  • 7/29/2019 review of bas stats for econometric success

    16/42

    Correlation

    A more convenient measure of the relationship betweenX

    and Yis correlation since it is normalized to lie inside

    [-1;1] interval.

    -1 < corr(X,Y) < 1

    If corr(X,Y) = 0, thenXand Yare uncorrelated.

    If corr(X,Y) > 0 , thenXand Yare positively correlated.

    If corr(X,Y) < 0, thenXand Yare negatively correlated.

    16

    YX

    XY

    YX

    YX

    YXcorr

    )var()var(

    ),cov(

    ),(

  • 7/29/2019 review of bas stats for econometric success

    17/42

    Note

    Covariance and correlation measure only linear

    dependence!

    Example: Cov(X,Y)=0

    Does not necessarily imply that y and x are

    independent.

    They may be non-linearly related.

    But if X and Y are jointly normally distributed,

    then they are independent.

  • 7/29/2019 review of bas stats for econometric success

    18/42

    18

    The correlationcoefficient

    measures linearassociation

  • 7/29/2019 review of bas stats for econometric success

    19/42

    The Mean and Variance of Sums of

    Random Variables

    19

    )()()( YEXEYXE

    ),cov(2)var()var()var( YXYXYX

    ),cov(2)var()var()var( YXYXYX

  • 7/29/2019 review of bas stats for econometric success

    20/42

    Continuous Probability Distributions:

    Normal Distribution

    The notation reads Xis Normally distributedwith mean and variance 2

    The PDF for a normal RV is

    The normal distribution has a familiar bell-shape.

    The normal density is symmetric around its mean, and 95% of

    probability density lies in the region .

    20

    ),(~2

    NX

    2

    22

    )(2

    1exp

    2

    1)(

    xxf

    )96.1,96.1(

    X

    Y

  • 7/29/2019 review of bas stats for econometric success

    21/42

    Effects of Varying Parameters

  • 7/29/2019 review of bas stats for econometric success

    22/42

    Infinite Number of Normal

    Distribution Tables

    Normal distributions differ by mean and

    standard deviations

    Each distribution would require its own table

    Thats an infinite number of tables!

  • 7/29/2019 review of bas stats for econometric success

    23/42

    Standard Normal Distribution IfXis a Normal RV with mean and variance 2

    has a normal distribution with mean 0

    and variance 1, or standard normal distribution.

    X

    Z

  • 7/29/2019 review of bas stats for econometric success

    24/42

    Example

  • 7/29/2019 review of bas stats for econometric success

    25/42

    Values of the std normal CDF, ,are tabulated in Appendix Table 1

    To compute probabilities for a normal RV, it

    must be standardized by subtracting its meanand dividing by standard deviation

    Example: Suppose Y ~ N(2,16), and we need

    P(Y

  • 7/29/2019 review of bas stats for econometric success

    26/42

    26

    Moments: Skewness, Kurtosis

    skewness=

    3

    3

    Y

    Y

    E Y

    = measures asymmetry in a

    distribution

    The larger the skewness (by absolute value), the moreasymmetric is distributionskewness = 0: distribution is symmetric

    skewness > ( 3: heavy tails (leptokurtotic), i.e. extreme events are

    more likely to occur

  • 7/29/2019 review of bas stats for econometric success

    27/42

    27

  • 7/29/2019 review of bas stats for econometric success

    28/42

    Central Limit Theorem

  • 7/29/2019 review of bas stats for econometric success

    29/42

    Important Continuous Distributions

    All derived from normal

    distribution

    2 distribution: arises from

    sum of squared normalrandom variables

    t distribution: arises from

    ratios of normal and 2

    variables

    F distribution: arises from

    ratios of2variables.

  • 7/29/2019 review of bas stats for econometric success

    30/42

    Hypothesis Testing

  • 7/29/2019 review of bas stats for econometric success

    31/42

    Identifying Hypotheses

    1. Formulate the question, e.g. test that thepopulation mean is equal to 3

    2. State the question statistically (H0: = 3)

    3. State its alternative statistically (H1: 3)

    4. Choose level of significance

    Typical values are 0.01, 0.05, 0.10

    Rejection region of sampling distribution: the

    unlikely values of sample statistic if null

    hypothesis is true

  • 7/29/2019 review of bas stats for econometric success

    32/42

    Identifying Hypotheses: Examples

    1. Is the population average amount of TV

    viewing 12 hours?

    H0: = 12

    H1: 12

    Ch i th L l f Si ifi

  • 7/29/2019 review of bas stats for econometric success

    33/42

    Choosing the Level of Significance:

    Type I and Type II Errors

    Type I Error: Reject a true null hypothesis.

    Type II Error: Do not reject a false null.

    We would like probabilities of both errors to be

    small. BUT, we cannot make both very small at

    the same time.

    In statistics, we fix the probability of Type I errorat a significance level (e.g. 5%) and minimize the

    probability of Type II error.

  • 7/29/2019 review of bas stats for econometric success

    34/42

    Hypothesis Testing: Basic Idea

  • 7/29/2019 review of bas stats for econometric success

    35/42

    Method 1: Compare Test Statistic to

    Critical Value from the Table1. Convert Sample Statistic (e.g., ) to standardized

    Z variable

    2. Compare to Critical Value from the table

    If Z-test statistic falls in the rejection region,reject H0;

    Otherwise do not reject H0

  • 7/29/2019 review of bas stats for econometric success

    36/42

    Two-Sided Test: Rejection Regions

  • 7/29/2019 review of bas stats for econometric success

    37/42

    One-Sided Test: Rejection Region

  • 7/29/2019 review of bas stats for econometric success

    38/42

    Method 2: Compute the P-value

    Probability of obtaining a test statistic more

    extreme ( or ) than actual sample value given

    H0 is true.

    The lowest significance level at which we reject H0

    Compute p-value for the test

    Use this p-value to make rejection decision: Ifp value , do not reject H0

    Ifp value < , reject H0

    l f 2 id d d 1 id d

  • 7/29/2019 review of bas stats for econometric success

    39/42

    P-values for 2-sided and 1-sided tests

    Two-sided test: H0: = 0

    H1: 0

    One-sided test:

    a. H0: > 0 b. H0: < 0

    H1: < 0 H1: > 0

    YY

    YYZPvaluep

    00 2||

    Y

    Yvaluep

    01

    Y

    Yvaluep

    0

  • 7/29/2019 review of bas stats for econometric success

    40/42

    P-value for a 2-Sided Test

  • 7/29/2019 review of bas stats for econometric success

    41/42

    Method 3: Confidence Intervals

    Confidence interval: set of values that containsthe true population mean with a pre-specified

    probability, say 95%.

    This pre-specified probability is called confidence

    level.

    A 95% confidence interval for Y contains the true

    value ofY

    in 95% of repeated samples.

    The 90% CI is:

    The 95% CI is:

    The 99% CI is:41

    )(96.1 YSEY

    )(58.2 YSEY

    )(645.1 YSEY

  • 7/29/2019 review of bas stats for econometric success

    42/42

    Jarque-Bera Test for Normality

    Assesses whether a given sample of data isnormally distributed

    Aggregates information in the data about both

    skewness and kurtosis Test of the hypothesis that S = 0 and K = 3

    The 5% critical value is 5.99; if JB > 5.99, reject

    the null of normality.