2. Statistical Inference - Single Population

download 2. Statistical Inference - Single Population

of 38

Transcript of 2. Statistical Inference - Single Population

  • 7/27/2019 2. Statistical Inference - Single Population

    1/38

    0

    2. Statistical Inference: Single

    Population Mean and Proportion

    (Review)

    ECON 251

    Research Methods

  • 7/27/2019 2. Statistical Inference - Single Population

    2/38

    1

    Descr ipt ive stat ist ics: calculating summary characteristics of data.

    Inferential statist ic s: Using sample summary measures to estimatepopulation characteristics.

    In descriptive

    statistics we

    summarize the data

    from a population or

    a sample of it.

    Data on population is

    NOT available. We

    take a sample and

    use its summarizing

    measures to estimate

    the unknown

    populationcharacteristics.

    Population

    Characteristics

    are unknown

    Sample:Find

    summarizing

    measures

    Inference

    Descriptive Statistics

    Inferential Statistics

    population

    Summarize the

    data

    sampleSummarize

    the data

    Descriptive Statistics vs. Inferential Statistics

  • 7/27/2019 2. Statistical Inference - Single Population

    3/38

    2

    Statistical Inference Review

    There are two procedures for making inference

    Hypothesis Testing (HT) and Estimation

    In estimation, we attempt to estimate the value of the parameter ineither of two ways:

    Point Estimator

    A point estimator draws inference about a population by estimating the value ofan unknown parameter using a single value or a point.

    Interval Estimator

    An interval estimator draws inference about a population by estimating thevalue of an unknown parameter using an interval.

    We use intervals so we can be precise about our degree of certainty regardingthe sample statistics proximity to the population parameter.

    HT involves testing a specific belief about the value of the parameter

    HT concepts are the foundation for estimation as well, so we begin

    there.

  • 7/27/2019 2. Statistical Inference - Single Population

    4/38

    3

    4 Steps For Hypothesis Testing

    Find the p-value

    (P-value method)

    Set upalternative &null hypotheses

    Step 1

    Calculate thetest statistic

    Step 2

    Find critical values

    (Rejection region method)Step 3

    Make a decisionStep 4

  • 7/27/2019 2. Statistical Inference - Single Population

    5/38

    4

    Step One: Set up alternative & null hypotheses

    The purpose of hypothesis testing is to determine whetherthere is enough statistical evidence in favor of a certainbelief about a population parameter.

    There are two hypotheses (about a population parameter(s))

    H0 - the null hypothesis [for example, H0: m = 5] H1 - the alternative hypothesis [for example, H1: m > 5]

  • 7/27/2019 2. Statistical Inference - Single Population

    6/38

    5

    Step One: Set up alternative & null hypotheses

    The alternative hypothesis is most important, it is what youare trying to prove. Always start by stating the alternativefirst.

    The alternative can involve >, < or

    The alternative establishes whether the test is one-tailed ortwo-tailed.

    The alternative establishes the location of the rejectionregion(s).

    Once you have correctly defined the alternative, the null iseasy to establish.

    We always assume the null is true, therefore H0 MUSTcontain =, and may contain , .

  • 7/27/2019 2. Statistical Inference - Single Population

    7/38

    6

    n

    xz

    m

    ns

    xt

    m

    n

    pp

    ppz

    )1(

    Step Two: Calculating test statistics

    Population Mean w/ Sigma known

    Population Mean w/ Sigma unknown

    Population Proportion

  • 7/27/2019 2. Statistical Inference - Single Population

    8/38

    7

    Step Two: Calculating test statistics

    The standardization formulas provide the test statistic.

    They convert our sample statistic from the samplingdistribution to the standardized distribution (torzin thiscase).

    There are millions of sampling distributions. Rather than

    knowing everything about every one of those distributions, westandardize our statistic thereby moving it from the samplingdistribution and placing it on the standardized distribution.

    We know everything there is to know about the standardized

    distribution. Because the test statistic is on the standardizeddistribution, we can compare the test statistic to a criticalvalue, or the area associated with the test statistic (p-value) toalpha.

  • 7/27/2019 2. Statistical Inference - Single Population

    9/38

    8

    Step Three: Find critical value or p-value

    You need to decide which method you are going to use tomake your decision.

    If you are doing the calculations by hand, you will frequentlyuse the rejection region (critical value) method.

    The critical value will either be given to you (exams, in classexamples) or you would find it in excel (NORMSINV, TINV).

    P-value method will frequently be used when you are usingsoftware to do your calculations, as most programs provide

    these values. You can also find them in excel (NORMSDIST,TDIST).

    In the latter case, be sure you can identify the p-valuegraphically as well.

  • 7/27/2019 2. Statistical Inference - Single Population

    10/38

    9

    Decision Rule: rejection region (critical value) methodReject H0 if the test statistic is more extreme than the critical value

    Given the significance level (probability of type I error) = a

    Two sided alternative

    One sided (upper tail) alternative

    One sided (lower tail) alternative

    Rejection regionaz

    Critical value

    Critical value

    Rejection region az

    In case of t distribution we will have & respectively.2at at

    Rejection region Rejection region

    Critical values

    2az2a

    z

  • 7/27/2019 2. Statistical Inference - Single Population

    11/38

    10

    Decision Rule: p-value methodP-value is "the amount of evidence in favor of the alternative hypothesis. Thesmaller the p-value, the more evidence in favor of the alternative (and the morelikely you will reject H

    0

    ). P-value is most commonly compared to aof 5% forReject/DNR decision:Reject H0 if the p-value is smaller than the significance level

    Each p-value/2

    -|tm| |tm|

    p-value = the area to theright of tm

    tm

    tm

    Two sided alternative

    One sided (upper tail) alternative

    One sided (lower tail) alternative

    (tm=test statistic; Same holds true for |Z

    m| & Z

    m)

    p-value = the area to the

    left of tm

  • 7/27/2019 2. Statistical Inference - Single Population

    12/38

    11

    Three steps to finding the p-value from a graph.1. Find the test statistic

    2. Draw an arrow from the test statistic to the extreme end of nearestrejection region

    3. If a two-tailed test, do this on the opposite side of the distribution aswell.

    The area of the graph which has an arrow through it, is the p-value.

    Try showing the p-value graphically in these 4 examples. In each case, assumethat the critical value is 3.2:

    H0: m = 5; H0: m = 5; H0: m = 5; H0: m = 5;

    H1: m > 5 H1: m > 5; H1: m < 5 H1: m 5

    Test stat = 7 Test stat = 3 Test stat = 3 Test stat = 7

    Using the p-value is the most common method of making your decision asmost computer software provides this value. However, you must graph yourdistribution before making a final determination.

  • 7/27/2019 2. Statistical Inference - Single Population

    13/38

    12

    Step Four: Make your decision

    Make one of the following two conclusions based on the test:

    Reject the null hypothesis in favor of the alternativehypothesis.

    There ___ enough evidence to infer that the alternative is true

    Do not reject the null hypothesis in favor of the alternativehypothesis.

    There _______ enough evidence to infer that the alternativeis true

  • 7/27/2019 2. Statistical Inference - Single Population

    14/38

    13

    H0 is true H0 is false

    DNR H0

    Reject H0

    States of Nature

    1- a

    Type I error, a,

    Significance level

    1-,power of test

    Type II error,

    Errors

    Two types of errors are possible when making adecision: Type I error - reject H0 when H0 is true.

    Type II error - do not reject H0

    when H0

    is false.

  • 7/27/2019 2. Statistical Inference - Single Population

    15/38

    14

    Analogy: Hypothesis testing is similar to a jury trial

    Assume innocent until proven guilty

    Assume H0 is true until proven otherwise

    Court either finds defendant guilty (Reject H0) or not guilty (DNR H0)

    Courts do not prove a person innocent (Accept H0); rather if not guiltyjust not enough evidence to prove guilty; similarly, if we DNR H0, we are

    not saying H0 is true, only that there is not enough evidence for us tobelieve otherwise.

    Level of proof required to establish guilty verdict? What if you convictan innocent person?

    Identical to establishing significance level of test. Type I error (a) is equivalent to convicting an innocent person. We focus

    on a, rather than worry about a Type II error () releasing a guiltyperson.

    Beyond a reasonable doubt is court of law norm

  • 7/27/2019 2. Statistical Inference - Single Population

    16/38

    15

    Errors

    It would be desirable to reduce both types of errors at the

    same time. But this is NOT possible.

    There is a trade off between aand. As we try to decreasea,will increase and vice versa.

    Because the consequences of a Type I error are in mostcircumstances considered to be of greater concern than aType II error (sending innocent person to jail is worse thanletting a guilty person go), we focus on controlling the size of

    the Type I error.

  • 7/27/2019 2. Statistical Inference - Single Population

    17/38

    16

    Errors

    Standard in statistics varies depending upon the issue at

    stake:

    ______________ evidence = 1% significance level

    __________ evidence = 1.001-5% significance level

    __________ evidence = 5.001-10% significance level __________ evidence = 10.001% or higher significance level

    Unless stated specifically to the contrary, assume we areusing a= .05 in all problems.

  • 7/27/2019 2. Statistical Inference - Single Population

    18/38

    17

    #1 A Nielsen survey estimated in the year 2000 that the mean

    number of hours of television viewing per household was7.25 hours per day. The survey involved 250 households. Thesample data had a standard deviation of 2.5 hours per day. In1990, it was determined that the population mean of viewingper household was 6.70 hours per day. Has TV viewingincreased since 1990?

    (t249,0.005=2.596, t249,0.01=2.34, t249,0.025=1.9695, t249,0.05=1.651);

    (z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

    ExamplesHypothesis Testing

  • 7/27/2019 2. Statistical Inference - Single Population

    19/38

    18

    Hypothesis Testing 4 Step Solution

    Identify the alternative and null hypotheses.

    H0: mH1: m

    Calculate the test statistic

    Find the critical value or p-value.

    Z0.05 = Make the decision

    _______ H0 in favor of the alternative. There is ___________proof that TV viewing has increased since 1990.

    n

    xZ

    m

  • 7/27/2019 2. Statistical Inference - Single Population

    20/38

    19

    #2 The owners of Subway claim that their stores average$875,000 in annual sales. You used this information indeciding to open a store in Delaware. Your store, however,

    has not come even close to these annual sales figures. Youwant to prove that you were misled, and that the averagefigure for all stores is actually less than 875,000. You collect

    annual sales figures from 70 randomly selected stores. Theaverage in your sample turns out to be $856,000, with astandard deviation of $24,000. You also know from a friendwho is in management of a similar franchise, that you can

    count on the standard deviation of sales being $28,000. Canyou prove your claim?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);

    (t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);

    (z0.005

    =2.58, z0.01

    =2.33, z0.025

    =1.96, z0.05

    =1.645)

  • 7/27/2019 2. Statistical Inference - Single Population

    21/38

    20

    #3 Your company is considering opening a retail store inFairbanks Alaska, but will only do so if average dailyspending per capita is higherthere than in the rest of the

    country. According to recent data, the average UShousehold spends $90 per day. A sample was taken inFairbanks. From a sample of 49, the average daily

    expenditure was $84.50, and the standard deviation was$14.50. Should you open a store in Fairbanks? You have alot riding on this decision, you need to be sure of yourconclusion.

    (t48,0.005=2.68, t48,0.01=2.41, t48,0.025=2.01, t48,0.05=1.68);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

  • 7/27/2019 2. Statistical Inference - Single Population

    22/38

    21

    #4 Microsoft Outlook is believed to be the most widely used e-mail manager. A Microsoft executive claims that MicrosoftOutlook is used by more than 75% of Internet users. A Merrill

    Lynch study involving 300 respondents, reported that 72%use Microsoft Outlook. Is there enough evidence here todisprove the executives claim?

    (t299,0.005=2.592, t299,0.01=2.339, t299,0.025=1.968, t299,0.05=1.65);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

  • 7/27/2019 2. Statistical Inference - Single Population

    23/38

    22

    #5 A fast-food restaurant plans a special offer that will enablecustomers to purchase specially designed drink glassesfeaturing well-known cartoon characters. Ifmore than 15%

    of the customers will purchase the glasses, the special offerwill be implemented. A preliminary test has been set up atseveral locations, and 88 of 500 customers purchased the

    glasses. Should the special offer be introduced?(t87,0.005=2.634, t87,0.01=2.37, t87,0.025=1.988, t87,0.05=1.663);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

  • 7/27/2019 2. Statistical Inference - Single Population

    24/38

    23

    #6 For a new newspaper to be financially viable, it has tocapture more than 12% of the Toronto market. In a surveyconducted among 400 randomly selected prospective

    readers, 58 participants indicated they would subscribe tothe newspaper. Can the publisher conclude that theproposed newspaper will be financially viable at the 10%

    significance level?(t57,0.005=2.665, t57,0.01=2.39, t57,0.025=2.00, t57,0.05=1.67);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645, z0.1=1.282)

  • 7/27/2019 2. Statistical Inference - Single Population

    25/38

    24

    Confidence Interval Estimation 4 Steps

    Confidence interval estimation relies on the same concepts

    and relationships as does hypothesis testing. A simple fourstep approach to these problems can also be helpful.

    1. We begin by calculating the point estimate from our sample

    data.2. To establish the appropriate interval width, find the upperand lower limits on the standardized distribution associatedwith your confidence level.

    3. Use the confidence interval formulas to place them on thesampling distribution.4. Place the sample statistic at the center of the interval and

    the confidence interval is complete.

  • 7/27/2019 2. Statistical Inference - Single Population

    26/38

    25

    Population Mean w/ Sigma known

    Population Mean w/ Sigma unknown

    Population Proportion

    nzx a 2/

    n

    stx 2/a

    n

    ppzp

    )1( 2/

    a

    Confidence Interval Formulas

  • 7/27/2019 2. Statistical Inference - Single Population

    27/38

    26

    1a is the confidence level associated with the interval Sample statistic is used as the center of the interval

    W: width of the interval; 2 x W: total length of the interval

    UCL (Upper Confidence Limit) and LCL (Lower Confidence Limit)are found using the critical value associated with a/2

    Confidence interval width for mean is a function of: use oft distribution orz distribution

    level of confidence chosen (positively related) of the sampling distribution (positively related) sample size (negatively related)

    Population parametercan lie outside of interval in fact, we

    know it will a % of the time If interested in establishing a confidence interval of a specific width

    and level of confidence, calculate the least number that is requiredto be in your sample to achieve your objective ahead of time.

  • 7/27/2019 2. Statistical Inference - Single Population

    28/38

    27

    #7 As a new Subway franchisee, you are estimating your expected annual

    sales. You have annual sales figures from 70 randomly selected stores.The average in your sample turns out to be $856,000, with a standarddeviation of $24,000. The population standard deviation is 28,000. Youwant a 90% and 95% confidence interval around your estimate.

    (t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);(t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

    Example Confidence Interval Estimation

  • 7/27/2019 2. Statistical Inference - Single Population

    29/38

    28

    Confidence Interval Estimation 4 Steps

    1. Calculate the point estimate.

    2. Find the upper and lower limits on the standardizeddistribution associated with your confidence level.

    For 1a = 90%; Z0.05=3. Use the confidence interval formulas to place the upper

    and lower limits on the sampling distribution.

    4. Place the point estimate at the center of the interval

    x

    nzx

    a 2/

  • 7/27/2019 2. Statistical Inference - Single Population

    30/38

    29

    Using CI to decide hypothesis tests:

    If you have calculated a confidence interval, and then decide

    you also want to test a hypothesis with this information, youcan do so directly provided:

    The hypothesis being tested is two-tailed

    Thea

    from the hypothesis test, and 1a

    from the confidenceinterval total 1.0

    If these two conditions hold, then determine whether thehypothesized value in the null hypothesis for the parameter

    falls in the interval created. If it does, DNR H0. If it does notReject H0.

  • 7/27/2019 2. Statistical Inference - Single Population

    31/38

    30

    ExampleUsing CI to decide a Hypothesis Test

    #8 The owners of Subway claim that their stores average $875,000

    in annual sales. You used this information in deciding to open astore in Delaware. Your store, however, has not come even closeto these annual sales figures. You want to prove that you weremisled, and that the average figure for all stores is actually NOT875,000. You collect annual sales figures from 70 randomlyselected stores. The average in your sample turns out to be$856,000, with a standard deviation of $24,000. You also knowfrom a friend who is in management of a similar franchise, thatyou can count on the standard deviation of sales being $28,000.

    Can you prove your claim?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);

    (t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);

    (z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

  • 7/27/2019 2. Statistical Inference - Single Population

    32/38

    31

    Sample sizes required to construct intervals of a certain

    degree of confidence and width can be determined by usingone of the following formulas below:

    Sample Size for Means

    Sample Size for Proportions

    a prioriidea of

    no a prioriidea of

    2

    222/

    w

    z

    n

    a

    2

    2/ )1(

    W

    ppz

    n

    a

    2

    222/ )5(.

    w

    zn a

    p

    p

    Estimating n for Confidence Intervals

  • 7/27/2019 2. Statistical Inference - Single Population

    33/38

    32

    When involving mean:

    Use sample standard deviation from previous study as

    Use a pilot study to obtain a standard deviation ()

    Use judgment, or best guess

    When involving proportion: Use best estimate if confident of a reasonable value for

    You have an a priorivalue for sample proportion

    Use 0.5 as You have no a priorivalue for sample proportion

    p

    p

    Estimating n for Confidence Intervals

  • 7/27/2019 2. Statistical Inference - Single Population

    34/38

    33

    Example Estimating n for Confidence Intervals

    #9The interval you have created for your Subway is a goodstart, but you would be more comfortable with a tighterrange for your estimate of sales. You decide that themaximum you can tolerate is +/- 2,500. What sample sizewould you need to collect to obtain a 90% confidenceinterval for annual sales with a width of 2,500?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);

    (t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);

    (z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

    2

    22

    2/

    w

    zn

    a

  • 7/27/2019 2. Statistical Inference - Single Population

    35/38

    34

    ExamplesConfidence Intervals

    #10 The Environmental Protection Agency (EPA) has agreed to

    give tax rebates to manufacturers of vehicles that get acombined city and highway gas mileage of at least 32mpg. A 49 car sample of a new Ford vehicle reveals amean of 32.6 mpg. It is believed that the highway gasmileage for Ford vehicles has a standard deviation of 0.78mpg.(t48,0.005=2.68, t48,0.01=2.41, t48,0.025=2.01, t48,0.05=1.68);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)

    Construct a 95% confidence interval. Then a 99%confidence interval

  • 7/27/2019 2. Statistical Inference - Single Population

    36/38

    35

    #11 Redo example #10, but this time, the standard deviation ofthe mpg for the 49 cars is 0.83, and there is no credible

    information regarding the population standard deviation ofmpg for these vehicles.

    Construct a 95% confidence interval. Then a 99%

    confidence interval.

  • 7/27/2019 2. Statistical Inference - Single Population

    37/38

    36

    #12 Suppose we have made an interval estimation for themean of the population such as: [126.56, 192.41]. If we

    realize that the true population mean is 195.7, what shouldwe conclude?

    The procedure for interval estimation must have been doneincorrectly.

    We should first standardize the LCL and UCL and then seeif they capture the mean.

    The procedure can still be valid, since we allow for a certainamount of error.

    We must use a t distribution instead of a z distribution.

    We could never get this result.

  • 7/27/2019 2. Statistical Inference - Single Population

    38/38

    37

    #13 A major news source conducted a poll asking 814 adults torespond to a series of questions about their feelings towardthe state of affairs within the United States. A total of 562

    adults responded yes to the question: Do you feel thingsare going well in the United States these days?

    A) What is the point estimate of the proportion of the adultpopulation that feel things are going well in the United

    States?

    B) What is the 90% confidence interval for the proportion ofthe adult population that feels things are going well in the

    United States?

    C) If one wanted to be 95% certain, and have an interval nowider than 3%, what sample size would be required?