Introduction to Inferential Statisticsbuad2053.weebly.com/uploads/6/6/2/0/66209151/inference... ·...

Introduction to Inferential Statistics INTRODUCTION Earlier chapters introduced the idea that sample information, in the form of sample statistics and graphs, could be used to examine the reasonableness of a possible value for a population parameter. This chapter begins by presenting the logic of hypothesis testing. Then, we formally (well, sort of formally) introduce three systematic approaches to inferential statistics: the p-value approach, the critical value approach, and confidence interval approach. In this chapter, these approaches are used to estimate and test hypotheses about the population mean when the population standard deviation is known. I’ve also included some directions (with screenshots) for using your statistical calculator. Subsequent chapters will build on the logic presented here. A brief conceptual recap Sampling distributions are distributions of a sample statistic and have well-defined characteristics. For the sampling distribution of the sample mean, those characteristics are:

1. The mean of the sampling distribution is equal to the mean of the population. 2. The standard error of the sampling distribution is equal to the population standard

deviation divided by the square root of the sample size. Hence, the sampling distribution is less spread out than the population distribution.

3. As the sample size increases, the sampling distribution takes on the shape of a bell curve (normal distribution), even if the population distribution is decidedly not normal.

Introduction to hypothesis testing Now, suppose that we do not know the mean of the original population (Later we will extend this to the case where we do not know the population standard deviation either.). However, we have an educated guess, or hypothesis, about the value of the population mean. Using the hypothesized value of the population mean, construct a sampling distribution for the sample mean based on samples of a particular size. This is the hypothesized sampling distribution (so-called because it is based on a hypothesized population mean rather than a known population mean). Now, select a random sample and compute the sample mean. If the obtained sample mean is one standard error away from the hypothesized mean, then the hypothesized sampling distribution is plausible. By contrast, if the obtained sample mean is three standard errors away from the hypothesized population mean, then the hypothesized sampling distribution is not plausible. In fact, we might (and will) reject the hypothesized sampling distribution and we will conclude that the hypothesized population mean is unlikely to be true. Details follow, but that is the main idea. Propose and construct a sampling distribution. Collect data using a random sample. Compare the statistic calculated from the sample to the

Introduction to Inferential Statistics

Roach Approach Page 176

hypothesized parameter and its sampling distribution. Decide whether or not the hypothesized parameter and sampling distribution are plausible. The backwards-sounding logic of hypothesis testing Let’s say that I an instructor who has developed an approach to teaching statistics that I believe will produce better learning of course content. In the past, scores on a standardized (multiple-choice) final exam have averaged about 72%. In quantitative terms, I would like to demonstrate that the population mean (now that I have implemented the new approach) is greater than 75%. Moreover, if I am “confident” that the population mean is greater than 75%, I will publicize the new approach.

Formally stated, here is my “research” hypothesis: 72: RH

Now, I am going to formulate a “null” hypothesis consistent with the idea that my new

approach has not led to increased scores. Here it is, formally stated: 72: oH or

72: oH

Now, construct a hypothesized sampling distribution based on the null hypothesis. Collect data using a random sample. Compute the sample mean and the probability that that sample mean will be obtained if the sampling distribution based on the null hypothesis is true. If the probability is high, conclude that the sampling distribution and population mean specified in the null hypothesis is plausible [and, if that were the case, it would be unwise for me to make a lot of noise about how great the new approach is]. If the probability is low, conclude that the sampling distribution and population mean specified in the null hypothesis is implausible and, therefore, I can be confident that the population mean is greater than 72. Let me restate that in a slightly different way. I am confident that my research hypothesis is true because the null hypothesis is implausible in view of the sample results. The logic of hypothesis testing is summarized in Table One.



Table One - Statistical Reasoning: The

Method of Indirect “Proof”

oH RH

oH

Either Ho or Hr is true but both

cannot be true.

Ho is shown to be unlikely and is

therefore rejected.

Hence, conclude and be

confident that Hr is true.RH

The p-value Approach to Hypothesis Testing The p-value approach centers on calculating the probability that the observed statistic occurs when (or given that) the null hypothesis is true. If the probability (p-value) is low, then the null hypothesis is rejected. For now, we will consider probabilities less than 0.05 to be “low.” Recall the logic presented above:

If the average (population mean) is 75, then it is unlikely that a sample mean of 80 (or higher) will be observed.

The sample mean is 80.

So: It is unlikely that population mean is 75. Consider a normally distributed population of exam scores with a population mean of 10 and a population standard deviation of 10. Now suppose that things have changed (maybe the instructor has started to post videos to accompany classroom lectures) and we’d like to know if the population mean is now greater than 75. We collect some data. Specifically, we select a random sample of 25 exam scores. The sample mean is 80. Using ideas presented in the previous chapter (I told you the central limit theorem would come up again.), the probability of obtaining a sample mean > 80 is 0.0062 [You should verify that calculation.]. That’s less than 1%. If I were from England (or if I was a soccer coach who had been hanging out with some coaches from England for a couple of days), I’d call that “bloody unlikely.”



If the average (population mean) is 75, then it is unlikely that a sample mean of 80 (or higher) will be observed. It would be bloody unlikely. The p-value is 0.0062. By the way, I used to coach soccer (close to 20 years, everything from U6 girls to U19 boys) and had a lot of fun poking fun at some of my fellow coaches who frequently used phrases like “bloody unlikely” and my personal favorite (to make fun of), “Spot on.” If you’ve played any soccer, you know what I’m talking about.

The sample mean is 80.

So: It is unlikely that population mean is 75. Well, to keep up the language, which even I am getting bored with, it is bloody unlikely that population mean is 75.

The following table summarizes the steps of the p-value approach to hypothesis testing and rules of thumb relevant to each step. P-value

State Hypotheses

Put what you want to conclude in the research hypothesis. For the null hypothesis, write the complement of the research hypothesis. The null hypothesis must have an “=” sign.

Decision Rule

Reject Ho if p-value < alpha.

Compute

Use a z table (or statistical software/calculator) to find probability in tail(s) of the distribution.

Decide

Reject or FTR Ho.

Conclude Formulate a statement about the population parameter.

Other Considerations

1. Draw a picture of the sampling distribution that underlies the hypothesis testing procedure.

2. Is the sample size sufficiently large? 3. Can you think of any possible flaws in the way the data were collected?

Example 1a – The p-value approach Suppose we want to know if it is reasonable to believe that the average age of computers used in The Company is less than 30 months. We might take a random sample of 100 computers and record the age (number of months since computer was received) of each.

The research hypothesis is: 30: RH



In addition to specifying a research hypothesis, we will now specify a competing hypothesis.

The competing hypothesis is referred to as the null hypothesis and is designated as OH .

While the research hypothesis specifies a difference, the null hypothesis specifies equality. In

this case, the null hypothesis is 30: OH , sometimes written as 30: OH (to be

discussed in greater detail later in the chapter). The logic of hypothesis testing starts with the assumption (for the purpose of conducting the test) that the null hypothesis is true. Along with the sample size and an estimate of the population standard deviation, the hypothesized value of the population parameter is used to construct a sampling distribution for the corresponding statistic. For now, assume that the population standard deviation is 10. The mean and standard error for the sampling

distribution are 30 Ox and 1

100

10

nx

. With a sample size of 100 (100 >

40), the sampling distribution follows a normal distribution (even if the population is decidedly non-normal).

Now, suppose we do take the sample of 100 and the sample mean is 27. Given the

parameter specified in the null hypothesis ( 30 ), how often will we obtain this result?

The following formula should look familiar. The only difference between this formula and the formula used for sampling distribution calculations is that it uses the hypothesized

population mean, O , instead of x

.

So, if the population mean is 30, then the chance of obtaining a sample mean of 27 (or lower) is very low (0.0013). This probability is the p-value – the probability that a statistic

0013.031

3027

x

OxZ



this extreme (or more extreme) is observed, given that the null hypothesis is true. Now, given that we actually did get a sample mean of 27, the proposition that the population mean is 30 is not plausible. In view of that implausibility, the null hypothesis is rejected and we

conclude that our research hypothesis ( 30 ) is true (We can’t be certain, but we are

confident. Our logic is based on inductive, not deductive, reasoning). Note: Much (nearly all) of the remainder of this text follows the logic just presented. Example 2: A market researcher asks a random sample of 64 consumers to record the number of antacid tablets they take over a three month period. Based on past experience the researcher assumes that the population standard deviation is 12. Based on reports submitted by the sample of 64 consumers, the sample mean number of antacid tablets the consumers taken during the three-month period is 11.5. Is this enough evidence to conclude that the population mean is over ten? Solution for example 2:

10: RH [Put what you are trying to conclude or support in the research hypothesis.]

10: OH

Calculate the standard error, the z test statistic and corresponding p-value.

5.164

12

nx

1587.

1587.3413.5000.

3413.1

15.1

105.11

valuep

Areaz

xZ

x

O

Construct the sampling distribution and locate the area associated with the research hypothesis.



Based on the above output and calculations, a more complete sketch of the sampling distribution follows.

The sampling distribution for x when n = 64, given 10O and 12

x -4 -3 -2 -1 0 1 2 3 4 z

5.1

10

x

O

where 5.164

12

nx

Interpretation of z: The sample mean, 11.5, is 1 standard error above the hypothesized population mean.

Interpretation of p (p-value): If the population mean were truly 10, then there would be a .1587 (or15.87%) chance of obtaining a sample mean “this” far above 10 (n = 64). Overall Interpretation: The null hypothesis is plausible; i.e., the population mean may be 10.



Example 3 The historical average time customers spend in line at the Local Taco Palace drive-thru is 5 minutes. A new manager was hired three months ago. In a random sample of 64 drive-thru customers last week, the average time in line was 5.3 minutes. Is this enough evidence to conclude that the population mean waiting time differs from the historical average? Assume a population standard deviation of 1.6 minutes. Use a 0.05 level of significance. State Hypotheses

5:

5:

O

R

H

H

Decision Rule

Reject Ho if p-value < 0.05.

Compute

*****1336.00668.*2

0668.4332.5000.

4332.05.12.0

53.5

2.64

6.1

valuep

xZ

n

x

O

x

Decide

FTR Ho, p-value not < 0.05

Conclude The population mean time customers spend in line at the Local Taco Palace drive-thru could be equal to 5 minutes.

***** When the hypothesis test is two-sided, the p-value is equal to the probability that a value “this far away” in either direction. After all, you did not know or specify a direction. Before you collected the data, you did not know whether to expect a shortening or lengthening of the wait-time.



The sampling distribution:

Total = 0.05;

025.02

025.02

x -4 -3 -2 -1 0 1 2 3 4 z

2.0

5

x

O



The Critical Value Approach to Hypothesis Testing Inferential decisions made using the critical value approach involve deciding whether or not the distance between an observed sample statistics and a specified (hypothesized) value of a population parameter is large enough to rule out sampling error as an explanation of the difference. That distance is measured in terms of the number of standard errors between an observed sample statistic and a specified (hypothesized) value of the population parameter (i.e., z-value or z-test). The following table summarizes the steps of the critical value approach to hypothesis testing and rules of thumb relevant to each step.

State Hypotheses Put what you want to conclude in the research hypothesis. For the null hypothesis, write the complement of the research hypothesis. The null hypothesis must have an "=” sign.

Decision Rule

The critical value is a value that specifies how “far” a sample statistic must be from the hypothesized parameter before the null hypothesis is rejected. The

critical value is based on a pre-specified level of significance ( ). Common

critical values for z tests are presented in the table below. One-tail upper: Reject Ho if the test statistic is > critical value. One-tail lower: Reject Ho if the test statistic is <- critical value. Two-tail: Reject Ho if the test statistic is <- critical value or > critical value.

Compute errorstd

parameteredhypothesizstatisticsamplestatistictest

.

Decide

Reject or FTR Ho.

Conclude

Formulate a statement about the population parameter.

Common Critical Values for the Z test

Critical Value Level of Significance Number of tails

1.645 0.05 1

1.96 0.05 2

2.326 0.01 1

2.576 0.01 2

For two-tailed tests, the null hypothesis is rejected if the computed z test is either greater than the critical value or less than the negative critical value. For example: Reject Ho if z –test > 1.96 or <-1.96.



One-tailed tests, by contrast, are directional. If the research hypothesis has a “>” sign, the null hypothesis is rejected when the z test is greater than the critical value. For example: Reject Ho if Z test > 1.645. If the research hypothesis has a “<” sign, the null hypothesis is rejected when the z test is less than the negative of the critical value. For example: Reject Ho if Z test < -1.645. Example 3 Suppose we want to know if it is reasonable to believe that the average age of computers used in The Company is less than 30 months. In a random sample of 100 computers, the sample mean is 27. Assume that the population standard deviation is 10. Use a critical value of 0.05.

1. State the research and null hypotheses:

30:

30:

O

R

H

H

2. State a decision rule.

Reject Ho if Z-test is < - 1.645. (Note: Use a Z-table or the table of common values to find the “1.645. In subsequent weeks, we also use the bottom row of the t-table.).

3. Compute the test statistic.

31

3027

1100

10

x

O

x

xZ

n

4. State a statistical decision based on the decision rule and the computed test statistic.

Reject the null hypothesis (-3 is < -1.645).

5. State a conclusion:

We can be confident that that the average age of computers used in The Company is less than 30 months.



6. Draw pictures of sampling distribution to illustrate critical and computed values of the test statistic. Construct the sampling distribution. Show the critical and computed z-values on separate graphs.

Critical Value for the Test Statistic –


x -4 -3 -2 -1 0 1 2 3 4 z

1

30

x

O

Computed Value for the Test Statistic –


x -4 -3 -2 -1 0 1 2 3 4 z

1

30

x

O

1. Let 05.0 (shaded area). 2. Area from mean to CV = 0.45.

3. Critical Value for Decision Rule: Z = -1.645

4. Computed value of z = -3.

Hence, Reject Ho.

Rejection Region



7. Is the sample size large enough?

A sample size of 100 is almost always be large enough. Unless you have a reason to expect an observation where a computer is 25 years old, you are in good shape. Draw a box plot to check. Later in the course, other considerations regarding sample size will arise.

8. Can you think of any possible flaws in the way the data were collected?

One thing that pops into my head has to do with measurement. If you are calling people up and asking how old their computers are, I expect that they will probably exaggerate (Poor me, my computer is 3 years old.). Note – For this course, I am more interested in encouraging to ask questions than I am in the specific question(s) you ask. Inquiring minds …

Example 4 Sports Performance, Incorporated, is marketing a training method that the company claims can improve the average driving distance of the recreational golfers to over 300 yards (Most recreational golfers do well to achieve a driving distance of 200 yards.). An independent research firm randomly selects 25 recreational golfers and invites them to attend the training sessions for free (All 25 that were invited attend and complete the sessions.). After completing the training sessions, each of the 25 golfers go to the driving range. After a few warm-ups, each golfer drives on ball. The sample mean driving distance for the 25 golfers is 314. Assuming a standard deviation of 50 yards, is this enough evidence to conclude that golfers who complete the training program have an average driving distance that exceeds 300 yards? Use a critical value of 0.05.

1. State the research and null hypotheses:

300300:

300:

orH

H

O

R

2. State a decision rule.

Reject Ho if Z-test is > 1.645. Use the CV program. The example in the appendix matches this problem if you need to look there.



3. Compute the test statistic.

4.110

300314

1025

50

x

O

x

xZ

n

4. State a statistical decision based on the decision rule and the computed test statistic.

Fail to Reject (FTR) the null hypothesis (1.4 is not > 1.645).

5. State a conclusion:

We cannot be confident that golfers who complete the training program have an average driving distance that exceeds 300 yards. [That means the company should not advertise that golfers who complete the training program have an average driving distance that exceeds 300 yards.]

6. Draw pictures of sampling distribution to illustrate critical and computed values of

the test statistic. Construct the sampling distribution. Show the critical and computed z-values on separate graphs.

Critical Value for the Test Statistic –


x -4 -3 -2 -1 0 1 2 3 4 z

10

300

x

O

1. Let 05.0 (shaded area). 2. Area from mean to CV = 0.45.

3. Critical Value for Decision Rule: Z = 1.645

Rejection Region



Computed Value for the Test Statistic –


x -4 -3 -2 -1 0 1 2 3 4 z

10

300

x

O

Is the sample size large enough?

A sample size of 25 is enough if there are no outliers. If you have one golfer who drives the ball 110 yards after the training (or one that drives the ball 490 yards), then a larger sample size is needed. Draw a box plot to check.

9. Can you think of any possible flaws in the way the data were collected?

How far could these golfers hit the ball before they attended the training program? What other factors may have influenced driving distance (wind, type of golf ball, type of golf club, etc.). In subsequent chapters, we will examine ways to address these issues through research design.

4. Computed value of z = 1.40.

Hence, Reject Ho.



Example 5 - The historical average time customers spend in line at the Local Taco Palace drive-thru is 5 minutes. A new manager was hired last three months ago. In a random sample of 64 drive-thru customers last week, the average time in line was 5.3 minutes. Is this enough evidence to conclude that the population mean waiting time differs from the historical average? Assume a population standard deviation of 0.8 minutes. Use a 0.05 level of significance. State Hypotheses 5:

5:

O

R

H

H

Decision Rule

Reject Ho if Z-test > 1.96 or <-1.96 **** Use the CV program on your calculator.

Sampling Distribution

2.0

50

x

x

Compute

5.12.0

53.5

2.64

6.1

x

O

x

xZ

n

Decide

FTR Ho, the z test is neither > 1.96 nor <-1.96 FTR Ho, the p-value of 0.1336 is not < 0.05; If the population mean is 5, then

.1336.03.5 xp

Conclude The population mean time customers spend in line at the Local Taco Palace drive-thru could be equal to 5 minutes.

***** Hypotheses should be set before data is collected. Remember that before you collected the data, you did not know whether to expect a shortening or lengthening of the wait-time. Therefore, when the hypothesis test is two-sided, the alpha ( ) is split into two portions. One-half of the alpha is assigned to the upper tail and the other half is assigned to the lower tail. Hence, a 0.05 “total” level of significance translates to a z-value of 1.96.



A rendition of the hypothesized sampling distribution:

Total = 0.05;

96.196.1

025.02

025.02

CRITICALCRITICAL ZZ

x -4 -3 -2 -1 0 1 2 3 4 z

2.0

5

x

O

5.1

3.5

COMPUTEDZ

andx



Confidence Intervals The idea of a confidence interval is one that you encountered before you enrolled in this course. For example, in election years, pollsters predict likely winners. For example, a pollster might report that 53% of a random sample of likely voters plan to vote for a particular candidate with a margin of error of 4%. In an example like this, the pollster (or newscaster reporting the results) will note that the election results are a statistical tie because the results are within the margin of error (53% - 4% = 49%). Moreover, a good pollster will also note a level of confidence for the prediction (usually 95%). Developing the formulas To estimate a population parameter, begin by calculating the corresponding sample statistic.

For example, if you want to estimate the population mean , start by calculating the

sample mean x . Though the sample mean is the best single estimate of the population

mean, the mean for a particular sample is probably not exactly equal to the population mean. In fact, if select another sample, you will probably obtain a different sample mean. This is where the central limit theorem kicks in. We may not know the value of the population mean, but we do know that the sampling distribution is bell-shaped (if the sample size is large enough). We also know that (a) the mean of the sampling distribution is equal to the

mean of the population x

and (b) the standard error of the sampling distribution is

mathematically related to the standard deviation of the population

nx

. Based on

this knowledge, here is a graph of the sampling distribution.

x Now, what we would like to have is a range of estimates that is very likely to contain the true

population mean. Instead of just saying that x is our estimate of , we use the following

formula to create an interval estimate for .


Roach Approach 3

x Margin of Error

The next step is to compute the margin of error.

Margin of Error = (Confidence Coefficient) (Standard Error), where, for now, the confidence coefficient is a z-value.

For the population mean, this translates to .n

Zx

To finish out the calculations, all we have to do is decide what value to use for Z. For a 95% confidence interval, use Z = 1.96. If you do not have a calculator, use Z = 2 (empirical rule) to get a rough estimate. For a 90% confidence interval, use Z = 1.645. These and other values can be obtained using the z table or table of common z values. Here are the steps for a 95% confidence interval.

Example 6 Returning to the real world, we are really just going to obtain one sample and compute the mean of that sample. Then, we will calculate a margin of error for our estimate and use it to construct an interval that is highly likely to include the population mean. For example, suppose you are the VP for Sales for a company that manufactures and sells vacuum cleaners. Your company has a policy of targeting markets where you can be confident that the average (population mean) age of vacuum cleaners in use is over 10 years. In a random sample of 49 Little Rock residents, the average age of vacuum cleaners they own is 10.4 years. Given the rather large investment (advertising, etc.) required, should you target Little Rock at this time? On the face of it, you may be tempted to say that the sample mean of 10.4 is high enough to warrant the decision to target Little Rock. After all, 10.4 is greater than 10. However, remember that this is only one of many possible random samples of size 49 that might have been selected and that other random sample will have different sample means. Another sample might yield a sample mean of 8, another 11, another 7.4, etc. In other words, a decision that relies solely on the mean of a particular sample is tantamount to making a decision based on sampling error. You might as well flip a coin or use a magic eight ball (a toy sold to children). To overcome this dilemma, let’s assume that the population standard deviation is 4.2 (For now, we will just make an assumption regarding the standard deviation. In subsequent chapters, we will use methods that do not require such assumptions.). For this problem, that means

.6.049

2.4

nx



Further, let’s assume that there are no extreme values. That means the sample size is large enough to guarantee a bell-shaped sampling distribution.

x

6.0

??

x

x

Now, what we would like to have is a range of estimates that is very likely to contain the true

population mean. Instead of just saying that x is our estimate of , we use the following

formula to create an interval estimate for .

58.1122.9

18.14.10

)6)(.96.1(4.10

.

to

nZx

Recall that the company only wants to target markets where it can be confident that the average (population mean) age of vacuum cleaners in use is over 10 years. Given our data and calculations, the population mean could be as low as 9.22 years. The company cannot be confident that the population mean age is greater than 10. Therefore, they should forego the Little Rock market, at least for now. Summary Table for Confidence Intervals for the Population Mean Parameter to be

estimated Standard

Error Margin of Error

(m) Confidence Interval

n

m = Z n

.n

Zx



Review/Overview of Confidence Interval Process The following table summarizes the steps of the confidence interval approach to inferential statistics and rules of thumb relevant to each step.

Confidence Interval State Hypotheses Not necessarily an explicit hypothesis but can be used to

evaluate/test hypotheses Decision Rule

If the entire interval is consistent with a statement about the population parameter, then one can be confident that that statement is true.

Compute

).)(( errorstdccstat **

Decide

Compare entire interval to statement to be “tested” and make the appropriate decision.

Conclude

Formulate a statement about the population parameter.

As with hypothesis testing issues such as sample size and research design should be addressed. Example 7: In a random sample of 100 students, the sample mean score on the GMAT (test for graduate school in business) is 515. Based on experience, the population standard deviation is known to be 60.

a. Construct a 95% confidence interval for the population mean. b. Based on that interval, can we conclude that the population mean is greater than

500? Solution 7:

a.

76.52624.503

76.11515

)6)(96.1(515

.

6100

60

to

nZx

nx



b. Yes. The entire interval is greater than 500.

Example 8 In a random sample of 25 CEOs from large firms (over 20,000 employees), the average compensation last year was $18 million dollars. Assume that the population standard deviation is $2 million.

a. Construct a 95% confidence interval for the population mean compensation for CEOs of large companies..

b. Based on that interval, can we conclude that the population mean is greater than $15 million?

c. Do you think the sample size is large enough? Explain. Solution 8:

a.

78.1822.17

78.018

)4.0)(96.1(18

.

4.025

2

to

nZx

nx

b. Yes. The entire interval is greater than $15 million.

c. The answer to this question depends on whether or not outliers and/or

extreme values are likely. If, for example, the sample of CEOs includes a particularly well-compensated CEO (say $80 million), then the results and conclusion are suspect. Recommendation: Construct a box plot.

A more precise understanding of a confidence interval For a 95% confidence interval, if we were to sample repeatedly (maybe thousands of times), 95% of the intervals will include the population parameter. That also means that 5% of the intervals would not include the population parameter. When we calculate a confidence interval based on a particular sample, we do not know if it is one of the 95% or one of the 5%. You may want to think about this the same way you think about a weather forecast. Suppose there is a 95% chance of rain today. That means that, of 100 days like today (same conditions), rain will occur on about 95 of those days. At the same time, it means that it will not rain on about 5 of the 100 days. On days like today, sometimes you will get wet (95/100) and sometimes you will not (5/100). When you construct confidence intervals, sometimes your interval will include the true population parameter (95/100 if you construct



a 95% confidence interval) and sometimes your interval will not include the true population parameter (5/100). In class, I’ll demonstrate this with a simulation. Problems

1. An automobile insurance company wants to examine the average amount of accident claims. In a random sample of 36 claims, the sample mean is $508 and the sample standard deviation is $48. Can the company be confident that the population mean claim is not equal to $500?

a. Write the null and research hypotheses.

b. Clearly label the graph of the sampling distribution. Calculate the p-value and shade in the area that corresponds to the p-value.

c. What does the p-value mean in the context of this problem? Use 05.0 .

d. Use the critical value approach to test the hypothesis.

e. Construct an approximate 95% confidence interval (use empirical rule).

2. A light bulb company claims that the 60-watt light bulb it sells has an average life of 1000 hours with a standard deviation of 80 hours. Sixty-four new bulbs were allowed to burn out to test this claim. The average lifetime of these bulbs was found to be 970 hours. Does this indicate that the average life of a bulb is not 1000 hours?


b. Clearly label the graph of the sampling distribution and shade in the area that corresponds to the p-value.



e. Construct a 95% confidence interval. Is this enough evidence to conclude that the population mean is over 940?

3. A random sample of 9 individuals is selected and a sample mean score of 164 is computed. Is this enough evidence to conclude that the population mean score is greater than 150? Assume that scores are approximately normally distributed and that the population standard deviation is 15.







e. Construct a 95% confidence interval. Is this enough evidence to conclude that the population mean is less than 145?

4. A random sample of 25 new homes is selected and a sample mean size of 2520 is computed. Is this enough evidence to conclude that the population mean home size is over 2400 square feet? Assume that size of homes is approximately normally distributed and that the population standard deviation is 600 square feet.



a. What does the p-value mean in the context of this problem?

Use 05.0 .

b. Use the critical value approach to test the hypothesis.

c. Construct an approximate (empirical rule) 95% confidence interval for the population mean home size. Based on the confidence interval, can we be confident that the population mean is less than 2700 square feet? Explain.



APPENDIX Using invNorm on the TI Statistical Calculator to obtain critical values for the z distribution

I prefer using the CV program but here are the steps to take using the invNorm function that is built into your calculator. Select 2nd – VARS (to get to DISTR) and then invnorm (option 3 on a TI-83)“. Then enter the probability to the left of the critical value.

Examples:

If the research hypothesis is 75: RH :



Appendix. Obtaining and interpreting hypothesis tests using the STAT function on

a TI-83/84

1. To obtain the hypothesis testing screen, select STAT and then use the right-arrow key to highlight TESTS.

2. Then select option 1 (ZTest) to obtain the data entry screen. Make sure Stats is highlighted



3. Make sure Stats is highlighted and enter the data. Keep in mind that μ0 is the value in the null hypothesis and the next to last row should have the same sign as the research (or alternative) hypothesis.

4. Select “Draw” to obtain a graph of the sampling distribution.

Interpretation of z: The sample mean, 27, is 3 standard errors below the hypothesized population mean.

Interpretation of p (p-value): If the population mean were truly 30, then there would be a .0013 (or 0.13%) chance of obtaining a sample mean “this” far from 30 (< 27 or > 33 with a 2-tailed test and n = 100). Overall Interpretation: The null hypothesis is not plausible; i.e., the population mean is not equal to 30.



5. Repeat the first three steps and then select “Calculate” to obtain a numerical summary.



REVIEW OF TI-83/84 commands for the Z-test using the DISTR function and the CV program Calculating the p-value

After calculating the Z-test, use the Normalcdf. Example 1 (one-tailed upper). Suppose Hr is µ > 25 and you compute a z-test of

2.01.

p-value = .0222

Example 2 (one-tailed lower). Suppose Hr is µ < 25 and you compute a z-test of

-1.56.

The p-value is .0594.

Example 3 (two-tailed lower). Suppose Hr is µ ≠ 25 and you compute a z-test of

-1.78.

Recall that, for a two-tailed test, you have to double the area. The p-value is

.0751.



FAQ Before working “complete” problems, here are a few frequently asked questions and answers to those questions. As these are the issues that students often find troublesome, this section should be studied thoroughly. You should read this section a couple of times (at least) before proceeding to the rest of the chapter. Then, after you have read the rest of the chapter, you should reread this section again (and again and again, until you are confident in your understanding).

1. How do I translate research questions into testable hypotheses?

1. Put what you are trying to conclude in the research hypothesis. 2. The research hypothesis is always an inequality (>, <, or ). 3. Put the complement of the research hypothesis in the null hypothesis. 4. The null hypothesis always contains equality (=, <, >).

Try to translate the following research questions into testable hypotheses. You should note that you would have to use numbers in your hypotheses. However, to more closely mirror the actual research process, the questions will not include numbers. You can make up reasonable numbers. You may also have to develop your own operational measure. Examples: Is class size too large at the university?

Hr: The population mean number of students per class is over 35; 35

Ho: 3535 or

Is the traffic in on Main Street too congested?

Hr: By the time the light changes at the intersection of Main Street and Arkansas Avenue, more than 12 cars are in line to go through the light.

12

Ho: 1212 or

Has the reengineering program reduced the time it takes to respond to a customer complaint?

Hr: Following the reengineering program, the time it takes to respond to a

customer complaint is less than eight hours; 8



Ho: 88 or

Have national test scores on the SAT exam changed?

Hr: This year’s national test scores on the SAT exam differ from the

historical average of 1000; .1000

Ho: 1000

2. How do you know when to use a one-tailed test and when to use a two-tailed test?

Look at the research hypothesis. If it implies a direction (more than, greater than, less than, lower than, etc.), it is a one-tailed hypothesis. The goal is to determine whether or not you can be confident that the parameter (e.g., the population mean) differs from the value specified in the null hypothesis in the specified direction. By contrast, if the research hypothesis is non-directional (not equal to, differs, etc.), it is a two-tailed hypothesis. The goal is to determine whether or not you can be confident that the parameter (e.g., the population mean) differs from the value specified in the null hypothesis in either direction.

Examples (carried through from previous section)

Hr: The population mean number of students per class is over 35; 35

This is a one-tailed (upper tail) test. We want to know if the population mean is over 35. [Someone else, perhaps a group representing taxpayers, may be interested in the lower direction, but that is not the goal pursued in this study.]

Hr: By the time the light changes at the intersection of Main Street and Arkansas

Avenue, more than 12 cars are in line to go through the light. 12

This is a one-tailed (upper tail) test. We want to know if the population mean is over 12.

Hr: Following the reengineering program, the time it takes to respond to a customer

complaint is less than eight hours; 8

This is a one-tailed (lower tail) test. We want to know if the population mean is under eight.

Hr: This year’s national test scores on the SAT exam differ from the historical

average of 1000; .1000



This is a two-tailed test. We want to know if the population mean differs from 1000. In other words, we are interested in recognizing changes in either direction.

3. What does the p-value mean?

You should spend some time making sure you understand the p-value. It is not that hard if you put forth the effort, but it is easy to miss it if you don’t put forth the effort. Besides that, it is on the top 10 list of ideas you should retain from this class. IF THE NULL HYPOTHESIS IS TRUE, the p-value is the likelihood that a computed statistic this far (or further) from the hypothesized parameter will be observed. The p-value is NOT the probability that the computed statistic occurs. It IS the probability that the computed statistic occurs IF the null hypothesis is true. Remember, the hypothesis testing procedure, along with all associated probabilities, assumes the null hypothesis is true. Then, if we compute a low probability (p-value) for a statistic based on the sample we obtained, the null hypothesis (and its sampling distribution) are called into question. Examples:

a. Suppose we want to know if scores on an exam have increased. Historically, the average score has been 72. We know the population has no outliers and has a standard deviation of about ten. We randomly select 25 students and give them the test. The average for the 25 students is 74. Compute, graph, and interpret the p-value.

Calculate the standard error: 225

10

nx

Draw the sampling distribution and locate the statistic ( x in this case) on the sampling distribution. It is 74.

72 x

2x



Compute the z-value and probability (shaded in blue above) that the computed sample statistic (or one further away from the parameter specified in the null hypothesis) is obtained IF Ho is true.

1

2

72740

x

xtestz

Using the empirical rule, the p-value is about 0.16. Using other methods (calculator, z-table, etc.), the p-value can be more accurately computed as 0.158655 or .1587). Interpret: If the population mean test score is 72, then there is about a 16% chance that a sample mean of 74 or higher will be observed in a random sample of 25 students. Hence, Ho is plausible. Even if Ho is true, there is a 16% chance that the sample statistic (or one more extreme) occurs.

b. Suppose we want to know if scores on an exam have increased. Historically,

the average score has been 72. We know the population has no outliers and has a standard deviation of about ten. We randomly select 25 students and give them the test. The average for the 25 students is 78. Compute, graph, and interpret the p-value.


10

nx


72 x

2x

Compute the z-value and probability (shaded in blue above) that the computed sample statistic (or one further away from the parameter specified in the null hypothesis) is obtained IF Ho is true.



3

2

72780

x

xtestz

Using the empirical rule, the p-value is about 0.0015. Using other methods (calculator, z-table, etc.), the p-value can be more accurately computed as 0.001350). Interpret: If the population mean test score is 72, then there is about a 0.15% chance that a sample mean of 74 or higher will be observed in a random sample of 25 students. Hence, Ho is not plausible. If Ho is true, there is a mere 0.15% chance that the sample statistic (or one extreme) occurs. Because the observed statistic is extremely unlikely to occur if the null hypothesis is true, we reject the null hypothesis and conclude that the research hypothesis is true (or, more accurately, we can be confident that the research hypothesis is true).

c. Suppose we want to know if scores on an exam have changed. Historically, the

average score has been 72. We know the population has no outliers and has a standard deviation of about ten. We randomly select 25 students and give them the test. The average for the 25 students is 74. Compute, graph, and interpret the p-value.


10

nx


72 x

2x

You probably noticed that the graph shades in areas in both tails of the distribution. Here is why. The p-value is the likelihood that a statistic this far (or farther) away from the hypothesized parameter. In this problem, the research hypothesis was two-tailed. Our interest includes a change in either direction. Both values above 74 and



values below 70 are as far (2 points) or farther away from the hypothesized parameter. The p-value is the total area in both tails. Compute the z-value and probability (shaded in blue above) that the computed sample statistic (or one further away from the parameter specified in the null hypothesis) is obtained IF Ho is true. Upper tail:

1

2

72740

x

xtestz

Lower tail:

1

2

72700

x

xtestz

Using the empirical rule, the p-value is about 0.32. Using other methods (calculator, z-table, etc.), the p-value can be more accurately computed as 0.317355). Interpret: If the population mean test score is 72, then there is about a 32% chance that a sample mean of more than two points above or below the hypothesized mean will be observed in a random sample of 25 students. Hence, Ho is not plausible. If Ho is true, there is a 32% chance that the sample statistic (or one more extreme) occurs. Because the observed statistic is likely to occur if the null hypothesis is true, we fail to reject the null hypothesis and conclude that the null hypothesis may be true. We cannot be confident that the research hypothesis is true. Though the population mean may actually differ from 72, the difference we observed in our sample of 25 randomly selected students may simply reflect random error.

4. When should Ho be rejected? When is a p-value small enough to warrant the

rejection of Ho?

Answer #1: The answer to this question depends on the consequences associated with rejecting the null hypothesis (Ho). If the consequences associated with rejecting the null hypothesis are high, do no reject Ho unless the p-value is extremely low (say, less than 0.01 or 0.001 or even less if the consequences are catastrophic). For example, consider the following hypotheses: Hr: A sovereign nation has weapons that can be used to harm our country. Ho: A sovereign nation does not have weapons that can be used to harm our country.



The consequences associated with incorrectly rejecting Ho are catastrophic (loss of lives, economic cost, etc.). In as case such as this, one should be extremely confident that Ho is not true before Ho is rejected. The p-value should be quite low before Ho is rejected. You can decide a precise value yourself. I would probably want it lower than 0.001, at least. By contrast, if the consequences associated with rejecting the null hypothesis are low, you may comfortably reject Ho at a higher p-value [say, a cutoff (alpha or ) of less than 0.05 or 0.10). For example, consider the following hypotheses: Hr: The average distance that a paper airplane (of a particular design) flies is over 30 feet. Ho: The average distance that a paper airplane (of a particular design) flies is less than or equal to 30 feet. Even though I would like to always be right, the consequences of incorrectly rejecting Ho are not particularly harmful. I might be embarrassed, but I would get over it. Hence, a cutoff ( ) of 0.05 (or even 0.10) would be appropriate. If I reject Ho, I will most likely be correct, but even if Ho is true, there is a 5% chance (or 10%) that I would get results that would lead to a rejection of Ho. In this case, I can live with the 5 (or 10) percent chance of error. Answer # 2: Here are some guidelines.

Describing the p-value

If the p-value is less than 0.01, there is

overwhelming evidence that supports the

alternative hypothesis.

If the p-value is between 0.01 and 0.05, there is a

strong evidence that supports the alternative

hypothesis.

If the p-value is between 0.05 and 0.10 there is a

weak evidence that supports the alternative

hypothesis.

If the p-value exceeds 0.10 there is no evidence

that supports the alternative hypothesis.



Answer # 3: Unless other stated, we will use a cutoff of 0.05 in this class. There is no magic reason for using 0.05, but it will simplify things and the main idea is there. Here is why. It is easier that way and I really want to keep our focus on the process of hypothesis testing as much as possible.

5. What is the critical value approach? Basically, this just means a null hypothesis will

be rejected if the observed value of the statistic is more than “x” standard deviations away from the mean. For the z test and = 0.05, here are the cutoffs:

a. For an “upper” one tailed test, reject Ho if the z test statistic is greater than 1.645.

b. For a “lower” one tailed test, reject Ho if the z test statistic is less than -1.645.

c. For a two-tailed test, reject Ho if the z test statistic is greater than 1.645 or less than -1.645.

6. How should hypothesis testing errors be acknowledged? Can those errors be

quantified?

In the end, based on a comparison between the p-value and some pre-specified cutoff value, a decision is made to either reject or fail to reject the null hypothesis. In either case (reject or fail to reject), the decision is either correct or incorrect.

For example, you might correctly or incorrectly reject the null hypothesis. Restating the obvious, if the null hypothesis is correctly rejected and the research hypothesis is accepted, an error was not made. By contrast, if the null hypothesis is true but is incorrectly rejected, then an error has been made. Incorrectly rejecting the null hypothesis is referred to as a Type I error. The probability that the null hypothesis is incorrectly rejected is equal to the pre-specified cutoff value, which is often referred to as or the level of significance. The cutoff is frequently, but arbitrarily, set at either 0.01 or 0.05. If set to 0.05, there is, a priori (before the test is conducted), a 5% chance that a true null hypothesis will be rejected anyway. Remember, statistics can produce confidence but not certainty. Moreover, if Type I errors are costly, then the cutoff can be set at a lower level, perhaps 0.01 or even 0.001 (or lower). In that case, the null hypothesis will only be rejected when very low p-values are observed.

Alternatively, you might correctly or incorrectly fail to reject Ho. Restating the obvious, if the null hypothesis is correctly “accepted” (failed to reject), an error was not made. By contrast, if the null hypothesis is not true but is not rejected, then an error has been made. Incorrectly failing to reject the null hypothesis is referred to as a Type II error. The probability that a Type II error occurs is a function of the distance between the false null hypothesis and the true value of the parameter. As the true value remains unknown, the probability of a Type II error must be computed over a range of possible true parameter values and, as a result, takes on many values rather than a specific probability. The range of Type II probabilities is

known as the curve.

7. How large should a sample be?



a. Sampling Distribution Assumptions i. If the population is near normal, any sample size will do. ii. If the population is skewed (or is likely to be), use a sample size of at

least 15. iii. If the population has (or is likely to have) outliers, use a sample size

of at least 40. b. Confidence Intervals and Margin of Error: If you know the margin of error,

the confidence level, and if you have an estimate of the population standard deviation, then set the margin of error equal to the confidence coefficient times the standard error (population standard deviation divided by n). Then, solve for n. It will be a little different when estimating other parameters, but the logic is the same.

c. Hypothesis Testing and Statistical Power. Statistical power is the ability of a test to find an effect if it is there. I may add this to the text later, but for now, all I want you to know is that statistical power is an issue.



Example Problems

5. Is class size too large at the university? In nine classrooms, the average class size is 36. Is this enough evidence to conclude that the population mean number of students per class is over 30? Assume that the population is normally distributed with a population standard deviation of six. Use a 0.05 significance level and a 90% confidence interval.

Solution Research Question: Is the class size too large? Research Hypothesis:

The population mean number of students per class is over 30, or

30: RH

Null Hypothesis:

The population mean could be less than or equal to 30 (It is plausible.)

30: OH

Assumptions:

Normal Population Random Sample

Sampling Distribution:

x

2

3430

X

H x

Decision Rule:

Critical Value Approach: Reject Ho if ztest > 1.645. P-value Approach: Reject Ho if p-value < 0.05. Confidence interval approach: Is the entire confidence interval greater than 30.



Calculations and Decision:

Critical Value:

22

3034

x

HxZtest

Reject Ho. The sample statistic is more than 1.645 standard errors above the hypothesized parameter. p-value: The probability that a sample mean of 34 is observed (assuming Ho is true) is 0.025 (Empirical Rule). Reject Ho. The difference is unlikely to be due to sampling error alone. The null hypothesis is not plausible. Confidence Interval

Conclusion:

The population mean number of students per class is over 35.

For a 90% confidence interval, z = 1.645.

29.3771.30

29.334

)2()645.1(34

to

Zxx

The interval does not include 30. Reject Ho.

Action Implications:

Possible Actions include: i. Hire additional faculty. ii. Lower tuition (After all, the student to faculty ratio is “high.”). iii. Build larger classrooms. iv. Raise entrance requirements.

Critical Evaluation:



a. Who conducted this study? Was it someone (say faculty or students) with a vested interest in lower class sizes?

b. Is the assumption that the population is normal reasonable? Aren’t there some (maybe just a few) classes with extremely large class sizes? If so, a larger sample size (over 40) is needed.

c. When was class size “measured?” Was it in the first day or two of class or was it in the last couple of weeks (after some students have dropped)?

d. Were the selected classes at the freshman, sophomore, junior, senior, or graduate level? Its one thing to have 40 students in a freshman/sophomore survey course and quite another to have 40 students in a senior level course (where more writing, discussion, and application would be expected).

e. Is a class size greater than 30 undesirable?

f. Even if we can show that the class size is greater than 30, is it so far greater than 30 that real action is needed? [The lower bound of the 90% confidence interval was 30.71.]

Introduction to Inferential Statisticsbuad2053.weebly.com/uploads/6/6/2/0/66209151/inference... ·...

Documents

Transcript of Introduction to Inferential Statisticsbuad2053.weebly.com/uploads/6/6/2/0/66209151/inference... ·...