Class Handout #3 (Sections 1.8, 1.9) Definitions z AREA the z-score above which lies an area under...

Class Handout #3 (Sections 1.8, 1.9)Definitions

zAREA the z-score above which lies an area under the normal curve equal to the subscript AREA

1. Find each of the z-scores listed by using Table 2 of the Statistical Tables.

1.645 1.960

2.576 2.326

z0.05 = z0.025 =

z0.005 = z0.01 =

0.05 0.025

0.005 0.01

3. For each of the normal probability distribution curves, find the indicated areas under the curve.

z0.05 = + 1.645– z0.05 = – 1.645

area = ______ area = ______area = ______0.05 0.90 0.05

z0.025 = + 1.960– z0.025 = – 1.960

area = ______ area = ______area = ______0.025 0.95 0.025

2. When a measurement is randomly selected from a population having a normal distribution, what is the probability that the z-score for this measurement will be less than z0.10?

1 – 0.10 = 0.90

z0.005 = + 2.576– z0.005 = – 2.576

area = ______ area = ______area = ______

z / 2– z / 2

area = ______ area = ______area = ______

0.0050.99

0.005

/2 1 – /2

z0.01 = + 2.326– z0.01 = – 2.326

area = ______ area = ______area = ______0.01 0.98 0.01

Class Handout #3Definitions

point estimation the use of the value of a statistic to estimate a parameter

Examples are (1)

(2)

using x to estimate ,

using s to estimate .

interval estimation the use of an interval of values (often based on a statistic and a standard error) to estimate a parameter

confidence interval

an interval estimate together with a corresponding probability; the probability represents the chance that the interval actually contains the parameter being estimated and is called the confidence level or confidence coefficient.

The most commonly chosen confidence levels are 90%, 95%, and 99%.

If the mean of the sampling distribution for a statistic is equal to the parameter being estimated, then the statistic is called unbiased; otherwise the statistic is called biased.

With random sampling, x is an unbiased estimator of .

zAREA the z-score above which lies an area under the normal curve equal to the subscript AREA

role of the Central Limit Theorem in finding a confidence interval for The Central Limit Theorem tells us that when a random sample of size n is taken from a population that has a normal distribution with mean and standard deviation , then the sampling distribution of x has a normal distribution with mean

and standard deviation

There is a 95% chance that x will be within 2 (more precisely, ) standard errors of .

called the standard error of estimate or standard error of the mean

1.960

There is a (1 – )100% chance that x will be within standard errors of .

That is, we can be (1 – )100% confident that population mean will lie between

z/2

x – z/2 and —— n

x + z/2

—— . n

Not knowing the value for , we estimate the standard error with s—— . n

. This is even true when the population does not

have a normal distribution, as long as

—— n

the sample size n is sufficiently large.

With this estimated standard error of the mean, we must use a t distribution in place of a normal or z distribution.

Student’s t distribution a distribution based on sample standard deviation s similar to the way the standard normal distribution is based on population standard deviation

The t distributions (1) (2) (3)

depend on degrees of freedom df (= n –1 for the one sample t test statistic);are symmetric and bell-shaped but flatter than the standard normal distribution;become more like the standard normal distribution as df increase.

Table 3 of the Statistical Tables displays values from various t distributions.

The concept of “degrees of freedom” is not easy to explain completely, but one intuitive explanation is to think of “degrees of freedom” as representing the “number of pieces of data observed” minus “the number of parameters being estimated”. When using a confidence interval to estimate a population mean , we observed n pieces of data (i.e., the n measurements in the selected random sample), and we are estimating 1 parameter (i.e., the population mean ).

tAREA the t-score above which lies an area under a t curve equal to the subscript AREA; if the corresponding degrees of freedom (df) is not clear, the t-score can be represented as tdf ; AREA

3.078 1.860

12.706 4.303

3.182 2.131

2.042 1.960= z0.025

4. Use Table 3 of the Statistical Tables to obtain each of the following:

t distribution with df = 1

t0.10 =


t0.05 =

t0.025 =



t0.025 =

t0.025 = t0.025 =




t distribution with df =

t0.025 = t0.025 =

The t-scores in the row are exactly the same as z-scores.

We can be (1 – )100% confident thatthe population mean is between x – t/2 and

s—— n

x + t/2

s—— . n

confidence interval for

tAREA the t-score above which lies an area under a t curve equal to the subscript AREA; if the corresponding degrees of freedom (df) is not clear, the t-score can be represented as tdf ; AREA

For a given sample size, increasing the confidence level _______________

the confidence interval length, and decreasing the confidence level _______________

the confidence interval length.

For a given confidence level, increasing the sample size _______________

the confidence interval length, and decreasing the sample size _______________

the confidence interval length.

increases

decreases

tends to increase

tends to decrease

5.

(a)

Forbes magazine published data on the best small firms in 1993. (Forbes, November 8, 1993, "America's Best Small Companies,"); these were firms with annual sales of more than $5 million and less than $350 million. The ages (in years) of the chief executive officer (CEO) for the first 20 firms listed are as follows:

53 43 33 45 46 55 41 55 36 4555 50 49 47 69 51 48 62 45 37

(This data is stored in the worksheet CEO_Data of the Excel file M214_Data.)

Treating these 20 ages as a random sample of ages from the population of ages of chief executive officers for small companies, find a 95% confidence interval for the mean age of chief executive officers for small companies.

n = x = s =

These statistics can all be verified by using the Excel spreadsheet named Summary_Statistics,

20 48.25 8.6382

df =

t0.025 =

19

2.093

1 – =

— = 2

— = 2

0.95

0.0250.025

48.25 – (2.093)(8.6382/20) , 48.25 + (2.093)(8.6382/20)

We can be 95% confident that the mean age of chief executive officers for small companies is between 44.207 and 52.293 years.

44.207 , 52.293

What must we assume in order for the confidence in part (a) to be appropriate? (b)

(c)

(d)

We assume that either the ages are normally distributed, at least approximately, or the sample size 20 is sufficiently large so that the sampling distribution of y is approximately normal.

How would the confidence interval in part (a) have been different if a 90% confidence level were chosen?

How would the confidence interval in part (a) have been different if a 99% confidence level were chosen?

The 90% confidence interval would have shorter length than the 95% confidence interval in part (a).

The 99% confidence interval would have longer length than the 95% confidence interval in part (a).

5.-continued

(e)

(f)

How would the confidence interval in part (a) have been different if the sample size were 40 instead of 20?

If we are willing to assume that the ages are normally distributed (at least approximately), how could we estimate an interval between which lie 95% of the ages of chief executive officers for small companies?

A 95% confidence interval based on a sample size of 40 would tend to have shorter length than a 95% confidence interval based on a sample size of 20.

We know that about 95% of the ages are within 2 (or more precisely 1.96) standard deviations of the mean. If we estimate the population mean and standard deviation with the sample mean and standard deviation, then we estimate that 95% of the ages of chief executive officers for small companies lie between

48.25 – (2)(8.6382) and 48.25 + (2)(8.6382) , that is, 30.974 and 65.526 years.

(This type of interval can be called a prediction interval; notice how much wider this interval is than the confidence interval in part (a).)

After considering how to estimate a mean with a confidence interval, we now consider how to perform a hypothesis test about a mean.

A hypothesis test is used when we have some hypothesized value for the mean prior to any data collection.

Return to the definitions in Class Handout #3:

hypothesis testing an inferential statistical analysis used to decide which of two competing hypotheses should be believed (analogous to a court trial)

null hypothesis (H0) a statement assumed to be true at the outset of a hypothesis test; often, a statement that a parameter is equal to a specific hypothesized value (comparable to “innocence” in a court trial)

alternative (research) hypothesis (H1) a statement for which sufficient evidence is required before it will be believed; often, a statement that the parameter is not equal to the hypothesized value (comparable to “guilt” in a court trial)

Confidence intervals are a method of inferential statistics used when no hypothesized value about a parameter to be estimated exists prior any data analysis; however, when such a hypothesized value exists, hypothesis testing is a popular method of inferential statistics to decide if a statistically significant difference exists. (A hypothesis test can also tell us whether or not a relationship is statistically significant.)

one-sided hypothesis test

two-sided hypothesis testNow let us go to Class Exercise 6(a).

6.

(a)

(b)

(c)

It is believed that the mean right hand grip strength of men between 20 and 40 years of age in the USA is 86.3 lbs. It is now of interest to perform a hypothesis test concerning the mean grip strength of men between 20 and 40 years of age in the country of Techavia.

If we are looking for evidence that the mean grip strength in Techavia is different from 86.3 lbs., state the null and alternative hypotheses for the hypothesis test.

H0:

H1:

= 86.3 (The mean grip strength is 86.3 lbs.) 86.3 (The mean grip strength is different from 86.3 lbs.)

Is the hypothesis test one-sided or two-sided?

Describe what it would mean to make a Type I error in this hypothesis test and what it would mean to make a Type II error in this hypothesis test.

Now look at the definitions for one-sided and two-sided tests.

hypothesis testing an inferential statistical analysis used to decide which of two competing hypotheses should be believed (analogous to a court trial)

null hypothesis (H0) a statement assumed to be true at the outset of a hypothesis test; often, a statement that a parameter is equal to a specific hypothesized value (comparable to “innocence” in a court trial)

alternative (research) hypothesis (H1) a statement for which sufficient evidence is required before it will be believed; often, a statement that the parameter is not equal to the hypothesized value (comparable to “guilt” in a court trial)

Confidence intervals are a method of inferential statistics used when no hypothesized value about a parameter to be estimated exists prior any data analysis; however, when such a hypothesized value exists, hypothesis testing is a popular method of inferential statistics to decide if a statistically significant difference exists. (A hypothesis test can also tell us whether or not a relationship is statistically significant.)

one-sided hypothesis test a test designed to identify a difference from a hypothesized value in only one direction

two-sided hypothesis test a test designed to identify a difference from a hypothesized value in either direction

Even though hypothesis tests may be one-sided or two-sided, confidence intervals are generally two-sided (except for rare occasions).

6.

(a)

(b)

(c)



H0:

H1:



Since we are looking for evidence that the population mean is different from the hypothesized value 86.3 in either direction, then the test is two-sided


Now look at the definitions for Type I and Type II error.

Type I error believing H1 (the alternative hypothesis) when in reality H0 (the null hypothesis) is true (in a court trial, saying that the defendant is guilty when the defendant is really innocent)

Type II error believing H0 (the null hypothesis) when in reality H1 (the alternative hypothesis) is true (in a court trial, saying that the defendant is innocent when the defendant is really guilty)

significance level ()

rejection (critical) region

p-value (probability value)

test statistic Now let us go to Class Exercise 6(c).

6.

(a)

(b)

(c)



H0:

H1:



Since we are looking for evidence that the population mean is different from the hypothesized value 86.3 in either direction, then the test is two-sided


Making a Type I error means the mean grip strength is actually 86.3 lbs., but we mistakenly conclude that it is different from 86.3 lbs.

Making a Type II error means the mean grip strength is actually different from 86.3 lbs., but we mistakenly conclude that it is equal to 86.3 lbs.



significance level ()

rejection (critical) region


test statistic a statistic which is used to decide whether to believe H0 or to believe H1

It is the test statistic which provides us with evidence to make our decision in a hypothesis test.

Now let us go to Class Exercise 6(d).

(d) Suppose we plan to measure each right hand grip strength in a random sample of 16 men from Techavia. If we assume that either the grip strengths are normally distributed or the sample size 16 is sufficiently large so that the sampling distribution of x is approximately normal, what test statistic would be appropriate for us to use to decide whether to believe H0 or to believe H1?

x – 86.3

s—––16

If H0 were true, then would be the t-score for x , where df =

and we expect this t-score to be within the bounds of random variation.

15 ,

If H0 were not true, then we would expect the t-score to be outside the bounds of random variation.

Consequently, we can use this t-score as a test statistic to decide whether to believe H0 or to believe H1, but we need to choose specific bounds for what should be considered random variation.



significance level () the highest probability of making a Type I error that we are willing to tolerate, commonly chosen to be 0.10, 0.05, or 0.01

With a given sample size n, the probability of making a Type II error increases as we decrease (the probability of making a Type I error).

rejection (critical) region a set of test statistic values which lead to rejecting H0 in favor of H1


test statistic a statistic which is used to decide whether to believe H0 or to believe H1

(e) Find the rejection region for the hypothesis test if

(i) a 0.05 significance level were chosen.

(ii) a 0.01 significance level were chosen.

1 – =

— = 2

— = 2

0.95

0.0250.025 = 0.05t distribution with df = 15

2.131t0.025 =–2.131– t0.025 =

The rejection region is defined to be all test statistic values t > 2.131 or t < –2.131 .

1 – =

— = 2

— = 2

0.99

0.0050.005 = 0.01t distribution with df = 15

2.947t0.005 =–2.947– t0.005 =

The rejection region is defined to be all test statistic values t > 2.947 or t < –2.947 .

6.-continued

(f) Suppose we actually measure each right hand grip strength in a random sample of 16 men from Techavia, and we find that x = 91.0 lbs. and s = 7.8 lbs. Find the test statistic value, and find the p-value for the hypothesis test.

The observed test statistic value is t (or t15) = = =x – 86.3

s—––16

91.0 – 86.3

7.8—––16

2.410

Note that this observed test statistic

provides us with sufficient evidence against the H0 (that is, t = 2.410 is in the rejection region) with = 0.05.

does not provide us with sufficient evidence against the H0 (that is, t = 2.410 is in the rejection region) with = 0.01.

Next class, we shall define and calculate the p-value.

Before submitting Homework #3, check some of the answers (if you haven’t done so already) from the link on the course schedule:

http://srv2.lycoming.edu/~sprgene/M214/Schedule214.htm

http://srv2.lycoming.edu/~sprgene/M214/Schedule214.htm

Class Handout #3 (Sections 1.8, 1.9) Definitions z AREA the z-score above which lies an area under...

Documents

Transcript of Class Handout #3 (Sections 1.8, 1.9) Definitions z AREA the z-score above which lies an area under...