SAMPLING AND SAMPLING DISTRIBUTIONS

36
SAMPLING AND SAMPLING DISTRIBUTIONS

description

SAMPLING AND SAMPLING DISTRIBUTIONS. CONTENTS. STATISTICS IN PRACTICE:MEAD CORPORATION 7.1 THE ELECTRONICS ASSOCIATES SAMPLING PROBLEM 7.2 SIMPLE RANDOM SAMPLING Sampling from a Finite Population Sampling from an Infinite Population 7.3 POINT ESTIMATION - PowerPoint PPT Presentation

Transcript of SAMPLING AND SAMPLING DISTRIBUTIONS

Page 1: SAMPLING    AND  SAMPLING DISTRIBUTIONS

SAMPLING AND SAMPLING

DISTRIBUTIONS

Page 2: SAMPLING    AND  SAMPLING DISTRIBUTIONS

CONTENTS

STATISTICS IN PRACTICE:MEAD CORPORATION7.1 THE ELECTRONICS ASSOCIATES SAMPLING PROBLEM7.2 SIMPLE RANDOM SAMPLING Sampling from a Finite Population Sampling from an Infinite Population7.3 POINT ESTIMATION7.4 INTRODUCTION TO SAMPLING DISTRIBUTIONS7.5 SAMPLING DISTRIBUTION OF Expected Value of Standard Deviation of Central Limit Theorem Sampling Distribution of for the EAI Sampling Problem Practical Value of the Sampling Distribution of Relationship Between the Sample Size and the Sampling

Distribution of

Page 3: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.6 SAMPLING DISTRIBUTION OF Expected Value of Standard Deviation of Form of the Sampling Distribution of Practical Value of the Sampling Distribution of 7.7 PROPERTIES OF POINT ESTIMATORS Unbiasedness Efficiency Consistency7.8 OTHER SAMPLING METHODS Stratified Random Sampling Cluster Sampling Systematic Sampling Convenience Sampling Judgment Sampling

p

p

p

p

p

Page 4: SAMPLING    AND  SAMPLING DISTRIBUTIONS

WHY WE SHOULD USE SAMPLES

It is unpractical to observe all the elements of a population for the necessary data collection.

There are a lot of elements.It wastetoo much time and money for the data collection.It is not timely .

There is disruption in the examination

The population is too large to study all the elements

shell(炮弹 )、 lamp(灯泡 )、 brick(砖 )等

Reasons for using samples

Page 5: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.1 THE ELECTRONICS ASSOCIATES SAMPLING PROBLEM

The director of personnel for Electronics Associates, Inc. (EAI), has been assigned the task of developing a profile of the company’s 2500 managers. The characteristics to be identified include the mean annual salary for the managers and the proportion of managers having completed the company’s management training program.

Using the 2500 managers as the population for this study, we can find the annual salary and the training program status for each individual by referring to the firm’s personnel records. The data file containing this information for all 2500 managers in the population is on the disk at the back of the book.

Page 6: SAMPLING    AND  SAMPLING DISTRIBUTIONS

Using the formulas presented in Chapter 3 ,we can compute the population mean and the population standard deviation for the annual salary data.

Population mean: =$ 51,800 Population standard deviation: =$ 4000

Furthermore, the data for the training program status show that 1500 of the 2500 managers have completed the training program. Letting p denote the proportion of the population having completed the training program, we see that p = 1500/2500 = .60.

Now if the necessary information on all the EAI managers was not readily available in the company’s database. Suppose that a sample of 30managers will be used. Clearly, the time and the cost of developing a profile would be substantially less for 30 managers than for the entire population.

Page 7: SAMPLING    AND  SAMPLING DISTRIBUTIONS

If the personnel director could be assured that a sample of 30 managers would provide adequate information about the population of 2500 managers, working with a sample would be preferable to working with the entire population. Let us explore the possibility of using a sample for the EAI study by first considering how we can identify a sample of 30 managers.

Page 8: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.2 SIMPLE RANDOM SAMPLING

Several methods can be used to select a sample from a population; one of the most common is simple random sampling.

7.2.1 Sampling from a Finite Population Simple Random Sample (Finite Population)

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.

In implementing the simple random sample selection process, it is possible that a random number used previously may appear again in the table before the sample of 30 EAI managers has been selected. Because we do not want to select a manager more than one time,

Page 9: SAMPLING    AND  SAMPLING DISTRIBUTIONS

any previously used random numbers are ignored because the corresponding manager is already included in the sample. Selecting a sample in this manner is referred to as sampling without replacement.

If we had selected the sample such that previously used random numbers were acceptable and specific managers could be included in the sample two or more times, we would be sampling with replacement.

(When we refer to simple random sampling, we will assume that the sampling is without replacement.)

The number of different simple random samples of size n that can be selected from a finite population of size N is

)!(!

!

nNn

N

Page 10: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.2.2 Sampling from an Infinite Population Simple Random Sample (Infinite Population) A simple random sample from an infinite

population is a sample selected such that the following conditions are satisfied.

1.Each element selected comes from the same population.

2.Each element is selected independently. For example, populations consisting of all possible

parts to be manufactured, all possible customer visits, all possible bank transactions, and so on can be classified as infinite populations.

Page 11: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.3 POINT ESTIMATION Now, let us return to the EAI problem. Assume that a

simple random sample of 30 managers has been selected and that the corresponding data on annual salary and management training program participation are as shown in Table 7.2.

To estimate the value of a population parameter, we compute a corresponding characteristic of the sample, referred to as a sample statistic. For example, to estimate the population mean and the population standard deviation for the annual salary of EAI managers, we simply use the data in Table 7.2 to calculate the corresponding sample statistics: the sample mean and the sample standard deviation s. The sample mean is

= = = $51,814.00

x

x n

xi30

420,554,1

Page 12: SAMPLING    AND  SAMPLING DISTRIBUTIONS

And the sample standard deviation is s = = = $ 3347.721

)( 2

n

xxi

29

260,009,325

In addition, by computing the proportion of managers in the sample who responded Yes, we can estimate the proportion of managers in the population who have completed the management training program. Table 7.2 shows that 19 of the 30 managers in the sample have completed the training program. Thus, the sample proportion, denoted by ,is given by

= = .63

This value is used as an estimate of the population proportion .

By making the preceding computations, we have performed the statistical procedure called point estimation. We refer to as the point estimator of the population mean ,s as the point

p

p30

19

p

xu

Page 13: SAMPLING    AND  SAMPLING DISTRIBUTIONS

estimator of the population standard deviation ,and as the point estimator of the population proportion .The actual numerical value obtained for , ,or in a particular sample is called the point estimate of the parameter.

pp

x s p

Page 14: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.4 INTRODUCTION TO SAMPLING DISTRIBUTIONS

The probability distribution of any particular sample statistic is called the sampling distribution of the statistic.

Because the various possible values of and are the result of different simple random samples, the probability distribution of and is called the sampling distribution of and .

xx

p

pp

x

Page 15: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.5 SAMPLING DISTRIBUTION OF

xx

The sampling distribution of is the probability distribution of all possible values of the sample mean, .

THE STATISTICAL PROCESS OF USING A SAMPLE MEAN TO MAKE INFERENCES ABOUT A POPULATION MEAN

Population with

mean = ?

A simple random sample of elements

is selected from the population.

The sample data provide a value for

the sample mean .

The value of is used to make

inferences about the value of .

n

x

x

x

Page 16: SAMPLING    AND  SAMPLING DISTRIBUTIONS

x

7.5.1 Expected Value of x

xx

E ( ) =

Where

E( ) = the expected value of

= the population mean

This result shows that with simple random sampling, the expected value or mean for is equal to the mean of the population.

x

Page 17: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.5.2 Standard Deviation of

Let us define the standard deviation of the sampling distribution of .We will use the following notation.

= the standard deviation of the sampling distribution of

= the standard deviation of the population

= the sample size

=the population size

Standard Deviation of

Finite Population Infinite Population

nN

nNx

1 n

x

x

x

xx

nN

x

Page 18: SAMPLING    AND  SAMPLING DISTRIBUTIONS

We can see that the factor is required for the finite population case but nor for the infinite population case. This factor is commonly referred to as the finite population correction factor.

Use the Following Expression to Calculate the Standard Deviation of

Whenever

1.The population is infinite ;or

2.The population is finite and the sample size is less than or equal to 5% of the population size; that is, .

1 NnN

x

nx

05.Nn

Page 19: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.5.3 Central Limit Theorem The final step in identifying the characteristics of the sampling

distribution of is to determine the form of the probability distribution of .We consider two cases: one in which the population distribution is unknown and one in which the population distribution is known to be normally distributed.

When the population distribution is unknown, we rely on one of the most important theorems in statistics——the central limit theorem. A statement of the central limit theorem as it applies to the sampling distribution of follows.

Central Limit Theorem In selecting simple random samples of size from a

population, the sampling distribution of the sample mean can be approximated by a normal probability distribution as the sample size becomes large.

x

x

xx

n

Page 20: SAMPLING    AND  SAMPLING DISTRIBUTIONS

ILLUSTRATION OF THE CENTRAL LIMIT THEOREM FOR THREE POPULATIONS

30nIn summary, if we use a large simple random sample, the central limit theorem enables us to conclude that the sampling distribution of can be approximated by a normal probability distribution.

x

Page 21: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.5.4 Relationship Between the Sample Size and the Sampling Distribution of x

30n 100n A COMPARISON OF THE SAMPLING DISTRIBUTIONS OF FOR

SIMPLE RANDOM SAMPLES OF AND EAI MANAGERS

51,800 As the sample size is increased, the standard error of the

mean is decreased. As a result, the larger sample size will provide a higher probability that the sample mean is within a specified distance of the population mean.

x

100n

30n

400x

30.730x

With

With

Page 22: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.6 SAMPLING DITRIBUTION OF

p

The sampling distribution of is the probability distribution of all possible values of the sample proportion .

THE STATISTICAL PROCESS OF USING A SAMPLE PROPORTION TO MAKE INFERENCES ABOUT A POPULATION PROPORTION

Population with proportion

= ?

A simple random sample of elements

is selected from the population.

The sample data provide a value for the sample proportion .

The value of is used to make inferences

about the value of . p

pp

p

p

p

n

p

Page 23: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.6.1 Expected Value of

where = the expected value of = the population proportion7.6.2 Standard Deviation of Finite Population Infinite Population

We see that the only difference is the use of the finite

population correction factor . Use the Following Expression to Calculate the Standard

Deviation of

p

p

p

p

ppE

PEP

n

pp

N

nNP

1

1

n

ppp

1

n

ppp

1

1 NnN

Page 24: SAMPLING    AND  SAMPLING DISTRIBUTIONS

Whenever 1.The population is infinite ;or 2.The population is finite and the sample size is less than or equal to 5% of the population size; that is, .

7.6.3 Form of the Sampling Distribution of The sampling distribution of can be approximate by a

normal probability distribution whenever the sample size is large.

With , the sample size can be considered large whenever the following two conditions are satisfied.

05.Nn

p

p

p

5np

51 pn

Page 25: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.7 PROPERTIES OF POINT ESTIMATORS

The properties

of good

point estimators

unbiasedness

efficiency

consistency

Because several different sample statistics can be used as point estimators of different population parameters, we will use the following general notation in this section.

=the population parameter of interest

=the sample statistic or point estimator of

In general, represents any population parameter ; represents the corresponding sample statistic.

Page 26: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.7.1 Unbiasedness

If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to be an unbiased estimator of the population parameter.

Unbiasedness

The sample statistic is an unbiased estimator of the population parameter if

where

= the expected value of the sample statistic

Hence, the expected value, or mean, of all possible values of an unbiased sample statistic is equal to the population parameter being estimated.

ˆE

E

Page 27: SAMPLING    AND  SAMPLING DISTRIBUTIONS

EXAMPLES OF UNBIASED AND BIASED POINT ESTIMATORS

Sampling distribution of

Sampling distribution of

Parameter is located at the mean of the sampling distribution;

(a) Unbiased Estimator

Parameter is not located at the mean of the sampling distribution;

(b) Biased Estimator

Bias

E

ˆE ˆE

Page 28: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.7.2 Efficiency The point estimator with the smaller standard deviation is said to

have greater relative efficiency than the other.

1

SAMPLING DISTRIBUTIONS OF TWO UNBIASED PIONT ESTIMATORS

Parameter

Sampling distribution of

Sampling distribution of

Note that the standard deviation of is less than the standard deviation of ;thus, values of have a greater chance of being close to the parameter than do values of .because the standard deviation of point estimator is less than the standard deviation of point estimator , is relatively more efficient than and is the preferred point estimator.

1

1

11

2

2

2

2

2

Page 29: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.7.3 Consistency Loosely speaking ,a point estimator is consistent if the value

s of the point estimator tend to become closer to the population parameter as the sample size becomes larger. In other words, a large sample size tends to provide a better point estimate than a small sample size.

Note that for the sample mean ,we showed that the standard deviation of is given by .Because is related to the sample size such that larger sample sizes provide smaller values for ,we conclude that a larger sample size tends to provide point estimates closer to the population mean .In this sense, we can say that the sample mean is a consistent estimator of the population mean .Using a similar rationale , we can also conclude that the sample proportion is a consistent estimator of the population proportion .

xx

x

nx x

x

p

p

Page 30: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.8 OTHER SAMPLING METHODS

7.8.1 Stratified Random Sampling In stratified random sampling, the elements in the

population are first divided into groups called strata, such that each element in the population belongs to one and only one stratum. The basis for forming the strata, such as department, location, age, industry type, and so on, is at the discretion of the designer of the sample.

DIAGRAM FOR CLUSTER SAMPLING

Population

Stratum 1 Stratum 2 Stratum H

Page 31: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.8.2 Cluster Sampling

In cluster sampling, the elements in the population are first divided into separate groups called clusters. Each element of the population belongs to one and only one cluster.

DIAGRAM FOR CLUSTER SAMPLING

Population

Cluster 1 Cluster 2 Cluster K

Page 32: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.8.3 Systematic Sampling

An alternative to simple random sampling is systematic sampling.

For example, if a sample size of 50 is desired from a population containing 5000 elements, we will sample one element for every 5000/50=100 elements in the population. A systematic sample for this case involves selecting randomly one of the first 100 elements from the population list. Other sample elements are identified by starting with the first sampled element and then selecting every 100th element that follows in the population list. In effect, the sample of 50 is identified by moving systematically through the population and identifying every 100th element after the first randomly selected element.

Page 33: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.8.4 Convenience Sampling

Convenience sampling is a nonprobability sampling technique. As the name implies, the sample is identified primarily by convenience. Elements are included in the sample without prespecified or known probabilities of being selected.

For example, a professor conducting research at a university may use student volunteers to constitute a sample simply because they are readily available and will participate as subjects for little or no cost.

Convenience samples have the advantage of relatively easy sample selection and data collection; however, it is impossible to evaluate the “goodness” of the sample in terms of its representativeness of the population.

Page 34: SAMPLING    AND  SAMPLING DISTRIBUTIONS

7.8.5 Judgment Sampling

One additional nonprobability sampling technique is judgment sampling. In this approach, the person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. Often this method is a relatively easy way of selecting a sample.

For example, a reporter may sample two or three senators, judging that those senators reflect the general opinion of all senators. However, the quality of the sample results depends on the judgment of the person selecting the sample. Again, great caution is warranted in drawing conclusions based on judgment samples used to make inferences about populations.

Page 35: SAMPLING    AND  SAMPLING DISTRIBUTIONS

SUMMARY

GLOSSARYParameter, Simple random sampling, Sampling without

Replacement, Sampling with replacement, Sample statistic,

Point estimate, Point estimator, Sampling error, Sampling

distribution, Finite population correction factor, Standard

error, Central limit theorem, Unbiasedness, Relative efficiency,

Consistency, Stratified random sampling, Cluster sampling,

Systematic sampling, Convenience sampling .

Page 36: SAMPLING    AND  SAMPLING DISTRIBUTIONS

KEY FORMULASExpected Value of

Standard Deviation of

Finite Population Infinite Population

Expected Value of

Standard Deviation of

Finite Population Infinite Population

x

x

p

p

xE

ppE

n

pp

N

nNP

1

1

n

ppp

1

nN

nNx

1 n

x