3_Sampling, Sampling Bias the Central Limit Theorem

38
SAMPLING, SAMPLING BIAS AND THE CENTRAL LIMIT THEOREM

description

SAMPLING

Transcript of 3_Sampling, Sampling Bias the Central Limit Theorem

Page 1: 3_Sampling, Sampling Bias the Central Limit Theorem

SAMPLING, SAMPLING BIAS AND THE CENTRAL LIMIT THEOREM

Page 2: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Bias - A Cautionary TaleDuring World War II, the Royal Air Force asked Abraham Wald, a statistician, to help decide where armour should be added to the UK's bombers. The RAF gave Wald information about which parts of its planes were typically hit.

Armour is heavy and can only be added to certain parts of the planes - but to which parts?

What should the statistician’s response to this problem be?

Section of Plane Bullet Holes per Square foot

Engine 1.11

Fuselage 1.73

Fuel System 1.55

Rest of the Plane 1.8

Page 3: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Bias - A Cautionary Tale

• Wald's response was simple, brilliant, and surprising: Armour the spots that hadn't been hit by German fire.

Why?

Page 4: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Bias - A Cautionary Tale• This seems backward at first, but Wald realized his data came from

bombers that survived. That is, the British were only able to analyze the bombers that returned to England; those that were shot down over enemy territory were not part of their sample. These bombers’ wounds showed where they could afford to be hit. Said another way, the undamaged areas on the survivors showed where the lost planes must have been hit because the planes hit in those areas did not return from their missions.

• Wald assumed that the bullets were fired randomly, that no one could accurately aim for a particular part of the bomber. Instead they aimed in the general direction of the plane and sometimes got lucky. So, for example, if Wald saw that more bombers in his sample had bullet holes in the middle of the wings, he did not conclude that Nazis liked to aim for the middle of wings. He assumed that there must have been about as many bombers with bullet holes in every other part of the plane but that those with holes elsewhere were not part of his sample because they had been shot down.

Page 5: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Bias• A sampling method is called biased if it systematically

favours some outcomes over others. • Bias can be intentional, but often it is not. • Representativeness is more important than randomness

Page 6: 3_Sampling, Sampling Bias the Central Limit Theorem

Examples of Sampling Biaseven though there is some randomness in the selection

• Selection from a specific real area. For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home-schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an overrepresentation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness. This may be an extreme form of biased sampling, because certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected).

• Self-selection bias, which is possible whenever the group of people being studied has any form of control over whether to participate. Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who do not.

Page 7: 3_Sampling, Sampling Bias the Central Limit Theorem

Examples of Sampling Biaseven though there is some randomness in the selection

• Pre-screening of trial participants, or advertising for volunteers within particular groups. For example a study to "prove" that smoking does not affect fitness might recruit at the local fitness center, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.

• Exclusion bias results from exclusion of particular groups from the sample, e.g. exclusion of subjects who have recently migrated into the study area (this may occur when newcomers are not available in a register used to identify the source population). Excluding subjects who move out of the study area during follow-up is rather equivalent of dropout or nonresponse, a selection bias in that it rather affects the internal validity of the study.

Page 8: 3_Sampling, Sampling Bias the Central Limit Theorem

Why Sample the Population?1. To contact the whole population would be time-

consuming.

2. The cost of studying all the items in a population may be prohibitive.

3. The physical impossibility of checking all items in the population.

4. The destructive nature of some tests.

5. The sample results are adequate.

8-8

Page 9: 3_Sampling, Sampling Bias the Central Limit Theorem

Most Commonly Used Probability Sampling Methods

• Simple Random Sample • Systematic Random Sampling

• Stratified Random Sampling

• Cluster Sampling

8-9

Page 10: 3_Sampling, Sampling Bias the Central Limit Theorem

Simple Random Sample

Simple Random Sample: A sample selected so that each item or person in the population has the same chance of being included.

8-10

Page 11: 3_Sampling, Sampling Bias the Central Limit Theorem

Simple Random Sample: Using Tables of Random Numbers

A population consists of 845 employees of Nitra Industries. A sample of 52 employees is to be selected from that population.

A more convenient method of selecting a random sample is to use the identification number of each employee and a table of random numbers

8-11

Page 12: 3_Sampling, Sampling Bias the Central Limit Theorem

Simple Random Sample: Using Excel

Jane and Joe Miley operate the Foxtrot Inn, a bed and breakfast in Tryon, North Carolina. There are eight rooms available for rent at this B&B. Listed below is the number of these eight rooms rented each day during June 2008. Use Excel to select a sample of five nights during the month of June.

8-12

Page 13: 3_Sampling, Sampling Bias the Central Limit Theorem

Systematic Random Sampling

EXAMPLEA population consists of 845

employees of Nitra Industries. A sample of 52 employees is to be selected from that population.

Use the systematic random sampling to select the samples.

Systematic Random Sampling: The items or individuals of the population are arranged in some order. A random starting point is selected and then every kth member of the population is selected for the sample.

-13

Page 14: 3_Sampling, Sampling Bias the Central Limit Theorem

Systematic Random Sampling

Given: N=845, n=52Step 1: Calculate k

k = N/n = 845/52 (round down)k = 16

Step 2: Use simple random sampling to select the first sample.

Step 3: Select every 16th element on the list after Step 2

8-14

Page 15: 3_Sampling, Sampling Bias the Central Limit Theorem

Stratified Random Sampling

Stratified Random Sampling: A population is first divided into subgroups, called strata, and a sample is selected from each stratum. Useful when a population can be clearly divided in groups based on some characteristics.

8-15

Page 16: 3_Sampling, Sampling Bias the Central Limit Theorem

Cluster Sampling

Cluster Sampling: A population is divided into clusters using naturally occurring geographic or other boundaries. Then, clusters are randomly selected and a sample is collected by randomly selecting from each cluster.

8-16

Page 17: 3_Sampling, Sampling Bias the Central Limit Theorem

Cluster Sampling

Suppose you want to determine the views of residents in Oregon about state and federal environmental protection policies.

Step 1: Subdivide the state into small units—either counties or regions, Step 2: Select at random, say 4 regions, then Step 3: Take samples of the residents in each of these regions and interview them.

8-17

Page 18: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Error

The sampling error is the difference between a sample statistic and its corresponding population parameter.

:

8-18

Page 19: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean is a probability distribution consisting of all possible sample means of a given sample size selected from a population.

8-19

Page 20: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

Tartus Industries has seven production employees (considered the population). The hourly earnings of each employee are given in the table below.

1. What is the population mean?

2. What is the sampling distribution of the sample mean for samples of size 2?

3. What is the mean of the sampling distribution?

4. What observations can be made about the population and the sampling distribution?

8-20

Page 21: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

8-21

Page 22: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

8-22

Page 23: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

8-23

Page 24: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

8-24

Page 25: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Distribution of the Sample Means – Example

These observations can be made:A. The mean of the distribution of the sample mean ($7.71) is equal to the mean of the population.B. The spread in the distribution of the sample mean is less than the spread in the population values. C. As the size of the sample is increased, the spread of the distribution of the sample mean becomes smaller.D. The shape of the sampling distribution of the sample mean and the shape of the frequency distribution of the population values are different. The distribution of the sample mean tends to be more bell-shaped and to approximate the normal probability distribution.

8-25

Page 26: 3_Sampling, Sampling Bias the Central Limit Theorem

Central Limit Theorem

• For any sample size, the sampling distribution of the sample mean will also be normal if the population follows a normal probability distribution.

• If the population distribution is symmetrical (but not normal), the normal shape of the distribution of the sample mean emerges with samples as small as 10.

• If a distribution is skewed or has thick tails, it may require samples of 30 or more to observe the normality feature.

• The mean of the sampling distribution is equal to μ and the variance is equal to σ2/n.

CENTRAL LIMIT THEOREM If all samples of a particular size are selected from any population, the sampling distribution of the sample mean is approximately a normal distribution. This approximation improves with larger samples.

8-26

Page 27: 3_Sampling, Sampling Bias the Central Limit Theorem

Sampling Methods and the Central Limit Theorem

8-27

Page 28: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Sampling Distribution of the Sample Mean (Sigma Known) – Example

Step 2: Find the probability of observing a z equal to or greater than 1.80.

8-28

Page 29: 3_Sampling, Sampling Bias the Central Limit Theorem

Central Limit Theorem – Example

Spence Sprockets, Inc. employs 40 people and faces some major decisions regarding health care for these employees. Before making a final decision on what health care plan to purchase, Ed decides to form a committee of five representative employees. The committee will be asked to study the health care issue carefully and make a recommendation as to what plan best fits the employees’ needs. Ed feels the views of newer employees toward health care may differ from those of more experienced employees. If Ed randomly selects this committee, what can he expect in terms of the mean years with Spence Sprockets for those on the committee? How does the shape of the distribution of years of experience of all employees (the population) compare with the shape of the sampling distribution of the mean? The lengths of service (rounded to the nearest year) of the 40 employees currently on the Spence Sprockets, Inc., payroll are as follows.

8-29

Page 30: 3_Sampling, Sampling Bias the Central Limit Theorem

25 Samples of 20 Employees25 Samples of Five Employees

Central Limit Theorem – Example

8-30

Page 31: 3_Sampling, Sampling Bias the Central Limit Theorem

Standard Error of the Mean

1. The mean of the distribution of sample means will be exactly equal to the population mean if we are able to select all possible samples of the same size from a given population.

2. There will be less dispersion in the sampling distribution of the sample mean than in the population. As the sample size increases, the standard error of the mean decreases

8-31

Page 32: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the SamplingDistribution of the Sample Mean (Sigma Known)

• If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution.

• If the shape is known to be non-normal, but the sample contains at least 30 observations, the central limit theorem guarantees the sampling distribution of the mean follows a normal distribution.

• To determine the probability a sample mean falls within a particular region, use:

n

Xz

8-32

Page 33: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Sampling Distribution of the Sample Mean (Sigma Unknown)

• If the population does not follow the normal distribution, but the sample is of at least 30 observations, the distribution of the sample means will follow the normal distribution.

• To determine the probability a sample mean falls within a particular region, use:

ns

Xt

8-33

Page 34: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Sampling Distribution of the Sample Mean (Sigma Known) – Example

The Quality Assurance Department for Cola, Inc., maintains records regarding the amount of cola in its Jumbo bottle. The actual amount of cola in each bottle is critical, but varies a small amount from one bottle to the next. Cola, Inc., does not wish to underfill the bottles. On the other hand, it cannot overfill each bottle. Its records indicate that the amount of cola follows the normal probability distribution. The mean amount per bottle is 31.2 ounces and the population standard deviation is 0.4 ounces.

At 8 A.M. today the quality technician randomly selected 16 bottles from the filling line. The mean amount of cola contained in the bottles is 31.38 ounces.

Is this an unlikely result? Is it likely the process is putting too much soda in the bottles? To put it another way, is the sampling error of 0.18 ounces unusual?

8-34

Page 35: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Sampling Distribution of the Sample Mean(Sigma Known) – Example

Step 1: Find the z value corresponding to the sample mean of 31.38.

80.1164.0$

20.3138.31

n

Xz

8-35

Page 36: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Standard Normal Distribution Table in Finding Probability – Example

? 1.80)P( Find z

8-36

Page 37: 3_Sampling, Sampling Bias the Central Limit Theorem

Using the Sampling Distribution of the Sample Mean (Sigma Known) – Example

What do we conclude? It is unlikely, less than a 4 percent

chance, we could select a sample of 16 observations from a normal population with a mean of 31.2 ounces and a population standard deviation of 0.4 ounces and find the sample mean equal to or greater than 31.38 ounces.

We conclude the process is putting too much cola in the bottles.

8-37

Page 38: 3_Sampling, Sampling Bias the Central Limit Theorem

SAMPLING, SAMPLING BIAS AND THE CENTRAL LIMIT THEOREM