7

of 27 /27
Point Estimates: The sample mean is the best estimator of the population mean . It is unbiased, consistent, the most efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution can be approximated by the normal distribution. If we know the sampling distribution of , we can make statements about any estimate we may make from sampling information. Let's look at a medical-supplies company that produces disposable hypodermic syringes. Each syringe is wrapped in a sterile package and then jumble-packed in a large corrugated carton. Jumble packing causes the cartons to contain differing numbers of syringes. Because the syringes are sold on a per unit basis, the company needs an estimate of the number of syringes per carton for billing purposes. We have taken a sample 35 cartons at random and recorded the number of syringes in each carton. The following table illustrates our results. We know that syringes. Thus, using the sample mean as our estimator, the point estimate of the population mean is 102 syringes per carton. The manufactured price of a disposable hypodermic syringe is quite small (about 25 ), so both the buyer and seller would accept the use of this point estimate as the basis for billing and the manufacturer can save the time and expense of counting each syringe that goes into a cartoon. Table 1: Results of a sample of 35 cartoons of hypodermic syringes (syringes per cartoon) 101 103 112 102 98 97 93 105 100 97 107 93 94 97 97 100 110 106 110 103 99 93 98 106 100 112 105 100 114 97 110 102 98 112 99 Hence, . So, sample variance and sample standard deviation are respectively 36.12 and 6.01. Point Estimate of the Population Variance and Standard Deviation: Suppose the management of the medical-supplies company wants to estimate the variance and/or standard deviation of the distribution of the number of packaged syringes per carton. The most frequently used estimator of the population standard deviation is the sample standard deviation s. We can calculate the sample standard deviation as in Table 1 and discover that it is 6.01 syringes. If, instead of considering

Embed Size (px)

Transcript of 7

Point Estimates: The sample mean X is the best estimator of the population mean . It is unbiased, consistent, the most efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution can be approximated by the normal distribution. If we know the sampling distribution of X , we can make statements about any estimate we may make from sampling information. Let's look at a medical-supplies company that produces disposable hypodermic syringes. Each syringe is wrapped in a sterile package and then jumble-packed in a large corrugated carton. Jumble packing causes the cartons to contain differing numbers of syringes. Because the syringes are sold on a per unit basis, the company needs an estimate of the number of syringes per carton for billing purposes. We have taken a sample 35 cartons at random and recorded the number of syringes in each carton. The following table illustrates our results. We know that X =

X = 3570 = 102 syringes.n35

Thus, using the sample mean X as our estimator, the point estimate of the population mean is 102 syringes per carton. The manufactured price of a disposable hypodermic syringe is quite small (about 25 C ), so both the buyer and seller would accept the use of this point estimate as the basis for billing and the manufacturer can save the time and expense of counting each syringe that goes into a cartoon. Table 1: Results of a sample of 35 cartoons of hypodermic syringes (syringes per cartoon) 101 105 97 93 114 103 100 100 98 97 112 97 110 106 110 102 107 106 100 102 98 93 110 112 98 97 94 103 105 112 93 97 99 100 99

2 2 nx 2 365368 35 ( 102 ) 2 x 2= 1 S ( x x ) = n 1 n 1 = 34 34 = 36.12. n 1 Hence, S = 36.12 = 6.01 .So, sample variance and sample standard deviation are respectively 36.12 and 6.01. Point Estimate of the Population Variance and Standard Deviation: Suppose the management of the medical-supplies company wants to estimate the variance and/or standard deviation of the distribution of the number of packaged syringes per carton. The most frequently used estimator of the population standard deviation is the sample standard deviation s. We can calculate the sample standard deviation as in Table 1 and discover that it is 6.01 syringes. If, instead of considering

S2 =

S2 =

specifically, it would tend to be too low. Using a divisor of n 1 gives us an unbiased estimator of cr2. Thus, we will use s2 and s to estimate cr2and . P o in t E s tim a te o f t h e P o p u la t io n P ro p o rt io n : The proportion of units that have a particular characteristic in a given population is symbolized p. If we know the proportion of units in a sample that have that same characteristic (symbolized p ), we can use this p as an estimator of p. It can be shown that

1 2 ( x x ) , the result would have some bias as an estimator of the population variance; n

1 2 ( x x ) n 1

as

our

sample

variance,

we

had

considered

has all the desirable properties we discussed earlier. Continuing our example of the manufacturer of medical supplies, we shall try to estimate the population proportion from the sample proportion. Suppose management wishes to estimate the number of cartons that will arrive damaged, owing to poor handling in shipment after the cartons leave the factory. We can check a sample of 50 cartons from their shipping point to the arrival at their destination and then record the presence or absence of damage. If, in this case, we find that the proportion of damaged cartons in the sample is 0.08, we would say that p = 0.08 = sample proportion damaged. Because the sample proportion p is a convenient estimator of the population proportion p, we can estimate that the proportion of damaged cartons in the population will also be 0.08. Problems to be solved: 1) A stadium authority is considering expanding its seating capacity and needs to know both the average number of people who attend events there and the variability in this number. The following are the attendances (in thousands) at nine randomly selected sporting events. Find point estimates of the mean and the variances of the population from which the sample was drawn. 8.8 14.0 21.3 7.9 12.5 20.6 16.3 14.1 13.0

p

2) The Pizza distribution authority (PDA) has developed quite a business in some area by delivering pizza orders promptly. PDA guarantees that its pizzas will be delivered in 30 minutes or less from the time the order was placed and if the delivery is late, the pizza is free. The time that it takes to deliver each pizza order that is on time is recorded in the Official Pizza Time Block (OPTB) and the delivery time for those pizzas that are delivered late is recorded as 30 minutes in the OPTB. 12 random entries from the OPTB are listed as 15.3 10.8 29.5 12.2 30.0 14.8 10.1 30.0 30.0 22.1 19.6 18.3

(a) Find the mean for the sample (b) From what population was this sample drawn? (c) Can this sample be used to estimate the average time that it takes for PDA to deliver a pizza? Explain. 3) Dr Ahmed, a meteorologist for local television station would like to report the average rainfall for today on this evenings newscast. The following are the rainfall measurements (in inches) for todays date for 16 randomly chosen past years. Determine the sample mean rainfall. 0.47 0.00 0.27 1.05 0.13 0.34 0.54 0.26 0.00 0.17 0.08 0.42 0.75 0.50 0.06 0.86

4) The Standard Chartered Bank is trying to determine the number of tellers available during the lunch rush on Sundays. The bank has collected data on the number of people who entered the bank during the last 3 months on Sunday from 11 A.M. to 1 P.M. Using the data below, find point estimates of the mean and standard deviation of the population from which the sample was drawn. 242 279 275 245 289 269 306 305 342 294 385 328

5) Electric Pizza was considering national distribution of its regionally successful product

and was compiling pro forma sales data. The average monthly sales figures (in thousands of dollars) from its 30 current distributors are listed. Treating them as (a) a sample and (b) a population, compute the standard deviation. 7.3 9.8 6.6 5.8 6.5 7.5 4.5 6.7 8.7 8.5 7.7 6.9 5.2 5.8 2.1 4.1 6.8 5.0 2.8 8.0 7.5 3.8 3.9 5.8 6.5 6.9 6.4 3.4 3.7 5.2

6) In a sample of 400 textile workers, 184 expressed extreme dissatisfaction regarding a prospective plan to modify working conditions. Because this dissatisfaction was strong enough to allow management to interpret plan reaction as being highly negative, they were curious about the proportion of total workers harboring this sentiment. Give a point estimate of this proportion. 7) The Friends of the Psychics network charges $3 per minute to learn the secrets that can turn your life around. The network charges for whole minutes only and rounds up to benefit the company. Thus, a 2 minute 10 second call costs $9. Below is a list of 15 randomly selected charges. 3 9 15 21 42 30 6 9 6 15 21 24 32 9 12 (a) Find the mean of the sample. (b) Find a point estimate of the variance of the population. (c) Can this sample be used to estimate the average length of a call? If so, what is your estimate? If not, what can we estimate using this sample? Interval Estimates : Basic Concepts The purpose of gathering samples is to learn more about a population. We can compute this information from the sample data as either point estimates, which we have just discussed, or as interval estimates, the subject of the rest of this chapter. An interval estimate describes a range of values within which a population parameter is likely to lie. Suppose the marketing research director needs an estimate of the average life in months of car batteries his company manufactures. We select a random sample of 200 batteries, record the car owners' names and addresses as listed in store records, and interview these owners about the battery life they have experienced. Our sample of 200 users has a mean battery life of 36 months. If we use the point estimate of the sample mean X as the best estimator of the population mean , we would report that the mean life of the company's batteries is 36 months. But the director also asks for a statement about the uncertainty that will be likely to accompany this estimate, that is, a statement about the range within which the unknown population mean is likely to lie. To provide such a statement, we need to find the standard error of the mean. We learned from earlier lectures that if we select and plot a large number of sample means from a population, the distribution of these means will approximate a normal curve. Furthermore, the mean of the sample means will be the same as the population mean. Our sample size of 200 is large enough that we can apply the central limit theorem. we can use the following formula and calculate the standard error of the mean: S ta n d a rd e rro r o f th e m x = ean =

. n

Suppose we have already estimated the standard deviation of the population of the batteries and reported that it is 10 months. So, we can calculate the standard error of the mean:

x =

n

=

10 = 0.707 m o n th . 200

We could now report to the director that our estimate of the life of the company's batteries is 36

months, and the standard error that accompanies this estimate is 0.707. In other words, the actual mean life for all the batteries may lie somewhere in the interval estimate of 35.293 to 36.707 months. This is helpful but insufficient information for the director. Next, we need to calculate the chance that the actual life will lie in this interval or in other intervals of different widths that we might choose, 2 (2 X 0.707), 3 (3 X 0.707), and so on. P r o b a b ilit y o f t h e T r u e P o p u la t io n P a ra m e t e r F a llin g w it h in t h e In t e r v a l E s t im a t e : To begin to solve this problem, we should review relevant parts of Probability Distribution. There we worked with the normal probability distribution and learned that specific portions of the area under the normal curve are located between plus and minus any given number of standard deviations from the mean. Fortunately, we can apply these properties to the standard error of the mean and make the following statement about the range of values used to make an interval estimate for our battery problem. The probability is 0.955 that the mean of a sample size of 200 will be within 2 standard errors of the population mean. Stated differently, 95.5 percent of all the sample means are within 2 standard errors from , and hence is within 2 standard errors of 95.5 percent of all the sample means. Theoretically, if we select 1,000 samples at random from a given population and then construct an interval of 2 standard errors around the mean of each of these samples, about 955 of these intervals will include the population mean. Similarly, the probability is 0.683 that the mean of the sample will be within 1 standard error of the population mean, and so forth. This theoretical concept is basic to our study of interval construction and statistical inference. In Figure 7-2, we have illustrated the concept graphically, showing five such intervals. Only the interval constructed around the sample mean X 4 does not contain the population mean. In words, statisticians would describe the interval estimates represented in Figure 7-2 by saying, "The population mean will be located within 2 standard errors from the sample mean 95.5 percent of the time." As far as any particular interval in Figure 7-2 is concerned, it either contains the population mean or it does not, because the population mean is a fixed parameter. Because we know that in 95.5 percent of all samples, the interval will contain the population mean, we say that we are 95.5 percent confident that the interval contains the population mean. Applying this to the battery example, we can now report to the director. Our best estimate of the life of the company's batteries is 36 months, and we are 68.3 percent confident that the life lies in the interval from 35.293 to 36.707 months (36 2 x ). Similarly, we are 95.5 percent confident that the life falls within the interval of 34.586 to 37.414 months (36 2 x ), and we are 99.7 percent confident that battery life falls within the interval of 33.879 to 38.121 months (36 3 x ). Figure 7.2: A number of intervals constructed around sample means; all except one include the population mean 95% of the means

Note: Every time you make an estimate there is an implied error in it. For people to understand this error, it's common practice to describe it with a statement like "Our best estimate of the life of this set of tires is 40,000 miles and we are 90 percent sure that the life will be between 35,000 and 45,000 miles." But if your boss demanded to know the precise average life of a set of tires, and if she were not into sampling, you'd have to watch hundreds of thousands of sets of tires being worn out and then calculate how long they lasted on average. Warning: Even then you'd be sampling because it's impossible to watch and measure every set of tires that's being used. It's a lot less expensive and a lot faster to use sampling to find the answer. And if you understand estimates, you can tell your boss what risks she is taking in using a sample to estimate real tire life. Problems to be solved: 1) For a population with a known variance of 185, a sample of 64 individuals leads to 217 as an estimate of the mean. (a) Find the standard error of the mean. (b) Establish an interval estimate that should include the population mean 68.3 percent of the time. 2) Urvashi is a frugal undergraduate at University who is interested in purchasing a used car. She randomly selected 125 want ads and found that the average price of a car in this sample was $3,250. Urvashi knows that the standard deviation of used-car prices in this city is $615. (a) Establish an interval estimate for the average price of a car so that Eunice can be 68.3 percent certain that the population mean lies within this interval. (b) Establish an interval estimate for the average price of a car so that Miss Gunterwal can be 95.5 percent certain that the population mean lies within this interval. 3) From a population known to have a standard deviation of 1.4, a sample of 60 individuals is taken. The mean for this sample is found to be 6.2. (a) Find the standard error of the mean. (b) Establish an interval estimate around the sample mean, using one standard error of the mean. 4) From a population with known standard deviation of 1.65, a sample of 32 items resulted in 34.8 as an estimate of the mean. (a) Find the standard error of the mean. (b) Compute an interval estimate that should include the population mean 99.7 percent of the time. 5) The University of North Carolina is conducting a study on the average weight of the many bricks that make up the University's walkways. Workers are sent to dig up and weigh a sample of 421 bricks and the average brick weight of this sample was 14.2 Ib. It is a well-known fact that the standard deviation of brick weight is 0.8 Ib. (a) Find the standard error of the mean.

(b) What is the interval around the sample mean that will include the population mean 95.5 percent of the time? 6) Because the owner of the Bard's Nook, a recently opened restaurant, has had difficulty estimating the quantity of food to be prepared each evening, he decided to determine the mean number of customers served each night. He selected a sample of 30 nights, which resulted in a mean of 71. The population standard deviation has been established as 3.76. (a) Give an interval estimate that has a 68.3 percent probability of including the population mean. (b) Give an interval estimate that has a 99.7 percent chance of including the population mean. 7) The manager of the Neuse River Bridge is concerned about the number of cars "running" the toll gates and is considering altering the toll-collection procedure if such alteration would be cost-effective. She randomly sampled 75 hours to determine the rate of violation. The resulting average violation per hour was 7. If the population standard deviation is known to be 0.9, estimate an interval that has a 95.5 percent chance of containing the true mean. 8) Gwen Taylor, apartment manager for WillowWood Apartments, wants to inform potential renters about how much electricity they can expect to use during August. She randomly selects 61 residents and discovers their average electricity usage in August to be 894b kilowatt hours (kwh). Gwen believes the variance in usage is about 1.31 (kwh) 2 . (a) Establish an interval estimate for the average August electricity usage so Gwen can be 68.3 percent certain the true population mean lies within this interval. (b) Repeat part (a) with a 99.7 percent certainty. (c) If the price per kwh is $0.12, within what interval can Gwen be 68.3 percent certain that the average August cost for electricity will lie? Interval Estimates and Confidence Intervals: In using interval estimates, we are not confined to 1, 2, 3 standard errors. According to Normal Table, for example 1.64 standard errors includes about 90 per cent of the area under the curve; it includes 0.4495 of the area on either side of the mean in a normal distribution. Similarly, 2.58 standard errors include about 99 per cent of the area, or 49.31 per cent on each side of the mean. In statistics, the probability that we associate with an interval estimate is called confidence level. This probability indicates how confident we are that the interval estimate will include the population parameter. A higher probability means more confidence. In estimation, the most commonly used confidence levels are 90 per cent and 99 per cent, but we are free to apply any confidence level. In Figure 7.2 , for example, we used a 95.5 per cent confidence level. The confidence interval is the range of the estimate we are making. If we report that we are 90 percent confident that the mean of the population of incomes of people in a certain community will lie between $8,000 and $24,000, then the range $8,000 - $24,000 is our confidence interval. Often, however, we will express the confidence interval in standard errors rather than in numerical values. Thus, we will often express confidence intervals like this: x 1.64 x , where

x + 1.64 x = upper limit of the confidence interval and x 1.64 x

= lower limit of the

confidence interval. Thus, confidence limits are the upper and lower limits of the confidence interval. In this case, x + 1.64 x is called the upper confidence limit (UCL) and x 1.64 x is the lower confidence limit (LCL). R e la t io n s h ip b e t w e e n C o n f id e n c e L e v e l a n d C o n fl: e n c e In t e r v a id

You may think that we should use a high confidence level, such as 99 percent, in all estimation problems. After all, a high confidence level seems to signify a high degree of accuracy in the estimate. In practice, however, high confidence levels will produce large confidence intervals, and such large intervals are not precise; they give very fuzzy estimates. Consider an appliance store customer who inquires about the delivery of a new washing machine. In Table 2 are several of the questions the customer might ask and the likely responses. This table indicates the direct relationship that exists between the confidence level and the confidence interval for any estimate. As the customer sets a tighter and tighter confidence interval, the store manager agrees to a lower and lower confidence level. Notice, too, that when the confidence interval is too wide, as is the case with a one-year delivery, the estimate may have very little real value, even though the store manager attaches a 99 percent confidence level to that estimate. Similarly, if the confidence interval is too narrow ("Will my washing machine get home before I do?"), the estimate is associated with such a low confidence level (1 percent) that we question its value. Table 2: Illustration of the relationship between confidence level and confidence interval C u sto m e r's Q u e stio n S to re M a n a g e r's s p o n s e Re Im plied Im p lie d C on fid e n c e C o n fid e n c e L e v e l Interval 1 year

W ill 1 g e t m y s h in g m a c h in e wa 1 a m a b s o lu te ly in o f B e tte r th a n 9 9 % ce rta w ith in 1 y e a r ? th a t. 'W ill y o u d e liv e r ats h in g w he m a c h in e 1 a m a lm o s t p o sit w e b e itiv ill w it h in 1 m o n t h ? d e liv e r eth is m o n th d W ill y o u d e liv e r t h e w a s h in g m a cw inh in 1 w e e k ? 1 a m p r e tty c e r ta in it h it e w ill g o o u t w ithth is w e e k . in W ill I g e t m y 1 a m n o t c e rta incan g e t we w a s h in g m a cth in eo r ro w ? om it to y oth e n . u T h e re is lit tle c h a nw e it c ill W ill m y w a s h in g h in e g e t h o m e m ac b e a t y oh o m e . u b e fo r e 1 d o ? A t le a s t 9 5 %

1 m o n th

About 80%

1 w eek

About 40%

1 day

N e a r1 %

1 hour

U s in g S a m p lin g a n d C o n fIn t e rc e l E s tim a tio n : id e n v a In our discussion of the basic concepts of interval estimation, particularly in Figure 7-2, we described samples being drawn repeatedly from a given population in order to estimate a population parameter. We also mentioned selecting a large number of sample means from a population. In practice, however, it is often difficult or expensive to take more than one sample from a population. Based on just one sample, we estimate the population parameter. We must be careful, then, about interpreting the results of such a process. Suppose we calculate from one sample in our battery example the following confidence interval and confidence level: "We are 95 percent confident that the mean battery life of the population lies within 30 and 42 months." This statement does not mean that the chance is 0.95 that the mean life of all our batteries falls within the interval established from this one sample. Instead, it means that if we select many random samples of the same size and calculate a confidence interval for each of these samples, then in about 95

percent of these cases, the population mean will lie within that interval. Warning: There is no free lunch in dealing with confidence levels and confidence intervals. When you want more of one, you have to take less of the other. Hint: To understand this important relationship, go back to Table 2. If you want the estimate of the time of delivery to be perfectly accurate (100 percent), you have to sacrifice tightness in the confidence interval and accept a very wide delivery promise ("sometime this year"). On the other hand, if you aren't concerned with the accuracy of the estimate, you could get a delivery person to say "I'm 1 percent sure I can get it to you within an hour." You can't have both at the same time. Problems to be solved: 1) Given the following confidence levels, express the lower and upper limits of the confidence interval for these levels in terms of X and x . (a) 54 percent (b) 75 percent. (c) 94 percent (d) 98 percent. 2) Define the confidence level for an interval estimate. 3) Define the confidence interval. 4) Suppose you wish to use a confidence level of 80 percent. Give the upper limit of the confidence interval in terms of the sample mean, X and the standard error, x . 5) In what way may an estimate be less meaningful because of (a) A high confidence level? (b) A narrow confidence level. 6) Suppose a sample of 50 is taken from a population with standard deviation 27 and that the sample mean is 86. (a) Establish an interval estimate for the population mean that is 95.5 percent certain to include the true population mean. (b) Suppose, instead, that the sample size was 5,000. Establish an interval for the population mean that is 95.5 percent certain to include the true population mean. (c) Why might estimate (a) be preferred to estimate (b)? Why might (b) be preferred to (a)? 8) Is the confidence level for an estimate based on the interval constructed from a single sample? 9) Given the following confidence levels, express the lower and upper limits of the confidence interval for these levels in terms of X and x . (a) 60 percent (b) 70 percent (c) 92 percent (d) 96 percent. 10) Steve Klippers, owner of Steve's Barbershop, has built quite a reputation among the residents of Cullowhee. As each customer enters his barbershop, Steve yells out the number of minutes that the customer can expect to wait before getting his haircut. The only statistician in town, after being frustrated by Steve's inaccurate point estimates, has determined that the actual waiting time for any customer is normally distributed with mean equal to Steve's estimate in minutes and standard deviation equal to 5 minutes divided by the customer's position in the waiting line. Help Steve's customers develop 95 percent probability intervals for the following situations: (a) The customer is second in line and Steve's estimate is 25 minutes. (b) The customer is third in line and Steve's estimate is 15 minutes. (c) The customer is fifth in line and Steve's estimate is 38 minutes. (d) The customer is first in line and Steve's estimate is 20 minutes. (e) How are these intervals different from confidence intervals? Calculating Interval Estimates of the Mean from Large Samples: A large automotive-parts wholesaler needs an estimate of the mean life it can expect from windshield wiper blades under typical driving conditions. Already, management has determined that the standard deviation of the population life is 6 months. Suppose we select a simple random sample of 100 wiper blades, collect data on their useful lives, and obtair

these results: n =1 0 0 - S a m p le s iz e X = 21 m onths - Sam ple m ean < = (T 6 m o n th< P o p u la tio n s ta n d a rd d e via tio n s Because the wholesaler uses tens of thousands of these wiper blades annually, it requests that we find an interval estimate with a confidence level of 95 percent. The sample size is greater than 30, so the central limit theorem allows us to use the normal distribution as our sampling distribution even if the population isn't normal. We calculate the standard error of the mean by using Equation 6-1 : 6 months ~W = 0 .6 m o n th