Statistical Estimation

31
Estimation In Statistics STEI ITB Bayu Rima Aditya

description

slide about statistical estimation by lecture

Transcript of Statistical Estimation

Estimation In Statistics

STEI ITBBayu Rima Aditya

Introduce

In statistics, estimation refers to the process by which one makes inferences about a population, based on information obtained from a sample.

Statisticians use sample statistics to estimate population parameters.

For example:a. Sample means are used to estimate population means;b. Sample proportions, to estimate population proportions.

Point Estimate vs Interval Estimate Point estimate. A point estimate of a population parameter is a

single value of a statistic.For example:

a. The sample mean x is a point estimate of the population mean μ.b. The sample proportion p is a point estimate of the population

proportion P. Interval estimate. An interval estimate is defined by two numbers,

between which a population parameter is said to lie.For example:a < x < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b.

Confidence Intervals

Statisticians use a confidence interval to express the precision and uncertainty associated with a particular sampling method. A confidence interval consists of three parts.1. A confidence level.2. A statistic.3. A margin of error.

For example:Suppose we compute an interval estimate of a population parameter. We might describe this interval estimate as a 95% confidence interval. This means that if we used the same sampling method to select different samples and compute different interval estimates, the true population parameter would fall within a range defined by the sample statistic + margin of error 95% of the time.

Confidence Level

The probability part of a confidence interval is called a confidence level. The confidence level describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter.

For Example:Suppose we collected all possible samples from a given population, and computed confidence intervals for each sample. Some confidence intervals would include the true population parameter; others would not. A 95% confidence level means that 95% of the intervals contain the true population parameter; a 90% confidence level means that 90% of the intervals contain the population parameter

Margin of Error

In a confidence interval, the range of values above and below the sample statistic is called the margin of error.

For example:Suppose the local newspaper conducts an election survey and reports that the independent candidate will receive 30% of the vote. The newspaper states that the survey had a 5% margin of error and a confidence level of 95%. These findings result in the following confidence interval: We are 95% confident that the independent candidate will receive between 25% and 35% of the vote.

Example

Which of the following statements is true.I. When the margin of error is small, the confidence level is high. II. When the margin of error is small, the confidence level is low. III. A confidence interval is a type of point estimate. IV. A population mean is an example of a point estimate. (A) I only (B) II only (C) III only (D) IV only.(E) None of the above.

Solution

The correct answer is (E). The confidence level is not affected by the margin of error. When the margin of error is small, the confidence level can low or high or anything in between. A confidence interval is a type of interval estimate, not a type of point estimate. A population mean is not an example of a point estimate; a sample mean is an example of a point estimate

Standard Error

The standard error is an estimate of the standard deviation of a statistic. This lesson shows how to compute the standard error, based on sample data.

The standard error is important because it is used to compute other measures, like confidence intervals and margins of error.

Notation

Population parameter Sample statistic

N: Number of observations in the population

n: Number of observations in the sample

Ni: Number of observations in population i ni: Number of observations in sample i

P: Proportion of successes in population p: Proportion of successes in sample

Pi: Proportion of successes in population i pi: Proportion of successes in sample i

μ: Population mean x: Sample estimate of population mean

μi: Mean of population i xi: Sample estimate of μi

σ: Population standard deviation s: Sample estimate of σ

σp: Standard deviation of p SEp: Standard error of p

σx: Standard deviation of x SEx: Standard error of x

Standard Deviation of Sample Estimates Statisticians use sample statistics to estimate population

parameters. Naturally, the value of a statistic may vary from one sample to the next.

The variability of a statistic is measured by its standard deviation.

Statistic Standard Deviation

Sample mean, x σx = σ / sqrt( n )

Sample proportion, p σp = sqrt [ P(1 - P) / n ]

Difference between means, x1 - x2 σx1-x2 = sqrt [ σ21 / n1 + σ2

2 / n2 ]

Difference between proportions, p1 - p2 σp1-p2 = sqrt [ P1(1-P1) / n1 + P2(1-P2) / n2 ]

Standard Error of Sample EstimatesSadly, the values of population parameters are often unknown, making it impossible to compute the standard deviation of a statistic. When this occurs, use the standard error.

Statistic Standard Error

Sample mean, x SEx = s / sqrt( n )

Sample proportion, p SEp = sqrt [ p(1 - p) / n ]

Difference between means, x1 - x2 SEx1-x2 = sqrt [ s21 / n1 + s2

2 / n2 ]

Difference between proportions, p1 - p2 SEp1-p2 = sqrt [ p1(1-p1) / n1 + p2(1-p2) / n2 ]

Example

Which of the following statements is true.I. The standard error is computed solely from sample attributes. II. The standard deviation is computed solely from sample attributes. III. The standard error is a measure of central tendency. (A) I only (B) II only (C) III only (D) All of the above. (E) None of the above.

Solution

The correct answer is (A). The standard error can be computed from a knowledge of sample attributes - sample size and sample statistics. The standard deviation cannot be computed solely from sample attributes; it requires a knowledge of one or more population parameters. The standard error is a measure of variability, not a measure of central tendency.

Margin of Error

In a confidence interval, the range of values above and below the sample statistic is called the margin of error.

For Example:Suppose we wanted to know the percentage of adults that exercise daily. We could devise a sample design to ensure that our sample estimate will not differ from the true population value by more than, say, 5 percent (the margin of error) 90 percent of the time (theconfidence level).

How to Compute the Margin of ErrorThe margin of error can be defined by either of the following equations:

1. Margin of error = Critical value x Standard deviation of the statistic.

2. Margin of error = Critical value x Standard error of the statistic If you know the standard deviation of the statistic, use the first equation to compute the margin of error. Otherwise, use the second equation.

How to Find the Critical Value #1 The critical value is a factor used to compute the margin of error. The central limit theorem states that the sampling distribution of a

statistic will be normal or nearly normal, if any of the following conditions apply:

1. The population distribution is normal.2. The sampling distribution is symmetric, unimodal, without

outliers.3. The sampling distribution is moderately skewed, unimodal,

without outliers.4. The sample size is 30 or greater than 30, without outliers.

How to Find the Critical Value #2When one of these conditions is satisfied, the critical value can be expressed as at score or as a z score. To find the critical value, follow these steps:

1. Compute alpha (α): α = 1 - (confidence level / 100)2. Find the critical probability (p*): p* = 1 - α/23. To express the critical value as a z score, find the z score having a

cumulative probability equal to the critical probability.4. To express the critical value as a t score, follow these steps:

a) Find the degrees of freedom (DF). When estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.

b) The critical t score is the t score having degrees of freedom equal to DF and acumulative probability equal to the critical probability (p*).

Example

Nine hundred (900) high school freshmen were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?(A) 0.013 (B) 0.025 (C) 0.500 (D) 1.960 (E) None of the above.

Solution

The correct answer is (B). To compute the margin of error, we need to find the critical value and the standard error of the mean. To find the critical value, we take the following steps:

1. Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 0.95 = 0.052. Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.9753. Find the critical z score. Since the sample size is large, the sampling

distribution will be roughly normal in shape. Therefore, we can express the critical value as a z score. For this problem, it will be the z score having a cumulative probability equal to 0.975. Using the Normal Distribution Tabel, we find that the critical value is 1.96.

Next, we find the standard error of the mean, using the following equation:

SEx = s / sqrt( n ) = 0.4 / sqrt( 900 ) = 0.4 / 30 = 0.013

And finally, we compute the margin of error (ME). ME = Critical value x Standard error = 1.96 * 0.013 = 0.025

This means we can be 95% confident that the mean grade point average in the population is 2.7 plus or minus 0.025, since the margin of error is 0.025.

Confidence Interval

Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.

How to Interpret Confidence Intervals #1Example:Suppose that a 90% confidence interval states that the population mean is greater than 100 and less than 200. How would you interpret this statement?

Some people think this means there is a 90% chance that the population mean falls between 100 and 200. This is incorrect. Like any population parameter, the population mean is a constant, not a random variable. It does not change. The probability that a constant falls within any given range is always 0.00 or 1.00.

How to Interpret Confidence Intervals #2The confidence level describes the uncertainty associated with a sampling method. Suppose we used the same sampling method to select different samples and to compute a different interval estimate for each sample. Some interval estimates would include the true population parameter and some would not. A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter; A 95% confidence level means that 95% of the intervals would include the parameter.

Confidence Interval Data RequirementsTo express a confidence interval, you need three pieces of information.1. Confidence level2. Statistic3. Margin of error

Given these inputs, the range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty associated with the confidence interval is specified by the confidence level.

Note: Often, the margin of error is not given; we must calculate it

How to Construct a Confidence Interval #1There are four steps to constructing a confidence interval:1. Identify a sample statistic. Choose the statistic (sample mean, sample

proportion) that you will use to estimate a population parameter.2. Select a confidence level. As we noted in the previous section, the

confidence level describes the uncertainty of a sampling method. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.

How to Construct a Confidence Interval #23. Find the margin of error. If you are working on a homework problem or a

test question, the margin of error may be given. Often, however, you will need to compute the margin of error, based on one of the following equations. Margin of error = Critical value * Standard deviation of statistic Margin of error = Critical value * Standard error of statistic

4. Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval is defined by the following equation. Confidence interval = sample statistic + Margin of error

Example

Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval.(A) 180 + 1.86 (B) 180 + 3.0 (C) 180 + 5.88 (D) 180 + 30 (E) None of the above

Solution

The correct answer is (A). To specify the confidence interval, we work through the four steps below.1. Identify a sample statistic. Since we are trying to estimate the mean

weight in the population, we choose the mean weight in our sample (180) as the sample statistic.

2. Select a confidence level. In this case, the confidence level is defined for us in the problem. We are working with a 95% confidence level.

3. Find the margin of error. The key steps are shown below.

a. Find standard error. The standard error (SE) of the mean is:SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95

b. Find critical value. The critical value is a factor used to compute the margin of error. To express the critical value as a t score, follow these steps.

– Compute alpha (α): α = 1 - (confidence level / 100) = 0.05– Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975– Find the degrees of freedom (df): df = n - 1 = 1000 - 1 = 999– The critical value is the t score having 999 degrees of freedom and a

cumulative probability equal to 0.975. From the t Distribution Calculator, we find that the critical value is 1.96.

• Note: We might also have expressed the critical value as a z score. Because the sample size is large, a z score analysis produces the same result - a critical value equal to 1.96.

• Compute margin of error (ME): ME = critical value * standard error = 1.96 * 0.95 = 1.86

Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level. Therefore, this 95% confidence interval says that the population mean falls within the interval 180 + 1.86.