Simple Random Sampling Point Estimation Introduction to...

50
Chapter 7 Sampling and Sampling Distributions x x Sampling Distribution of Introduction to Sampling Distributions Point Estimation Simple Random Sampling

Transcript of Simple Random Sampling Point Estimation Introduction to...

  • Chapter 7Sampling and Sampling Distributions

    xxSampling Distribution ofIntroduction to Sampling DistributionsPoint EstimationSimple Random Sampling

  • Example: St. Andrew’s

    St. Andrew’s College receives900 applications annually fromprospective students. Theapplication form contains a variety of informationincluding the individual’sscholastic aptitude test (SAT) score and whether or notthe individual desires on-campus housing.

  • Example: St. Andrew’s

    The director of admissionswould like to know thefollowing information:• the average SAT score for

    the 900 applicants, and• the proportion of

    applicants that want to live on campus.

  • Example: St. Andrew’s

    We will now look at threealternatives for obtaining thedesired information.� Conducting a census of the

    entire 900 applicants� Selecting a sample of 30

    applicants, using a random number table� Selecting a sample of 30 applicants, using Excel

  • Conducting a Census

    � If the relevant data for the entire 900 applicants were in the college’s database, the population parameters of interest could be calculated using the formulas presented in Chapter 3.

    � We will assume for the moment that conducting a census is practical in this example.

  • 990900

    ixμ = =∑ 990900

    ixμ = =∑

    2( )80

    900ix μσ−

    = =∑2( )

    80900

    ix μσ−

    = =∑

    Conducting a Census

    648 .72900

    p = =648 .72900

    p = =

    � Population Mean SAT Score

    � Population Standard Deviation for SAT Score

    � Population Proportion Wanting On-Campus Housing

  • � as Point Estimator of μxx

    � as Point Estimator of ppp

    29,910 99730 30

    ixx = = =∑ 29,910 99730 30

    ixx = = =∑

    2( ) 163,996 75.229 29ix xs−

    = = =∑2( ) 163,996 75.2

    29 29ix xs−

    = = =∑

    20 30 .68p = =20 30 .68p = =

    Point Estimation

    Note: Different random numbers would haveidentified a different sample which would haveresulted in different point estimates.

    � s as Point Estimator of σ

  • PopulationParameter

    PointEstimator

    PointEstimate

    ParameterValue

    μ = Population meanSAT score

    990 997

    σ = Population std.deviation for SAT score

    80 s = Sample std.deviation forSAT score

    75.2

    p = Population pro-portion wantingcampus housing

    .72 .68

    Summary of Point EstimatesObtained from a Simple Random Sample

    = Sample meanSAT score

    xx

    = Sample pro-portion wantingcampus housing

    pp

  • � Process of Statistical Inference

    The value of is used tomake inferences about

    the value of μ.

    xx The sample data provide a value forthe sample mean .xx

    A simple random sampleof n elements is selected

    from the population.

    Population with mean

    μ = ?

    Sampling Distribution of xx

  • The sampling distribution of is the probabilitydistribution of all possible values of the sample mean .

    xx

    xx

    Sampling Distribution of xx

    where: μ = the population mean

    E( ) = μxx

    xxExpected Value of

  • Sampling Distribution of xx

    Finite Population Infinite Population

    σ σx nN nN

    =−−

    ( )1

    σ σx nN nN

    =−−

    ( )1

    σ σx n=σ σx n=

    • is referred to as the standard error of themean.σ xσ x

    • A finite population is treated as beinginfinite if n/N < .05.

    • is the finite correction factor.( ) / ( )N n N− −1( ) / ( )N n N− −1

    xxStandard Deviation of

  • Form of the Sampling Distribution of xx

    If we use a large (n > 30) simple random sample, thecentral limit theorem enables us to conclude that thesampling distribution of can be approximated bya normal distribution.

    xx

    When the simple random sample is small (n < 30),the sampling distribution of can be considerednormal only if we assume the population has anormal distribution.

    xx

  • 80 14.630x n

    σσ = = =80 14.630x n

    σσ = = =

    ( ) 990E x =( ) 990E x =xx

    Sampling Distribution of for SAT Scoresxx

    SamplingDistribution

    of xx

  • With a mean SAT score of 990 and a standard deviation of80, what is the probability that a simple random sampleof 30 applicants will provide an estimate of thepopulation mean SAT score that is within +/−10 ofthe actual population mean μ ?

    In other words, what is the probability that will bebetween 980 and 1000?

    xx

    Sampling Distribution of for SAT Scoresxx

  • Step 1: Calculate the z-value at the upper endpoint ofthe interval.

    z = (1000 - 990)/14.6= .68

    .2517

    Step 2: Find the area under the curve between the meanand the upper endpoint.

    Sampling Distribution of for SAT Scoresxx

  • Sampling Distribution of for SAT Scoresxx

    Probabilities forthe Standard Normal Distribution

    z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .

    z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .

  • Sampling Distribution of for SAT Scoresxx

    xx990

    SamplingDistribution

    of xx14.6xσ = 14.6xσ =

    1000

    Area = .2517

  • Step 3: Calculate the z-value at the lower endpoint ofthe interval.

    Step 4: Find the area under the curve between the m ean and the lower endpoint.

    z = (980 - 990)/14.6= - .68

    = .2517

    Sampling Distribution of for SAT Scoresxx

  • Sampling Distribution of for SAT Scoresxx

    xx990

    SamplingDistribution

    of xx14.6xσ = 14.6xσ =

    980

    Area = .2517

  • Sampling Distribution of for SAT Scoresxx

    xx980 990

    Area = .2517

    SamplingDistribution

    of xx14.6xσ = 14.6xσ =

    1000

    Area = .2517

  • Sampling Distribution of for SAT Scoresxx

    Step 5: Calculate the area under the curve betweenthe lower and upper endpoints of the interval.

    P(-.68 < z < .68) == .2517 + .2517= .5034

    The probability that the sample mean SAT score willbe between 980 and 1000 is:

    P(980 < < 1000) = .5034xx

  • xx1000980 990

    Sampling Distribution of for SAT Scoresxx

    Area = .5034

    SamplingDistribution

    of xx14.6xσ = 14.6xσ =

  • Relationship Between the Sample Sizeand the Sampling Distribution of xx

    Suppose we select a simple random sample of 100applicants instead of the 30 originally considered.

    E( ) = μ regardless of the sample size. In ourexample, E( ) remains at 990.

    xxxx

    Whenever the sample size is increased, the standarderror of the mean is decreased. With the increasein the sample size to n = 100, the standard error of themean is decreased to:

    xσ xσ

    80 8.0100x n

    σσ = = =80 8.0100x n

    σσ = = =

  • Relationship Between the Sample Sizeand the Sampling Distribution of xx

    ( ) 990E x =( ) 990E x =xx

    14.6xσ = 14.6xσ =With n = 30,

    8xσ = 8xσ =With n = 100,

  • Recall that when n = 30, P(980 < < 1000) = .5034.xx

    Relationship Between the Sample Sizeand the Sampling Distribution of xx

    We follow the same steps to solve for P(980 < < 1000)when n = 100 as we showed earlier when n = 30.

    xx

    Now, with n = 100, P(980 < < 1000) = .7888.xxBecause the sampling distribution with n = 100 has asmaller standard error, the values of have lessvariability and tend to be closer to the populationmean than the values of with n = 30.

    xx

    xx

  • Relationship Between the Sample Sizeand the Sampling Distribution of xx

    xx1000980 990

    Area = .7888

    SamplingDistribution

    of xx8xσ = 8xσ =

  • Chapter 7 Sampling and Sampling Distributions

    Other Sampling Methods

    ppSampling Distribution of

  • A simple random sampleof n elements is selected

    from the population.

    Population with proportion

    p = ?

    � Making Inferences about a Population Proportion

    The sample data provide a value for thesample proportion .pp

    The value of is usedto make inferences

    about the value of p.

    pp

    Sampling Distribution of pp

  • E p p( ) =E p p( ) =

    Sampling Distribution of pp

    where:p = the population proportion

    The sampling distribution of is the probabilitydistribution of all possible values of the sampleproportion .pp

    pp

    ppExpected Value of

  • σ pp p

    nN nN

    =− −

    −( )1

    1σ p

    p pn

    N nN

    =− −

    −( )1

    1σ p

    p pn

    =−( )1σ p

    p pn

    =−( )1

    is referred to as the standard error of theproportion.σ pσ p

    Sampling Distribution of pp

    Finite Population Infinite Population

    ppStandard Deviation of

    • A finite population is treated as beinginfinite if n/N < .05.

  • Recall that 72% of theprospective students applyingto St. Andrew’s College desireon-campus housing.

    � Example: St. Andrew’s College

    Sampling Distribution of pp

    What is the probability thata simple random sample of 30 applicants will providean estimate of the population proportion of applicantdesiring on-campus housing that is within plus orminus .05 of the actual population proportion?

  • σ −= =p.72(1 .72) .082

    30σ −= =p

    .72(1 .72) .08230

    ( ) .72E p =( ) .72E p =pp

    SamplingDistribution

    of pp

    Sampling Distribution of pp

  • Step 1: Calculate the z-value at the upper endpoint ofthe interval.

    z = (.77 - .72)/.082 = .61

    .2291

    Step 2: Find the area under the curve between the m ean and upper endpoint.

    Sampling Distribution of pp

  • Probabilities forthe Standard Normal Distribution

    z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .

    z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389. . . . . . . . . . .

    Sampling Distribution of pp

  • .77.72

    Area = .2291

    pp

    SamplingDistribution

    of pp

    .082pσ = .082pσ =

    Sampling Distribution of pp

  • Step 3: Calculate the z-value at the lower endpoint ofthe interval.

    Step 4: Find the area under the curve between the m ean and the lower endpoint.

    z = (.67 - .72)/.082 = - .61

    .2291

    Sampling Distribution of pp

  • .67 .72

    Area = .2291

    pp

    SamplingDistribution

    of pp

    .082pσ = .082pσ =

    Sampling Distribution of pp

  • P(.67 < < .77) = .4582pp

    Step 5: Calculate the area under the curve betweenthe lower and upper endpoints of the interval.

    P(-.61 < z < .61) == .2291 + .2291= .4582

    The probability that the sample proportion of applicantswanting on-campus housing will be within +/-.05 of theactual population proportion :

    Sampling Distribution of pp

  • .77.67 .72

    Area = .4582

    pp

    SamplingDistribution

    of pp

    .082pσ = .082pσ =

    Sampling Distribution of pp

  • Other Sampling Methods

    � Stratified Random Sampling� Cluster Sampling� Systematic Sampling� Convenience Sampling� Judgment Sampling

  • The population is first divided into groups ofelements called strata.

    Stratified Random Sampling

    Each element in the population belongs to one andonly one stratum.

    Best results are obtained when the elements withineach stratum are as much alike as possible(i.e. a homogeneous group).

  • Stratified Random Sampling

    A simple random sample is taken from each stratum.

    Formulas are available for combining the stratumsample results into one population parameterestimate.

    Advantage: If strata are homogeneous, this methodis as “precise” as simple random sampling but witha smaller total sample size.

    Example: The basis for forming the strata might bedepartment, location, age, industry type, and so on.

  • Cluster Sampling

    The population is first divided into separate groupsof elements called clusters.

    Ideally, each cluster is a representative small-scaleversion of the population (i.e. heterogeneous group).

    A simple random sample of the clusters is then taken.

    All elements within each sampled (chosen) clusterform the sample.

  • Cluster Sampling

    Advantage: The close proximity of elements can becost effective (i.e. many sample observations can beobtained in a short time).

    Disadvantage: This method generally requires alarger total sample size than simple or stratifiedrandom sampling.

    Example: A primary application is area sampling,where clusters are city blocks or other well-definedareas.

  • Systematic Sampling

    If a sample size of n is desired from a populationcontaining N elements, we might sample oneelement for every n/N elements in the population.

    We randomly select one of the first n/N elementsfrom the population list.

    We then select every n/Nth element that follows inthe population list.

  • Systematic Sampling

    This method has the properties of a simple randomsample, especially if the list of the populationelements is a random ordering.

    Advantage: The sample usually will be easier toidentify than it would be if simple random samplingwere used.

    Example: Selecting every 100th listing in a telephonebook after the first randomly selected listing

  • Convenience Sampling

    It is a nonprobability sampling technique. Items areincluded in the sample without known probabilitiesof being selected.

    Example: A professor conducting research might usestudent volunteers to constitute a sample.

    The sample is identified primarily by convenience.

  • Advantage: Sample selection and data collection arerelatively easy.

    Disadvantage: It is impossible to determine howrepresentative of the population the sample is.

    Convenience Sampling

  • Judgment Sampling

    The person most knowledgeable on the subject of thestudy selects elements of the population that he orshe feels are most representative of the population.

    It is a nonprobability sampling technique.

    Example: A reporter might sample three or foursenators, judging them as reflecting the generalopinion of the senate.

  • Judgment Sampling

    Advantage: It is a relatively easy way of selecting asample.

    Disadvantage: The quality of the sample resultsdepends on the judgment of the person selecting thesample.