S245 12 Sampling Theory

download S245 12 Sampling Theory

of 104

Transcript of S245 12 Sampling Theory

  • 8/12/2019 S245 12 Sampling Theory

    1/104

    Sampling Theory

    Determining the distribution of Sample

    statistics

  • 8/12/2019 S245 12 Sampling Theory

    2/104

    Sampling Theory

    sampling distributions

    It is important that we model this and use it

    to assess accuracy of decisions made from

    samples. A sample is a subset of the population.

    In many instances it is too costly to collect

    data from the entire population.

    Note:It is important to recognize the dissimilarity(variability) we should expect to see in varioussamples from the same population.

  • 8/12/2019 S245 12 Sampling Theory

    3/104

    Statistics and Parameters

    A statisticis a numerical value computed from a

    sample. Its value may differ for different samples.

    e.g. sample mean , sample standard deviation s, and

    sample proportion .

    A parameteris a numerical value associated with a

    population. Considered fixed and unchanging. e.g.

    population mean m, population standard deviation s,and population proportion p.

    x

    p

  • 8/12/2019 S245 12 Sampling Theory

    4/104

    Observations on a measurementX

    x1,x

    2,x

    3, ,x

    ntaken on individuals (cases) selected at random from a

    population are random variablesprior to theirobservation.

    The observations are numerical quantities whosevalues are determined by the outcome of a randomexperiment (the choosing of a random sample fromthe population).

  • 8/12/2019 S245 12 Sampling Theory

    5/104

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0 10 20 30 40 50 60

    The probability distribution of the observationsx1,x2,

    x3, ,x

    nis sometimes called the population.

    This distribution is thesmoothhistogram of the the

    variableXfor the entire population

  • 8/12/2019 S245 12 Sampling Theory

    6/104

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0 10 20 30 40 50 60

    the populationis unobserved (unless all observations

    in the population have been observed)

  • 8/12/2019 S245 12 Sampling Theory

    7/104

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0 10 20 30 40 50 60

    A histogram computed from the observations

    x1,x2,x3, ,xnGives an estimate of the population.

  • 8/12/2019 S245 12 Sampling Theory

    8/104

    A statisticcomputed from the observations

    x1,x2,x3, ,xnis also a random variableprior to observation of the

    sample.

    A statisticis also a numerical quantity whose value is

    determined by the outcome of a random experiment

    (the choosing of a random sample from the

    population).

  • 8/12/2019 S245 12 Sampling Theory

    9/104

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0 10 20 30 40 50 60

    The probability distribution of statistic computedfrom the observations

    x1,x2,x3, ,xnis sometimes called its sampling distribution.

    This distribution describes the random behaviour of

    the statistic

  • 8/12/2019 S245 12 Sampling Theory

    10/104

    It is important to determine the sampling distribution

    of a statistic.

    It will describe itssampling behaviour.

    The sampling distribution will be used the assess the

    accuracy of the statistic when used for the purpose ofestimation.

    Sampling theory is the area of Mathematical Statistics

    that is interested in determining the sampling

    distribution of various statistics

  • 8/12/2019 S245 12 Sampling Theory

    11/104

    Many statistics have a normal distribution.

    This quite often is true if the population is Normal

    It is also sometimes true if the sample size is

    reasonably large. (reasonthe Central limit

    theorem, to be mentioned later)

  • 8/12/2019 S245 12 Sampling Theory

    12/104

    Combining Random Variables

  • 8/12/2019 S245 12 Sampling Theory

    13/104

    Combining Random Variables

    Quite often we have two or more random variablesX, Y, Z etc

    We combine these random variables using amathematical expression.

    Important question

    What is the distribution of the new random variable?

  • 8/12/2019 S245 12 Sampling Theory

    14/104

    Example 1: Suppose that one performs two

    independent tasks (A and B):

    X= time to perform task A (normal with mean 25

    minutes and standard deviation of 3 minutes.)

    Y= time to perform task B (normal with mean 15

    minutes and std dev 2 minutes.)

    Let T=X+ Y= total time to perform the two tasks

    What is the distribution of T?

    What is the probability that the two tasks take more

    than 45 minutes to perform?

  • 8/12/2019 S245 12 Sampling Theory

    15/104

    Example 2:

    Suppose that a student will take three tests in the nextthree days

    1. Mathematics (X is the score he will receive on thistest.)

    2. English Literature (Y is the score he will receive on

    this test.)

    3. Social Studies (Z is the score he will receive on thistest.)

  • 8/12/2019 S245 12 Sampling Theory

    16/104

    Assume that

    1. X (Mathematics) has a Normal distribution with

    mean m= 90 and standard deviation s= 3.

    2. Y (English Literature) has a Normal distribution

    with mean m= 60 and standard deviation s= 10.3. Z (Social Studies) has a Normal distribution with

    mean m= 70 and standard deviation s= 7.

  • 8/12/2019 S245 12 Sampling Theory

    17/104

    Graphs

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0 20 40 60 80 100

    X (Mathematics)

    m= 90, s= 3.

    Y (English Literature)m= 60, s= 10.

    Z (Social Studies)

    m= 70 , s= 7.

  • 8/12/2019 S245 12 Sampling Theory

    18/104

    Suppose that after the tests have been written an overall

    score, S, will be computed as follows:

    S (Overall score) = 0.50X (Mathematics) + 0.30 Y

    (English Literature) + 0.20Z (Social Studies) +

    10 (Bonus marks)

    What is the distribution of the overall score, S?

  • 8/12/2019 S245 12 Sampling Theory

    19/104

    Sums, Differences, Linear Combinations of R.V.s

    A linear combination of random variables,X, Y, . . . isa combination of the form:

    L =aX +bY + + c (a constant)

    where a, b, etc. are numberspositive or negative.

    Most common:

    Sum=X +Y Difference=XY

    Others

    Averages = 1/3X +1/3Y +

    1/3Z

    Weighted averages = 0.40X + 0.25 Y + 0.35Z

  • 8/12/2019 S245 12 Sampling Theory

    20/104

    Sums, Differences, Linear Combinations of R.V.s

    A linear combination of random variables,X, Y, . . . isa combination of the form:

    L =aX +bY + + c (a constant)

    where a, b, etc. are numberspositive or negative.

    Most common:

    Sum=X +Y Difference=XY

    Others

    Averages = 1/3X +1/3Y +

    1/3Z

    Weighted averages = 0.40X + 0.25 Y + 0.35Z

  • 8/12/2019 S245 12 Sampling Theory

    21/104

    Means of Linear Combinations

    The mean of L is:

    Mean(L)=a Mean(X)+b Mean(Y)+ + c

    mL=a mX+b mY+ + cMost common:

    Mean(X +Y) = Mean(X) + Mean(Y)

    Mean(XY) = Mean(X)Mean(Y)

    If L =aX +bY + + c

  • 8/12/2019 S245 12 Sampling Theory

    22/104

    Variances of Linear Combinations

    IfX, Y, . . . are independentrandom variables and

    L =aX +bY + + cthen

    Variance(L)=a2Variance(X)+b2 Variance(Y)+

    Most common:

    Variance(X +Y) = Variance(X) + Variance(Y)

    Variance(XY) = Variance(X) + Variance(Y)

    2 2 2 2 2

    L X Ya bs s s

    The constant c has no effect on the variance

  • 8/12/2019 S245 12 Sampling Theory

    23/104

    Example: Suppose that one performs two

    independent tasks (A and B):

    X= time to perform task A (normal with mean 25

    minutes and standard deviation of 3 minutes.)Y= time to perform task B (normal with mean 15

    minutes and std dev 2 minutes.)

    Xand Y independent so T=X+ Y= total time is normal

    with

    6.323deviationstandard

    401525mean

    22

    s

    m

    0823.39.16.3

    404545

    ZPZPTP

    What is the probability that the two tasks take more than 45

    minutes to perform?

  • 8/12/2019 S245 12 Sampling Theory

    24/104

    Example 2:

    A student will take three tests in the next three days

    1. X (Mathematics) has a Normal distribution with

    mean m= 90 and standard deviation s= 3.

    2. Y (English Literature) has a Normal distribution

    with mean m= 60 and standard deviation s= 10.3. Z (Social Studies) has a Normal distribution with

    mean m= 70 and standard deviation s= 7.

    Overall score, S = 0.50X (Mathematics) + 0.30 Y(English Literature) + 0.20Z (Social Studies) +

    10 (Bonus marks)

  • 8/12/2019 S245 12 Sampling Theory

    25/104

    Graphs

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0 20 40 60 80 100

    X (Mathematics)

    m= 90, s= 3.

    Y (English Literature)m= 60, s= 10.

    Z (Social Studies)

    m= 70 , s= 7.

  • 8/12/2019 S245 12 Sampling Theory

    26/104

    Determine the distribution of

    S = 0.50X + 0.30 Y + 0.20Z + 10

    S has a normal distribution with

    MeanmS= 0.50 mX + 0.30 mY + 0.20 mZ + 10

    = 0.50(90) + 0.30(60) + 0.20(70) + 10

    = 45 + 18 + 14 +10 = 87

    2 2 22 2 20.5 0.3 0.2

    s X Y Zs s s s

    2 2 22 2 20.5 3 0.3 10 0.2 7

    2.25 9 1.96 13.21 3.635

  • 8/12/2019 S245 12 Sampling Theory

    27/104

    Graph

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0 20 40 60 80 100

    distribution of

    S = 0.50X + 0.30 Y + 0.20Z + 10

  • 8/12/2019 S245 12 Sampling Theory

    28/104

    Sampling Theory

    Determining the distribution of Sample

    statistics

  • 8/12/2019 S245 12 Sampling Theory

    29/104

    Combining Random Variables

  • 8/12/2019 S245 12 Sampling Theory

    30/104

    Sums, Differences, Linear Combinations of R.V.s

    A linear combination of random variables,X, Y, . . . isa combination of the form:

    L =aX +bY + + c (a constant)

    where a, b, etc. are numberspositive or negative.

    Most common:

    Sum=X +Y Difference=XY

    Others

    Averages = 1/3X +1/3Y +

    1/3Z

    Weighted averages = 0.40X + 0.25 Y + 0.35Z

  • 8/12/2019 S245 12 Sampling Theory

    31/104

    Means of Linear Combinations

    The mean of L is:

    Mean(L)=a Mean(X)+b Mean(Y)+ + c

    mL=a mX+b mY+ + cMost common:

    Mean(X +Y) = Mean(X) + Mean(Y)

    Mean(XY) = Mean(X)Mean(Y)

    If L =aX +bY + + c

  • 8/12/2019 S245 12 Sampling Theory

    32/104

    Variances of Linear Combinations

    IfX, Y, . . . are independentrandom variables and

    L =aX +bY + + cthen

    Variance(L)=a2Variance(X)+b2 Variance(Y)+

    Most common:

    Variance(X +Y) = Variance(X) + Variance(Y)

    Variance(XY) = Variance(X) + Variance(Y)

    2 2 2 2 2

    L X Ya bs s s

    The constant c has no effect on the variance

  • 8/12/2019 S245 12 Sampling Theory

    33/104

    Normality of Linear Combinations

    IfX, Y, . . . are independent Normal random

    variables and

    L =aX +bY + + c

    then L is Normal with

    mean

    and standard deviation

    cba YXL mmm

    2222 XXL ba sss

    2

  • 8/12/2019 S245 12 Sampling Theory

    34/104

    In particular:

    X +Y is normal with

    XY is normal with

    22

    deviationstandard

    mean

    YX

    YX

    ss

    mm

    22deviationstandard

    mean

    YX

    YX

    ss

    mm

  • 8/12/2019 S245 12 Sampling Theory

    35/104

    The distribution of the sample

    mean

  • 8/12/2019 S245 12 Sampling Theory

    36/104

    The distribution of averages (the mean)

    Letx1,x2, ,xn denote n independent randomvariables each coming from the same Normal

    distribution with mean mand standard deviation s.

    Let

    11 2

    1 1 1

    n

    i

    in

    xx x x x

    n n n n

    What is the distribution of ?x

    Th di ib i f ( h )

  • 8/12/2019 S245 12 Sampling Theory

    37/104

    The distribution of averages (the mean)

    Because the mean is a linear combination

    1 2

    1 1 1nx x x xn n n

    m m m m

    and

    1 1 1 1

    nn n n nm m m m m

    1 2

    2 2 2

    2 2 2 21 1 1nx x x xn n n

    s s s s

    2 2 2 2 22 2 2

    2

    1 1 1n

    n n n n n

    s ss s s

  • 8/12/2019 S245 12 Sampling Theory

    38/104

    Thus if x1,x2, ,xn denote n independent random

    variables each coming from the same Normal

    distribution with mean mand standard deviation s.Then

    11 2

    1 1 1

    n

    i

    in

    x

    x x x x

    n n n n

    has Normal distribution with

    mean andxm m2

    2variancex

    n

    ss

    standard deviation xn

    ss

  • 8/12/2019 S245 12 Sampling Theory

    39/104

    Graphs

    0

    0.02

    0.04

    0.06

    0.08

    150 170 190 210 230 250 270 290 310

    The probability

    distribution of

    individual

    observations

    The probability

    distribution of

    the mean

    s

    m

    n

    s

  • 8/12/2019 S245 12 Sampling Theory

    40/104

    Summary

    The distribution of the sample mean is Normal. The distribution of the sample mean has exactly thesame mean as the population (m).

    The distribution of the sample mean has a smaller

    standard deviation then the population.

    Averaging tends todecrease variability

    An Excelfile illustrating the distribution of thesample mean

    compared ton

    ss

    x

    x

    http://localhost/var/www/apps/conversion/tmp/scratch_5/mean.XLShttp://localhost/var/www/apps/conversion/tmp/scratch_5/mean.XLS
  • 8/12/2019 S245 12 Sampling Theory

    41/104

    Example

    Suppose we are measuring the cholesterol level ofmen age 60-65

    This measurement has a Normal distribution with

    mean m= 220 and standard deviation s= 17.

    A sample of n = 10 males age 60-65 are selected and

    the cholesterol level is measured for those 10 males.

    x1,x2,x3,x4,x5,x6,x7,x8,x9,x10, are those 10

    measurementsFind the probability distribution of

    Compute the probability that is between 215 and 225

    ?x

    x

  • 8/12/2019 S245 12 Sampling Theory

    42/104

    Solution

    Find the probability distribution of xNormal with 220xm m

    17and 5.376

    10x n

    ss

    215 225P x

    215 220 220 225 220

    5.376 5.376 5.376

    xP

    0.930 0.930 0.648P z

  • 8/12/2019 S245 12 Sampling Theory

    43/104

    The Central Limit Theorem

    The Central Limit Theorem (C.L.T.) states that if nissufficiently large, the sample meansof randomsamples from anypopulation with mean mand finite

    standard deviation sare approximately normallydistributedwith mean mand standard deviation .

    Technical Note:

    The mean and standard deviation given in the CLThold for any sample size; it is only the approximatelynormal shape that requires n to be sufficiently large.

    n

    s

  • 8/12/2019 S245 12 Sampling Theory

    44/104

    Graphical Illustration of the Central Limit Theorem

    Original Population

    x10 3020

    10 x

    Distribution ofx:

    n= 10

    x

    Distribution ofx:

    n= 30

    10 20

    x

    Distribution ofx:

    n= 2

    10 3020

  • 8/12/2019 S245 12 Sampling Theory

    45/104

    Implications of the Central Limit Theorem

    The Conclusion that the sampling distribution of thesample mean is Normal, will totrueif the sample size

    is large (>30). (even though the population may be non-

    normal).

    When the population can be assumed to be normal, the

    sampling distribution of the sample mean is Normal, will

    totruefor any sample size.

    Knowing the sampling distribution of the sample meanallows to answer probability questions related to the

    sample mean.

    E l

  • 8/12/2019 S245 12 Sampling Theory

    46/104

    Example

    Example: Consider a normal population with m= 50 and s=

    15.Suppose a sample of size 9 is selected at random. Find:

    P x( )45 60 Px( . )475

    1)

    2)

    Solutions: Since the original population is normal, the distribution of the

    sample mean is also (exactly) normal

    1) m mx 50s s

    x n 15 9 153 52)

  • 8/12/2019 S245 12 Sampling Theory

    47/104

    5045 60 x01.00 2.00 z

    Example

    P x PP z

    ( )(

    .

    .

    .

    45 60 45 505

    60 505

    1.00 2.00)08413

    00228

    08185

    zz = ;x- ms n

  • 8/12/2019 S245 12 Sampling Theory

    48/104

    5047.5 x0-0.50 z

    0 3085.

    Example

    Px PxPz

    ( . ) .( .)

    .

    .

    .

    475 505

    475 505

    505000

    01915

    03085

    z = ;x- ms n

  • 8/12/2019 S245 12 Sampling Theory

    49/104

    Example

  • 8/12/2019 S245 12 Sampling Theory

    50/104

    Example

    xP PPz

    ( ).

    ( . )

    105 105 109283

    141.

    00793

    z = ;x- ms n z

    109105 x0141. z

    0 0793.

    1)

  • 8/12/2019 S245 12 Sampling Theory

    51/104

    To investigate the claim, we need to examine how likelyanobservation is the sample mean of $120

    There is evidence (the sample) to suggest the claim of m= $109 islikely wrong

    Since the probability is so small, this suggests the observation of$120 is very rare (if the mean cost is really $109)

    Consider how far out in the tail of the distribution of the sample

    mean is $120

    Px PPz

    ( ).

    ( . )

    120 120 109283

    3891.0000 - 0.9999 = 0.0001

    z = ;x- ms n z

    2)

  • 8/12/2019 S245 12 Sampling Theory

    52/104

    Summary

    The distribution of is (exactly) normal when theoriginal population is normal

    The CLT says: the distribution of is approximatelynormal regardless of the shape of the original

    distribution, when the sample size is large enough!

    The mean of the sampling distribution of is equal to

    the mean of the original population:

    x

    xm m

    x

    x

    The standard deviation of the sampling distribution of(also called the standard error of the mean) is equal to the

    standard deviation of the original population divided bythe square root of the sample size:

    x

    x

    n

    ss

  • 8/12/2019 S245 12 Sampling Theory

    53/104

    Sampling Distribution of a

    Sample Proportion

  • 8/12/2019 S245 12 Sampling Theory

    54/104

    Sampling Distribution for Sample Proportions

    Letp =population proportion of interest

    or binomial probability of success.

    Let

    trialsbimomialofno.

    succesesofno.

    n

    Xp

    pofondistributisamplingThen the

    pp mean m n

    ppp

    )1(

    s

    is approximately a normal distribution with

    = sample proportion or proportion ofsuccesses.

    L i

  • 8/12/2019 S245 12 Sampling Theory

    55/104

    Logic

    RecallX = the number of successes in n trials has aBinomial distribution with parameters n andp (the

    probability of success).AlsoX has approximately a Normal distributionwith

    mean m= np and

    standard deviation

    1Then the sampling distribution of

    Xp X

    n n

    1 1mean p np p

    n nm m

    (1 )(1 )

    1 1and

    p

    p pnp p

    nn ns s

    is a normal distribution with

    (1 )npq np ps

  • 8/12/2019 S245 12 Sampling Theory

    56/104

    0

    5

    10

    15

    20

    25

    30

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    c

    pofondistributiSampling

    p pm

    1p

    p p

    ns

    E l S l P ti F i

  • 8/12/2019 S245 12 Sampling Theory

    57/104

    Example Sample Proportion Favor ing a

    Candidate

    Suppose 20% all voters favor Candidate A.Pollsters take a sample of n= 600 voters. Thenthe sample proportion who favor A will haveapproximately a normal distribution with

    20.0mean ppm

    01633.0600

    )80.0(20.0)1(

    n

    pp

    ps

  • 8/12/2019 S245 12 Sampling Theory

    58/104

    0

    5

    10

    15

    20

    25

    30

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    c

    pofondistributiSampling

    i S i i i i

  • 8/12/2019 S245 12 Sampling Theory

    59/104

    Determine the probability that the sample proportion

    will be between 0.18 and 0.22

    i.e. the probability, 0.18 0.22P p

    Using the Sampling distribution:

    Suppose 20% all voters favor Candidate A. Pollsterstake a sample of n= 600 voters.

  • 8/12/2019 S245 12 Sampling Theory

    60/104

    01633.0600

    )80.0(20.0)1(

    n

    ppps

    Solution:

    20.0Recall ppm

    0.18 0.20 0.20 0.22 0.20

    0.18 0.220.1633 0.1633 0.1633

    pP p P

    7794.01103.08897.0225.1225.1 zP

    01633.0 01633.0 01633.0

  • 8/12/2019 S245 12 Sampling Theory

    61/104

  • 8/12/2019 S245 12 Sampling Theory

    62/104

    Distribution for Sample Mean

    the sampling distribution of x

    mean and standard deviationx x

    n

    sm m s

    is a normal distribution with

    If data is collected from a Normal distributionwith mean mand standard deviation sthen:

  • 8/12/2019 S245 12 Sampling Theory

    63/104

    The Central Limit Thereom

    the sampling distribution of x

    mean and standard deviationx x

    n

    sm m s

    is a approximatelynormal (for n > 30) with

    If data is collected from a distribution (possibly nonNormal)with mean mand standard deviation sthen:

  • 8/12/2019 S245 12 Sampling Theory

    64/104

    Distribution for Sample Proportions

    Letp =population proportion of interest

    or binomial probability of success.Let

    trialsbimomialofno.

    succesesofno.

    n

    Xp

    pofondistributisamplingThen the

    pp mean m n

    ppp

    )1(

    s

    is approximately a normal distribution with

    = sample proportion or proportion ofsuccesses.

  • 8/12/2019 S245 12 Sampling Theory

    65/104

    Sampling distribution of a

    differences

  • 8/12/2019 S245 12 Sampling Theory

    66/104

    Sampling distribution of a differencein two

    Sample means

  • 8/12/2019 S245 12 Sampling Theory

    67/104

    IfX, Yare independentnormal random variables, then :

    XY is normal with

    Recall

    22deviationstandard

    mean

    YX

    YX

    ss

    mm

  • 8/12/2019 S245 12 Sampling Theory

    68/104

    Comparing Means

    Situation

    We have two normal populations (1 and 2)

    Let m1and s1denote the mean and standard deviation ofpopulation 1.

    Let m2and s2denote the mean and standard deviation ofpopulation 2.

    Letx1,x2,x3, ,xndenote a sample from a normalpopulation 1.

    Lety1,y2,y3, ,ymdenote a sample from a normal

    population 2. Objective is to compare the two population means

    We know that:

  • 8/12/2019 S245 12 Sampling Theory

    69/104

    We know that:

    is Normal with meanD x y

    11

    is Normal with mean and

    x xx

    n

    sm m s

    22

    and

    is Normal with mean and

    y yy

    m

    sm m s

    Thus

    1 2 -x y x ym m m m m

    2 22 2 1 2=

    x y x yn m

    s ss s s

    E l

  • 8/12/2019 S245 12 Sampling Theory

    70/104

    Example

    Consider measuring Heart rate two minutes after a twenty

    minute exercise program.

    There are two groups of individuals

    1. Those who performed exercise program A (considered to be

    heavy).2. Those who performed exercise program B (considered to be

    light).

    The average Heart rate for those who performed exercise

    program Awas m1 = 110 with standard deviation, s1= 7.3, while

    the average Heart rate for those who performed exercise

    program Bwas m2 = 95 with standard deviation, s2= 4.5.

  • 8/12/2019 S245 12 Sampling Theory

    71/104

    -0.01

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    80 90 100 110 120 130

    Heart rate for

    program B

    Heart rate for

    program A

  • 8/12/2019 S245 12 Sampling Theory

    72/104

    Situation

    Suppose we observe the heart rate of n = 15 subjects onprogram A.

    Letx1,x2,x3, ,x15denote these observations.

    We also observe the heart rate of m = 20 subjects onprogram B.

    Lety1,y2,y3, ,y20denote these observations.

    What is the probability that the sample mean heart rate forProgram A is at least 8 units higher than the sample mean

    heart rate for Program B?

    We know that:

  • 8/12/2019 S245 12 Sampling Theory

    73/104

    We know that:

    is Normal with meanD x y

    7.3 is Normal with mean 110 and

    15x x

    x m s

    and

    4.5is Normal with mean 95 and

    20y y

    y m s

    and

    110 - 95 15x y x ym m m

    2 2 2 22 2 1 2 7.3 4.5= 2.1366

    15 20x y x y

    n m

    s ss s s

  • 8/12/2019 S245 12 Sampling Theory

    74/104

    -0.05

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    80 90 100 110 120 130

    distn of

    sample mean

    for program B distn of

    sample mean

    program A

  • 8/12/2019 S245 12 Sampling Theory

    75/104

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    0 5 10 15 20 25 30

    distn of differencein sample means, D

  • 8/12/2019 S245 12 Sampling Theory

    76/104

    What is the probability that the sample mean heart rate forProgram A is at least 8 units higher than the sample mean

    heart rate for Program B?

    Solution

    want 8 8 8P x y P x y P D

    15 8 15

    3.282.1366 2.1366

    DP P z

    1 0.0005 0.9995

  • 8/12/2019 S245 12 Sampling Theory

    77/104

    Sampling distribution of a differencein two

    Sample proportions

    C i P ti

  • 8/12/2019 S245 12 Sampling Theory

    78/104

    Comparing Proportions

    Situation

    Suppose we have two Success-Failure experiments Letp1= the probability of success for experiment 1.

    Letp2= the probability of success for experiment 2.

    Suppose that experiment 1 is repeated n1times and

    experiment 2 is repeated n2 Letx1 = the no. of successes in the n1 repititions ofexperiment 1,x2 = the no. of successes in the n2 repititionsof experiment 2.

    1 2

    1 2

    1 2 = and =

    x x

    p pn n

    1 21 2

    1 2

    What is the distribution of = ?x x

    D p pn n

    We know that:

  • 8/12/2019 S245 12 Sampling Theory

    79/104

    1 2 is Normal with meanD p p

    1

    11 1

    1

    = is Normal with meanp

    xp p

    nm

    Thus

    1 2 1 2 1 2 -p p p p p pm m m

    1 2 1 2

    1 1 2 22 2

    1 2

    1 1=

    p p p p

    p p p p

    n ns s s

    1

    1 1

    1

    1-andp

    p pn

    s

    2

    22 2

    2

    Also = is Normal with meanp

    xp p

    nm

    2

    2 2

    2

    1-and

    p

    p p

    ns

    Example

  • 8/12/2019 S245 12 Sampling Theory

    80/104

    The Globe and Mail carried out a survey to investigate

    the State of the Baby Boomers. (June 2006)

    Two populations in the study

    1. Baby Boomers (age 4059) (n1= 664)

    2. GenerationX (age 3039) (n2= 342)

  • 8/12/2019 S245 12 Sampling Theory

    81/104

    One of questions

    Are you close to your parents? Yes or No

    Suppose that the proportions in the two populations were: Baby Boomers40% yes (p1= 0.40)

    GenerationX20% yes (p2= 0.20)

    What is the probability that this would be observed inthe samples to a certain degree?

    What isP[p1p2 0.15]?^ ^

    Solution:

  • 8/12/2019 S245 12 Sampling Theory

    82/104

    1

    11 1

    1

    = is Normal with mean 0.40p

    xp p

    nm

    1

    1 1

    1

    1-andp

    p pn

    s

    2

    22 2

    2

    Also = is Normal with mean 0.20p

    xp p

    nm

    22 2

    2

    1-and p p pn

    s

    0.40 1-0.400.019012

    664

    0.20 1-0.200.02163

    342

    distn of sample

  • 8/12/2019 S245 12 Sampling Theory

    83/104

    0

    5

    10

    15

    20

    25

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    proportion for Gen X

    distn of sample proportion

    for Baby Boomers

  • 8/12/2019 S245 12 Sampling Theory

    84/104

    1 2 is Normal with meanD p p Now

    1 2 1 2 1 2 - 0.4 0.2 0.2

    D p p p p p pm m m m

    1 2 1 2

    1 1 2 22 2

    1 2

    1 1

    =D p p p pp p p p

    n ns s s s

    0.4 1 0.4 0.2 1 0.2

    664 3420.028797

    D p p Distribution of

  • 8/12/2019 S245 12 Sampling Theory

    85/104

    0

    2

    4

    6

    8

    10

    12

    14

    16

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    1 2 D p p Distribution of

  • 8/12/2019 S245 12 Sampling Theory

    86/104

  • 8/12/2019 S245 12 Sampling Theory

    87/104

    Sampling distributions

    Summary

  • 8/12/2019 S245 12 Sampling Theory

    88/104

    Distribution for Sample Mean

    the sampling distribution of x

    mean and standard deviationx x

    n

    sm m s

    is a normal distribution with

    If data is collected from a Normal distributionwith mean mand standard deviation sthen:

  • 8/12/2019 S245 12 Sampling Theory

    89/104

    The Central Limit Thereom

    the sampling distribution of x

    mean and standard deviationx x

    n

    sm m s

    is a approximatelynormal (for n > 30) with

    If data is collected from a distribution (possibly nonNormal)with mean mand standard deviation sthen:

  • 8/12/2019 S245 12 Sampling Theory

    90/104

  • 8/12/2019 S245 12 Sampling Theory

    91/104

    Distribution of a difference in two sample Means

    is Normal with meanD x y

    1 2 -x y x ym m m m m

    2 22 2 1 2=x y x y

    n m

    s ss s s

    Distribution of a difference in two sample proportions

  • 8/12/2019 S245 12 Sampling Theory

    92/104

    1 2 is Normal with meanD p p

    1 2 1 2 1 2 -

    p p p p p pm m m

    1 2 1 2

    1 1 2 22 2

    1 2

    1 1=

    p p p p

    p p p p

    n ns s s

  • 8/12/2019 S245 12 Sampling Theory

    93/104

    The Chi-square (c2) distribution

    The Chi-squared distribution

  • 8/12/2019 S245 12 Sampling Theory

    94/104

    with

    ndegrees of freedom

    Comment:Ifz1,z2, ...,znare independent

    random variables each having a standardnormal distribution then

    U=

    has a chi-squared distribution with ndegrees of freedom.

    22

    2

    2

    1 nzzz

    The Chi-squared distributionwith

  • 8/12/2019 S245 12 Sampling Theory

    95/104

    0

    0.06

    0.12

    0.18

    0 10 20

    with

    ndegrees of freedom

    n- degrees of freedom

  • 8/12/2019 S245 12 Sampling Theory

    96/104

    2 4 6 8 1 0 1 2 1 4

    0 . 1

    0 . 2

    0 . 3

    0 . 4

    0 . 52 d.f.

    3 d.f.

    4 d.f.

    Statistics that have the Chi squared

  • 8/12/2019 S245 12 Sampling Theory

    97/104

    Statistics that have the Chi-squared

    distribution:

    2

    2 2

    1 1 1 1

    1.c r c r

    ij ij

    ij

    j i j iij

    x Er

    Ec

    The statistic used to detect independence

    between two categorical variables

    d.f. = (r1)(c1)

    Let x1 x2 x denote a sample from the

  • 8/12/2019 S245 12 Sampling Theory

    98/104

    Letx1,x2, ,xn denote a sample from thenormal distribution with mean mand

    standard deviation s, then

    2

    1

    2

    2.

    r

    i

    i

    x x

    Us

    has a chi-square distribution with d.f. = n1.

    2

    2

    ( 1)n s

    s

    Example

  • 8/12/2019 S245 12 Sampling Theory

    99/104

    Suppose thatx1,x2, ,x10is a sample of

    size n = 10 from the normal distribution withmean m=100 and standard deviation s =15.

    2

    1

    1

    r

    i

    i

    x xs

    n

    Suppose that

    p

    is the sample standard deviation.Find 10 20 .P s

    Note

  • 8/12/2019 S245 12 Sampling Theory

    100/104

    2

    1 2

    r

    i

    i

    x x

    Us

    has a chi-square distribution with

    d.f. = n1 = 9

    2

    2

    ( 1)n s

    s

    210 20 100 400P s P s

    2

    2

    (9)

    (15)

    s

    22 2 2

    9 100 9 400915 15 15

    sP

    4 16P U

  • 8/12/2019 S245 12 Sampling Theory

    101/104

    The excel function

  • 8/12/2019 S245 12 Sampling Theory

    102/104

    4 16P U

    CHIDIST(x,df) computes P x U

    x

    P x U

  • 8/12/2019 S245 12 Sampling Theory

    103/104

    4 16 CHIDIST(4,9)-CHIDIST(16,9)P U

    = 0.91141 - 0.06688 = 0.84453

  • 8/12/2019 S245 12 Sampling Theory

    104/104

    Statistical Inference

    http://localhost/var/www/apps/conversion/tmp/scratch_5/S245%2013%20Statistical%20Inference.ppthttp://localhost/var/www/apps/conversion/tmp/scratch_5/S245%2013%20Statistical%20Inference.ppt