UL(Inferences From Two Samples)-Salva

download UL(Inferences From Two Samples)-Salva

of 75

Transcript of UL(Inferences From Two Samples)-Salva

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    1/75

    INFERENCES FROM

    TWO SAMPLES

    SALVACION M. VINLUAN DR. IMELDA E. CUARTELUNIVERSITY OF LUZON Professor

    Doctor of Philosophy in Development

    Education

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    2/75

    Definitions

    Testing Two Means, Dependent

    Case: The Mean of the Differences

    Testing of Two Variances

    Testing of Two Means, Independent

    Case: The Differences of the Means

    Testing Two Proportions

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    3/75

    What is Statistical Inference?

    This refers to the process of

    drawing conclusions from data that is

    subject to random variation, for

    example, observational errors or

    sampling variation.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    4/75

    More substantially, the term

    statistical inference, statistical

    induction and inferential statistics

    are used to describe systems of

    procedures that can be used to drawconclusions from datasets arising

    from systems affected by random

    variation, such as observationalerrors, random sampling, or random

    experimentation.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    5/75

    Initial requirements of such a

    system of procedures for inferenceand induction are that the system

    should produce reasonable

    answers when applied to well-defined situations and that it

    should be general enough to be

    applied across a range ofsituations.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    6/75

    The outcome of statistical

    inference may be an answer to thequestion what should be donenext?, where there might be a

    decision about making furtherexperiments or surveys, or about

    drawing conclusions before

    implementing some organizationalor governmental policy.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    7/75

    Statistical inference

    refers to the statistical methodconcerned with making estimates

    of population value. This

    particular method and processwill help us determine how

    accurate and acceptable our

    generalizations are.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    8/75

    Testing Two Means, Dependent

    Case: The Mean of theDifference

    There are two possible caseswhen testing population means,

    the dependent case and the

    independent case.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    9/75

    The t-Test for Dependent

    Samples

    The t-Test for dependent samples is

    applied to matched pairs or correlated

    samples. The samples are supposedly

    taken from one population. For example,in the research study on the degree of

    seriousness of problems encountered

    by college freshmen, data were takenbefore and after their individual

    counseling sessions.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    10/75

    From the 15-item problem

    checklist, the corresponding degree

    of seriousness of problems

    encountered by the students before

    and after their individual sessions

    comprises the data to be

    compared. This is also referred to

    as repeated measures.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    11/75

    To compute the t-value for dependent

    samples, the formula is as follows

    t = DnD (D)

    n - 1

    where:

    t = t-value

    D = differencen = number of cases

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    12/75

    Example:

    Student Before After

    1 3.00 4.00

    2 3.25 3.50

    3 3.00 3.50

    4 2.50 3.605 2.75 3.45

    6 2.50 2.75

    7 2.75 4.758 3.75 2.00

    9 3.50 3.51

    10 3.60 4.50

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    13/75

    Solution:

    Step 1. Ho : P1 = P2

    H1 : P1 P2

    Step 2: = 0.05 (two-tailed test)

    Step 3: Reject the null hypothesis if the computed

    t-value is greater than the critical t-value

    of1.833 at =0.05 with nine degrees offreedom.

    Step 4: t-test for dependent samples

    t = D

    n D (D)

    n - 1

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    14/75

    Student Before (X1) After(X2) D D2

    1 3.00 4.00 -1.00 1.0000

    2 3.25 3.50 0.25 0.0625

    3 3.00 3.50 0.50 0.2500

    4 2.50 3.60 1.10 1.2100

    5 2.75 3.45 0.70 0.49006 2.50 2.75 0.25 0.0625

    7 2.75 4.75 -2.00 4.0000

    8 3.75 4.00 0.25 0.0625

    9 3.50 3.51 0.01 0.0001

    10 3.60 4.50 0.90 0.8100

    6.96 7.9476

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    15/75

    t = DnD (D)

    n - 1

    t = 6.96

    10(7.9496) - (6.96)

    10- 1

    t = 3.74

    http://www.ehow.com/how_7213406_calculate-t_statistic.html
  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    16/75

    The degree of freedom:

    df= 10-1df= 9

    Step 5: Reject the null hypothesis since

    the computed value of3.74 is

    greater than the critical value of

    1.833 at = 0.05 with 9 degreesof freedom.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    17/75

    Conclusion:

    There is a significant difference

    between the degrees of

    seriousness of problems

    encountered by the students before

    and after their exposures to

    individual counseling sessions.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    18/75

    Testing Two Variances

    A technique commonly used under F-test

    is referred the Analysis of Variance

    (ANOVA), since it deals with the ratiobetween the variability occurring among

    the different groups or treatments against

    the variability occurring within the

    members of each of the groups ortreatments.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    19/75

    Moreover, it is assumed that the different groupsmust have equal variances. The following is an

    example of the data layout for this test.

    Groups or Treatments

    1 2 3 n

    1 X11 X12 X13 X1n2 X21 X22 X23 X2n

    Replicates3 X31 X32 X33 X3n

    .

    .m Xm1 Xm2 Xm3 Xmn

    Totals T1 T2 T3 Tmn

    Means X1 X2 X3 Xn

    Sum Squares SS1 SS2 SS3 SSn

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    20/75

    where:

    m = is the number of replicates

    n = is the number of groups or treatments

    Xjk = is the observation belonging to the jth

    replicate of the kth treatment or group

    1 j m and 1 k n

    Tk = is the total of all the kth treatments

    from each of the m replicates, i.e.

    Tk=Xjk

    j=1

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    21/75

    Xk is the mean of the kth treatments

    from each of the m replicate; i.e.

    Xk = Xjkm

    SSk is the sum of the squares of all thekth treatments from each of the m

    replicates; i,e.

    SSk= Xjkj=1

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    22/75

    To perform the test of hypothesis

    comparing more than two samplemeans the following steps and needed

    information must be noted.

    Ho: All the means of the different groups

    or treatments are the same.

    Ha: At least one mean is different fromthe other means.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    23/75

    The rejection criteria from this test is

    stated as follows:Reject Ho is Fc > F(df1,df2). Do not reject Ho

    otherwise.

    where:

    F(df1,df2)is the critical value, is the levelof significance, df1 = n-1, df2 = p-n, p =

    mn; m = number of replicates and n =

    number of groups or treatments. Fc is

    the computed test statistic.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    24/75

    In computing for the test statistic, we

    construct the ANOVA table as follows:

    Analysis of VarianceSources

    of

    Variation

    df Sum of

    Squares

    Means

    Squares

    F Ratio

    Between

    Groups

    Within

    Groups

    Total

    df1 = n-1

    df2 = p-n

    P-1

    SSB

    SSW

    SS

    MSB

    MSW

    Fc

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    25/75

    where:

    SSB - sum of squares between groups

    SSW -sum of squares within groups

    MSB - mean squares between groups

    MSW -means squares within groupsSST - total sum of squares

    Fc - test statistic

    n - number of groups or treatmentsm - number of replicates

    p - mn

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    26/75

    a) the correction factor (CF), given by

    nCF= Tk

    P

    b) the total sum of squares (SST)

    SST=SSk-CF

    c) the sum of squares between groups (SSB)

    SSB = Tk - CF

    k=1 m

    d) the sum of squares within groups (SSW)

    SSW = SST - SSB

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    27/75

    e) the mean squares between groups (MSB)

    MSB = SSB = SSBdf1 n-1

    f) the mean squares within groups (MSW)

    MSW = SSW = SSW

    df1 n-1

    g) the Fc, referred to as the test statistic is

    defined by

    Fc = MSB

    MSW

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    28/75

    Example:

    Five bus companies were selectedin order to determine if there is a

    difference in the number of hours

    travelling a 200 kilometer path from

    place A to place B at 5% level of

    significance. The data sets were

    given as follows:

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    29/75

    Bus Companies

    u v w x y

    Travel 1 2.7 3.0 3.4 4.9 4.6

    time of 2 2.8 4.2 3.6 2.6 3.8

    buses 3 4.0 4.3 4.6 5.1 4.6

    4 3.5 4.1 2.8 3.5 3.2

    5 3.8 3.2 5.0 3.1 2.9

    6 4.9 4.0 2.3 3.2 2.5

    Totals 21.70 22.80 21.70 22.40 21.60Sum of

    squares 81.83 88.18 83.81 88.88 81.66

    Means 3.62 3.80 3.62 3.73 3.60

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    30/75

    Solution:

    Statement of Hypothesis:

    Ho: The five bus companies have the sametraveling time in hours from place A to

    place B.

    Ha: At least one of the five bus company has

    different traveling time in hours.

    Critical Region and Criteria for rejection

    Level of significance: = 5

    Test: F-test

    df: df1= n-1= 5-1=4

    df2= p-1= 30-5=25 ; p=mn

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    31/75

    Criteria for rejection:

    Reject the Ho

    is Fc

    > F0.05(4.25)

    Do not reject Ho otherwise

    Computation:

    Test Statistic: FcCF= Tk = (21.70+22.80+21.70+22.40+21.60)

    30 30

    = (110.2)

    30

    = 404.80

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    32/75

    SST=SSk - CF

    =(81.83+88.18+83.81+88.88+81.66)- 404.80

    = 19.56

    SSB= (Tk) - CF

    k=1 m

    = 21.70+22.80+21.70+22.40+21.60 - 404.80

    6 6 6 6 6

    = 404.99 404.80= 0.19

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    33/75

    MSB = 0.19 = 0.0475

    5 -1

    SSW = SST SSB

    = 19.56 0.19

    = 19.37

    MSW = 19.37 = 0.7748

    30-5

    .:F = 0.475 = 0.0613

    0.7748

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    34/75

    Analysis of Variance

    Sources of

    variation

    df Sum of

    squares

    Means

    squares

    F Ratio

    Between

    BusCompany

    Within Bus

    Company

    4

    25

    0.19

    19.37

    0.0475

    0.7748

    0.0613

    Total 29 19.56

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    35/75

    F0.05(4.25)= 2.76

    Conclusion:

    Since the test statistic does not exceeds

    the critical value, the Ho is not rejected.

    Thus, the traveling time in hours of thefive bus companies are the same from

    place A to place B at 5% level of

    significance.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    36/75

    Example:

    Three teachers taught statistics to three sections. The

    final grades of six students per section and somedescriptive were shown below. Can we say that there

    is a difference between the scores of each section?

    Test at 5% level of significance.

    Sections Students final grade Total SS Mean

    A

    B

    C

    87

    75

    90

    81

    84

    82

    86

    83

    89

    91

    86

    86

    87

    79

    80

    90

    83

    82

    522

    490

    509

    45,476

    40.096

    43,265

    87.00

    81.67

    84.83

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    37/75

    Solution:Statement of Hypothesis:

    Ho: The three sections have the same meanfinal grades in Statistics

    Ha: At least one of the three sections has

    different mean final grades in Statistics

    Critical Region and Criteria for rejection

    Level of significance: = 5%

    Test: F-test

    df: df1 = n-1= 3-1= 2

    df2 = p-n =18-3 =15

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    38/75

    Criteria for rejection:

    Reject Ho is Fc > F0.05(2.15)

    Do not reject Ho otherwise

    Computation:

    Test Statistic: Fc

    CF=Tk = (522+490+509)

    n 18

    = (1521)18

    = 128,524.50

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    39/75

    SST= SSk CF

    k=1

    = (45,476+40,096+43,265)-128,524.50

    = 128,837 128,524.50

    = 312.5

    SSB= (Tk) - CF

    k=1 m

    = 522 + 490 + 509 -128,524.5

    6 6 6

    = (45,414 + 40,016.67 + 43,180.17) 128,524.5= 128,610.84 128,524.5

    = 86.34

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    40/75

    MSB = 86.34 = 43.1700

    3-1

    SSW = SST SSB

    = 312.5 86.34

    = 226.16

    MSW = 226.16 =15.0773

    18-3

    .: F = 43.1700 = 2.8632

    15.0773

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    41/75

    Analysis of Variance

    Sources ofVariation

    df Sum ofSquares

    MeansSquares

    F Ratio

    Between Sections

    Within Sections

    2

    15

    86.34

    226.16

    43.1700

    15.0773

    2.8632

    Total 17 312.50

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    42/75

    F0.05(2.15) = 3.68

    Conclusion:

    Since the test statistic does not exceeds

    the critical value, we have to accept Ho.Thus, the final grades of the students in

    the three sections are the same at 5%

    level of significance.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    43/75

    Testing Two Means, Independent Case:

    The Difference of the Means

    The independent t-test is used when the two

    sample means are taken from separate

    groups of respondents or populations. For

    example, the researcher is interested tocompare the research skills of assistant

    professors and associate professors in a

    certain university. The assistant professors

    comprise one group and the associateprofessors another group.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    44/75

    t = X1X2S1 + S2

    n1 n2

    where:

    X1 = Mean of the first group

    X2 = Mean of the second group

    S1 = Variance of the first group

    S2 = Variance of the second group

    n1 = Number of cases in the first group

    n2 = Number of cases in the second

    group

    http://www.ehow.com/how_7213406_calculate-t_statistic.html
  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    45/75

    Example:

    In a study conducted to determine theresearch skills of assistant and associate

    professors in state universities and colleges,

    the following data represent the mean scores of

    the professors:

    Assistant Prof.(X1) 3.0 4.2 2.75 3.50 4.50 2.50 2.60

    Associate Prof.(X2) 3.75 4.50 3.50 4.00 4.00 3.50 3.00

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    46/75

    Solution:

    Step 1: State the hypotheses

    Ho: There is no significant difference

    between the research skills of

    assistant and associate

    professors.

    H1: There is a significant difference

    between the research skills of

    assistant and associate

    professors.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    47/75

    Step 2: Set the level of significance

    = 0.05; two-tailed test

    Step 3: Reject Ho if the computed

    value is greater than 1.782.

    Step 4: Compute the value of the test

    statistics from the given data.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    48/75

    Asst. Professor Assoc. Professor

    (X1) (X2)

    3.00 3.75

    4.20 4.50

    2.75 3.50

    3.50 4.00

    4.50 4.002.50 3.50

    2.60 3.00

    __________ ___________

    X1 = 23.05 X2= 26.26X1 = 3.29 X2 = 3.75

    S1 = 0.64 S2=0.23

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    49/75

    t = X1 - X2

    S1 + S2

    n2 n2

    t = 3.29 -3.75

    .64 + .23

    7 7t = -.46

    .12t = - 1.3143

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    50/75

    The degree of freedom:df = n1 + n2 n2

    df = 7 + 7 2df = 14 2

    df = 12

    Step 5: In as much as the computed t- value of1.3143

    is lower than the critical value, then there is nosufficient evidence to reject the null hypothesis.

    Step 6: State the conclusion.

    Since the computed t-value is lower than the critical t-

    value, it means that there is no significant differencebetween the research skills of assistant and associate

    professors. Thus, they manifest comparable research

    skills.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    51/75

    Example:A researcher wishes to compare the post-test

    performance of two groups of students classified

    according to their learning styles assimilators andconvergers. Ten students were assigned per group.

    After their exposures to simplified instructional

    materials on Basic Statistics, the assimilators mean

    post-test performance was 20, with a standarddeviation of 8.92, while the convergers mean post-test performance was 26.26, with a standard

    deviation of 8.67. Test the significant difference

    between the two group means.

    http://www.ehow.com/how_5145417_calculate-ttest.html
  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    52/75

    Solution:

    Step 1:

    Ho: There is no significant difference in

    the mean post-test performance of the

    two groups of students classified

    according to their learning styles.

    H1: There exists significant difference in

    the mean post-test performance of thetwo groups of students classified

    according to their learning styles.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    53/75

    Step 2: = 0.05 (two-tailed test)

    Step 3: Reject the null hypothesis (Ho) if

    the computed t-value is greater

    than the critical t-value of1.734at = 0.05 with 18 degrees offreedom.

    Step 4: t-test for independent samples

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    54/75

    t = X1 X2

    S1 + S2 n1 n2

    Convergers AssimilatorsX1 = 26.26 X2 = 20

    S1 = 8.67 S2 = 8.92

    n1 = 10 n2 = 10

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    55/75

    t = X1 X2S1 + S2

    n1 n2t = 26.26 20

    8.67 + 8.92

    10 10t = 6.26

    1.759

    t = 6.26 t = 4.711.33

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    56/75

    The degree of freedom:

    df = n1+ n2 - 2

    df = 10 + 10 2df = 20 2

    df = 18

    Step 5: Reject the null hypothesis since the computed

    t-value of4.71 is greater than 1.734.

    Conclusion:

    There exists a significant difference between the mean

    post-test performance of the two groups of students

    classified according to their learning styles after their

    exposures to simplified instructional materials in Basic

    Statistics.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    57/75

    Testing Two Proportions

    Remember that the normal distributioncan be used to approximate the binomial

    distribution in certain cases. Specifically,

    the approximation was considered goodwhen np and nq were both at least 5.

    Now, we're talking about two

    proportions, so np and nq must be atleast 5 for both samples.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    58/75

    We don't have a way to specifically

    test two proportions for values, what

    we have is the ability to test thedifference between the proportions.

    So, much like the test for two means

    from independent populations, we

    will be looking at the difference of

    the proportions.

    We will also be computing an

    average proportion and calling it p-

    bar. It is the total number of

    successes divided by the totalnumber of trials. The definitions

    which are necessary are shown to

    the right.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    59/75

    The test statistic used here is

    similar to that for a single

    population proportion, except

    the difference of proportions areused instead of a single

    proportion, and the value of p-

    bar is used instead of p in the

    standard error portion. Since we're using the normal

    approximation to the binomial,

    the difference of proportions has

    a normal distribution. The teststatistic is given.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    60/75

    Some people will be tempted to try to

    simplify the denominator of this test

    statistic incorrectly. It can be simplified,

    but the correct simplification is not to

    simply place the product of p-bar and q-

    bar over the sum of the ns. Rememberthat to add fractions, you must have a

    common denominator, that is why this

    simplification is incorrect.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    61/75

    The correct simplification would be to

    factor a p-bar and q-bar out of the two

    expressions. This is usually the formulagiven, because it is easier to calculate.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    62/75

    Comparing Two Proportions

    A commonly posed question is "Are twoproportions different?" For example, is

    the end-of-course passing rate this year

    significantly different from the end-of-course passing rate last year. This type of

    question involves comparing two

    proportions both of which are estimates,

    and thus the necessary formula will be

    different.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    63/75

    In this case we will develop the theory

    that allows us to apply the hypothesis

    testing procedure to this problem. The

    procedure involves the use of test

    statistic that involves two proportions.But this statistic will still be a z-statistic

    and we will compare it to the normal

    scores, i.e. z-scores.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    64/75

    Theoretical Background

    Given two independent variables that areboth normal, their sum or difference will

    be normal distributed. This idea can

    readily be proven. The important point isthat the difference of two sample

    proportions is an example of the

    difference of two normally distributedvariables. Thus we can standardize them

    and obtain a a-score.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    65/75

    For example, If you have two estimated

    proportions, 1 and 22 and you happen toknow that one has a mean of 0.6 and

    the other a mean of 0.4, then you know

    their difference will have a mean of12= 0.6 0.4 = 0.2. What we now need isthe standard deviation of this difference.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    66/75

    Assuming that each estimate is based on 300

    observations you would also be able to

    determine the variance of the difference. Sincethe variance of each estimated proportion is

    0.60.4 300 = 0.0008. The variance of thedifference would be the sum of the two

    variances or 0.0016. Which means thestandard deviation of the difference is 0.0016= 0.04.

    All this may seem complicated but the bottom

    line is that we know the distribution of ( 1 2).Thus we can calculate confidence intervals

    and do hypothesis tests on this variable.

    f

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    67/75

    Distribution of 1 2Let 1 is a proportion based on a sample size n1 and

    2 is based on a sample of size n2. Assume that bothsample sizes are large enough (>100) then we can

    assume that the estimates of p are normally

    distributes.

    If we assume that the underlying proportions from thetwo samples are the same, i.e. p1 = p2, then ( 12)will have:

    a normal distribution (i.e. the variable can be

    standardized to a z-score.)

    a mean = 0

    an approximate variance = (1 )( 1 n1 + 1 n2)where is the estimate of the proportions.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    68/75

    This fact will be useful to deriving a test-

    statistic for proportion. In this case 95%

    of the time the corresponding z-scoreshould not deviate more than 1.96 units

    from 0. This z-score also called the test

    statistics for comparing two proportionsis:

    z =

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    69/75

    If we do not assume that the two

    proportions are the same then ( 12) canestimate their difference. In this case thisexpression will have:

    a normal distribution. (i.e. we can use z-scores to find the error margin.)

    a mean, =p1p2

    an approximate variance of = 1(1 1)n1 + 2(1 2)n2.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    70/75

    This means if we want to estimate the

    difference (or gap) between twoproportions, p1 and p2 then the 95%

    confidence interval will be:

    ( 1 2) 1.96 p1p2 ( 1 2) + 1.96

    Example: The five- step hypothesis test for

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    71/75

    Example: The five step hypothesis test for

    two proportions.Example:

    Test the assertion that the EOG Mathematics test of

    female fifth grade students is no different than that of

    the male students. (Data for Asheville schools, 2008.

    NC DPI)

    STEP I: Set up two opposing hypotheses.

    Let p1 be the proportion of female students passing

    andp2be the proportion of male student.

    H0 : p1 = p2 VS Ha : p1p2

    STEP II G t d t

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    72/75

    STEP II: Get data.

    Even though all students in each grade are tested, you

    still only have a sample since your population is all

    students that could potentially be taught in the

    Asheville school system. According to NC Department

    of Public Instruction 89 of 126 (or 70.6%) female

    students passed the test and 119 of 142 (or 83.8%)

    male students passed the test.# Passed Sample Size Estimated

    Proportion

    Female

    Students

    89 n1 = 126

    1 = 0.706

    Male Students 119 n2 = 142

    2 = 0.838

    All Students 208 n = 268

    3 = 0.776

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    73/75

    STEP III: Decide on a statistical test.

    We will use the two proportion z test given in this

    lesson.STEP IV: Calculate your test statistic.

    Every test has a formula that standardizes the

    estimator, in this case that estimator

    is . The statistic is:Z statistic =

    Z- stat = (0.706 0.838)0.7760.224(1126 + 1142)

    Z-stat0.1320.0026 2.587.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    74/75

    STEP V: Arrive at a conclusion and state it in

    clear English.

    Remember that the two critical values for the differentlevels of certainty are: z0.025 = 1.96 orz0.005 = 2.576.

    Since |z-stat| >z*(the critical value) we reject the

    Null Hypothesis.

    Conclusion:

    With 99% certainty we can state that there is astatistically significant larger proportion of boys

    passing the fifth grade EOC in mathematics in the

    Asheville school district.

  • 7/28/2019 UL(Inferences From Two Samples)-Salva

    75/75

    Thank you