Probability and Statstical Inference 6

download Probability and Statstical Inference 6

of 54

Transcript of Probability and Statstical Inference 6

  • 8/12/2019 Probability and Statstical Inference 6

    1/54

    PROBABILITY & STATISTICALINFERENCE LECTURE 6

    MSc in Computing (Data Analytics)

  • 8/12/2019 Probability and Statstical Inference 6

    2/54

  • 8/12/2019 Probability and Statstical Inference 6

    3/54

    General Steps in Hypotheses testing

    1. From the problem context, identify the parameter of interest.

    2. State the null hypothesis, H0.

    3. Specify an appropriate alternative hypothesis, H1.

    4. Choose a significance level, .

    5. Determine an appropriate test statistic.6. State the rejection region for the statistic.

    7. Compute any necessary sample quantities, substitute these into theequation for the test statistic, and compute that value.

    8. Decide whether or not H0should be rejected and report that in theproblem context.

  • 8/12/2019 Probability and Statstical Inference 6

    4/54

    Type of questions that can be answered with Two sample

    hypothesis tests

    A manufacturing plant want to compare thedefective rate of items coming off two different

    process lines.

    Whether the test results of patients who received a

    drug are better than test results of those who

    received a placebo.

    The question being answered is whether there is a

    significant (or only random) difference in theaverage cycle time to deliver a pizza from Pizza

    Company A vs. Pizza Company B.

  • 8/12/2019 Probability and Statstical Inference 6

    5/54

    Difference in Means of Two Normal Distributions, Variances

    Known

  • 8/12/2019 Probability and Statstical Inference 6

    6/54

    Test Assumptions

  • 8/12/2019 Probability and Statstical Inference 6

    7/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    8/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    9/54

    Example

    The P-Valueis the exact significance level of a statistical test; thatis the probability of obtaining a value of the test statistic that

    is at least as extreme as that when the null hypothesis is true

  • 8/12/2019 Probability and Statstical Inference 6

    10/54

    Confidence Interval on a Difference in Means, Variances

    Known

  • 8/12/2019 Probability and Statstical Inference 6

    11/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    12/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    13/54

    Difference in Means of Two Normal Distributions,

    Variances unknown

    We wish to test:

    The pooled estimator of 2:

  • 8/12/2019 Probability and Statstical Inference 6

    14/54

    Difference in Means of Two Normal Distributions,

    Variances unknown

  • 8/12/2019 Probability and Statstical Inference 6

    15/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    16/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    17/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    18/54

    Confidence Interval on the Difference in Means, Variance

    Unknown

  • 8/12/2019 Probability and Statstical Inference 6

    19/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    20/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    21/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    22/54

    Practical Hypothesis Testing

    1.

    From the problem context, identify the parameter ofinterest.

    2. State the null hypothesis, H0.

    3. Specify an appropriate alternative hypothesis, H1.

    4. Choose a significance level, .

    5. Calculate the P-value using a software package of choice.

    6. Decide whether or not H0should be rejected and report

    that in the problem context. Reject H0when P-Value is lessthan .

    (Golden rule: Reject H0for small )

  • 8/12/2019 Probability and Statstical Inference 6

    23/54

    Some Reserach

    Look up the correct formula for calculating thehypotheses test between two proportions

    What are the assumptions for the test

    Find an example of the research

  • 8/12/2019 Probability and Statstical Inference 6

    24/54

    Answer to research

    Large-sampletest on the difference in populationproportions

  • 8/12/2019 Probability and Statstical Inference 6

    25/54

    Example

    Example of large-sampletest on the difference inpopulation proportions

  • 8/12/2019 Probability and Statstical Inference 6

    26/54

  • 8/12/2019 Probability and Statstical Inference 6

    27/54

  • 8/12/2019 Probability and Statstical Inference 6

    28/54

    Analysis of Variance

  • 8/12/2019 Probability and Statstical Inference 6

    29/54

    Introduction

    In the previous section we were concerned with theanalysis of data where we compared the sample

    means.

    Frequently data contains more that two samples,

    they may compare several treatments.

    In this lecture we introduce statistical analysis that

    allows us compare the mean of more that two

    samples. The method is called Analysis of Variance or AVOVA for short.

  • 8/12/2019 Probability and Statstical Inference 6

    30/54

    Total Sum of Squares

    Data set:

    14, 12, 10, 6 ,4, 2

    Group A:

    6 ,4, 2

    Group B:

    14, 12, 10

    Overall Mean : 8

    Total Sum of Squares:

    SST= (14-8)2 + (12-8)2 +

    (10-8)2 + (6-8)2 + (4-8)2 +(2-8)2 =112

  • 8/12/2019 Probability and Statstical Inference 6

    31/54

    Between Group Variation

    Sum of Squares of the

    Model:

    SSm= na( - a)2 + nb( -

    b)2

    =3*(8-4)2 +3*(8-12)2

    =96

  • 8/12/2019 Probability and Statstical Inference 6

    32/54

    Within Group Variation

    Sum of Squares of theError:

    SSe=

    = (14-12)2 + (12-12)2 +(10-12)2 + (6-4)2 + (4-

    4)2 + (4-2)2 +

    = 16

    2

    1 1

    _ _

    )(

    k

    i

    n

    j

    jij xx

  • 8/12/2019 Probability and Statstical Inference 6

    33/54

    Structure of the Data

    Group Observation Total Mean

    1 x11 x12 .......... x1n x1

    2 x21 x22..........

    x2n x2

    .

    .

    .

    ..........

    a xa1

    xa2

    .......... xan

    xa

    Total

    1x

    2x

    ax

    x

  • 8/12/2019 Probability and Statstical Inference 6

    34/54

    ANOVA Table

    Source Degrees of

    Freedom

    Sum Of Squares Mean

    Square

    F- Stat

    Model a - 1 SSM /(a-1) MSM / MSE

    Error n-aSSE /(n-a)

    Total n-1SST /(n-1)2

    1

    )( xx

    n

    i

    i

    a

    j

    jj xxn

    1

    2)(

    2

    1 1

    _ _

    )(

    a

    i

    n

    j

    jij xx

    Where : n is the sample size and a is the number of

    groups

  • 8/12/2019 Probability and Statstical Inference 6

    35/54

    ANOVA TableOriginal Example

    Source Degrees of

    Freedom

    Sum Of Squares Mean

    Square

    F- Stat

    Model 2 - 1 = 1 96 96 24

    Error 62 = 4 164

    Total 61 = 5 112

    Where : n is the sample size and a is the number of groups

  • 8/12/2019 Probability and Statstical Inference 6

    36/54

    Model Assumptions

    Independence of observations within and betweensamples

    normality of sampling distribution

    equal variance - This is also called the

    homoscedasticity assumption

  • 8/12/2019 Probability and Statstical Inference 6

    37/54

    The ANOVA Equation

    We can describe the observations in the abovetable using the following equation:

    nj

    ai

    Y ijiij ,......,2,1

    ,......,2,1

    Where : n is the sample size and k is the number of groups

  • 8/12/2019 Probability and Statstical Inference 6

    38/54

    ANOVA Hypotheses

    We wish to test the hypotheses:

    The analysis of variance partitions the total variability

    into two parts.

  • 8/12/2019 Probability and Statstical Inference 6

    39/54

    Example

  • 8/12/2019 Probability and Statstical Inference 6

    40/54

    Graphical Display of Data

    Figure 13-1 (a)Box plots of hardwood concentration data. (b) Display of

    the model in Equation 13-1 for the completely randomized single-factor

    experiment

  • 8/12/2019 Probability and Statstical Inference 6

    41/54

    Example

    We can use ANOVA to test the hypotheses thatdifferent hardwood concentrations do not affect the

    mean tensile strength of the paper. The hypotheses

    are:

    The ANOVA table is below:

  • 8/12/2019 Probability and Statstical Inference 6

    42/54

    Example

    The p-value is less than 0.05 therefore the H0 canbe rejected and we can conclude that at least one

    of the hardwood concentrations affects the mean

    tensile strength of the paper.

  • 8/12/2019 Probability and Statstical Inference 6

    43/54

    Test Model Assumptions

    Use the Bartletts Test to test for homoscedasticityassumption

    Bartlett's test (Snedecor and Cochran, 1983) is used

    to test if ksamples have equal variances.

    Bartlett's test is sensitive to departures from

    normality. That is, if your samples come from non-

    normal distributions, then Bartlett's test may simply

    be testing for non-normality. The Levene test is analternative to the Bartlett test that is less sensitive to

    departures from normality.

  • 8/12/2019 Probability and Statstical Inference 6

    44/54

    Barlett Test for Equal Variance

    The hypotheses for the Barlett test are as follows:

    The barlett test statistic follows a chi-squared

    distribution

    Interpert the p-value like any other hypothese test

    ji,paironleastatfor:H

    ...:H

    22

    i1

    222

    210

    j

    n

    If the Assumption of Equal Variance is

  • 8/12/2019 Probability and Statstical Inference 6

    45/54

    If the Assumption of Equal Variance is

    not met

    If the assumption for equal variance is not met usethe Welches ANOVA

    Assignment for next week:

    Investigate the difference between the standard

    ANOVA and Welches ANOVA?

  • 8/12/2019 Probability and Statstical Inference 6

    46/54

    Demo

  • 8/12/2019 Probability and Statstical Inference 6

    47/54

    Confidence Interval about the mean

    For 20% hardwood, the resulting confidence interval on the mean is

  • 8/12/2019 Probability and Statstical Inference 6

    48/54

    Confidence Interval about on the difference of two treatments

    For the hardwood concentration example,

  • 8/12/2019 Probability and Statstical Inference 6

    49/54

    Multiple Comparisons Following the

  • 8/12/2019 Probability and Statstical Inference 6

    50/54

    Multiple Comparisons Following the

    ANOVA The least significant difference (LSD) is

    If the sample sizes are different in each treatment:

  • 8/12/2019 Probability and Statstical Inference 6

    51/54

    Example: Multi-comparison Test

  • 8/12/2019 Probability and Statstical Inference 6

    52/54

    Example: Multi-comparison Test

  • 8/12/2019 Probability and Statstical Inference 6

    53/54

    Demo

  • 8/12/2019 Probability and Statstical Inference 6

    54/54

    Exercises