Fritz Scholz — Fall 2006faculty.washington.edu/fscholz/DATAFILES498B2008/Review.pdf · 2012. 3....

76
Experiments and Observational Studies Applied Statistics and Experimental Design Fritz Scholz — Fall 2006 In observational studies we obtain measurements on several variables. Sampling could be random or not. We observe what is in the sample, no manipulation of factors by any experimenter. Factor levels may be chosen by hidden agendas. It is not clear which variables have an effect on which other variables if we observe any correlations. Cause and Effect unclear. There may be unmeasured factors that affect seemingly correlated variables. In a “controlled” experiment we control certain input variables and determine their effect on response variables. We have to guard against subconscious effects when “controlling” inputs. = randomization! 1

Transcript of Fritz Scholz — Fall 2006faculty.washington.edu/fscholz/DATAFILES498B2008/Review.pdf · 2012. 3....

  • Experiments and Observational Studies

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    In observational studies we obtain measurements on several variables.

    Sampling could be random or not. We observe what is in the sample,

    no manipulation of factors by any experimenter.

    Factor levels may be chosen by hidden agendas.

    It is not clear which variables have an effect on which other variables

    if we observe any correlations. Cause and Effect unclear.

    There may be unmeasured factors that affect seemingly correlated variables.

    In a “controlled” experiment we control certain input variables and

    determine their effect on response variables.

    We have to guard against subconscious effects when “controlling” inputs.

    =⇒ randomization!1

  • Steps in Designing of Experiments (DOE)

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    1. Be clear on the goal of the experiment. Which questions to address?Set up hypotheses about treatment/factor effects, a priori.Don’t go fishing afterwards! It can only point to future experiments.

    If you torture the data long enough, they will confess to anything.

    2. Understand the experimental units over which treatments will be randomized.Where do they come from? How do they vary? Are they well defined?

    3. Define the appropriate response variable to be measured.

    4. Define potential sources of response variationa) factors of interest (to be manipulated)b) nuisance factors (to be randomized)

    5. Decide on treatment and blocking variables.

    6. Define clearly the experimental process and what is randomized.

    2

  • Three Basic Principles in Experimental Design

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Replication:repeat experimental runs under same values for control variables.⇒ understanding inherent variability⇒ better response estimate via averaging.

    Repeat all variation aspects of an experimental run. Not just a repeatmeasure of response, after all aspects of an experimental run are done.

    Randomization:Systematic Confounding between treatment and other factors (hidden or not)unlikely. Removes sources of bias arising from factor/unit interaction.Disperses biases randomly among all units ⇒ error or background noise.Provides logical/probability basis for inference about treatment effects.

    Blocking:Effective when (natural within block variation)/(between block variation) is small.Randomized treatment assignment within ≈ homogeneous blocksTreatment effect more clearly visible against lower within block variation.Separates variation between blocks from treatment effect.

    3

  • Flux ExperimentApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    18 boards are available for the experiment,not necessarily a random sample from all boards (present, past and future).

    Test flux brands X and Y: randomly assign 9 boards each to X & Y (FLUX)

    The boards are soldered and cleaned. Order randomized. (SC.ORDER)

    Then the boards are coated and cured to avoid handling contamination.Order randomized. (CT.ORDER)

    Then the boards are placed in a humidity chamber and measured for SIR.Position in chamber randomized. (SLOT)

    The randomization at the various process steps avoids unknown biases.When in doubt, randomize!

    Randomization of flux assignment gives us a mathematical basisfor judging flux differences with respect to the response SIR.

    4

  • DOE Steps Recapitulated

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    1. Goal of the experiment. Answer question: Is Flux X different from Flux Y?If not, we can use them interchangeably. One may be cheaper than the other.Test null hypothesis H0: No difference in fluxes.

    2. Understand the experimental units:Boards with all processing steps up to measuring response.

    3. Define the appropriate response variable to be measured: SIR

    4. Define potential sources of response variationa) factors of interest: flux typeb) nuisance factors: boards, processing steps, testing.

    5. Decide on treatment and blocking variables.Treatment = flux type, no blocking.With 2 humidity chambers we might have wanted to block on those.

    6. Define clearly the experimental process and what is randomized.Treatments and all nuisance factors are randomized.

    5

  • Flux DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    BOARD FLUX SC.ORDER CT.ORDER SLOT SIR1 Y 13 14 5 8.62 Y 16 8 6 7.53 X 18 9 15 11.54 Y 11 11 11 10.65 X 15 18 9 11.66 X 9 15 18 10.37 X 6 1 16 10.18 Y 17 12 17 8.29 Y 5 10 13 10.010 Y 10 13 14 9.311 Y 14 5 10 11.512 X 12 17 12 9.013 X 4 7 3 10.714 X 8 6 1 9.915 Y 3 2 4 7.716 X 7 3 2 9.717 Y 1 16 8 8.818 X 2 4 7 12.6

    see Flux.csv

    or flux

    6

  • Flux Experiment: First Boxplot Look at SIR DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    X Y

    89

    10

    11

    12

    Flux

    SIR

    ( l

    og

    10(O

    hm

    ) )

    FLUXY − FLUXX = −1.467

    ●●

    7

  • Flux Experiment: QQ-Plot of SIR DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    8 9 10 11 12

    89

    10

    11

    12

    SIR with Flux X ( log10(Ohm) )

    SIR

    with

    Flu

    x Y

    (

    lo

    g1

    0(O

    hm

    ) )

    8

  • QQ-Plot of SIR Data (Higher Perspective?)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    ● ●●

    ●●●

    0 5 10 15 20

    05

    10

    15

    20

    SIR with Flux X ( log10(Ohm) )

    SIR

    with

    Flu

    x Y

    (

    lo

    g1

    0(O

    hm

    ) )

    9

  • Some QQ-Plots from N(0,1) Samples (m=9, n=9)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    ●● ●

    ●●●

    y − x = −0.792

    ●●

    ● ●

    ●●

    ●y − x = 0.926

    ●●

    ● ●

    ●● ●

    y − x = −0.57

    ● ●●●

    ●●

    ●y − x = −0.394

    ●●

    ●●

    ●●

    ●●

    y − x = −0.62

    ● ●●

    ●●

    ●y − x = 0.115

    ●●●

    ● ●

    y − x = −0.625

    ● ●

    ●●

    ● ● ●●

    y − x = −1.12

    ●●

    ●●

    y − x = 0.584

    ●●

    ●● ●●

    y − x = −0.647

    ●●

    ● ●● ●

    ●y − x = 0.41

    ●●

    ●●●

    ●●

    ● ●

    y − x = −1.33

    ● ●

    ● ●●

    ●●

    ●●

    y − x = −0.667

    ●●

    ● ●

    ● ●

    ●y − x = −0.31

    ● ●

    ● ●● ●

    ● ● ●

    y − x = −0.757

    ● ●● ●

    ●●

    ●y − x = 0.845

    ●●●y − x = 0.165

    ●●

    ● ●

    ●y − x = 1.07

    ●●●

    ●●

    ●●

    ●y − x = −0.194

    ●●

    ● ●

    ● ●

    ●● ●

    y − x = −0.253

    10

  • Is the Difference Ȳ − X̄ =−1.467 Significant?Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    In comparing SIR for the two fluxes let us focus on the difference of meansFLUXY −FLUXX = Ȳ − X̄ .

    If the use of flux X or flux Y made no difference then we should have seenthe same results for these 18 boards, no matter which got flux X or Y.X or Y is just an artificial “distinguishing” label with no consequence.

    For other random assignments of fluxes, or random splittings of 18 boardsinto two groups of 9 & 9, we would have seen other differences of means.

    There are(18

    9)= 48620 such possible splits. For each split we could obtain Ȳ − X̄ .

    Need the reference distribution of Ȳ − X̄ for all 48620 splits to judge how unusuala random split we had when we got Ȳ − X̄ = −1.467. It was based on a randomsplit by our randomization, i.e., it is one of the 48620 equally likely ones.

    11

  • Some Randomization Examples of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    8.6 8.6 8.6 8.6 8.6 8.6 8.67.5 7.5 7.5 7.5 7.5 7.5 7.511.5 11.5 11.5 11.5 11.5 11.5 11.510.6 10.6 10.6 10.6 10.6 10.6 10.611.6 11.6 11.6 11.6 11.6 11.6 11.610.3 10.3 10.3 10.3 10.3 10.3 10.310.1 10.1 10.1 10.1 10.1 10.1 10.18.2 8.2 8.2 8.2 8.2 8.2 8.210 10 10 10 10 10 109.3 9.3 9.3 9.3 9.3 9.3 9.311.5 11.5 11.5 11.5 11.5 11.5 11.5

    9 9 9 9 9 9 910.7 10.7 10.7 10.7 10.7 10.7 10.79.9 9.9 9.9 9.9 9.9 9.9 9.97.7 7.7 7.7 7.7 7.7 7.7 7.79.7 9.7 9.7 9.7 9.7 9.7 9.78.8 8.8 8.8 8.8 8.8 8.8 8.812.6 12.6 12.6 12.6 12.6 12.6 12.6

    Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄1.1778 0.4222 -0.0889 -0.4000 0.5778 0.7778 0.2000

    12

  • Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Compute Ȳ − X̄ for each of the 48620 possible splits and determine how unusualthe observed difference of −1.467 is.

    This seems like a lot of computing work but it takes just a few seconds in R usingthe function combn of the package combinat.

    Download and install that package first from the contributed packages in CRAN orfrom R packages under STAT 421 site and invoke library(combinat) prior tousing combn.

    randomization.ref.dist=combn(1:18,9,fun=mean.fun,y=SIR)gives the vector of all 48620 such average differences Ȳ − X̄ , wheremean.fun

  • p-Value of Ȳ − X̄ =−1.467Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The function combn goes through all choices of index combinations of 9 values

    taken from 1:18 (referred to as ind in mean.fun).

    For each such index combination it evaluates the mean Ȳ of the SIR values forthose chosen indices and the mean X̄ of the remaining SIR values.It then takes the difference Ȳ − X̄ and outputs all these differences as a vector.

    We find a (two-sided) p-value of .02344 for our observed Ȳ − X̄ =−1.467, i.e.

    mean(abs(randomization.ref.dist)>=1.467)=.02344

    That is the probability of seeing a |Ȳ − X̄ | value as or more extreme than theobserved |ȳ− x̄|= 1.467, when in fact the hypothesis H0 holds true, i.e., under therandomization reference distribution.

    Randomization of fluxes is the logical basis for any such probability statements,

    i.e., calculation of p-values!14

  • Randomization Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Randomization Reference Distribution of SIRY − SIRX

    Y − X = SIRY − SIRX

    De

    nsi

    ty

    −3 −2 −1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    P(Y

    −X

    ≤−

    1.4

    67)=

    0.0

    11

    72 P(Y

    −X

    ≥1

    .46

    7)=

    0.0

    11

    72

    15

  • The p-Value: What it is not!Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The p-value based on some sample or experimental data

    is not the probability that the hypothesis is true.

    The hypothesis is not the outcome of some chance experiment ⇒ no probability!

    The calculation of the p-value assumes that the hypothesis is true!

    It is doubly hypothetical!

    The calculated chance is that of seeing stronger “contradictory evidence” against

    the assumed hypothesis than what was obtained in the observed sample/experiment.

    “Contradictory evidence”⇔ a test statistic that measures strong discrepancy to H0.

    p-values vary from sample to sample, tend to be uniformly distributed under H0.

    A small p-value makes H0 implausible and some alternative more attractive.16

  • Approximation to Randomization Reference Distribution

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    For moderate to large m and n the number of combinations(m+n

    m)

    becomes solarge that it taxes the computing power or storage capacity of the average computer.

    A simple way out is to generate a sufficiently large sample, say M = 10,000 orM = 100,000 of combinations from this set of all

    (m+nm)

    combinations.

    Compute the statistic of interest, s(X i,Y i) = Ȳi− X̄i, i = 1, . . . ,M for eachsampled combination and approximate the randomization reference distribution

    F(z) = P(s(X ,Y )≤ z) by F̂M(z)

    where F̂M(z) is the proportion of s(X i,Y i) = Ȳi− X̄i values that are ≤ z.

    By the law of large numbers (LLN) we have for any z

    F̂M(z)−→ F(z) as M → ∞ i.e. F̂M(z)≈ F(z) for large M.

    17

  • Sample Simulation ProgramApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    This can be done in a loop using the sample function in R.

    simulated.reference.distribution=function(M=10000){

    D.star=NULL

    for(i in 1:M){

    SIR.star=sample(SIR)

    D.star=c(D.star,mean(SIR.star[1:9])-mean(SIR.star[10:18]))

    }

    D.star}

    The following slide shows the QQ-plot comparison with the full randomization

    reference distribution, together with the respective p-values.

    This approach should suffice for practical purposes.

    18

  • QQ-Plot of Ȳ − X̄ for Simulated & Full Randomization Reference Distribution

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    −2 −1 0 1 2

    −2

    −1

    01

    2

    y − x for all combinations

    y−

    x fo

    r a

    ll 1

    00

    00

    sa

    mp

    led

    co

    mb

    ina

    tion

    s

    p̂1 = 0.0099

    p̂2 = 0.0117

    p1 = 0.01172p2 = 0.01172

    19

  • Randomization Distribution of 2-Sample t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    t(X ,Y ) =(Ȳ − X̄)/

    √1/n+1/m√

    [∑ni=1(Yi− Ȳ )2 +∑mj=1(X j− X̄)

    2]/(m+n−2)

    it expresses the difference in averages relative to a measure of sample variability.

    The randomization reference distribution of the t(X ,Y ) values is in one-to-one

    correspondence to the randomization reference distribution of the Ȳ − X̄ values.

    Theory =⇒ The randomization reference distribution of t(X ,Y ) is very well

    approximated by a t-distribution with 16 = 18−1−1 degrees of freedom.

    The test based on t(X ,Y ) and its t-distribution under H0 also shows up

    in a normal population based approach to this problem.

    20

  • QQ-Plot of t(X ,Y ) Randomization Reference DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    −6 −4 −2 0 2 4 6

    −6

    −4

    −2

    02

    46

    t16 quantiles

    ord

    ere

    d r

    an

    do

    miz

    atio

    n t

    −st

    atit

    ics

    21

  • t-Approximation for t(X ,Y ) Randomization Reference DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    2−sample t−statistic randomization reference distribution

    t(X,Y)−statistic

    De

    nsi

    ty

    −6 −4 −2 0 2 4 6

    0.0

    0.1

    0.2

    0.3

    0.4

    t16 density

    22

  • The Randomization Test

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We have obtained the full or simulated randomization reference distribution.

    Thus any extreme value of |Ȳ − X̄ | could either come about due to a rare chanceevent during our randomization step or due to H0 actually being wrong.

    We have to make a decision: Reject H0 or not?

    We may decide to reject H0 when |Ȳ − X̄ | ≥C, where C is some critical value.

    To determine C one usually sets a significance level α which limits the probabilityof rejecting H0 when in fact H0 is true (Type I error). The requirement

    α = P(reject H0 | H0) = P(|Ȳ − X̄ | ≥C | H0) then determines C = Cα .

    23

  • Significance Levels and p-Values

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When we reject H0 we would say that the results were significantat the (previously chosen) level α.

    Commonly used values of α are α = .05 or α = .01.

    Rejecting at smaller α than these would be even stronger evidence against H0.

    Our chance of making a wrong decision (rejecting H0 when true) would be smaller.

    For how small an α would we still have rejected?

    This leads us to the observed significance level or p-value of the test for the givendata, i.e., for the observed discrepancy value |ȳ− x̄|

    p-value = P(|Ȳ − X̄ | ≥ |ȳ− x̄| | H0)

    24

  • How to Determine the p-Value

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We have stated p-values obtained from the full and the simulated (M=10000)reference distributions. How are they obtained?

    Note the following:

    > x=1:10> x>3[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

    > sum(x>3)[1] 7> mean(x>3)[1] 0.7

    Note that x>3 produced a logic vector with same length as x.

    The logic values FALSE and TRUE are also interpreted numerically as 0 and 1,respectively, in arithmetic expressions.

    25

  • How to Determine the p-Value (continued)

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We view the reference distribution as a vector x of numbers for all the differencesof means, Ȳ − X̄ , obtained either for all 48620 possible splits or for the M = 10000simulated splits.

    mean(x=1.467) would give us the respective p-valuesp1 = .01172 and p2 = .01172 for the full reference distribution,and p̂1 = .0099 and p̂2 = .0117 for the simulated reference distribution.

    The simulated distribution is obviously not quite symmetric.

    Rather than adding these 2 p-values to get a 2-sided p-value we can also do thisdirectly via mean(abs(x)>= 1.467)=.02344 for all 48620 splits or

    mean(abs(x)>= 1.467)=.00216 for the M = 10000 simulated splits.

    Here abs(x) gives the vector of absolute values of all components in x.

    26

  • How to Determine the Critical Value C.crit for the Level α TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    For α = .05 we want to find C.crit such that mean(abs(x)>=C.crit)=.05.

    Equivalently, find the .95-quantile of abs(x) via C.crit=quantile(abs(x),.95).

    From the full reference distribution we get C.crit(α = .05) = 1.288889and C.crit(α = .01) = 1.644444.

    From the simulated reference distribution we get C.crit(α = .05) = 1.311111and C.crit(α = .01) = 1.666667.

    27

  • What Does the t-Distribution Give Us?

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    What does the observed t-statistic t(x,y) =−2.513 give as 2-sided p-value?

    We find P(|t(X ,Y )| ≥ 2.513) = 2∗ (1−pt(2.513,16)) = .02306, pretty closeto the .02344 from the full randomization reference distribution.

    What are the critical values tcrit(α) for |t(X ,Y )| for level α = .05, .01 tests?

    We find tcrit(α = .05) = qt(.975,16) = 2.1199 and

    tcrit(α = .01) = qt(.995,16) = 2.9208, respectively.

    With |t(x,y)|= 2.513 we would reject H0 at α = .05 since |t(x,y)| ≥ 2.1199

    but not at α = .01 since |t(x,y)|< 2.9208.

    28

  • Hypothesis TestingApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We have addressed the question: Does the type of flux affect SIR?

    Formally we have tested the

    null hypothesis H0: The type of flux does not affect SIRagainst the

    alternative hypothesis H1: The type of flux does affect SIR.

    While H0 seems fairly specific, H1 is open ended. H1 can be anything but H0.

    There may be many ways for SIR to be affected by flux differences,

    e.g., change in mean, median, or scatter.

    Such differences may show up in data Z through an appropriate test statistic s(Z).

    Here Z = (X1, . . . ,X9,Y1, . . . ,Y9).

    29

  • Test Criteria or Test StatisticsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    In the flux analysis we chose to use the absolute difference of sample means,

    s(Z) = |Ȳ − X̄ |, as our test criterion or test statistic for testing the null hypothesis.

    A test statistic is a value calculated from data and other known entities,

    e.g., assumed parameter values.

    We could have worked with the absolute difference in sample medians or with the

    ratio of sample standard deviations and compared that ratio with 1, etc.

    Different test statistics are sensitive to different deviations from the null hypothesis.

    A test statistic, when viewed as a function of random input data, is itself a random

    variable, and has a distribution, its sampling distribution.

    30

  • Sampling DistributionsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    For a test statistic s(Z) to be effective in deciding between H0 and H1 it is desirablethat the sampling distributions of s(Z) under H0 and H1 are somewhat different.

    Sampling Distribution under H0

    rela

    tive

    frequ

    ency

    90 95 100 105 110 115 120

    0.00

    0.10

    0.20

    Sampling Distribution under H1

    rela

    tive

    frequ

    ency

    90 95 100 105 110 115 120

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    31

  • When to Reject H0Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The previous illustration shows a specific sampling distribution for s(Z) under H1.

    Typically H1 consists of many different possible distributional models leading tomany possible sampling distributions under H1.

    Under H0 we often have just a single sampling distribution, the null distribution.

    If under H1 the test statistics s(Z) tends to have mostly higher values than underH0, we would want to reject H0 when s(Z) is large.

    How large is too large? Need a critical value Ccrit and reject H0 when s(Z)≥Ccrit.

    Choose Ccrit such that P(s(Z)≥Ccrit|H0) = α, a pre-chosen significance level.Typically α = .05 or .01. It is the probability of the type I error.

    The previous illustration also shows that there may be values s(Z) in the overlapof both distributions. Decisions are not clear cut =⇒ type I or type II error

    32

  • Decision TableApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Truth

    Decision H0 is true H0 is false

    accept H0 correct decision type II error

    reject H0 type I error correct decision

    Testing hypotheses (like estimation) is a branch of a more general concept,

    namely decision theory. Decisions are optimized with respect to penalties

    for wrong decisions, i.e., P(Type I Error) and P(Type I Error), or

    the mean squared error of an estimate θ̂ of θ, namely E((θ̂−θ)2).33

  • The Null Distribution and Critical ValuesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Sampling Distribution under H0

    rela

    tive

    freq

    uenc

    y

    90 95 100 105 110 115 120

    0.00

    0.10

    0.20

    0.30

    reject H0accept H0

    type I error

    critical value = 104.9

    significance level α = 0.05

    Sampling Distribution under H1

    rela

    tive

    freq

    uenc

    y

    90 95 100 105 110 115 120

    0.00

    0.10

    0.20

    reject H0accept H0

    type II error

    34

  • Critical Values and p-ValuesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Note that p-value(s(z)) ≤ α is equivalent to rejecting H0 at level α.

    Sampling Distribution under H0

    rela

    tive

    frequ

    ency

    90 95 100 105 110 115 120

    0.00

    0.10

    0.20

    observed value 107.1

    p−value = 0.0097

    critical value = 104.9

    significance level α = 0.05

    Sampling Distribution under H1

    rela

    tive

    frequ

    ency

    90 95 100 105 110 115 120

    0.00

    0.10

    0.20

    reject H0accept H0

    type II error

    35

  • p-Values and Significance LevelsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We just saw that knowing the p-value allows us to accept or reject H0 at level α.

    However, the p-value is more informative than saying that we reject at level α.

    It is the smallest level α at which we would still have rejected H0.

    It is also called the observed significance level.

    Working with predefined α made it possible to choose the best level α test.Best: Having highest probability of rejecting H0 when H1 is true.

    This makes for nice mathematical theory, but p-values should be the preferred way

    of judging and reporting test results.

    36

  • Randomization Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Randomization Reference Distribution of SIRY − SIRX

    Y − X = SIRY − SIRX

    De

    nsi

    ty

    −3 −2 −1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Dcrit =

    −1

    .28

    9

    Dcrit =

    1.2

    89

    ob

    serv

    ed

    D

    = Y

    −X

    =−

    1.4

    67

    α = 0.05

    37

  • The Power FunctionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The probability of rejecting H0 is denoted by β. It is a function of the distributionalmodel F governing Z, i.e., β = β(F). It is called the power function of the test.

    When the hypothesis H0 is composite and when s(Z) has more than one possi-ble distribution under H0 one defines the highest probability of type I error as thesignificance level of the test. Hence α = sup{β(F) : F ∈ H0}.

    For various F ∈ H1 the power function gives us the corresponding probabilities oftype II error as 1−β(F).

    Montgomery unfortunately uses β = β(F) as the symbol for the probability oftype II error. This is not standard.

    38

  • Samples and PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    So far we have covered inference based on a randomization test. This relied heavily

    on our randomized assignment of flux X and flux Y to the 18 circuit boards.

    Such inference can logically only say something about flux differences

    in the context of those 18 boards.

    To generalize any conclusions to other boards would require some assumptions,

    judgement, and ultimately a step of faith.

    Namely, assume that these 18 boards and their processing represent a

    representative sample from a conceptual population of such processed boards.

    For samples to be representative they should be random samples.

    39

  • Conceptual PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Clearly the 18 boards happened to be available at the time of the experiment.

    They could have been a random sample of all boards available at the time.

    However, they also may have been taken sequentially in the order of production.

    They certainly could not be a sample from future boards, yet to be produced.

    The processing aspects were to some extent made to look like a random sampleby the various randomization steps.

    Thus we could regard the 9+9 SIR values as two random samples from twovery large or infinite conceptual populations of SIR values.

    2 populations: all potential boards/processes with flux X or all the same boards/processeswith flux Y. Can’t have it both ways ⇒ further conceptualization.

    40

  • Population Distributions and DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Such infinite populations of Z-values are conveniently described by densities f (z),with the properties f (z)≥ 0 and

    R ∞−∞ f (z)dz = 1.

    The probability of observing a randomly chosen element Z that is ≤ to somespecified value x is then given by

    F(x) = P(Z ≤ x) =Z x−∞

    f (z)dz =Z x−∞

    f (t)dt z & t are just dummy variables

    F(x) as a function of x is also called the cumulative distribution function (CDF)of the random variable Z.

    F(x)↗ from 0 to 1 as x goes from −∞ to ∞.

    41

  • Means, Expectations and VariancesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The mean or expectation of Z or its population is defined by

    µ = µZ = E(Z) =Z ∞−∞

    z f (z)dz≈∑z f (z)∆(z) = ∑zp(z)a probability weighted average of z values.

    It is the center of probability mass balance.

    By extension the mean or expectation of g(Z) is defined by

    E(g(Z)) =Z ∞−∞

    g(z) f (z)dz

    The variance of Z is defined by

    σ2 = var(Z) = E((Z−µ)2

    )=

    Z ∞−∞

    (z−µ)2 f (z)dz

    σ = σZ =√

    var(Z) is called the standard deviation of Z or its population.

    Measure of distribution spread.

    42

  • Multivariate Densities or PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    f (z1, . . . ,zn) is a multivariate density if it has the following properties:

    f (z1, . . . ,zn)≥ 0 for all z1, . . . ,zn andZ ∞−∞

    . . .Z ∞−∞

    f (z1, . . . ,zn) dz1, . . . ,dzn = 1 .

    It describes the behavior of the infinite population of such n-tuples (z1, . . . ,zn).

    A random element (Z1, . . . ,Zn) drawn from such a population is a random vector.

    We say that Z1, . . . ,Zn in such a random vector are (statistically) independent whenthe following property holds:

    f (z1, . . . ,zn) = f1(z1)×·· ·× fn(zn)

    Here fi(zi) is the marginal density of Zi. It is obtainable from the multivariate densityby integrating out all other variables, e.g.,

    f2(z2) =Z ∞−∞

    . . .Z ∞−∞

    f (z1,z2,z3, . . . ,zn) dz1dz3 . . .dzn .

    43

  • Random SampleApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When drawing repeatedly values Z1, . . . ,Zn from a common infinite population withdensity f (z) we get a multivariate random vector (Z1, . . . ,Zn).

    If the drawings are physically unrelated or “independent,” we may consider Z1, . . . ,Znas statistically independent, i.e., the random vector has density

    h(z1, . . . ,zn) = f (z1)×·· ·× f (zn) .

    Z1, . . . ,Zn is then also referred to as a random sample.

    We also express this as Z1, . . . ,Zni.i.d.∼ f .

    Here i.i.d. = independent and identically distributed.

    44

  • Rules of Expectations & Variances (Review)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    For any set of random variables X1, . . . ,Xn and constants a0,a1, . . . ,an we have

    E (a0 +a1×X1 + . . .+an×Xn) = a0 +a1×E(X1)+ . . .+an×E(Xn)

    provided the expectations E(X1), . . . ,E(Xn) exist and are finite.

    This holds whether X1, . . . ,Xn are independent or not.

    For any set of independent random variables X1, . . . ,Xn and constants a0,a1, . . . ,anwe have

    var(a0 +a1×X1 + . . .+an×Xn) = a21× var(X1)+ . . .+a2n×var(Xn)

    provided the variances var(X1), . . . ,var(Xn) exist and are finite. var(a0) = 0.

    This is also true under the weaker (than independence) condition cov(Xi,X j) =E(XiX j)−E(Xi)E(X j) = 0 for i 6= j. In that case X1, . . . ,Xn are uncorrelated.

    45

  • Rules for AveragesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    E (X̄) = E

    (1n

    n

    ∑i=1

    Xi

    )=

    1n

    E

    (n

    ∑i=1

    Xi

    )=

    1n

    n

    ∑i=1

    E(Xi) =1n

    n

    ∑i=1

    µi = µ̄

    whether X1, . . . ,Xn are independent or not.

    If µ1 = . . . = µn = µ then E(X̄) = µ.

    If X1, . . . ,Xn are independent we also have

    var(X̄) = var

    (1n

    n

    ∑i=1

    Xi

    )=

    1n2

    n

    ∑i=1

    var(Xi) =1n2

    n

    ∑i=1

    σ2i =1n

    σ̄2 ↘ 0 as n→ ∞ ,

    where σ̄2 =1n

    n

    ∑i=1

    σ2i . σ̄2 = σ2 when σ21 = . . . = σ

    2n = σ

    2 .

    46

  • A Normal Random SampleApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    X1, . . . ,Xn is called a normal random sample when the common density of the Xi isa normal density of the following form:

    f (x) =1√2πσ

    exp

    (−(x−µ)

    2

    2σ2

    )

    This density or population has mean µ and standard deviation σ.

    When µ = 0 and σ = 1 one calls it the standard normal density

    ϕ(x) =1√2π

    exp

    (−x

    2

    2

    )with CDF Φ(x) =

    Z x−∞

    ϕ(z) dz .

    If X ∼N (µ,σ2) then (X−µ)/σ∼N (0,1).

    ⇒ P(X ≤ x) = P((X−µ)/σ≤ (x−µ)/σ) = Φ((x−µ)/σ).

    47

  • The CLT & the Normal Population ModelApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The normal population model is motivated by the Central Limit Theorem (CLT).

    This comes about because many physical or natural measured phenomena can beviewed as the addition of several independent source inputs or factors.

    Y = X1 + . . .+Xk or Y = a0 +a1X1 + . . .+akXk

    for constants a0,a1, . . . ,ak.

    More generally but also approximately extend this via a 1-term Taylor expansion

    Y = g(X1, . . . ,Xk) ≈ g(µ1, . . . ,µk)+k

    ∑i=1

    (Xi−µi)∂g(µ1, . . . ,µk)

    ∂µi= a0 +a1X1 + . . .+akXk

    provided the linearization provides a good approximation to g.48

  • Central Limit Theorem (CLT) IApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    • Suppose we randomly and independently draw random variables X1, . . . ,Xnfrom n possibly different populations with respective means µ1, . . . ,µn andstandard deviations σ1, . . . ,σn

    • Suppose further that

    max

    (σ2i

    σ21 + . . .+σ2n

    )→ 0 , as n→ ∞

    i.e., none of the variances dominates among all variances

    • Then Yn = X1 + . . . + Xn has an approximate normal distribution with meanand variance given by

    µY = µ1 + . . .+µn and σ2Y = σ21 + . . .+σ

    2n .

    49

  • Central Limit Theorem (CLT) IIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    standard normal population

    x1

    Den

    sity

    −2 0 2 4

    0.0

    0.2

    0.4

    uniform population on (0,1)

    x2

    Den

    sity

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.4

    0.8

    1.2

    a log−normal population

    x3

    Den

    sity

    0.0 0.5 1.0 1.5

    01

    23

    45

    Weibull population

    x4

    Den

    sity

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

    0.0

    0.2

    0.4

    0.6

    0.8

    50

  • Central Limit Theorem (CLT) IIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Central Limit Theorem at Work

    x1 + x2 + x3 + x4

    Den

    sity

    −2 0 2 4 6

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    51

  • Central Limit Theorem (CLT) IVApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    standard normal population

    x1

    Den

    sity

    −4 −2 0 2 4

    0.0

    0.2

    0.4

    uniform population on (0,1)

    x2

    Den

    sity

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.6

    1.2

    a log−normal population

    x3

    Den

    sity

    0.0 0.5 1.0 1.5

    02

    4

    Weibull population

    x4

    Den

    sity

    0.0 0.5 1.0 1.5 2.0 2.5 3.0

    0.0

    0.4

    0.8

    Weibull population

    x5

    Den

    sity

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

    0.0

    0.4

    0.8

    52

  • Central Limit Theorem (CLT) VApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Central Limit Theorem at Work

    x1 + x2 + x3 + x4

    Den

    sity

    −2 0 2 4 6

    0.00

    0.15

    0.30

    Central Limit Theorem at Work

    x2 + x3 + x4 + x5

    Den

    sity

    1 2 3 4 5 6 7

    0.0

    0.2

    0.4

    53

  • Central Limit Theorem (CLT) VIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    standard normal population

    x1

    Den

    sity

    −4 −2 0 2 4

    0.0

    0.2

    0.4

    uniform population on (0,1)

    x2

    Den

    sity

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.4

    0.8

    1.2

    a log−normal population

    x3

    Den

    sity

    0 5 10 15

    0.0

    0.2

    0.4

    Weibull population

    x4

    Den

    sity

    0.0 0.5 1.0 1.5 2.0 2.5 3.0

    0.0

    0.4

    0.8

    54

  • Central Limit Theorem (CLT) VIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Central Limit Theorem at Work (not so good)

    x1 + x2 + x3 + x4

    Den

    sity

    0 10 20 30 40

    0.00

    0.05

    0.10

    0.15

    0.20

    55

  • Central Limit Theorem (CLT) VIIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    standard normal population

    x1

    Den

    sity

    −4 −2 0 2 4

    0.0

    0.2

    0.4

    uniform population on (0,1)

    x2

    Den

    sity

    0 5 10 15 20

    0.00

    0.02

    0.04

    0.06

    a log−normal population

    x3

    Den

    sity

    0.0 0.5 1.0 1.5

    01

    23

    45

    Weibull population

    x4

    Den

    sity

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

    0.0

    0.4

    0.8

    56

  • Central Limit Theorem (CLT) IXApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Central Limit Theorem at Work (not so good)

    x1 + x2 + x3 + x4

    Den

    sity

    −20 −10 0 10 20 30 40

    0.00

    0.02

    0.04

    0.06

    0.08

    57

  • Derived Distributions from Normal ModelApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Since the normal model will be our assumed model throughout

    it is worthwhile to characterize some distributions that are derived from it.

    They will play a significant role later on.

    The chi-square distribution, the Student t-distribution, and the F-distribution.

    These distributions come about as sampling distributions of certain test statistics

    based on normal random samples.

    58

  • Properties of Normal Random VariablesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Assume that X1, . . . ,Xn are independent normal random variables with respectivemeans and variances given by: µ1, . . . ,µn and σ21, . . . ,σ

    2n. Then

    Y = X1 + . . .+Xn ∼N (µ1 + . . .+µn,σ21 + . . .+σ2n)

    If X ∼N (µ,σ2) thenX−µ

    σ∼N (0,1)

    or more generally for constants a and b

    a+bX ∼N (a+bµ,b2σ2)

    Caution: Some people write X ∼N (µ,σ) when I would write X ∼N (µ,σ2).

    59

  • The Chi-Square DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When Z1, . . . ,Z fi.i.d.∼ N (0,1) we say that

    C f =f

    ∑i=1

    Z2i memorize this definition!

    has a chi-square distribution with f degrees of freedom, we also write C f ∼ χ2f .

    It has mean f and variance 2 f , worth memorizing.

    Density, CDF, quantiles,and random samples of or from the chi-square distribution

    can be obtained in R via: dchisq(x,f), pchisq(x,f), qchisq(p,f), rchisq(N,f),

    respectively.

    If C f1 ∼ χ2f1

    and C f2 ∼ χ2f2

    are independent then C f1 +C f2 ∼ χ2f1+ f2

    .

    Why? Think definition!

    60

  • χ2 DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    0 5 10 15 20 25 30

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    s

    de

    nsity

    df=1df=2df=5df=10df=20

    61

  • The Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When Z ∼N (0,1) is independent of C f ∼ χ2f we say that

    t =Z√

    C f / fmemorize this definition!

    has a Student t-distribution with f degrees of freedom. We also write t ∼ t f .

    It has mean 0 (for f > 1) and variance f /( f −2) if f > 2.

    For large f (say f ≥ 30) the t-distribution is approximately standard normal.

    Density, CDF, quantiles, and random samples of or from the Student t-distributioncan be obtained in R via: dt(x,f),pt(x,f), qt(p,f), and rt(N,f) respectively.

    62

  • Densities of the Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    −4 −2 0 2 4

    0.0

    0.1

    0.2

    0.3

    0.4

    t

    de

    nsity

    df=1df=2df=5df=10df=20df=30df= ∞

    63

  • The Noncentral Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When X ∼N (δ,1) is independent of C f ∼ χ2f we say that

    t =X√

    C f / fmemorize this definition!

    has a noncentral Student t-distribution with f degrees of freedom and noncentralityparameter ncp =δ. We also write t ∼ t f ,δ.

    Density and CDF of the noncentral Student t-distribution can be obtained in R via:dt(x,f,ncp) and pt(x,f,ncp), respectively.

    The corresponding quantile function qnct(p,f,ncp) can be downloaded from myweb site for use in R.

    Random samples from t f ,δ: (rnorm(N)+ncp)/sqrt(rchisq(N,f)/f)

    64

  • Densities of the Noncentral Student t-Distribution

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    −5 0 5 10 15

    0.0

    0.1

    0.2

    0.3

    t

    de

    nsity

    df = 6, nct = 0df = 6, nct = 1df = 6, nct = 2df = 6, nct = 4

    These densities march to the left for negative ncp.

    65

  • The F-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    When C f1 ∼ χ2f1

    and C f2 ∼ χ2f2

    are independent χ2 random variables with f1 andf2 degrees of freedom, respectively, we say that

    F =C f1/ f1C f2/ f2

    memorize this definition!

    has an F distribution with f1 and f2 degrees of freedom. We also write F ∼ Ff1, f2.

    Density, CDF, quantiles,and random samples of or from the Ff1, f2-distribution canbe obtained in R via: df(x,f1,f2), pf(x,f1,f2), qf(p,f1,f2), rf(N,f1,f2),respectively.

    If t ∼ t f then t2∼F1, f . Why? Because t2 = Z2/(C f / f )= (C1/1)/(C f / f )with the required independence of C1 and C f .

    Also 1/F ∼ Ff2, f1. Just look at the above definition!66

  • F DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    0 1 2 3 4 5

    0.0

    0.5

    1.0

    1.5

    F

    de

    nsity

    df1 = 1 , df2 = 3df1 = 2 , df2 = 5df1 = 5 , df2 = 5df1 = 10 , df2 = 20df1 = 20 , df2 = 20df1 = 50 , df2 = 100

    67

  • Decomposition of Sum of Squares (SS)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We illustrate here an early example of the SS decomposition.

    n

    ∑i=1

    X2i =n

    ∑i=1

    (Xi− X̄ + X̄)2 with X̄ =n

    ∑i=1

    Xi/n

    =n

    ∑i=1

    (Xi− X̄)2 +2n

    ∑i=1

    (Xi− X̄)X̄ +nX̄2

    =n

    ∑i=1

    (Xi− X̄)2 +nX̄2 .

    We used the fact that ∑(Xi− X̄) = ∑Xi−nX̄ = ∑Xi−∑Xi = 0,

    i.e., the residuals sum to zero.

    Such decompositions are a recurring theme in the Analysis of Variance (ANOVA).

    68

  • Distribution of X̄ and ∑(Xi− X̄)2Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Assume that (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2). Then X̄ ∼N (µ,σ2/n) and

    n

    ∑i=1

    (Xi− X̄)2 has the same distribution as σ2Cn−1 where Cn−1 ∼ χ2n−1 .

    We also express this with the symbol ∼ asn

    ∑i=1

    (Xi− X̄)2 ∼ σ2Cn−1 or∑ni=1(Xi− X̄)

    2

    σ2∼Cn−1 .

    Furthern

    ∑i=1

    (Xi− X̄)2 and X̄ are statistically independent ,

    in spite of the fact that X̄ appears in both expressions.69

  • One-Sample t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Assume that X = (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2).

    We want to test the hypothesis H0 : µ = µ0 against the alternatives H1 : µ 6= µ0.σ is left unspecified and is unknown.

    X̄ is a good indicator for µ since its mean is µ and its variance is σ2(X̄) = σ2/n.

    Thus a reasonable test statistic may be X̄ −µ0 ∼N (µ−µ0,σ2/n) = N (0,σ2/n)when H0 is true. Unfortunately we do not know σ.

    √n(X̄−µ0)/σ = (X̄−µ0)/(σ/

    √n)∼N (0,1) suggests replacing the unknown σ

    by suitable estimate to get a single reference distribution under H0.

    From the previous slide: ⇒ s2 = ∑ni=1(Xi− X̄)2/(n−1)∼ σ2Cn−1/(n−1)

    is independent of X̄ . Note E(s2) = σ2, i.e., s2 is an unbiased estimate of σ2.70

  • One-Sample t-StatisticApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    Replacing σ by s in the standardization√

    n(X̄−µ0)/σ ⇒ one-sample t-statistic

    t(X)=√

    n(X̄−µ0)s

    =√

    n(X̄−µ0)/σ√s2/σ2

    =√

    n(X̄−µ0)/σ√Cn−1/(n−1)

    =Z√

    Cn−1/(n−1)∼ tn−1

    since under H0 we have that Z =√

    n(X̄−µ0)/σ∼N (0,1) and Cn−1 ∼ χ2n−1are independent of each other. We thus satisfy the definition of the t-distribution.

    Hence we can use t(X) in conjunction with the known reference distribution tn−1under H0 and reject H0 for large values of |t(X)|.

    The 2-sided level α test has critical value tcrit = tn−1,1−α/2 = qt(1−α/2,n−1)and we reject H0 when |t(X)| ≥ tcrit.

    The 2-sided p-value for the observed t-statistic tobs(x) isP(|tn−1| ≥ |tobs(x)|) = 2P(tn−1 ≤−|tobs(x)|) = 2∗pt(−|tobs(x)|,n−1).

    71

  • The t.test in RApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    R has a function, t.test, that performs 1- and 2-sample t-tests.

    See ?t.test for documentation. We focus here on the 1-sample test.

    > t.test(rnorm(20)+.4)

    One Sample t-test

    data: rnorm(20) + 0.4

    t = 2.2076, df = 19, p-value = 0.03976

    alternative hypothesis: true mean is not equal to 0

    95 percent confidence interval:

    0.02248992 0.84390488

    sample estimates:

    mean of x

    0.433197472

  • Calculation of the Power Function of Two-Sided t-Test

    Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    The power function of this two-sided t-test is given by

    β(µ,σ)= P(|t| ≥ tcrit)= P(t ≤−tcrit)+P(t ≥ tcrit)= P(t ≤−tcrit)+1−P(t < tcrit)

    t = t(X) =√

    n(X̄−µ0)s

    =√

    n(X̄−µ+(µ−µ0))/σs/σ

    =√

    n(X̄−µ)/σ+√

    n(µ−µ0)/σs/σ

    =Z +δ√

    Cn−1/(n−1)∼ tn−1,δ

    noncentral t-distribution with noncentrality parameter δ =√

    n(µ−µ0)/σ.

    Thus the power function depends on µ and σ only through δ and we write

    β(δ) = P(tn−1,δ ≤−tcrit)+1−P(tn−1,δ < tcrit)

    = pt(−tcrit,n−1,δ)+1−pt(tcrit,n−1,δ) ↗ as |δ| ↗

    73

  • Power Function of Two-Sided t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    −4 −2 0 2 4

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    δ = n(µ − µ0) σ

    β(δ)

    sample size n = 10

    α = 0.05α = 0.01

    74

  • How to Use the Power FunctionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006

    From the previous plot we can read off for the level α = .05 test

    β(δ)≈ .6 for δ =±√

    n(µ0−µ)/σ≈ 2.5 or |µ0−µ| ≈ 2.5σ/√

    n.

    The smaller the natural variability σ the smaller the difference |µ0−µ|we can detect with probability .6.

    Similarly, the larger the sample size n the smaller the difference |µ0−µ|we can detect with probability .6, note however the effect of

    √n.

    Both of these conclusions are intuitive because σ(X̄) = σ/√

    n.

    Given a required detection difference |µ−µ0| and with some upper boundknowledge σu ≥ σ we can plan the appropriate minimum sample size n to achievethe desired power .6: 2.5×σ/|µ−µ0| ≤ 2.5×σu/|µ−µ0|=

    √n.

    For power 6= .6 replace 2.5 by the appropriate value from the previous plot.75

  • Where is the Flaw in Previous Argument?Applied Statistics and Experimental Design Fritz Scholz — Fall 2006

    We tacitly assumed that that the power curve plot would not change with n.

    Both tcrit = qt(1−α/2,n−1) and P(tn−1,δ ≤±tcrit) depend on n.

    See the next 3 plots.

    Thus it does not suffice to consider the n in δ alone.

    However, typically the sample size requirements will ask for large values of n.

    In that case the power functions stay more or less stable.

    Compare n = 100 and n = 1000.

    Will provide a function that gets us out of this dilemma.

    76