Fritz Scholz — Fall 2006faculty.washington.edu/fscholz/DATAFILES498B2008/Review.pdf · 2012. 3....
Transcript of Fritz Scholz — Fall 2006faculty.washington.edu/fscholz/DATAFILES498B2008/Review.pdf · 2012. 3....
-
Experiments and Observational Studies
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
In observational studies we obtain measurements on several variables.
Sampling could be random or not. We observe what is in the sample,
no manipulation of factors by any experimenter.
Factor levels may be chosen by hidden agendas.
It is not clear which variables have an effect on which other variables
if we observe any correlations. Cause and Effect unclear.
There may be unmeasured factors that affect seemingly correlated variables.
In a “controlled” experiment we control certain input variables and
determine their effect on response variables.
We have to guard against subconscious effects when “controlling” inputs.
=⇒ randomization!1
-
Steps in Designing of Experiments (DOE)
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
1. Be clear on the goal of the experiment. Which questions to address?Set up hypotheses about treatment/factor effects, a priori.Don’t go fishing afterwards! It can only point to future experiments.
If you torture the data long enough, they will confess to anything.
2. Understand the experimental units over which treatments will be randomized.Where do they come from? How do they vary? Are they well defined?
3. Define the appropriate response variable to be measured.
4. Define potential sources of response variationa) factors of interest (to be manipulated)b) nuisance factors (to be randomized)
5. Decide on treatment and blocking variables.
6. Define clearly the experimental process and what is randomized.
2
-
Three Basic Principles in Experimental Design
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
Replication:repeat experimental runs under same values for control variables.⇒ understanding inherent variability⇒ better response estimate via averaging.
Repeat all variation aspects of an experimental run. Not just a repeatmeasure of response, after all aspects of an experimental run are done.
Randomization:Systematic Confounding between treatment and other factors (hidden or not)unlikely. Removes sources of bias arising from factor/unit interaction.Disperses biases randomly among all units ⇒ error or background noise.Provides logical/probability basis for inference about treatment effects.
Blocking:Effective when (natural within block variation)/(between block variation) is small.Randomized treatment assignment within ≈ homogeneous blocksTreatment effect more clearly visible against lower within block variation.Separates variation between blocks from treatment effect.
3
-
Flux ExperimentApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
18 boards are available for the experiment,not necessarily a random sample from all boards (present, past and future).
Test flux brands X and Y: randomly assign 9 boards each to X & Y (FLUX)
The boards are soldered and cleaned. Order randomized. (SC.ORDER)
Then the boards are coated and cured to avoid handling contamination.Order randomized. (CT.ORDER)
Then the boards are placed in a humidity chamber and measured for SIR.Position in chamber randomized. (SLOT)
The randomization at the various process steps avoids unknown biases.When in doubt, randomize!
Randomization of flux assignment gives us a mathematical basisfor judging flux differences with respect to the response SIR.
4
-
DOE Steps Recapitulated
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
1. Goal of the experiment. Answer question: Is Flux X different from Flux Y?If not, we can use them interchangeably. One may be cheaper than the other.Test null hypothesis H0: No difference in fluxes.
2. Understand the experimental units:Boards with all processing steps up to measuring response.
3. Define the appropriate response variable to be measured: SIR
4. Define potential sources of response variationa) factors of interest: flux typeb) nuisance factors: boards, processing steps, testing.
5. Decide on treatment and blocking variables.Treatment = flux type, no blocking.With 2 humidity chambers we might have wanted to block on those.
6. Define clearly the experimental process and what is randomized.Treatments and all nuisance factors are randomized.
5
-
Flux DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
BOARD FLUX SC.ORDER CT.ORDER SLOT SIR1 Y 13 14 5 8.62 Y 16 8 6 7.53 X 18 9 15 11.54 Y 11 11 11 10.65 X 15 18 9 11.66 X 9 15 18 10.37 X 6 1 16 10.18 Y 17 12 17 8.29 Y 5 10 13 10.010 Y 10 13 14 9.311 Y 14 5 10 11.512 X 12 17 12 9.013 X 4 7 3 10.714 X 8 6 1 9.915 Y 3 2 4 7.716 X 7 3 2 9.717 Y 1 16 8 8.818 X 2 4 7 12.6
see Flux.csv
or flux
6
-
Flux Experiment: First Boxplot Look at SIR DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
X Y
89
10
11
12
Flux
SIR
( l
og
10(O
hm
) )
FLUXY − FLUXX = −1.467
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
7
-
Flux Experiment: QQ-Plot of SIR DataApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
●
●
●
●
●
●
●
●
●
8 9 10 11 12
89
10
11
12
SIR with Flux X ( log10(Ohm) )
SIR
with
Flu
x Y
(
lo
g1
0(O
hm
) )
8
-
QQ-Plot of SIR Data (Higher Perspective?)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
● ●●
●●●
●
●
●
0 5 10 15 20
05
10
15
20
SIR with Flux X ( log10(Ohm) )
SIR
with
Flu
x Y
(
lo
g1
0(O
hm
) )
9
-
Some QQ-Plots from N(0,1) Samples (m=9, n=9)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
●
●● ●
●
●●●
●
y − x = −0.792
●
●
●●
● ●
●●
●y − x = 0.926
●
●●
● ●
●● ●
●
y − x = −0.57
●
● ●●●
●●
●
●y − x = −0.394
●●
●●
●
●●
●●
y − x = −0.62
●
●
● ●●
●●
●
●y − x = 0.115
●
●
●●●
●
●
● ●
y − x = −0.625
● ●
●
●●
● ● ●●
y − x = −1.12
●
●
●●
●
●●
●
●
y − x = 0.584
●
●
●●
●● ●●
●
y − x = −0.647
●
●
●●
● ●● ●
●y − x = 0.41
●●
●●●
●●
● ●
y − x = −1.33
● ●
● ●●
●●
●●
y − x = −0.667
●
●●
● ●
● ●
●
●y − x = −0.31
● ●
● ●● ●
● ● ●
y − x = −0.757
●
● ●● ●
●
●●
●y − x = 0.845
●
●
●
●
●
●
●●●y − x = 0.165
●
●
●
●
●●
● ●
●y − x = 1.07
●●●
●
●●
●●
●y − x = −0.194
●●
● ●
● ●
●● ●
y − x = −0.253
10
-
Is the Difference Ȳ − X̄ =−1.467 Significant?Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
In comparing SIR for the two fluxes let us focus on the difference of meansFLUXY −FLUXX = Ȳ − X̄ .
If the use of flux X or flux Y made no difference then we should have seenthe same results for these 18 boards, no matter which got flux X or Y.X or Y is just an artificial “distinguishing” label with no consequence.
For other random assignments of fluxes, or random splittings of 18 boardsinto two groups of 9 & 9, we would have seen other differences of means.
There are(18
9)= 48620 such possible splits. For each split we could obtain Ȳ − X̄ .
Need the reference distribution of Ȳ − X̄ for all 48620 splits to judge how unusuala random split we had when we got Ȳ − X̄ = −1.467. It was based on a randomsplit by our randomization, i.e., it is one of the 48620 equally likely ones.
11
-
Some Randomization Examples of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
8.6 8.6 8.6 8.6 8.6 8.6 8.67.5 7.5 7.5 7.5 7.5 7.5 7.511.5 11.5 11.5 11.5 11.5 11.5 11.510.6 10.6 10.6 10.6 10.6 10.6 10.611.6 11.6 11.6 11.6 11.6 11.6 11.610.3 10.3 10.3 10.3 10.3 10.3 10.310.1 10.1 10.1 10.1 10.1 10.1 10.18.2 8.2 8.2 8.2 8.2 8.2 8.210 10 10 10 10 10 109.3 9.3 9.3 9.3 9.3 9.3 9.311.5 11.5 11.5 11.5 11.5 11.5 11.5
9 9 9 9 9 9 910.7 10.7 10.7 10.7 10.7 10.7 10.79.9 9.9 9.9 9.9 9.9 9.9 9.97.7 7.7 7.7 7.7 7.7 7.7 7.79.7 9.7 9.7 9.7 9.7 9.7 9.78.8 8.8 8.8 8.8 8.8 8.8 8.812.6 12.6 12.6 12.6 12.6 12.6 12.6
Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄ Ȳ−X̄1.1778 0.4222 -0.0889 -0.4000 0.5778 0.7778 0.2000
12
-
Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
Compute Ȳ − X̄ for each of the 48620 possible splits and determine how unusualthe observed difference of −1.467 is.
This seems like a lot of computing work but it takes just a few seconds in R usingthe function combn of the package combinat.
Download and install that package first from the contributed packages in CRAN orfrom R packages under STAT 421 site and invoke library(combinat) prior tousing combn.
randomization.ref.dist=combn(1:18,9,fun=mean.fun,y=SIR)gives the vector of all 48620 such average differences Ȳ − X̄ , wheremean.fun
-
p-Value of Ȳ − X̄ =−1.467Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
The function combn goes through all choices of index combinations of 9 values
taken from 1:18 (referred to as ind in mean.fun).
For each such index combination it evaluates the mean Ȳ of the SIR values forthose chosen indices and the mean X̄ of the remaining SIR values.It then takes the difference Ȳ − X̄ and outputs all these differences as a vector.
We find a (two-sided) p-value of .02344 for our observed Ȳ − X̄ =−1.467, i.e.
mean(abs(randomization.ref.dist)>=1.467)=.02344
That is the probability of seeing a |Ȳ − X̄ | value as or more extreme than theobserved |ȳ− x̄|= 1.467, when in fact the hypothesis H0 holds true, i.e., under therandomization reference distribution.
Randomization of fluxes is the logical basis for any such probability statements,
i.e., calculation of p-values!14
-
Randomization Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
Randomization Reference Distribution of SIRY − SIRX
Y − X = SIRY − SIRX
De
nsi
ty
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P(Y
−X
≤−
1.4
67)=
0.0
11
72 P(Y
−X
≥1
.46
7)=
0.0
11
72
15
-
The p-Value: What it is not!Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
The p-value based on some sample or experimental data
is not the probability that the hypothesis is true.
The hypothesis is not the outcome of some chance experiment ⇒ no probability!
The calculation of the p-value assumes that the hypothesis is true!
It is doubly hypothetical!
The calculated chance is that of seeing stronger “contradictory evidence” against
the assumed hypothesis than what was obtained in the observed sample/experiment.
“Contradictory evidence”⇔ a test statistic that measures strong discrepancy to H0.
p-values vary from sample to sample, tend to be uniformly distributed under H0.
A small p-value makes H0 implausible and some alternative more attractive.16
-
Approximation to Randomization Reference Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
For moderate to large m and n the number of combinations(m+n
m)
becomes solarge that it taxes the computing power or storage capacity of the average computer.
A simple way out is to generate a sufficiently large sample, say M = 10,000 orM = 100,000 of combinations from this set of all
(m+nm)
combinations.
Compute the statistic of interest, s(X i,Y i) = Ȳi− X̄i, i = 1, . . . ,M for eachsampled combination and approximate the randomization reference distribution
F(z) = P(s(X ,Y )≤ z) by F̂M(z)
where F̂M(z) is the proportion of s(X i,Y i) = Ȳi− X̄i values that are ≤ z.
By the law of large numbers (LLN) we have for any z
F̂M(z)−→ F(z) as M → ∞ i.e. F̂M(z)≈ F(z) for large M.
17
-
Sample Simulation ProgramApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
This can be done in a loop using the sample function in R.
simulated.reference.distribution=function(M=10000){
D.star=NULL
for(i in 1:M){
SIR.star=sample(SIR)
D.star=c(D.star,mean(SIR.star[1:9])-mean(SIR.star[10:18]))
}
D.star}
The following slide shows the QQ-plot comparison with the full randomization
reference distribution, together with the respective p-values.
This approach should suffice for practical purposes.
18
-
QQ-Plot of Ȳ − X̄ for Simulated & Full Randomization Reference Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
−2 −1 0 1 2
−2
−1
01
2
y − x for all combinations
y−
x fo
r a
ll 1
00
00
sa
mp
led
co
mb
ina
tion
s
p̂1 = 0.0099
p̂2 = 0.0117
p1 = 0.01172p2 = 0.01172
19
-
Randomization Distribution of 2-Sample t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
t(X ,Y ) =(Ȳ − X̄)/
√1/n+1/m√
[∑ni=1(Yi− Ȳ )2 +∑mj=1(X j− X̄)
2]/(m+n−2)
it expresses the difference in averages relative to a measure of sample variability.
The randomization reference distribution of the t(X ,Y ) values is in one-to-one
correspondence to the randomization reference distribution of the Ȳ − X̄ values.
Theory =⇒ The randomization reference distribution of t(X ,Y ) is very well
approximated by a t-distribution with 16 = 18−1−1 degrees of freedom.
The test based on t(X ,Y ) and its t-distribution under H0 also shows up
in a normal population based approach to this problem.
20
-
QQ-Plot of t(X ,Y ) Randomization Reference DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
−6 −4 −2 0 2 4 6
−6
−4
−2
02
46
t16 quantiles
ord
ere
d r
an
do
miz
atio
n t
−st
atit
ics
21
-
t-Approximation for t(X ,Y ) Randomization Reference DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
2−sample t−statistic randomization reference distribution
t(X,Y)−statistic
De
nsi
ty
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
t16 density
22
-
The Randomization Test
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
We have obtained the full or simulated randomization reference distribution.
Thus any extreme value of |Ȳ − X̄ | could either come about due to a rare chanceevent during our randomization step or due to H0 actually being wrong.
We have to make a decision: Reject H0 or not?
We may decide to reject H0 when |Ȳ − X̄ | ≥C, where C is some critical value.
To determine C one usually sets a significance level α which limits the probabilityof rejecting H0 when in fact H0 is true (Type I error). The requirement
α = P(reject H0 | H0) = P(|Ȳ − X̄ | ≥C | H0) then determines C = Cα .
23
-
Significance Levels and p-Values
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
When we reject H0 we would say that the results were significantat the (previously chosen) level α.
Commonly used values of α are α = .05 or α = .01.
Rejecting at smaller α than these would be even stronger evidence against H0.
Our chance of making a wrong decision (rejecting H0 when true) would be smaller.
For how small an α would we still have rejected?
This leads us to the observed significance level or p-value of the test for the givendata, i.e., for the observed discrepancy value |ȳ− x̄|
p-value = P(|Ȳ − X̄ | ≥ |ȳ− x̄| | H0)
24
-
How to Determine the p-Value
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
We have stated p-values obtained from the full and the simulated (M=10000)reference distributions. How are they obtained?
Note the following:
> x=1:10> x>3[1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> sum(x>3)[1] 7> mean(x>3)[1] 0.7
Note that x>3 produced a logic vector with same length as x.
The logic values FALSE and TRUE are also interpreted numerically as 0 and 1,respectively, in arithmetic expressions.
25
-
How to Determine the p-Value (continued)
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
We view the reference distribution as a vector x of numbers for all the differencesof means, Ȳ − X̄ , obtained either for all 48620 possible splits or for the M = 10000simulated splits.
mean(x=1.467) would give us the respective p-valuesp1 = .01172 and p2 = .01172 for the full reference distribution,and p̂1 = .0099 and p̂2 = .0117 for the simulated reference distribution.
The simulated distribution is obviously not quite symmetric.
Rather than adding these 2 p-values to get a 2-sided p-value we can also do thisdirectly via mean(abs(x)>= 1.467)=.02344 for all 48620 splits or
mean(abs(x)>= 1.467)=.00216 for the M = 10000 simulated splits.
Here abs(x) gives the vector of absolute values of all components in x.
26
-
How to Determine the Critical Value C.crit for the Level α TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
For α = .05 we want to find C.crit such that mean(abs(x)>=C.crit)=.05.
Equivalently, find the .95-quantile of abs(x) via C.crit=quantile(abs(x),.95).
From the full reference distribution we get C.crit(α = .05) = 1.288889and C.crit(α = .01) = 1.644444.
From the simulated reference distribution we get C.crit(α = .05) = 1.311111and C.crit(α = .01) = 1.666667.
27
-
What Does the t-Distribution Give Us?
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
What does the observed t-statistic t(x,y) =−2.513 give as 2-sided p-value?
We find P(|t(X ,Y )| ≥ 2.513) = 2∗ (1−pt(2.513,16)) = .02306, pretty closeto the .02344 from the full randomization reference distribution.
What are the critical values tcrit(α) for |t(X ,Y )| for level α = .05, .01 tests?
We find tcrit(α = .05) = qt(.975,16) = 2.1199 and
tcrit(α = .01) = qt(.995,16) = 2.9208, respectively.
With |t(x,y)|= 2.513 we would reject H0 at α = .05 since |t(x,y)| ≥ 2.1199
but not at α = .01 since |t(x,y)|< 2.9208.
28
-
Hypothesis TestingApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
We have addressed the question: Does the type of flux affect SIR?
Formally we have tested the
null hypothesis H0: The type of flux does not affect SIRagainst the
alternative hypothesis H1: The type of flux does affect SIR.
While H0 seems fairly specific, H1 is open ended. H1 can be anything but H0.
There may be many ways for SIR to be affected by flux differences,
e.g., change in mean, median, or scatter.
Such differences may show up in data Z through an appropriate test statistic s(Z).
Here Z = (X1, . . . ,X9,Y1, . . . ,Y9).
29
-
Test Criteria or Test StatisticsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
In the flux analysis we chose to use the absolute difference of sample means,
s(Z) = |Ȳ − X̄ |, as our test criterion or test statistic for testing the null hypothesis.
A test statistic is a value calculated from data and other known entities,
e.g., assumed parameter values.
We could have worked with the absolute difference in sample medians or with the
ratio of sample standard deviations and compared that ratio with 1, etc.
Different test statistics are sensitive to different deviations from the null hypothesis.
A test statistic, when viewed as a function of random input data, is itself a random
variable, and has a distribution, its sampling distribution.
30
-
Sampling DistributionsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
For a test statistic s(Z) to be effective in deciding between H0 and H1 it is desirablethat the sampling distributions of s(Z) under H0 and H1 are somewhat different.
Sampling Distribution under H0
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
Sampling Distribution under H1
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.05
0.10
0.15
0.20
0.25
31
-
When to Reject H0Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
The previous illustration shows a specific sampling distribution for s(Z) under H1.
Typically H1 consists of many different possible distributional models leading tomany possible sampling distributions under H1.
Under H0 we often have just a single sampling distribution, the null distribution.
If under H1 the test statistics s(Z) tends to have mostly higher values than underH0, we would want to reject H0 when s(Z) is large.
How large is too large? Need a critical value Ccrit and reject H0 when s(Z)≥Ccrit.
Choose Ccrit such that P(s(Z)≥Ccrit|H0) = α, a pre-chosen significance level.Typically α = .05 or .01. It is the probability of the type I error.
The previous illustration also shows that there may be values s(Z) in the overlapof both distributions. Decisions are not clear cut =⇒ type I or type II error
32
-
Decision TableApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Truth
Decision H0 is true H0 is false
accept H0 correct decision type II error
reject H0 type I error correct decision
Testing hypotheses (like estimation) is a branch of a more general concept,
namely decision theory. Decisions are optimized with respect to penalties
for wrong decisions, i.e., P(Type I Error) and P(Type I Error), or
the mean squared error of an estimate θ̂ of θ, namely E((θ̂−θ)2).33
-
The Null Distribution and Critical ValuesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Sampling Distribution under H0
rela
tive
freq
uenc
y
90 95 100 105 110 115 120
0.00
0.10
0.20
0.30
reject H0accept H0
type I error
critical value = 104.9
significance level α = 0.05
Sampling Distribution under H1
rela
tive
freq
uenc
y
90 95 100 105 110 115 120
0.00
0.10
0.20
reject H0accept H0
type II error
34
-
Critical Values and p-ValuesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Note that p-value(s(z)) ≤ α is equivalent to rejecting H0 at level α.
Sampling Distribution under H0
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
observed value 107.1
p−value = 0.0097
critical value = 104.9
significance level α = 0.05
Sampling Distribution under H1
rela
tive
frequ
ency
90 95 100 105 110 115 120
0.00
0.10
0.20
reject H0accept H0
type II error
35
-
p-Values and Significance LevelsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
We just saw that knowing the p-value allows us to accept or reject H0 at level α.
However, the p-value is more informative than saying that we reject at level α.
It is the smallest level α at which we would still have rejected H0.
It is also called the observed significance level.
Working with predefined α made it possible to choose the best level α test.Best: Having highest probability of rejecting H0 when H1 is true.
This makes for nice mathematical theory, but p-values should be the preferred way
of judging and reporting test results.
36
-
Randomization Reference Distribution of Ȳ − X̄Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
Randomization Reference Distribution of SIRY − SIRX
Y − X = SIRY − SIRX
De
nsi
ty
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Dcrit =
−1
.28
9
Dcrit =
1.2
89
ob
serv
ed
D
= Y
−X
=−
1.4
67
α = 0.05
37
-
The Power FunctionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
The probability of rejecting H0 is denoted by β. It is a function of the distributionalmodel F governing Z, i.e., β = β(F). It is called the power function of the test.
When the hypothesis H0 is composite and when s(Z) has more than one possi-ble distribution under H0 one defines the highest probability of type I error as thesignificance level of the test. Hence α = sup{β(F) : F ∈ H0}.
For various F ∈ H1 the power function gives us the corresponding probabilities oftype II error as 1−β(F).
Montgomery unfortunately uses β = β(F) as the symbol for the probability oftype II error. This is not standard.
38
-
Samples and PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
So far we have covered inference based on a randomization test. This relied heavily
on our randomized assignment of flux X and flux Y to the 18 circuit boards.
Such inference can logically only say something about flux differences
in the context of those 18 boards.
To generalize any conclusions to other boards would require some assumptions,
judgement, and ultimately a step of faith.
Namely, assume that these 18 boards and their processing represent a
representative sample from a conceptual population of such processed boards.
For samples to be representative they should be random samples.
39
-
Conceptual PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Clearly the 18 boards happened to be available at the time of the experiment.
They could have been a random sample of all boards available at the time.
However, they also may have been taken sequentially in the order of production.
They certainly could not be a sample from future boards, yet to be produced.
The processing aspects were to some extent made to look like a random sampleby the various randomization steps.
Thus we could regard the 9+9 SIR values as two random samples from twovery large or infinite conceptual populations of SIR values.
2 populations: all potential boards/processes with flux X or all the same boards/processeswith flux Y. Can’t have it both ways ⇒ further conceptualization.
40
-
Population Distributions and DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Such infinite populations of Z-values are conveniently described by densities f (z),with the properties f (z)≥ 0 and
R ∞−∞ f (z)dz = 1.
The probability of observing a randomly chosen element Z that is ≤ to somespecified value x is then given by
F(x) = P(Z ≤ x) =Z x−∞
f (z)dz =Z x−∞
f (t)dt z & t are just dummy variables
F(x) as a function of x is also called the cumulative distribution function (CDF)of the random variable Z.
F(x)↗ from 0 to 1 as x goes from −∞ to ∞.
41
-
Means, Expectations and VariancesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
The mean or expectation of Z or its population is defined by
µ = µZ = E(Z) =Z ∞−∞
z f (z)dz≈∑z f (z)∆(z) = ∑zp(z)a probability weighted average of z values.
It is the center of probability mass balance.
By extension the mean or expectation of g(Z) is defined by
E(g(Z)) =Z ∞−∞
g(z) f (z)dz
The variance of Z is defined by
σ2 = var(Z) = E((Z−µ)2
)=
Z ∞−∞
(z−µ)2 f (z)dz
σ = σZ =√
var(Z) is called the standard deviation of Z or its population.
Measure of distribution spread.
42
-
Multivariate Densities or PopulationsApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
f (z1, . . . ,zn) is a multivariate density if it has the following properties:
f (z1, . . . ,zn)≥ 0 for all z1, . . . ,zn andZ ∞−∞
. . .Z ∞−∞
f (z1, . . . ,zn) dz1, . . . ,dzn = 1 .
It describes the behavior of the infinite population of such n-tuples (z1, . . . ,zn).
A random element (Z1, . . . ,Zn) drawn from such a population is a random vector.
We say that Z1, . . . ,Zn in such a random vector are (statistically) independent whenthe following property holds:
f (z1, . . . ,zn) = f1(z1)×·· ·× fn(zn)
Here fi(zi) is the marginal density of Zi. It is obtainable from the multivariate densityby integrating out all other variables, e.g.,
f2(z2) =Z ∞−∞
. . .Z ∞−∞
f (z1,z2,z3, . . . ,zn) dz1dz3 . . .dzn .
43
-
Random SampleApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
When drawing repeatedly values Z1, . . . ,Zn from a common infinite population withdensity f (z) we get a multivariate random vector (Z1, . . . ,Zn).
If the drawings are physically unrelated or “independent,” we may consider Z1, . . . ,Znas statistically independent, i.e., the random vector has density
h(z1, . . . ,zn) = f (z1)×·· ·× f (zn) .
Z1, . . . ,Zn is then also referred to as a random sample.
We also express this as Z1, . . . ,Zni.i.d.∼ f .
Here i.i.d. = independent and identically distributed.
44
-
Rules of Expectations & Variances (Review)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
For any set of random variables X1, . . . ,Xn and constants a0,a1, . . . ,an we have
E (a0 +a1×X1 + . . .+an×Xn) = a0 +a1×E(X1)+ . . .+an×E(Xn)
provided the expectations E(X1), . . . ,E(Xn) exist and are finite.
This holds whether X1, . . . ,Xn are independent or not.
For any set of independent random variables X1, . . . ,Xn and constants a0,a1, . . . ,anwe have
var(a0 +a1×X1 + . . .+an×Xn) = a21× var(X1)+ . . .+a2n×var(Xn)
provided the variances var(X1), . . . ,var(Xn) exist and are finite. var(a0) = 0.
This is also true under the weaker (than independence) condition cov(Xi,X j) =E(XiX j)−E(Xi)E(X j) = 0 for i 6= j. In that case X1, . . . ,Xn are uncorrelated.
45
-
Rules for AveragesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
E (X̄) = E
(1n
n
∑i=1
Xi
)=
1n
E
(n
∑i=1
Xi
)=
1n
n
∑i=1
E(Xi) =1n
n
∑i=1
µi = µ̄
whether X1, . . . ,Xn are independent or not.
If µ1 = . . . = µn = µ then E(X̄) = µ.
If X1, . . . ,Xn are independent we also have
var(X̄) = var
(1n
n
∑i=1
Xi
)=
1n2
n
∑i=1
var(Xi) =1n2
n
∑i=1
σ2i =1n
σ̄2 ↘ 0 as n→ ∞ ,
where σ̄2 =1n
n
∑i=1
σ2i . σ̄2 = σ2 when σ21 = . . . = σ
2n = σ
2 .
46
-
A Normal Random SampleApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
X1, . . . ,Xn is called a normal random sample when the common density of the Xi isa normal density of the following form:
f (x) =1√2πσ
exp
(−(x−µ)
2
2σ2
)
This density or population has mean µ and standard deviation σ.
When µ = 0 and σ = 1 one calls it the standard normal density
ϕ(x) =1√2π
exp
(−x
2
2
)with CDF Φ(x) =
Z x−∞
ϕ(z) dz .
If X ∼N (µ,σ2) then (X−µ)/σ∼N (0,1).
⇒ P(X ≤ x) = P((X−µ)/σ≤ (x−µ)/σ) = Φ((x−µ)/σ).
47
-
The CLT & the Normal Population ModelApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
The normal population model is motivated by the Central Limit Theorem (CLT).
This comes about because many physical or natural measured phenomena can beviewed as the addition of several independent source inputs or factors.
Y = X1 + . . .+Xk or Y = a0 +a1X1 + . . .+akXk
for constants a0,a1, . . . ,ak.
More generally but also approximately extend this via a 1-term Taylor expansion
Y = g(X1, . . . ,Xk) ≈ g(µ1, . . . ,µk)+k
∑i=1
(Xi−µi)∂g(µ1, . . . ,µk)
∂µi= a0 +a1X1 + . . .+akXk
provided the linearization provides a good approximation to g.48
-
Central Limit Theorem (CLT) IApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
• Suppose we randomly and independently draw random variables X1, . . . ,Xnfrom n possibly different populations with respective means µ1, . . . ,µn andstandard deviations σ1, . . . ,σn
• Suppose further that
max
(σ2i
σ21 + . . .+σ2n
)→ 0 , as n→ ∞
i.e., none of the variances dominates among all variances
• Then Yn = X1 + . . . + Xn has an approximate normal distribution with meanand variance given by
µY = µ1 + . . .+µn and σ2Y = σ21 + . . .+σ
2n .
49
-
Central Limit Theorem (CLT) IIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
1.2
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
01
23
45
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.2
0.4
0.6
0.8
50
-
Central Limit Theorem (CLT) IIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work
x1 + x2 + x3 + x4
Den
sity
−2 0 2 4 6
0.00
0.05
0.10
0.15
0.20
0.25
0.30
51
-
Central Limit Theorem (CLT) IVApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.6
1.2
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
02
4
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
Weibull population
x5
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.4
0.8
52
-
Central Limit Theorem (CLT) VApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work
x1 + x2 + x3 + x4
Den
sity
−2 0 2 4 6
0.00
0.15
0.30
Central Limit Theorem at Work
x2 + x3 + x4 + x5
Den
sity
1 2 3 4 5 6 7
0.0
0.2
0.4
53
-
Central Limit Theorem (CLT) VIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
1.2
a log−normal population
x3
Den
sity
0 5 10 15
0.0
0.2
0.4
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
54
-
Central Limit Theorem (CLT) VIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work (not so good)
x1 + x2 + x3 + x4
Den
sity
0 10 20 30 40
0.00
0.05
0.10
0.15
0.20
55
-
Central Limit Theorem (CLT) VIIIApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
standard normal population
x1
Den
sity
−4 −2 0 2 4
0.0
0.2
0.4
uniform population on (0,1)
x2
Den
sity
0 5 10 15 20
0.00
0.02
0.04
0.06
a log−normal population
x3
Den
sity
0.0 0.5 1.0 1.5
01
23
45
Weibull population
x4
Den
sity
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.4
0.8
56
-
Central Limit Theorem (CLT) IXApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Central Limit Theorem at Work (not so good)
x1 + x2 + x3 + x4
Den
sity
−20 −10 0 10 20 30 40
0.00
0.02
0.04
0.06
0.08
57
-
Derived Distributions from Normal ModelApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Since the normal model will be our assumed model throughout
it is worthwhile to characterize some distributions that are derived from it.
They will play a significant role later on.
The chi-square distribution, the Student t-distribution, and the F-distribution.
These distributions come about as sampling distributions of certain test statistics
based on normal random samples.
58
-
Properties of Normal Random VariablesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Assume that X1, . . . ,Xn are independent normal random variables with respectivemeans and variances given by: µ1, . . . ,µn and σ21, . . . ,σ
2n. Then
Y = X1 + . . .+Xn ∼N (µ1 + . . .+µn,σ21 + . . .+σ2n)
If X ∼N (µ,σ2) thenX−µ
σ∼N (0,1)
or more generally for constants a and b
a+bX ∼N (a+bµ,b2σ2)
Caution: Some people write X ∼N (µ,σ) when I would write X ∼N (µ,σ2).
59
-
The Chi-Square DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
When Z1, . . . ,Z fi.i.d.∼ N (0,1) we say that
C f =f
∑i=1
Z2i memorize this definition!
has a chi-square distribution with f degrees of freedom, we also write C f ∼ χ2f .
It has mean f and variance 2 f , worth memorizing.
Density, CDF, quantiles,and random samples of or from the chi-square distribution
can be obtained in R via: dchisq(x,f), pchisq(x,f), qchisq(p,f), rchisq(N,f),
respectively.
If C f1 ∼ χ2f1
and C f2 ∼ χ2f2
are independent then C f1 +C f2 ∼ χ2f1+ f2
.
Why? Think definition!
60
-
χ2 DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
0 5 10 15 20 25 30
0.0
0.1
0.2
0.3
0.4
0.5
s
de
nsity
df=1df=2df=5df=10df=20
61
-
The Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
When Z ∼N (0,1) is independent of C f ∼ χ2f we say that
t =Z√
C f / fmemorize this definition!
has a Student t-distribution with f degrees of freedom. We also write t ∼ t f .
It has mean 0 (for f > 1) and variance f /( f −2) if f > 2.
For large f (say f ≥ 30) the t-distribution is approximately standard normal.
Density, CDF, quantiles, and random samples of or from the Student t-distributioncan be obtained in R via: dt(x,f),pt(x,f), qt(p,f), and rt(N,f) respectively.
62
-
Densities of the Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
t
de
nsity
df=1df=2df=5df=10df=20df=30df= ∞
63
-
The Noncentral Student t-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
When X ∼N (δ,1) is independent of C f ∼ χ2f we say that
t =X√
C f / fmemorize this definition!
has a noncentral Student t-distribution with f degrees of freedom and noncentralityparameter ncp =δ. We also write t ∼ t f ,δ.
Density and CDF of the noncentral Student t-distribution can be obtained in R via:dt(x,f,ncp) and pt(x,f,ncp), respectively.
The corresponding quantile function qnct(p,f,ncp) can be downloaded from myweb site for use in R.
Random samples from t f ,δ: (rnorm(N)+ncp)/sqrt(rchisq(N,f)/f)
64
-
Densities of the Noncentral Student t-Distribution
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
−5 0 5 10 15
0.0
0.1
0.2
0.3
t
de
nsity
df = 6, nct = 0df = 6, nct = 1df = 6, nct = 2df = 6, nct = 4
These densities march to the left for negative ncp.
65
-
The F-DistributionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
When C f1 ∼ χ2f1
and C f2 ∼ χ2f2
are independent χ2 random variables with f1 andf2 degrees of freedom, respectively, we say that
F =C f1/ f1C f2/ f2
memorize this definition!
has an F distribution with f1 and f2 degrees of freedom. We also write F ∼ Ff1, f2.
Density, CDF, quantiles,and random samples of or from the Ff1, f2-distribution canbe obtained in R via: df(x,f1,f2), pf(x,f1,f2), qf(p,f1,f2), rf(N,f1,f2),respectively.
If t ∼ t f then t2∼F1, f . Why? Because t2 = Z2/(C f / f )= (C1/1)/(C f / f )with the required independence of C1 and C f .
Also 1/F ∼ Ff2, f1. Just look at the above definition!66
-
F DensitiesApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
0 1 2 3 4 5
0.0
0.5
1.0
1.5
F
de
nsity
df1 = 1 , df2 = 3df1 = 2 , df2 = 5df1 = 5 , df2 = 5df1 = 10 , df2 = 20df1 = 20 , df2 = 20df1 = 50 , df2 = 100
67
-
Decomposition of Sum of Squares (SS)Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
We illustrate here an early example of the SS decomposition.
n
∑i=1
X2i =n
∑i=1
(Xi− X̄ + X̄)2 with X̄ =n
∑i=1
Xi/n
=n
∑i=1
(Xi− X̄)2 +2n
∑i=1
(Xi− X̄)X̄ +nX̄2
=n
∑i=1
(Xi− X̄)2 +nX̄2 .
We used the fact that ∑(Xi− X̄) = ∑Xi−nX̄ = ∑Xi−∑Xi = 0,
i.e., the residuals sum to zero.
Such decompositions are a recurring theme in the Analysis of Variance (ANOVA).
68
-
Distribution of X̄ and ∑(Xi− X̄)2Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
Assume that (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2). Then X̄ ∼N (µ,σ2/n) and
n
∑i=1
(Xi− X̄)2 has the same distribution as σ2Cn−1 where Cn−1 ∼ χ2n−1 .
We also express this with the symbol ∼ asn
∑i=1
(Xi− X̄)2 ∼ σ2Cn−1 or∑ni=1(Xi− X̄)
2
σ2∼Cn−1 .
Furthern
∑i=1
(Xi− X̄)2 and X̄ are statistically independent ,
in spite of the fact that X̄ appears in both expressions.69
-
One-Sample t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Assume that X = (X1, . . . ,Xn)i.i.d.∼ N (µ,σ2).
We want to test the hypothesis H0 : µ = µ0 against the alternatives H1 : µ 6= µ0.σ is left unspecified and is unknown.
X̄ is a good indicator for µ since its mean is µ and its variance is σ2(X̄) = σ2/n.
Thus a reasonable test statistic may be X̄ −µ0 ∼N (µ−µ0,σ2/n) = N (0,σ2/n)when H0 is true. Unfortunately we do not know σ.
√n(X̄−µ0)/σ = (X̄−µ0)/(σ/
√n)∼N (0,1) suggests replacing the unknown σ
by suitable estimate to get a single reference distribution under H0.
From the previous slide: ⇒ s2 = ∑ni=1(Xi− X̄)2/(n−1)∼ σ2Cn−1/(n−1)
is independent of X̄ . Note E(s2) = σ2, i.e., s2 is an unbiased estimate of σ2.70
-
One-Sample t-StatisticApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
Replacing σ by s in the standardization√
n(X̄−µ0)/σ ⇒ one-sample t-statistic
t(X)=√
n(X̄−µ0)s
=√
n(X̄−µ0)/σ√s2/σ2
=√
n(X̄−µ0)/σ√Cn−1/(n−1)
=Z√
Cn−1/(n−1)∼ tn−1
since under H0 we have that Z =√
n(X̄−µ0)/σ∼N (0,1) and Cn−1 ∼ χ2n−1are independent of each other. We thus satisfy the definition of the t-distribution.
Hence we can use t(X) in conjunction with the known reference distribution tn−1under H0 and reject H0 for large values of |t(X)|.
The 2-sided level α test has critical value tcrit = tn−1,1−α/2 = qt(1−α/2,n−1)and we reject H0 when |t(X)| ≥ tcrit.
The 2-sided p-value for the observed t-statistic tobs(x) isP(|tn−1| ≥ |tobs(x)|) = 2P(tn−1 ≤−|tobs(x)|) = 2∗pt(−|tobs(x)|,n−1).
71
-
The t.test in RApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
R has a function, t.test, that performs 1- and 2-sample t-tests.
See ?t.test for documentation. We focus here on the 1-sample test.
> t.test(rnorm(20)+.4)
One Sample t-test
data: rnorm(20) + 0.4
t = 2.2076, df = 19, p-value = 0.03976
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.02248992 0.84390488
sample estimates:
mean of x
0.433197472
-
Calculation of the Power Function of Two-Sided t-Test
Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
The power function of this two-sided t-test is given by
β(µ,σ)= P(|t| ≥ tcrit)= P(t ≤−tcrit)+P(t ≥ tcrit)= P(t ≤−tcrit)+1−P(t < tcrit)
t = t(X) =√
n(X̄−µ0)s
=√
n(X̄−µ+(µ−µ0))/σs/σ
=√
n(X̄−µ)/σ+√
n(µ−µ0)/σs/σ
=Z +δ√
Cn−1/(n−1)∼ tn−1,δ
noncentral t-distribution with noncentrality parameter δ =√
n(µ−µ0)/σ.
Thus the power function depends on µ and σ only through δ and we write
β(δ) = P(tn−1,δ ≤−tcrit)+1−P(tn−1,δ < tcrit)
= pt(−tcrit,n−1,δ)+1−pt(tcrit,n−1,δ) ↗ as |δ| ↗
73
-
Power Function of Two-Sided t-TestApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
δ = n(µ − µ0) σ
β(δ)
sample size n = 10
α = 0.05α = 0.01
74
-
How to Use the Power FunctionApplied Statistics and Experimental Design Fritz Scholz — Fall 2006
From the previous plot we can read off for the level α = .05 test
β(δ)≈ .6 for δ =±√
n(µ0−µ)/σ≈ 2.5 or |µ0−µ| ≈ 2.5σ/√
n.
The smaller the natural variability σ the smaller the difference |µ0−µ|we can detect with probability .6.
Similarly, the larger the sample size n the smaller the difference |µ0−µ|we can detect with probability .6, note however the effect of
√n.
Both of these conclusions are intuitive because σ(X̄) = σ/√
n.
Given a required detection difference |µ−µ0| and with some upper boundknowledge σu ≥ σ we can plan the appropriate minimum sample size n to achievethe desired power .6: 2.5×σ/|µ−µ0| ≤ 2.5×σu/|µ−µ0|=
√n.
For power 6= .6 replace 2.5 by the appropriate value from the previous plot.75
-
Where is the Flaw in Previous Argument?Applied Statistics and Experimental Design Fritz Scholz — Fall 2006
We tacitly assumed that that the power curve plot would not change with n.
Both tcrit = qt(1−α/2,n−1) and P(tn−1,δ ≤±tcrit) depend on n.
See the next 3 plots.
Thus it does not suffice to consider the n in δ alone.
However, typically the sample size requirements will ask for large values of n.
In that case the power functions stay more or less stable.
Compare n = 100 and n = 1000.
Will provide a function that gets us out of this dilemma.
76