Post on 04-Mar-2020
Basic Statistics: MBI-540
Sally W. Thurston, PhD
Associate ProfessorDepartment of Biostatistics and Computational Biology
University of Rochester
January 2011
Sally W. Thurston, PhD Basic Statistics: MBI-540
Why statistics?
Allows us to say something (“make inference”) about apopulation based on data from a sample.Example: measure effect of drug on weight loss in 20BALB/c mice
Sample: the 20 micePopulation: all BALB/c mice (that could be subject to thesame condition)Interested in weight loss in BALB/c mice, not just in the 20mice.Statistics allows us to estimate weight loss in the populationusing data from a sample.
Many journal articles ask for statistics
Sally W. Thurston, PhD Basic Statistics: MBI-540
Some comments
Design of experiment is keyNo statistical method can fix bad data!Essential to think about design before data is collected
Be clear about research question(s) before collecting data.Includes: what data to collect to answer question(s).
Know your dataHow the measurements relate to research questionPlots to visualize your data: helpful!
Most statistical models have assumptionsIf violated, statistical “answer” not correct.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How can a statistician help?
Design of experimentTo help, statistician must understand aims, background
AnalysisStatistician can figure out appropriate model, checkassumptions, fit model to data
Research?Talk with statistician early: help with design to prevent biasIf small project, consultation possibleIf detailed project, statistician should be collaborator, notconsultant
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Tuba’s question
Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.
Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.
Should Dr. Tuba publish?
Your thoughts?What additional information might you need beforedeciding?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Tuba’s question
Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.
Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.
Should Dr. Tuba publish?
Your thoughts?What additional information might you need beforedeciding?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Tuba’s question
Dr. Tuba is interested in the effect of drugs on weight loss inmice after infection.
Drugs are given right after infection6 mice were given the standard drug6 mice were given the new drugDr. Tuba reports p < 0.05 for a t-test of the difference inmean weight loss across the two groups after day 5.
Should Dr. Tuba publish?
Your thoughts?What additional information might you need beforedeciding?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Some things to consider
How did Dr. Tuba choose which mice received each drug?Are the mice independent?Plot the data - why?Did he choose to present results only from the time periodthat showed the most significant effect?
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.
Problems?Ease of capture might be associated with propensity tolose weight.
Fastest mice might be the most healthyFastest mice might be the lightest
So - mouse selection could be the cause of an apparentdrug effect.
Important to randomize treatment assignment.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.
Problems?
Ease of capture might be associated with propensity tolose weight.
Fastest mice might be the most healthyFastest mice might be the lightest
So - mouse selection could be the cause of an apparentdrug effect.
Important to randomize treatment assignment.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.
Problems?Ease of capture might be associated with propensity tolose weight.
Fastest mice might be the most healthyFastest mice might be the lightest
So - mouse selection could be the cause of an apparentdrug effect.
Important to randomize treatment assignment.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario AThe 12 mice were running around in a pen.Dr. Tuba captured the mice one at a time.The first 6 mice captured were given the standard drug.
Problems?Ease of capture might be associated with propensity tolose weight.
Fastest mice might be the most healthyFastest mice might be the lightest
So - mouse selection could be the cause of an apparentdrug effect.
Important to randomize treatment assignment.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.
Problems?The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was
caused by the drug - ORdue to a litter effect
Mice in one litter might have a different propensity to loseweight than mice in another litter.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.
Problems?
The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was
caused by the drug - ORdue to a litter effect
Mice in one litter might have a different propensity to loseweight than mice in another litter.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario BThe mice came from 2 dams, treatment was randomized.Pups from the first dam were given the standard drug.Pups from the second dam were given the new drug.
Problems?The effect of drug is totally confounded with litterTranslation: If we see a difference in outcome between thetwo groups, we can’t tell if it was
caused by the drug - ORdue to a litter effect
Mice in one litter might have a different propensity to loseweight than mice in another litter.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.
Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.
Better idea: within a litter, randomize first/second born pups todrug.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.
Problems?
Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.
Better idea: within a litter, randomize first/second born pups todrug.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.
Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.
Better idea: within a litter, randomize first/second born pups todrug.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario CThe mice came from 6 dams.The first pup in each dam was given the standard drug.The second pup in each dam was given the new drug.
Problems?Birth order may be associated with the propensity to loseweight.Birth order is confounded with drug.
Better idea: within a litter, randomize first/second born pups todrug.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.
Problems?There might be a cage effect - viral infection, hotterlocation, less water, · · · .
Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.
Problems?
There might be a cage effect - viral infection, hotterlocation, less water, · · · .
Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.
Sally W. Thurston, PhD Basic Statistics: MBI-540
How did Dr. Tuba choose the mice?
Scenario DMice were all first-born pups from different dams.Mice were randomized to treatmentAll mice which received the standard drug were housed inone cage.All mice which received the new drug were housed in one(different) cage.
Problems?There might be a cage effect - viral infection, hotterlocation, less water, · · · .
Better idea: divide animals from a drug group into > 1 cage - orif necessary, house all in same cage.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Was day 5 one of many days he considered?
1 2 3 4 5
−4
−3
−2
−1
0
Days
Wei
ght l
oss
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● group 1group 2
Comments?
Simulated data: the mean isthe same for both groups, atall timesLooking for the biggestdifference = multiplecomparisons
Sally W. Thurston, PhD Basic Statistics: MBI-540
Was day 5 one of many days he considered?
1 2 3 4 5
−4
−3
−2
−1
0
Days
Wei
ght l
oss
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● group 1group 2
Comments?Simulated data: the mean isthe same for both groups, atall timesLooking for the biggestdifference = multiplecomparisons
Sally W. Thurston, PhD Basic Statistics: MBI-540
Was day 5 one of many days he considered?
1 2 3 4 5
−8
−6
−4
−2
0
Days
Wei
ght l
oss
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● group 1group 2
Comments?
Simulated data: the meandeclines with time for group2, not group 1If expect this, could chooseday 5 a prioriCould use other statisticalmethods to compare slopeover time in the 2 groups.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Was day 5 one of many days he considered?
1 2 3 4 5
−8
−6
−4
−2
0
Days
Wei
ght l
oss
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● group 1group 2
Comments?Simulated data: the meandeclines with time for group2, not group 1If expect this, could chooseday 5 a prioriCould use other statisticalmethods to compare slopeover time in the 2 groups.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Are the mice independent?
Mice from the same litter are more like each other.Mice housed in the same cage might respond moresimilarlyWe are not interested in the effect of litter (or cage)
If appropriate experimental design, can control for correlatedobservations statistically.Need to record and save data on the litter (or cage)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Are the mice independent?
Mice from the same litter are more like each other.Mice housed in the same cage might respond moresimilarlyWe are not interested in the effect of litter (or cage)
If appropriate experimental design, can control for correlatedobservations statistically.Need to record and save data on the litter (or cage)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (1)
An investigator is comparing response to different doses of 3drugs on 96-well plates.
Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.
Is this a good experimental design? Why / why not?
If one drug appears to be better, it might be due to a differencein the plates.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (1)
An investigator is comparing response to different doses of 3drugs on 96-well plates.
Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.
Is this a good experimental design? Why / why not?
If one drug appears to be better, it might be due to a differencein the plates.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (1)
An investigator is comparing response to different doses of 3drugs on 96-well plates.
Each drug is tested at 5 doses.Plate 1 has multiple wells of drug A at each dose level.Plate 2 has multiple wells of drug B at each dose level.Plate 3 has multiple wells of drug C at each dose level.
Is this a good experimental design? Why / why not?
If one drug appears to be better, it might be due to a differencein the plates.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (2)
Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.
All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.
Is this a good experimental design? Why / why not?
Expectation that new treatment is better could unconsciouslybias the investigator’s measurements
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (2)
Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.
All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.
Is this a good experimental design? Why / why not?
Expectation that new treatment is better could unconsciouslybias the investigator’s measurements
Sally W. Thurston, PhD Basic Statistics: MBI-540
Other scenarios (2)
Dr. Busy scores mouse activity in a 1-minute period on a scaleof 1-10.
All mice are measured 6 hours after treatment10 mice receive a standard treatment, 10 a new treatment.Treatment assignment is randomizedThe new treatment is expected to be better.The investigator knows the treatment assignment of eachmouse.
Is this a good experimental design? Why / why not?
Expectation that new treatment is better could unconsciouslybias the investigator’s measurements
Sally W. Thurston, PhD Basic Statistics: MBI-540
Comments on experimental designs
In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.
Unless the unintended effect can be estimated(impossible?)
No statistical analysis will fix this problem.Often, the problem can be avoided by good experimentaldesign.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Comments on experimental designs
In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.Unless the unintended effect can be estimated(impossible?)
No statistical analysis will fix this problem.
Often, the problem can be avoided by good experimentaldesign.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Comments on experimental designs
In the scenarios just described, there is a possibility thatthe intended effect cannot be distinguished fromsomething else that is not of interest.Estimate of intended effect may be biased.Unless the unintended effect can be estimated(impossible?)
No statistical analysis will fix this problem.Often, the problem can be avoided by good experimentaldesign.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Summary: experimental design
Data should be a random sample from populationIf treatment is assigned, assignment should be randomlydetermined.
Can use table of random numbersBlinding
Person measuring endpoints should not know treatmentassignmentPerson receiving treatment should not know his/hertreatmentParticularly an issue if measurements can be subjective
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s question
A well-known journal article reports that after infection,BALB/c mice given ’superdrug’ lost 4 ounces on averageafter one week.Dr. Forte wants to compare this to ’superdrug’ effects oninfected C57BL/6 mice.Dr. Forte infects 4 C57BL/6 mice, treats with ’superdrug’,and measures weight loss after one week.
His measurements of weight loss: 5, 8, 6, 3 ounces.
Sally W. Thurston, PhD Basic Statistics: MBI-540
The mean
Data (weight loss) represented by x1, x2, · · · , xnn is sample size (number observations)Dr. Forte’s data: x1 = 5, x2 = 8, x3 = 6, x4 = 3.
Population mean (µ): the average xi over all observationsin population
Cannot measure data from all observations in population.µ is unknown (a population parameter)
Estimate µ by µ̂. Here µ̂ = x̄ = sample mean.
For Dr. Forte’s experimentWhat is n?What is the sample?What is the population?
Sally W. Thurston, PhD Basic Statistics: MBI-540
The mean
Data (weight loss) represented by x1, x2, · · · , xnn is sample size (number observations)Dr. Forte’s data: x1 = 5, x2 = 8, x3 = 6, x4 = 3.
Population mean (µ): the average xi over all observationsin population
Cannot measure data from all observations in population.µ is unknown (a population parameter)
Estimate µ by µ̂. Here µ̂ = x̄ = sample mean.
For Dr. Forte’s experimentWhat is n?What is the sample?What is the population?
Sally W. Thurston, PhD Basic Statistics: MBI-540
The mean (cont.)
Dr. Forte’s data: 3, 5, 6, 8.Sample mean, x̄ = 1
n∑n
i=1 xi
What is Dr. Forte’s sample mean?
x̄ = 14(3 + 5 + 6 + 8) = 5.5
Sally W. Thurston, PhD Basic Statistics: MBI-540
The mean (cont.)
Dr. Forte’s data: 3, 5, 6, 8.Sample mean, x̄ = 1
n∑n
i=1 xi
What is Dr. Forte’s sample mean?
x̄ = 14(3 + 5 + 6 + 8) = 5.5
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variability
Consider measurements from 2 researchers:Dr. Forte’s measurements: 3, 5, 6, 8.Dr. Pianissimo’s measurements: 5.1, 5.3, 5.7, 5.9Both have x̄ = 5.5.
Whose measurements give you more confidence about yourknowledge of the population mean?
Dr. Pianissimo: data closer together, plausible values for µ insmaller interval.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variability
Consider measurements from 2 researchers:Dr. Forte’s measurements: 3, 5, 6, 8.Dr. Pianissimo’s measurements: 5.1, 5.3, 5.7, 5.9Both have x̄ = 5.5.
Whose measurements give you more confidence about yourknowledge of the population mean?
Dr. Pianissimo: data closer together, plausible values for µ insmaller interval.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variance (σ2 = population variance)
Population variance: σ2 = E[(xi − µ)2]
E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )
Sample variance, σ̂2
Estimate σ2 by σ̂2.If µ known: σ̂2 = 1
n
∑ni=1(xi − µ)2
If µ unknown: σ̂2 = s2 = 1n−1
∑ni=1(xi − x̄)2
Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2
3 = 4.333
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variance (σ2 = population variance)
Population variance: σ2 = E[(xi − µ)2]
E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )
Sample variance, σ̂2
Estimate σ2 by σ̂2.If µ known: σ̂2 = 1
n
∑ni=1(xi − µ)2
If µ unknown: σ̂2 = s2 = 1n−1
∑ni=1(xi − x̄)2
Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?
s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2
3 = 4.333
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variance (σ2 = population variance)
Population variance: σ2 = E[(xi − µ)2]
E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )
Sample variance, σ̂2
Estimate σ2 by σ̂2.If µ known: σ̂2 = 1
n
∑ni=1(xi − µ)2
If µ unknown: σ̂2 = s2 = 1n−1
∑ni=1(xi − x̄)2
Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2
3 = 4.333
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Variance (σ2 = population variance)
Population variance: σ2 = E[(xi − µ)2]
E means “expectation”: average over all observations inpopulation.Translation: σ2 is expected squared difference betweenindividual values and population meanσ = population SD (of xi )
Sample variance, σ̂2
Estimate σ2 by σ̂2.If µ known: σ̂2 = 1
n
∑ni=1(xi − µ)2
If µ unknown: σ̂2 = s2 = 1n−1
∑ni=1(xi − x̄)2
Dr. Forte’s data: 3, 5, 6, 8, x̄ = 5.5. What is s2?s2 = (3−5.5)2+(5−5.5)2+(6−5.5)2+(8−5.5)2
3 = 4.333
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9, s2 = 0.133.
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE
Standard deviation (of x), SD(x) is s =√
s2
Can also write s as sx
Standard error (of x̄), SE(x̄) is√
s2
n = s√n
In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)
Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.
What are SD(x) and SE(x̄)?SD(x) =
√4.333 = 2.08
SE(x̄) = 2.08√4
= 1.04
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.
What are SD(x) and SE(x̄)?SD(x) =
√0.133 = 0.365
SE(x̄) = 0.365√4
= 0.183
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE
Standard deviation (of x), SD(x) is s =√
s2
Can also write s as sx
Standard error (of x̄), SE(x̄) is√
s2
n = s√n
In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)
Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.
What are SD(x) and SE(x̄)?
SD(x) =√
4.333 = 2.08SE(x̄) = 2.08√
4= 1.04
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.
What are SD(x) and SE(x̄)?SD(x) =
√0.133 = 0.365
SE(x̄) = 0.365√4
= 0.183
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE
Standard deviation (of x), SD(x) is s =√
s2
Can also write s as sx
Standard error (of x̄), SE(x̄) is√
s2
n = s√n
In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)
Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.
What are SD(x) and SE(x̄)?SD(x) =
√4.333 = 2.08
SE(x̄) = 2.08√4
= 1.04
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.
What are SD(x) and SE(x̄)?
SD(x) =√
0.133 = 0.365SE(x̄) = 0.365√
4= 0.183
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE
Standard deviation (of x), SD(x) is s =√
s2
Can also write s as sx
Standard error (of x̄), SE(x̄) is√
s2
n = s√n
In other contexts, can speak of standard error of otherquantities, e.g. SE(β̂)
Dr. Forte’s data: 3, 5, 6, 8,x̄ = 5.5, s2 = 4.333.
What are SD(x) and SE(x̄)?SD(x) =
√4.333 = 2.08
SE(x̄) = 2.08√4
= 1.04
Dr. Pianissimo’s data: 5.1, 5.3, 5.7, 5.9x̄ = 5.5, s2 = 0.133.
What are SD(x) and SE(x̄)?SD(x) =
√0.133 = 0.365
SE(x̄) = 0.365√4
= 0.183
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE (cont.)
Recall SD(x) is sx =√
1n−1
∑ni=1(xi − x̄)2, SE(x̄) = sx√
n
If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?
SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).
What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.
Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE (cont.)
Recall SD(x) is sx =√
1n−1
∑ni=1(xi − x̄)2, SE(x̄) = sx√
n
If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?
SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).
What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.
Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.
Sally W. Thurston, PhD Basic Statistics: MBI-540
SD, SE (cont.)
Recall SD(x) is sx =√
1n−1
∑ni=1(xi − x̄)2, SE(x̄) = sx√
n
If Dr. Forte had collected data from 20 mice instead of 4, howwould SD(x) and SE(x̄) be affected?
SD(x) would be about the same (estimated more precisely)SE(x̄) would decrease (we have more confidence about x̄).
What do SD(x) and SE(x̄) mean?SD(x) is our estimate of σ, the SD of x in the population.SE(x̄) is a measure of our uncertainty about µ.
Can form confidence interval (CI) for µ: x̄ ± tcritSE(x̄)tcrit discussed later (usually ≈ 2)Approximately 95% of such CIs include µ.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Why divide by n − 1 in s2?
If we knew µ, σ̂2 =Pn
i=1(xi−µ)2
n
We don’t know µ, so σ̂2 = s2 =Pn
i=1(xi−x̄)2
n−1
For a particular sample, x̄ minimizes∑
(xi − µ̂)2 for any µ̂
Using x̄ to estimate µ:Numerate of s2 is (on average) smaller than
∑(xi − µ)2
Need to divide by smaller number to compensate; n − 1 isthe right number
Sally W. Thurston, PhD Basic Statistics: MBI-540
Why divide by n − 1 in s2?
Examine by simulationTake n = 5 observations fromnormal, µ = 0, σ2 = 1.
Calculate s2 =P
(xi−x̄)2
4 andP(xi−x̄)2
5
Repeat multiple timesNote that true value of σ is 1.
Dividing by n − 1 does wellDividing by n underestimates σ
SD(x): divide by (n−1)
Fre
quen
cy
0.5 1.0 1.5 2.0
020
4060
8010
0
Not SD(x): divide by n
Fre
quen
cy
0.0 0.5 1.0 1.5
020
4060
8012
0
Sally W. Thurston, PhD Basic Statistics: MBI-540
Why divide by n − 1 in s2?
Examine by simulationTake n = 5 observations fromnormal, µ = 0, σ2 = 1.
Calculate s2 =P
(xi−x̄)2
4 andP(xi−x̄)2
5
Repeat multiple timesNote that true value of σ is 1.Dividing by n − 1 does wellDividing by n underestimates σ
SD(x): divide by (n−1)
Fre
quen
cy
0.5 1.0 1.5 2.0
020
4060
8010
0
Not SD(x): divide by n
Fre
quen
cy
0.0 0.5 1.0 1.5
020
4060
8012
0
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
We collected data on the number of M&M of each color inpackages of mini M&M’s
# brown
Fre
quen
cy
5 10 15 20
02
46
8
# orange
Fre
quen
cy
4 6 8 10 12 14 16
02
46
8
# yellow
Fre
quen
cy
2 4 6 8 10 12 14
02
46
810
12
# green
Fre
quen
cy
6 8 10 12
01
23
45
67
# blue
Fre
quen
cy
4 6 8 10 12 14 16 18
05
1015
# red
Fre
quen
cy
4 6 8 10 12 14 16
05
1015
32 people totalHistograms show number ofeach color
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Aside: did everyone get same # M&M’s? NO
Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1
%(blue + red): All
Fre
quen
cy
0.30 0.35 0.40 0.45
01
23
45
67 Focus on % (red + blue)
x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?
We could approximate xi with anormal distribution, mean µ,variance σ2
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Aside: did everyone get same # M&M’s? NO
Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1
%(blue + red): All
Fre
quen
cy
0.30 0.35 0.40 0.45
01
23
45
67 Focus on % (red + blue)
x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?
We could approximate xi with anormal distribution, mean µ,variance σ2
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Aside: did everyone get same # M&M’s? NO
Number MM: 50 51 55 56 57 58 59 60 61 62 66 73People: 1 1 3 2 9 2 4 5 1 2 1 1
%(blue + red): All
Fre
quen
cy
0.30 0.35 0.40 0.45
01
23
45
67 Focus on % (red + blue)
x1 is % (red + blue) for firstperson, x2 same for secondperson, etc.What is the sample?What is the population?
We could approximate xi with anormal distribution, mean µ,variance σ2
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
sample 1
Fre
quen
cy
0.30 0.35 0.40 0.45
0.0
0.5
1.0
1.5
2.0
2.5
3.0
sample 2
Fre
quen
cy
0.30 0.35 0.40 0.45
0.0
0.5
1.0
1.5
2.0
sample 3
Fre
quen
cy
0.30 0.32 0.34 0.36 0.38 0.40
0.0
0.5
1.0
1.5
2.0
sample 4
Fre
quen
cy
0.30 0.32 0.34 0.36 0.38 0.40
0.0
0.5
1.0
1.5
2.0
%(blue + red): samples
Interest: % (red + blue)Divide data into 4 groups of 8observationsCalculate mean, SD, SE(x̄) ineach group
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is
0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29
Calculate x̄
x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)8 = 0.374
Calculate sx (standard deviation)
sx =
√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2
7 = 0.048
Calculate SE(x̄) = sx√n = 0.048√
8= 0.017
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is
0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29
Calculate x̄x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)
8 = 0.374Calculate sx (standard deviation)
sx =
√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2
7 = 0.048
Calculate SE(x̄) = sx√n = 0.048√
8= 0.017
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: % (red + blue): Data (x1, · · · , x8) is
0.45, 0.38, 0.39, 0.38, 0.35, 0.33, 0.41, 0.29
Calculate x̄x̄ = (0.45+0.38+0.39+0.38+0.35+0.33+0.41+0.29)
8 = 0.374Calculate sx (standard deviation)
sx =
√(0.45−0.374)2+(0.38−0.374)2+···+(0.29−0.374)2
7 = 0.048
Calculate SE(x̄) = sx√n = 0.048√
8= 0.017
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
We might want to know if % red + blue in the population = 13
(later: t-test)
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
We might want to know if % red + blue in the population = 13
(later: t-test)
Sally W. Thurston, PhD Basic Statistics: MBI-540
M&M data
Each group has n = 8 observations (32 observations total)Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
We might want to know if % red + blue in the population = 13
(later: t-test)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test
Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?
H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)
Example: H0 : µ = µ0 versus HA : µ 6= µ0.µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).
How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test
Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?
H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)Example: H0 : µ = µ0 versus HA : µ 6= µ0.
µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).
How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test
Recall Dr. Forte’s question: is weight loss in C57BL/6 mice(his data) different than in BALB/c mice (published data,weight loss = 4 ounces)?
H0 is the null hypothesis (“no effect”, “same as before”)HA is the alternative hypothesis (“some effect”, “not H0”)Example: H0 : µ = µ0 versus HA : µ 6= µ0.
µ is mean weight loss for C57Bl/6 mice in populationµ is unknown, but we have an estimate of it (x̄)µ0 is some specific number (here: 4).
How to evaluate H0, HA?Compare x̄ to µ0If x̄ is far from µ0, suggests H0 not true.Need SE(x̄) to evaluate how different x̄ and µ0 are.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test assumptions
T-test is based on tcalc = x̄−µ0SE(x̄)
Assumptions: xi ∼ independent N(µ, σ2)
Interpretation: ∼ means “distributed as”Says data are independent and have a normal distributionµ = population mean (of the xi ’s)σ2 = population variance (of the xi ’s) (σ = SD)
“How can I tell if my data have a normal distribution?”Assumption is about distribution of data from populationt-test robust to moderate departure from normalityImportant to plot data to check for gross violation fromnormality
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test assumptions
T-test is based on tcalc = x̄−µ0SE(x̄)
Assumptions: xi ∼ independent N(µ, σ2)
Interpretation: ∼ means “distributed as”Says data are independent and have a normal distributionµ = population mean (of the xi ’s)σ2 = population variance (of the xi ’s) (σ = SD)
“How can I tell if my data have a normal distribution?”Assumption is about distribution of data from populationt-test robust to moderate departure from normalityImportant to plot data to check for gross violation fromnormality
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples of n=20 from a normal distribution
normsamp[ctr]
−3 −1 0 1 2 3
01
23
4
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
0.0
1.0
2.0
3.0
normsamp[ctr]
−3 −1 0 1 2 3
01
23
45
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
67
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
4
−3 −1 0 1 2 3
01
23
45
67
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
67
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
6
Normal
normsamp[ctr]
Fre
quen
cy
−3 −2 −1 0 1 2 3
01
23
4
One mistake
c(normsamp[1:(nper.plot − 1)], 10.25)
Fre
quen
cy
−4 −2 0 2 4 6 8 10
01
23
4
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples of n=20 from a normal distribution
normsamp[ctr]
−3 −1 0 1 2 3
01
23
4
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
0.0
1.0
2.0
3.0
normsamp[ctr]
−3 −1 0 1 2 3
01
23
45
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
67
normsamp[ctr]
Fre
quen
cy
−3 −1 0 1 2 3
01
23
4
−3 −1 0 1 2 3
01
23
45
67
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
67
Fre
quen
cy
−3 −1 0 1 2 3
01
23
45
6
Normal
normsamp[ctr]
Fre
quen
cy
−3 −2 −1 0 1 2 3
01
23
4
One mistake
c(normsamp[1:(nper.plot − 1)], 10.25)
Fre
quen
cy
−4 −2 0 2 4 6 8 10
01
23
4
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outliers
“My data has one very unusual value. Should I delete it?”
Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?
animal was sicka different person took the measurementexperimental conditions different from others
If clear reason why unusual value differs from otherobservations
Value represents different condition: OK to delete (explainin paper)
If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outliers
“My data has one very unusual value. Should I delete it?”Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?
animal was sicka different person took the measurementexperimental conditions different from others
If clear reason why unusual value differs from otherobservations
Value represents different condition: OK to delete (explainin paper)
If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outliers
“My data has one very unusual value. Should I delete it?”Step 1: check to see if the value is a mistake! If so, fix.Step 2: is there something unusual about thatobservation?
animal was sicka different person took the measurementexperimental conditions different from others
If clear reason why unusual value differs from otherobservations
Value represents different condition: OK to delete (explainin paper)
If cannot determine any reason why value is unusualDeleting observation underestimates population variability:don’t deleteAlternative: report results both with and without the value(explain in paper)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Parametric or non-parametric?
“When should I use a non-parametric version of a t-test?”Some statisticians (nearly) always use non-parametrictests (such as Mann-Whitney)Some statisticians (nearly) always use parametric tests(such as t-test)Knowledge of type of data can help determine if normalityis reasonable
Sometimes data approximately normal after transformation
If no severe departure from normality, either parametric ornonparametric tests probably fine.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test example
T-test is based on tcalc = x̄−µ0SE(x̄)
Dr. Forte: H0 : µ = 4, HA : µ 6= 4.We calculated x̄ = 5.5, SE(x̄) = 1.04.What is tcalc?
Here tcalc = (5.5− 4)/1.04 = 1.44
Under H0, tcalc ∼ tn−1
Here: tcalc ∼ t3If H0 true, tcalc centered around 0
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test example
T-test is based on tcalc = x̄−µ0SE(x̄)
Dr. Forte: H0 : µ = 4, HA : µ 6= 4.We calculated x̄ = 5.5, SE(x̄) = 1.04.What is tcalc?
Here tcalc = (5.5− 4)/1.04 = 1.44
Under H0, tcalc ∼ tn−1
Here: tcalc ∼ t3If H0 true, tcalc centered around 0
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution
Recall tcalc = x̄−µ0SE(x̄)
“How can tcalc have a distribution? It’s just a number! ”Your thoughts?
Any single experiment gives a single tcalc .Could take another sample of 4 C57BL/6 mice, measureweight loss, get tcalc for that sample.This could theoretically be repeated multiple times withdifferent miceEach experiment would give a tcalc
A histogram of the tcalc values would look like the t3distribution (assuming enough experiments)
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution
Recall tcalc = x̄−µ0SE(x̄)
“How can tcalc have a distribution? It’s just a number! ”Your thoughts?
Any single experiment gives a single tcalc .Could take another sample of 4 C57BL/6 mice, measureweight loss, get tcalc for that sample.This could theoretically be repeated multiple times withdifferent miceEach experiment would give a tcalc
A histogram of the tcalc values would look like the t3distribution (assuming enough experiments)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples from t-distributions with 3 df
n= 10
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n= 100
−6 −4 −2 0 2 4
05
1015
2025
30
n= 1000
−4 −2 0 2 4 6
050
100
150
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·
n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples from t-distributions with 3 df
n= 10
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n= 100
−6 −4 −2 0 2 4
05
1015
2025
30
n= 1000
−4 −2 0 2 4 6
050
100
150
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”
n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples from t-distributions with 3 df
n= 10
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n= 100
−6 −4 −2 0 2 4
05
1015
2025
30
n= 1000
−4 −2 0 2 4 6
050
100
150
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)
n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Samples from t-distributions with 3 df
n= 10
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
n= 100
−6 −4 −2 0 2 4
05
1015
2025
30
n= 1000
−4 −2 0 2 4 6
050
100
150
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
n=10: First fewobservations: -0.69,1.73, 0.55, 0.96,-0.79, · · ·n=100: Starting tolook roughly “normal”n=1000: (12 obsoutside plot. Rangewas: -12.8 to 8.3)n=10,000: (85 obsoutside plot. Rangewas: -24.3 to 35.7)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Normal, t distributions: differences?
Normal distributionn= 10000
−6 −4 −2 0 2 4 6
020
040
060
080
0
Normal: range was: -3.6 to 3.9
t3 distributionn= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
t3: range was: -24.3 to 35.7
Sally W. Thurston, PhD Basic Statistics: MBI-540
Different t-distributions
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
df=2df=8normal
With more data, larger ndegrees of freedom (df) =n − 1 increasessmaller area in tails of thedistributionapproaches normaldistribution (which is t with∞ df)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Normal, t distributions
“Why doesn’t tcalc = x̄−µ0SE(x̄) have a normal distribution?”
Recall we assumed xi ∼ N(µ, σ2)
x̄ has a normal distributionIf we knew σ2
SE(x̄) would be σ2/n, a fixed number“tcalc” would have a normal distribution
When σ2 not known, use s2 to estimate itOur estimate of σ2 isn’t perfectResult: tcalc more variable
Sally W. Thurston, PhD Basic Statistics: MBI-540
Normal, t distributions
“Why doesn’t tcalc = x̄−µ0SE(x̄) have a normal distribution?”
Recall we assumed xi ∼ N(µ, σ2)
x̄ has a normal distributionIf we knew σ2
SE(x̄) would be σ2/n, a fixed number“tcalc” would have a normal distribution
When σ2 not known, use s2 to estimate itOur estimate of σ2 isn’t perfectResult: tcalc more variable
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution (3 df)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
t dist, 3 df2.5% cutpt = 3.18
If H0 trueExpect values nearcenterRed: unusual
If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area
Each shaded area has 2.5% of thedistributionShaded area defined by tcrit
tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α
With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution (3 df)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
t dist, 3 df2.5% cutpt = 3.18
If H0 trueExpect values nearcenterRed: unusual
If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area
Each shaded area has 2.5% of thedistributionShaded area defined by tcrit
tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α
With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution (3 df)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
t dist, 3 df2.5% cutpt = 3.18
If H0 trueExpect values nearcenterRed: unusual
If H0 true, control Pr(reject H0) to α(α often chosen as 0.05)Reject H0 if tcalc in shaded area
Each shaded area has 2.5% of thedistributionShaded area defined by tcrit
tcrit is number for which 2.5% ofdistribution has larger values(−tcrit : 2.5% are smaller)tcrit depends on df and α
With 3 df, α = 0.05, tcrit = 3.18Reject H0 if |tcalc | > tcrit
Sally W. Thurston, PhD Basic Statistics: MBI-540
Sample from t-distribution: how good is tcrit?
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
How good is tcrit?Do we really get 2.5% in each “tail”?For t3, α = 0.05, tcrit = 3.18.
From sample of 10,000 observation from t3247 observations were < −3.18 (2.47%)258 observations were > 3.18 (2.58%)
If H0 true and α = 0.05, approximately 5% of the time wewould reject H0 (“type I error”)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Sample from t-distribution: how good is tcrit?
n= 10000
−6 −4 −2 0 2 4 6
020
040
060
0
How good is tcrit?Do we really get 2.5% in each “tail”?For t3, α = 0.05, tcrit = 3.18.
From sample of 10,000 observation from t3247 observations were < −3.18 (2.47%)258 observations were > 3.18 (2.58%)
If H0 true and α = 0.05, approximately 5% of the time wewould reject H0 (“type I error”)
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for Dr. Forte
Recall Dr. Forte’s question: is weight loss in C57BL/6mice (his data) different than in BALB/c mice(published data, weight loss = 4 ounces)?His hypotheses: H0 : µ = 4, HA : µ 6= 4.µ: weight loss in population of C57BL/6 miceDr. Forte: tcalc = x̄−µ0
SE(x̄) = (5.5− 4)/1.04 = 1.44
t-test rule: Reject H0 if |tcalc | > tcrit
With 3 df, α = 0.05, tcrit = 3.18What should Dr. Forte conclude?
Fail to reject H0Dr. Forte’s data are consistent with H0 (p = 0.25)No evidence that weight loss in C57BL/6 mice isdifferent from 4 ounces.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for Dr. Forte
Recall Dr. Forte’s question: is weight loss in C57BL/6mice (his data) different than in BALB/c mice(published data, weight loss = 4 ounces)?His hypotheses: H0 : µ = 4, HA : µ 6= 4.µ: weight loss in population of C57BL/6 miceDr. Forte: tcalc = x̄−µ0
SE(x̄) = (5.5− 4)/1.04 = 1.44
t-test rule: Reject H0 if |tcalc | > tcrit
With 3 df, α = 0.05, tcrit = 3.18What should Dr. Forte conclude?Fail to reject H0
Dr. Forte’s data are consistent with H0 (p = 0.25)No evidence that weight loss in C57BL/6 mice isdifferent from 4 ounces.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-distribution (3df)
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
t dist, 3 df2.5% cutpt = 3.18
Dr. Forte’s value: blue line(1.44)Compare Dr. Forte’s value toa t distribution with 3 dfReject H0 if tcalc in shadedareaFail to reject H0
Sally W. Thurston, PhD Basic Statistics: MBI-540
Confidence interval
95% confidence interval (CI): x̄ ± tcrit × SE(x̄)
(Assumes tcrit calculated using α = 0.05)
This example: tcrit = 3.18We calculated x̄ = 5.5, SE(x̄) = 1.04What is Dr. Forte’s 95% CI?
95% CI is 5.5± 3.18× 1.04 = (2.19, 8.81)
This is a 95% CI for µFail to reject H0 because interval includes µ0 = 4.Confidence interval more informative than “fail to reject H0”
Sally W. Thurston, PhD Basic Statistics: MBI-540
Confidence interval
95% confidence interval (CI): x̄ ± tcrit × SE(x̄)
(Assumes tcrit calculated using α = 0.05)
This example: tcrit = 3.18We calculated x̄ = 5.5, SE(x̄) = 1.04What is Dr. Forte’s 95% CI?95% CI is 5.5± 3.18× 1.04 = (2.19, 8.81)
This is a 95% CI for µFail to reject H0 because interval includes µ0 = 4.Confidence interval more informative than “fail to reject H0”
Sally W. Thurston, PhD Basic Statistics: MBI-540
Classical statistics
µ considered fixed.Observed data (xi ) are considered random
Could collect new data: different xiImplies calculated confidence interval is random.
If H0 is true, on average 95% of the CIs will include µ.Can never answer “Is H0 true?”
Can only say something about how unusual our teststatistic is if H0 is true
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
t dist, 3 df12.5% cutpt = 1.44 p-value = probability of observing
a statistic as extreme or moreextreme as we observed, if H0 istrue
Curve: total area=1Dr. Forte’s tcalc = 1.44.p-value = shaded area = 0.25
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value
We calculated p = 0.25 for Dr. Forte’s data.
Is this the same as the probability that H0 is true?
Nop-value is calculated assuming H0 is true.
Recall: p-value is probability of observing a statistic as or moreextreme as we observed, if H0 is true
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value
We calculated p = 0.25 for Dr. Forte’s data.
Is this the same as the probability that H0 is true?
Nop-value is calculated assuming H0 is true.
Recall: p-value is probability of observing a statistic as or moreextreme as we observed, if H0 is true
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value questions
What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.”
Not true“If the result is not statistically significant, that proves thereis no difference.” Not true
Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07
If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”
A p-value is not an end in itself.
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value questions
What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true
“If the result is not statistically significant, that proves thereis no difference.” Not true
Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07
If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”
A p-value is not an end in itself.
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value questions
What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.”
Not trueBetter:
Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07
If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”
A p-value is not an end in itself.
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value questions
What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.” Not true
Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07
If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”
A p-value is not an end in itself.
Sally W. Thurston, PhD Basic Statistics: MBI-540
p-value questions
What do you think of these statements?“The p-value tells you whether or not the observed effect isreal.” Not true“If the result is not statistically significant, that proves thereis no difference.” Not true
Better:Use knowledge of the magnitude of the estimated effect(and the CI) to help interpret the results.Suppose comparing treatment to control and p = 0.07
If estimated effect large, “Our study suggests an importantbenefit of treatment, but this did not reach statisticalsignificance.”
A p-value is not an end in itself.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?
0.374− 0.3330.0169
= 2.41
Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion? Reject H0 (also p = 0.047).
For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?
0.374− 0.3330.0169
= 2.41
Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion?
Reject H0 (also p = 0.047).For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
Consider M&M data from 1st sample of 8 peopleGroup 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
Recall xi is % (red + blue) M&M’sTest H0 : µ = 1/3, versus HA : µ 6= 1/3.What is tcalc?
0.374− 0.3330.0169
= 2.41
Compare to tcrit = 2.36 (based on (n-1) df, α = 0.05)Conclusion? Reject H0 (also p = 0.047).
For first 8 people, mean % (red + blue) M&M’s significantlydifferent than 1/3.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203
tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123
tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
tcalc = 1.084, p = 0.314
For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
tcalc = 3.19, p = 0.003
Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203
tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123
tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
tcalc = 1.084, p = 0.314
For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
tcalc = 3.19, p = 0.003
Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.
Sally W. Thurston, PhD Basic Statistics: MBI-540
t-test for M&M’s
For groups of 8 people, tcrit = 2.36Group 1: x̄= 0.374, sx= 0.0479 SE(x̄)= 0.0169
tcalc = 2.41, p = 0.047Group 2: x̄= 0.3564, sx= 0.0574 SE(x̄)= 0.0203
tcalc = 1.138, p = 0.293Group 3: x̄= 0.3549, sx= 0.0348 SE(x̄)= 0.0123
tcalc = 1.175, p = 0.123Group 4: x̄= 0.3465, sx= 0.0342 SE(x̄)= 0.0121
tcalc = 1.084, p = 0.314
For entire sample (n=32), tcrit = 2.04Everyone: x̄= 0.358 sx= 0.0436 SE(x̄)= 0.0077
tcalc = 3.19, p = 0.003
Larger n: smaller SE(x̄), smaller tcrit , easier to reject H0 if H0not true.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
2-sample t-test
Dr. Forte now wants to compare the effect of ’superdrug’on weight loss in knockout mice (new) versus C57BL/6mice (just measured)
NOTE: Better to collect data on both species in the sametime period.
Let x̄1 and s1 be mean and SD of weight loss in 1st sample(C57BL/6), n1 = 4 observations.Let x̄2 and s2 be mean and SD of weight loss in 2ndsample (knockout), n2 observationsµ1 is population mean weight loss in C57BL/6 mice, µ2 inknockout mice.Null hypothesis of “no effect” or “no difference” isH0 : µ1 = µ2, so HA : µ1 6= µ2.
Sally W. Thurston, PhD Basic Statistics: MBI-540
2-sample t-test (cont.)
H0 : µ1 = µ2, HA : µ1 6= µ2
Same as H0 : µ1 − µ2 = 0 and HA : µ1 − µ2 6= 0.Estimate µ1 − µ2 by x̄1 − x̄2.tcalc = x̄1−x̄2
SE(x̄1−x̄2)
Assumptions of the 2-sample test: data from 2 populationsAre independentAre normally distributed (within population)Have the same variance (or: use modified estimate ofstandard error)
Reject H0 if |tcalc | > tcrit (tcrit depends on n1 + n2 and α)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 1
Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once
Is this a reasonable design?
Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 1
Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once
Is this a reasonable design?
Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 1
Pairs of mice are housed in the same cage with same foodsourceInterest is in effect of restricted food intakeMice fed 2x/day, only 1 mouse can eat at once
Is this a reasonable design?
Mice are not independent: if one mouse in a pair eatsaggressively, the other mouse cannot eat as much
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 2
12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source
Is this a reasonable design? No
Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 2
12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source
Is this a reasonable design?
No
Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Scenario 2
12 mice receive standard food, 12 a new food.Interest: gain in head circumference, measured at end of 2weeksBob measures mice receiving standard food, Harrymeasures the othersBob and Harry do not know which mice received whichfood source
Is this a reasonable design? No
Food source is totally confounded with person who takesmeasurementsBob might tend to get smaller measurements on sameanimal (bias)Harry might tend to measure less accurately (largervariance)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s 2-sample data
1 2
12
34
56
78
group
y
●
●
●
●
Do you think assumptions arereasonable?
(OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2
SE(x̄1−x̄2)
x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1
1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)
95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s 2-sample data
1 2
12
34
56
78
group
y
●
●
●
●
Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2
SE(x̄1−x̄2)
x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc?
5.5−2.11.19 = 2.86
tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)
95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s 2-sample data
1 2
12
34
56
78
group
y
●
●
●
●
Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2
SE(x̄1−x̄2)
x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1
1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion?
(Reject H0)
95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s 2-sample data
1 2
12
34
56
78
group
y
●
●
●
●
Do you think assumptions arereasonable? (OK)H0 : µ1 − µ2 = 0, HA : µ1 − µ2 6= 0.tcalc = x̄1−x̄2
SE(x̄1−x̄2)
x̄1 = 5.5, x̄2 = 2.1SE(x̄1 − x̄2) = 1.19 (assumes samevariance in 2 groups)What is tcalc? 5.5−2.1
1.19 = 2.86tcrit = 2.45, from 6 df, α = 0.05What is your conclusion? (Reject H0)
95% CI for µ1 − µ2 is x̄1 − x̄2 ± tcrit × SE(x̄1 − x̄2) = (0.50, 6.30)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s conclusions
Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03
What does “reject H0” mean here?
Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.
What does the p-value mean?If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s conclusions
Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03
What does “reject H0” mean here? Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.
What does the p-value mean?
If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Forte’s conclusions
Recall: Dr. Forte wants to compare the effect of’superdrug’ on weight loss in knockout vs C57BL/6mice.C57BL/6 mice: (x̄1 = 5.5), knockout mice (x̄2 = 2.1)We rejected H0, p = 0.03
What does “reject H0” mean here? Data suggests that theaverage weight loss in C57BL/6 mice is significantlygreater than in knockout mice.
What does the p-value mean?If the 2 species had the same mean weight loss, we wouldhave observed a difference as great or greater 3% of thetime.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
5.6
5.8
6.0
6.2
6.4
6.6
6.8
group
y
●
●
●
●
●
●
Comments?
The data distribution looksreasonable
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
5.6
5.8
6.0
6.2
6.4
6.6
6.8
group
y
●
●
●
●
●
●
Comments?The data distribution looksreasonable
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
−4
−2
02
46
group
y
●●●●●●
Comments?
One large outlying observationA mistake?Non-normal distribution?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
−4
−2
02
46
group
y
●●●●●●
Comments?One large outlying observation
A mistake?Non-normal distribution?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
300
400
500
600
700
800
900
group
y
●
●
●
●
●
●
Comments?
Variability within groups notsimilarSame data as first plot -EXCEPT data isexponentiatedSuggest: use logtransformationKnowledge of what theoutcomes are can guidedecision about whether totransform
Sally W. Thurston, PhD Basic Statistics: MBI-540
Plot the data: why?
1 2
300
400
500
600
700
800
900
group
y
●
●
●
●
●
●
Comments?Variability within groups notsimilarSame data as first plot -EXCEPT data isexponentiatedSuggest: use logtransformationKnowledge of what theoutcomes are can guidedecision about whether totransform
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Crescendo
Dr. Crescendo wants to test the effects of drugs A, B, andC on cell counts in 96-well platesSome things to consider
Is each drug tested on multiple plates? (plate effect)Are the wells at the edge of the plate different from others?(edge effect)Are the wells for drug A always read first? (order effect)
Sally W. Thurston, PhD Basic Statistics: MBI-540
Set up for 1-way ANOVA
Dr. Crescendo took 5 measurements from each of the 3treatments (A,B,C)µ1, µ2, µ3 are population means of treatment effects A, B,and C.Hypotheses:
H0 : µ1 = µ2 = µ3HA : at least 2 means are not equal.
How to test H0, HA?
Sally W. Thurston, PhD Basic Statistics: MBI-540
Pictures for 1-way ANOVA
1 2 3
45
67
8
group
y
●
●
●
●
●
1 2 3
45
67
8group
y
●
●
●
●
●
How do the 2 pictures differ?
Right: 3 means are more different from each otherRight: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments
Sally W. Thurston, PhD Basic Statistics: MBI-540
Pictures for 1-way ANOVA
1 2 3
45
67
8
group
y
●
●
●
●
●
1 2 3
45
67
8group
y
●
●
●
●
●
How do the 2 pictures differ?Right: 3 means are more different from each other
Right: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments
Sally W. Thurston, PhD Basic Statistics: MBI-540
Pictures for 1-way ANOVA
1 2 3
45
67
8
group
y
●
●
●
●
●
1 2 3
45
67
8group
y
●
●
●
●
●
How do the 2 pictures differ?Right: 3 means are more different from each otherRight: Variability between means is also more different
ANOVA compares variability between treatments tovariability within treatments
Sally W. Thurston, PhD Basic Statistics: MBI-540
Pictures for 1-way ANOVA
1 2 3
45
67
8
group
y
●
●
●
●
●
1 2 3
45
67
8group
y
●
●
●
●
●
How do the 2 pictures differ?Right: 3 means are more different from each otherRight: Variability between means is also more differentANOVA compares variability between treatments tovariability within treatments
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr Crescendo’s data
1 2 3
5.5
6.0
6.5
7.0
group
y
●
●
●
●●
Dr. Crescendo's data
Sample sizes:n1 = n2 = n3 = 5.Sample means: x̄1 =5.49, x̄2 = 6.07, x̄3 = 6.38.Sample variances: s2
1 =0.29, s2
2 = 0.48, s23 = 0.66
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
ANOVA assumes Var(x1) = Var(x2) = Var(x3)
Call this σ2
Variability within treatments is weighted average ofs2
1, s22, s2
3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25
Variability between treatments is weighted average of howfar apart group means are from overall mean
df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
ANOVA assumes Var(x1) = Var(x2) = Var(x3)
Call this σ2
Variability within treatments is weighted average ofs2
1, s22, s2
3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25
Variability between treatments is weighted average of howfar apart group means are from overall mean
df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
ANOVA assumes Var(x1) = Var(x2) = Var(x3)
Call this σ2
Variability within treatments is weighted average ofs2
1, s22, s2
3.degrees of freedom (df) = n1 + n2 + n3 − 3 (here: df=12)Dr. Crescendo: within treatment variance is3.016/12 = 0.25
Variability between treatments is weighted average of howfar apart group means are from overall mean
df = number groups - 1 (here: df=2)Dr. Crescendo: between treatment variance is2.063/2 = 1.03
Sally W. Thurston, PhD Basic Statistics: MBI-540
Testing in 1-way ANOVA (3 groups)
Recall: 1-way ANOVA comparesVariability within treatment: MS(Within) = MS(Error)Variability between treatment: MS(Between) = MS(Group)(Note: MS = mean square)
Dr. Crescendo:MS(Error) = 3.016
12 = 0.25MS(Between) = 2.063
2 = 1.03
ANOVA tableDf Sum Sq Mean Sq
Between 2 2.06337 1.03169Error 12 3.01612 0.25134
Sally W. Thurston, PhD Basic Statistics: MBI-540
Testing in 1-way ANOVA (3 groups)
Recall: 1-way ANOVA comparesVariability within treatment: MS(Within) = MS(Error)Variability between treatment: MS(Between) = MS(Group)(Note: MS = mean square)
Dr. Crescendo:MS(Error) = 3.016
12 = 0.25MS(Between) = 2.063
2 = 1.03
ANOVA tableDf Sum Sq Mean Sq
Between 2 2.06337 1.03169Error 12 3.01612 0.25134
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?
MS(Between) ≈ MS(Within), and both estimate σ2
MS(Between)MS(Within) close to 1.
If H0 not true, same questionsOnly MS(Within) estimates σ2.
MS(Between) > MS(Within) so MS(Between)MS(Within) > 1
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2
MS(Between)MS(Within) close to 1.
If H0 not true, same questionsOnly MS(Within) estimates σ2.
MS(Between) > MS(Within) so MS(Between)MS(Within) > 1
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2
MS(Between)MS(Within) close to 1.
If H0 not true, same questions
Only MS(Within) estimates σ2.
MS(Between) > MS(Within) so MS(Between)MS(Within) > 1
Sally W. Thurston, PhD Basic Statistics: MBI-540
1-way ANOVA: basics
If H0 trueHow do MS(Within) and MS(Between) compare?How could we estimate σ2?MS(Between) ≈ MS(Within), and both estimate σ2
MS(Between)MS(Within) close to 1.
If H0 not true, same questionsOnly MS(Within) estimates σ2.
MS(Between) > MS(Within) so MS(Between)MS(Within) > 1
Sally W. Thurston, PhD Basic Statistics: MBI-540
Testing in 1-way ANOVA (3 groups)
H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.
ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)
Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134
Test of H0 is Fcalc =MS(Between)
MS(Error) = 4.10.
If H0 true (and assumptions met!), Fcalc ∼ F2,12
Sally W. Thurston, PhD Basic Statistics: MBI-540
Testing in 1-way ANOVA (3 groups)
H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.
ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)
Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134
Test of H0 is Fcalc =MS(Between)
MS(Error) = 4.10.
If H0 true (and assumptions met!), Fcalc ∼ F2,12
Sally W. Thurston, PhD Basic Statistics: MBI-540
Testing in 1-way ANOVA (3 groups)
H0 : µ1 = µ2 = µ3, HA : at least 2 means are not equal.
ANOVA tableDf Sum Sq Mean Sq F value Pr(>F)
Between 2 2.06337 1.03169 4.1047 0.04383Error 12 3.01612 0.25134
Test of H0 is Fcalc =MS(Between)
MS(Error) = 4.10.
If H0 true (and assumptions met!), Fcalc ∼ F2,12
Sally W. Thurston, PhD Basic Statistics: MBI-540
Some F distributions
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
df=2,12df=3,12df=8,12df=8,100
Sally W. Thurston, PhD Basic Statistics: MBI-540
Dr. Crescendo’s Fcalc
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
F dist, 2 and 12 df5% cutpt = 3.89
Generally: only large valuesof Fcalc considered unusual(1-tail)Reject H0 ifFcalc > F2,12,.95 = 3.89.Dr. Crescendo:Fcalc = 1.03/0.25 = 4.10,p = 0.044. Reject H0.
Sally W. Thurston, PhD Basic Statistics: MBI-540
Outline
1 Design of experiments: examples
2 Mean, standard deviation, standard error
3 M&M data
4 t-test
5 2-sample t-test
6 Importance of plotting the data
7 1-way ANOVA
8 Summary remarks
Sally W. Thurston, PhD Basic Statistics: MBI-540
Interacting with statistician
“What questions should I ask the statistician?”
Who asks the first questions?I ask: “Tell me about your experiment and your question.”We need some background
If designing an experiment:I ask: “What is your research question (explain in terms ofdata you plan to collect)?”We can help you
Decide what data to collect to answer questionHow to design experimentFigure out reasonable sample size (but need furtherinformation from you: estimated effect)
For data already collectedI ask: “What is your research question (explained in termsof data)?”Plots of data helpful
Sally W. Thurston, PhD Basic Statistics: MBI-540
Interacting with statistician
“What questions should I ask the statistician?”Who asks the first questions?
I ask: “Tell me about your experiment and your question.”We need some background
If designing an experiment:I ask: “What is your research question (explain in terms ofdata you plan to collect)?”We can help you
Decide what data to collect to answer questionHow to design experimentFigure out reasonable sample size (but need furtherinformation from you: estimated effect)
For data already collectedI ask: “What is your research question (explained in termsof data)?”Plots of data helpful
Sally W. Thurston, PhD Basic Statistics: MBI-540
How a statistician can help
Design of experimentStatistician must understand study aims and backgroundCan help investigator decide
what data to collecthow to design data collection processhow to use data to answer research questions
AnalysisDetermine appropriate modelUnderstand and check assumptions (use different model ifviolated)Carry out analysis
Sally W. Thurston, PhD Basic Statistics: MBI-540
What does Biostatistics Dept offer?
CoursesBST463 = Introduction to Biostatistics (fall semester)
Consulting serviceRotation system: faculty and PhD student or postdocFree assistance with grant preparation if appropriate %effort for BiostatisticsFree 1-hour consultation on anythingIf related to CTSI, voucher possible for up to 10 free hoursStudent projects: faculty mentor must attend initial meetingand most other meetings with BiostatisticiansFaculty advisor can contact Susan Messing(Susan_Messing@urmc.rochester.edu) for appointment,see http://www.urmc.rochester.edu/biostat/consulting/
Sally W. Thurston, PhD Basic Statistics: MBI-540
Parting remarks
Statistics is not just “crunching numbers”Requires understanding of model and assumptions.
Good experimental design and data collection is keyPoor design: may be impossible to make valid inferenceabout population.Biased data collection: may never be able to adjust for bias.Good design: enables statistical inference about populationof inference based on data collected.
Sally W. Thurston, PhD Basic Statistics: MBI-540