day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare...

55
Testing proportions BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD

Transcript of day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare...

Page 1: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

TestingproportionsBIO5312FALL2017

STEPHANIE J. SPIELMAN,PHD

Page 2: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

EstimationAnestimatorisastatistic(~formula)forestimatingaparameterAgoodestimatorisunbiased◦ Theexpectedvalue(expectation)oftheestimatorshouldequaltheparameterbeingestimated

◦ Meanofthesamplingdistributionofthestatisticshouldequaltheparameterbeingestimated

Agoodestimatorisconsistent◦ IncreasingthesamplesizeproducesanestimatewithsmallerSE

Agoodestimatorisefficient◦ HasthesmallestSEamonganyestimatoryoucouldhavechosen

Page 3: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Weareusuallyinterestedinpointestimate,SE,andCINormally-distributedvariable◦ 𝜇" = �̅�

◦ 𝜎() = ∑ (,-.,̅)01-234.5

◦ Knownσ◦ SE= 6

4�

◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ◦ SE= ?

4�

◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸

Page 4: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Hypothesistestingframeworkst-tests comparemeansforcontinuousquantitativedata

Todaywewilllearntoanalyzediscretecountdata("proportions"):◦ Binomialtest◦ 𝝌2 goodness-of-fit◦ Contingencytableanalysis◦ 𝝌2 association/homogeneityandFisherexacttest

Page 5: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Binomialtest𝑃 𝑘𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 = 4

H 𝑝H 1 − 𝑝 (4.H) = 4

H 𝑝H𝑞(4.H)

◦ Binomialcoefficient: 4H = 4!H! 4.H !

Hypothesistest:◦ H0 :Therelativefrequencyofsuccessintheunderlyingpopulationisp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationisnotp0◦ HA :Therelativefrequencyofsuccessintheunderlyingpopulationis>/<p0

Nullproportionofsuccessestotestagainst

Page 6: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Binomialtestassumption:BInSconditionsaresatisfied

Binaryoutcomes

Independenttrials(outcomesdonotinfluenceeachother)

n isfixedbeforethetrialsbegin

Sameprobabilityofsuccess,p,foralltrials

Page 7: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Binomialtest:ExampleInacertainspeciesofwasp,eachwasphasa30%chanceofbeingmale.Icollect12wasps,ofwhich5aremale.Doesmysampleshowevidencethat30%ofwaspsaremale?Useα=0.05.

Inotherwords,istheobservedsuccessproportion5/12(41.67%)consistentwithapopulationwhoseprobabilityofsuccessis0.3?

Page 8: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

VerifyingassumptionsBinaryoutcomes:Maleorfemale

Independenttrials:Waspsexdoesnotinfluencesexofotherwasps

n isfixedbeforethetrialsbegin: Icollect12wasps

Sameprobabilityofsuccess,p,foralltrials: P(male)=0.3foreverywasp

Page 9: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

PerformingthebinomialtestMysample:◦ p=5/12=0.417◦ n=12◦ X=5 WegenerallysayXinsteadofkwhenperforminghypothesistests,byconvention

H0 :Theprobabilityofbeingamalewaspisp0 =0.3

HA:Theprobabilityofbeingamalewaspdiffersfromp0 =0.3

Page 10: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

ThePMFforwaspsex

0.014

0.071

0.17

0.240.23

0.16

0.079

0.029

0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)

Prob

abilit

y m

ass

Thesamplingdistributionforthebinomialteststatisticisbinomial:Thisiseffectivelyournull.

Page 11: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

PerformingthetestRecall,theP-valueistheprobabilityofobtainingaresultasextremeormore◦ Therefore,P-valueisP(numberofsuccesses>=5)

𝑃(𝑋 ≥ 5) = 5)< 0.3<0.7(5).<) + 5)

U 0.3U0.7(5).U) + ⋯+ 5)5) 0.3

5)0.7(5).5))

p0 =0.3n=12X=5

0.014

0.071

0.17

0.240.23

0.16

0.079

0.029

0.0078 0.0015 0.00019 1.5e−05 5.3e−070.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12Number of males (successes)

Prob

abilit

y m

ass

> 1 – pbinom(4, 12, 0.3)[1] 0.2673445

Page 12: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Conclusions,round1OurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.

Page 13: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

NotesonbinomialtestsComputingtwo-sidedP-valuesisnon-trivial◦ Binomialdistributionsymmetriconlywhenp=0.5

> binom.test(5, 12, 0.3)Exact binomial test

data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303

sample estimates:probability of success

0.4166667

Thisisnot0.276*2!

Page 14: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Computingthebinomialstandarderror𝑺𝑬𝒑" = 𝒑" 𝟏 − 𝒑" /𝒏�

= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐

�= 0.142

Whatisthisvalue?1. Thestandarddeviationofthesamplingdistributionoftheprobabilityofsuccess2. Quantifiestheprecisionof�̂�,ourestimateofthepopulationprob.ofsuccess

Page 15: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

ComputingthebinomialconfidenceintervalClassically,weusetheWaldmethod◦ Note:Only"precise"whennisnotverylarge(>0.8)orsmall(<0.2)

◦ 𝒑" istheestimatedproportionofsuccess,X/n=0.417◦ 𝒁𝟎.𝟎𝟐𝟓 is1.96

◦ 𝑺𝑬𝒑" = 𝒑"(𝟏.𝒑")

𝒏�

= 𝟎.𝟒𝟏𝟕(𝟏.𝟎.𝟒𝟏𝟕)𝟏𝟐

�=0.142

𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"

Page 16: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

CalculatingthebinomialCI𝒑" − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑" < 𝒑 < 𝒑" + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒑"

0.417– 0.278<p<0.417+0.278à 0.417± 0.278> binom.test(5, 12, 0.3)Exact binomial test

data: 5 and 12number of successes = 5, number of trials = 12, p-value = 0.3614alternative hypothesis: true probability of success is not equal to 0.395 percent confidence interval:0.1516522 0.7233303

sample estimates:probability of success

0.4166667

Rusesamoreexactmethod,theClopper-Pearsoninterval

Page 17: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

FinalconclusionsOurP-valueof0.276ismuchgreaterthanα.Thereforewefailtorejectthenullhypothesisandwehavenoevidencethatthepopulationproportionofmalescorrespondingtooursamplediffersfrom0.3.

Ourestimatedproportionofsuccessis0.417withSE=0.142anda95%CIof0.417± 0.278.

Page 18: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Pause:Binomialexercise

Page 19: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Use𝟀2Goodness-of-fittestifwedonothavebinaryoutcomesGoodness-of-fittestasksifobservedproportionsareequaltoanullproportion

df =(numberofcategories)– 1– (numberofparametersestimatedfromdata)0forgoodness-of-fittest

Page 20: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Example:Arebabiesbornwiththesamefrequencyeverydayoftheweek?

Freq

uenc

y

Sun.Mon.

Tue.W

ed.Thu.

Fri. Sat.

70

60

50

40

30

20

10

0

Dayin1999 #birthsSunday 33Monday 41Tuesday 63Wednesday 63Thursday 47Friday 56Saturday 47

H0 :Theprobabilityofbirthwasthesameeverydayoftheweekin1999.

HA:Theprobabilityofbirthwasnotthesameeverydayoftheweekin1999.

Page 21: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Teststatistic𝜒) = ∑ #ij?klmkn-.#k,okpqkn- 0

#k,okpqkn-�r

Day# Observed

births #daysin1999 Expectedprop # ExpectedbirthsSunday 33 52 52/365 =0.142 0.142*52=49.863Monday 41 52 0.142 49.863Tuesday 63 52 0.142 49.863Wednesday 63 52 0.142 49.863Thursday 47 52 0.142 49.863Friday 56 53 0.145 50.822Saturday 47 52 0.142 49.863

Total 350 365 1 1

Page 22: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Calculatingtheteststatisticanddf

Day # Observedbirths # ExpectedbirthsSunday 33 0.142*52=49.863Monday 41 49.863Tuesday 63 49.863Wednesday 63 49.863Thursday 47 49.863Friday 56 50.822Saturday 47 49.863

Total 350 1

𝜒) = s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑r − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r )

#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑r

r

= (yy.z{.|Uy)0

z{.|Uy+(z5.z{.|Uy)

0

z{.|Uy+(Uy.z{.|Uy)

0

z{.|Uy+(Uy.z{.|Uy)

0

z{.|Uy+(z}.z{.|Uy)

0

z{.|Uy+(<U.<:.|)))

0

<:.|))+(z}.z{.|Uy)

0

z{.|Uy

=15.05

df =#categories – 1=7– 1=6

OurcategoricalvariableisDaysofweekIthassevencategories

Page 23: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Reportsandconclusions

At0.0199,we reject the nullhypothesis that are births are equally distributedacross days in1999.We haveevidence that frequency of births differs acrossdays.

Prob

abili

ty d

ensi

ty

0.160.140.120.100.080.060.040.02

00

26

!5 10 15

!2 = 15.05

20

> 1 - pchisq(15.05, 6) [1] 0.01987137

Page 24: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Noteson𝟀2Goodness-of-fittestAssumptionsforall𝟀2 tests◦ Randomlysampleddatafrompopulation◦ Twoormorecategoriesofacategoricalvariable(dataiscounts)◦ Expected frequenciesmustbe>=1◦ Nomorethan20%ofexpectedfrequenciesare<5

Wetakeonly>=teststatisticforP-value◦ Generaltoall𝟀2tests

Page 25: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

𝟀2goodness-of-fitinR#### Prepare data: Observed counts and expected proportions ####> births <- c(33,41,63,63,47,56,47)> expected <- c(52,52,52,52,52,53,52)> expected <- expected/sum(expected)> expected[1] 0.1424658 0.1424658 0.1424658 0.1424658 0.1424658 0.1452055 0.1424658

> chisq.test(births, p = expected)

Chi-squared test for given probabilities

data: birthsX-squared = 15.057, df = 6, p-value = 0.01982

Page 26: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

BinomialispreferredfortwogroupsTempleUniversitystudentsare52%female,48%male.DoesthisclassreflecttheTemplestudentpopulation?

Wehave19students:7femalesand12males.

Page 27: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

BinomialP-valuesaremoreprecise> binom.test(7, 19, 0.52)

Exact binomial test

data: 7 and 19number of successes = 7, number of trials = 19, p-value = 0.251alternative hypothesis: true probability of success is not equal to 0.5295 percent confidence interval:0.1628859 0.6164221

sample estimates:probability of success

0.3684211

> chisq.test(c(7,12), p = c(0.52, 0.48))

Chi-squared test for given probabilities

data: c(7, 12)X-squared = 1.749, df = 1, p-value = 0.186

Page 28: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Pause:Goodnessoffitexercise

Page 29: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

ContingencytableanalysisTestforanassociationbetweentwo(ormore)categoricalvariables◦ Areheartattacksmorelikelyforpeoplewhotakeaspirindaily?◦ Aresmokersmorelikelytodrinkthannon-smokers?

Twoflavors:◦ 𝟀2testforindependence(orhomogeneity)◦ Fisher'sExacttest

Page 30: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Contingencytablesshowassociatedcountsfortwo+categoricalvariables

Takes dailyaspirin Nodailyaspirin

Heartattack 75 62

Noheartattack 108 71

Page 31: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Example:𝟀2testforindependence/associationLifecycleofR.ondatrae

Uninfectedfrog Infectedfrog

Eatenbybird 1 47

Noteatenbybird 49 44

2variables:Eaten (2categoriesyes/no)Infected (2categoriesyes/no)

Page 32: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Example:𝟀2testforindependence

Uninfected frog Infectedfrog TOTAL

Eatenbybird 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

H0 :Infectionandbeingeatenareindependent

HA:Infectionandbeingeatenarenotindependent

Page 33: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Computingtheteststatistic

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0

#k,okpqkn~,��l

�p

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Underthenullhypothesis,thevariablesareindependent.ExpectedcalculationsemployP[AandB]=P[A]xP[B]

P[eatenanduninfected]=P[eaten]xP[uninfected]=48/141x50/141=0.1207

Expectedcount=P[eatenanduninfected] xtotal=17.02… =(row/total)x(column/total)x(total)

Page 34: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Performingthetest

𝜒) = s s#𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑l,p − #𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p

)

#𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑l,p

l

p

= (5.5}.:))0

5}.:)+(zz.y:.{)

0

y:.{+(z{.yy.y)

0

yy.y+(z}.U:.))

0

U:.)=31.9

df =(#r– 1)(#c– 1)=(2– 1)(2– 1)=1

Uninfected Infected

Eaten 117.02 4430.9

Noteaten 4933.3 4760.2

1 – pchisq(31.9, 1)[1] 1.623172e-08

Werejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.

Page 35: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

PerformingthetestinR> data.table <- rbind(c(1,49), c(44,47)) > data.table

[,1] [,2][1,] 1 49[2,] 44 47

> chisq.test(data.table)Pearson's Chi-squared test with Yates' continuity correctiondata: data.tableX-squared = 29.809, df = 1, p-value = 4.768e-08

> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared testdata: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08

Thisiswhatwecalculatedonthelastslide("Rascalculator").Differencesarefromusingroundedexpectedcounts.

Page 36: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Yatescontinuitycorrection

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,�0

#k,okpqkn~,��l

�p

𝜒) = ∑ ∑ #ij?klmkn~,�.#k,okpqkn~,� .:.<0

#k,okpqkn~,��l

�p

Withoutcorrection

Yatescontinuitycorrection

DecreasestheteststatisticandincreasestheP-value

Page 37: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

OddsTheodds ofsuccessaretheprobabilityofsuccessdividedbyfailure

𝑂 = o5.o

Theoddsofbeingeatenwhileinfected

𝑂 = �[k�qk4�4nr4�kpqkn]5.�[k�qk4�4nr4�kpqkn]

= z}/{55.z}/{5

= 1.07

𝑂 = �[k�qk4�4nr4�kpqkn]�[4iqk�qk4�4nr4�kpqkn]

= z}zz= 1.07

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Page 38: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Oddsratio,for2x2tablesTheodds ratioistheoddsofsuccessinonegroupdividedbyoddsofsuccessinasecondgroup

𝑂𝑅 = o3 (5.o3)⁄o0 (5.o0)⁄

Interpretation◦ OR=1:Oddsofsuccessisthesameforeithergroup◦ OR<1:Oddsofsuccessingroup2arehigherthangroup1◦ OR>1:Oddsofsuccessingroup1arehigherthangroup2

ORsquantifythedeviationfromnullin2x2contingencytabletests.

Page 39: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02

𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑 Infectedfrogshave52.3 theoddsofbeingeaten

comparedtouninfectedfrogs.

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Page 40: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =4744 = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

1 − 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] =149 = 0.02

Infected frogshave52.3 theoddsofbeingeatencomparedtouninfected frogs.

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

𝑶𝑹 = 𝟏. 𝟎𝟕𝟎. 𝟎𝟐 = 𝟓𝟐. 𝟑

Page 41: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Oddsratiocalculations:Aretheoddshigherthatyouareeatenwhileinfected?

𝑂5 = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 1.07

𝑂) = 𝑃[𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑]

𝑃[𝑛𝑜𝑡𝑒𝑎𝑡𝑒𝑛𝑎𝑛𝑑𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑] = 0.02

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

𝑂5 = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑒𝑎𝑡𝑒𝑛] =

471 = 47

𝑂) = 𝑃[𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛]𝑃[𝑢𝑛𝑖𝑛𝑓𝑒𝑐𝑡𝑒𝑑𝑎𝑛𝑑𝑢𝑛𝑒𝑎𝑡𝑒𝑛] =

4449 = 0.899

𝑶𝑹 = 𝟒𝟕

𝟎. 𝟖𝟗𝟗 = 𝟓𝟐. 𝟑Infectedfrogshave52.3theoddsofbeingeatencomparedtouneaten frogs.

Eatenfrogshave52.3theoddsofbeinginfectedcomparedtouneatenfrogs.

Page 42: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

TherearetwowaystocalculateOROnewillbe>1(52.3)andonewillbe<1(1/52.3 =0.019)◦ Wegenerallyusethe>1option◦ Convinceyourselfthatthisistrue.

Funfact:𝑶𝑹 = 𝒂∗𝒅𝒃∗𝒄

= 𝟏∗𝟒𝟒𝟒𝟗∗𝟒𝟕

= 𝟎. 𝟎𝟏𝟗

Oftenwereportlogodds =ln(OR) > log(52.3) [1] 3.956996

Uninfected Infected TOTAL

Eaten a 1 c 47 48

Noteaten b 49 d 44 93

TOTAL 50 91 141

Page 43: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

CalculatingtheORstandarderror

𝑆𝐸 ln 𝑂𝑅 = 5�+ 5

j+ 5

p+ 5

n�

𝑆𝐸 ln 𝑂𝑅 = 55+ 5

z{+ 5

z}+ 5

zz� = 1.03

blah blah2

blob a c

blob1 b d

Uninfected Infected

Eaten 1 47

Noteaten 49 44

Page 44: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

CalculatingthelogoddsCI

𝒍𝒏(𝑶𝑹� ) − 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹� < 𝒍𝒏(𝑶𝑹) < 𝒍𝒏(𝑶𝑹)� + 𝒁𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝑶𝑹�

3.96– (1.96*1.03)<𝒍𝒏(𝑶𝑹)<3.96+(1.96*1.03)à 3.96± 2.02

Page 45: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Conclusions,withlogoddsWerejectthenullhypothesis(P<<α)thatinfectionandbeingeatenareindependent.Wehaveevidencethatbeinginfectedwiththistrematodeisassociatedwithbeingeatenbyabird.

Furthermore,frogsthatareeatenaremorelikelytobeinfectedcomparedtouneatenfrogs,withalogoddsratioof3.96andlogoddsCI of1.94– 5.98.

Page 46: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

𝟀2testforhomogeneityIndependence:measuretwoproperties fromonesetofsubjects◦ Wemeasuredeaten andinfection forfrogs

Homogeneity:measureoneproperty ontwosetsofsubjectsfromdifferentpopulations◦ Measureeffectofmedicine insampleofcancerindividualsandsampleofhealthyindividuals

Page 47: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Example:testofhomogeneityDrug Placebo

Cancer 75 62

Healthy 108 71

H0 :Theprobabilitythatsymptomsimproveisthesameforbothcancerandhealthygroups.HA:Theprobabilitythatsymptomsimprovediffersbetweencancerandhealthygroups.

Inpracticalterms,thisusestheexactsameprocedureasatestforindependence.

Page 48: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Fisher'sExacttestMoreexactthan𝟀2andusedforlow-counttablesComputetheexactprobabilityofobservingtablewithcounts:

Fisher'stestcomputesthisvalueforallpossibletables withthesamerow/columntotals(margins)ComputesP-valuebysummingprobabilitiesfortableswithasextremeormorecountdistributions

blah blah2

blob a c

blob1 b d

𝑃 𝑎, 𝑏, 𝑐, 𝑑 =𝑎 + 𝑏 ! 𝑐 + 𝑑 ! 𝑎 + 𝑐 ! 𝑏 + 𝑑 !

𝑛! 𝑎! 𝑏! 𝑐! 𝑑!

Page 49: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Fisher'sexacttest

Uninfected Infected TOTAL

Eaten 1 47 48

Noteaten 49 44 93

TOTAL 50 91 141

> chisq.test(data.table, correct=FALSE)Pearson's Chi-squared test

data: data.tableX-squared = 31.906, df = 1, p-value = 1.618e-08

> fisher.test(data.table)Fisher's Exact Test for Count Data

data: data.tablep-value = 8.37e-10alternative hypothesis: true odds ratio is not equal to 195 percent confidence interval:0.0005344122 0.1417331275

sample estimates:odds ratio0.02222648

ExactP-value

OurOR=52.3,or0.019.Slightdifferencesareexpectedbecausefisher.test()usesML

ApproximateP-value

Page 50: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Relativerisk:It'snottheORCommonlymeasuredinepidemiologicalstudies

Relativeriskistheprobabilityofanevent(ie disease)inanexposedgroup,relativetounexposedgroup◦ RR=P(eventwhenexposed)/P(eventwhennotexposed)

Page 51: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

RelativeriskexampleLungcancer

Nolungcancer

Smoker 525 450

Non-smoker

32 621

RR=P(eventwhenexposed)/P(eventwhennotexposed)

RRofcancerduetosmokingexposure:=P(cancer|smoker)/P(cancer|notsmoker)=[525/(525+450)]/[32/(32+621)]

=10.99

à Smokershavea10.99timeshigherriskthandonon-smokerstodeveloplungcancer.

Liveexercise:Calculatetheoddsratioforasmokerdevelopingcancerrelativetoanon-smoker.

Page 52: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

TheOddsRatioLungcancer

Nolungcancer

Smoker 525 450

Non-smoker

32 621

𝑂5 = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟]

𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑐𝑎𝑛𝑐𝑒𝑟] =52532

𝑂) = 𝑃[𝑠𝑚𝑜𝑘𝑒𝑟𝑎𝑛𝑑𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟]𝑃[𝑛𝑜𝑛 − 𝑠𝑚𝑜𝑘𝑒𝑟𝑛𝑜𝑐𝑎𝑛𝑐𝑒𝑟] =

450621

𝑂𝑅 = 525/32450/621 = 𝟐𝟐. 𝟔𝟒

à Smokershave22.64timestheoddsofgettinglungcancerthannon-smokers.

Page 53: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

What'sthepracticaldifference?Oddsratiosmeasuretheextentofassociationbetweenvariables.◦ Itistheratiooftwoodds(ratioofprob event:prob non-event)

Relativeriskisthemoreintuitivequantitythatwe"understand"◦ Itistheratiooftwoprobabilities(prob event)

Page 54: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

RecaponestimationNormally-distributed variable◦ 𝜇" = �̅�◦ 𝜎() = 𝑠)

◦ Knownσ

◦ 𝑆𝐸,̅ =64�

◦ 95%CI=�̅� ± 𝑍:.:)<𝑆𝐸◦ Unknownσ

◦ 𝑆𝐸,̅ =?4�

◦ 95%CI=�̅� ± 𝑡:.:)<𝑆𝐸

Binomially-distributedvariable

◦ �̂� = H4

◦ 𝑆𝐸o( = �̂�(1 − �̂�)/𝑛�

◦ 95%CI=�̂� ± 𝑍:.:)<𝑆𝐸o(

Log-Oddsratio

◦ 𝑙𝑜𝑔𝑂𝑅� = ln o(3 5.o(3⁄o(0 5.o(0⁄

◦ 𝑆𝐸¤i¥¦§� = 5�+ 5

j+ 5

p+ 5

n�

◦ 95%CI=𝑙𝑜𝑔𝑂𝑅� ± 𝑍:.:)<𝑆𝐸o(

Page 55: day6 testing proportions - sjspielman.github.io · Hypothesis testing frameworks t-testscompare means for continuous quantitative data Today we will learn to analyze discrete count

Chooseyourownadventure,sofar

(Orfisher'sexacttest)