HST 190: Introduction to Biostatistics

HST190:IntroductiontoBiostatistics

Lecture6:Methodsforbinarydata

1 HST190:IntrotoBiostatistics

Binarydata

• Sofar,wehavefocusedonsettingwhereoutcomeiscontinuous

• Now,weconsiderthesettingwhereouroutcomeofinterestisbinary,meaningittakesvalues1or0.§ Inparticular,weconsiderthe2x2contingencytable tabulatingpairsofbinaryobservations(𝑋#, 𝑌#), … , (𝑋(, 𝑌()

HST190:IntrotoBiostatistics2


• Considertwopopulations§ IVdruguserswhoreportsharingneedles

§ IVdruguserswhodonotreportsharingneedles

• Istherateofpositivetuberculinskintestequalinbothpopulations?§ Toaddressthisquestion,wesample40patientswhoreportand60patientswhodonottocompareratesofpositivetuberculintest

§ Datacross-classified accordingtothesetwobinaryvariables2x2table Positive Negative Total

Reportsharing 12 28 40

Don’treportsharing 11 49 60

Total 23 77 100

Chi-squaretestforcontingencytables


• TheChi-squaretestisatestofassociationbetweentwocategoricalvariables.

• Ingeneral,itsnullandalternativehypothesesare§ 𝐻*:therelativeproportionsofindividualsineachcategoryofvariable#1arethesameacrossallcategoriesofvariable#2;thatis,thevariablesarenotassociated (i.e.,statisticallyindependent).

§ 𝐻# :thevariablesareassociatedo Noticethealternativeisalwaystwo-sided

• Inourexample,thismeans§ 𝐻*:reportedneedlesharingisnotassociatedwithPPD


• TheChi-squaretestcomparesobservedcountsinthetabletocountsexpectedifnoassociation(i.e.,𝐻*)§ Expectedcountsareobtainedusingthemarginaltotals ofthetable.

• Recallindependencerule 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵),sofrom100people,assumingindependence,weexpect

𝑃 share ∩ positive = 𝑃 share 𝑃 positive =40100

23100 = 0.092

§ Then,we’dexpect0.092 100 = 9.2 positivesharers,insteadof12

2x2table Positive Negative TotalReportsharing 12 28 40

Don’treportsharing 11 49 60

Total 23 77 100


• Similarly,therewilllikelybesomediscrepancybetweenobservedandexpectedcountsfortheotherthreecellsinthetable.§ Chi-squaretestassesses:arethesedifferencestoolargetobetheresultofsamplingvariability?

• StepsofChi-squaretest1) Completetheobserved-datatable

2) Computetableofexpectedcounts

3) Calculatethe𝑋A statistic

4) Getp-valuefromthechi-squaretable

• Thismethodisvalidonlyifallexpectedcounts≥5§ testreliesonapproximationthatdoesnotholdinsmallsamples


1) Completeobserveddatatable

2) Completetableofexpectedcounts

𝐸CD =𝑂C⋅×𝑂⋅D𝑛 =

(𝑂C# + 𝑂CA)(𝑂#D + 𝑂AD)𝑛

3) Calculatechi-squareteststatistic

𝑋A = ∑observed − expected A

expected

=𝑂## − 𝐸## A

𝐸##+

𝑂#A − 𝐸#A A

𝐸#A+

𝑂A# − 𝐸A# A

𝐸A#+

𝑂AA − 𝐸AA A

𝐸AA§ swap𝑂CD − 𝐸CD with 𝑂CD − 𝐸CD − 0.5 forYatescontinuitycorrection

O11 O12 O1.O21 O22 O2.O.1 O.2 n

E11 E12 E1.E21 E22 E2.E.1 E.2 n


4) Getp-valuefromchi-squaredistribution§ Undernullhypothesis𝐻*:noassociationbetweenthetwofactors,the𝑋A statisticfollowsachi-squaredistributionwith1degreeoffreedom.Thisisoftenwrittenas𝑋A~𝜒#A

o continuousandpositive-valued,definedbyoneparameterdf

§ p-valuecomesfromrighttail,butisinherently‘two-sided’o matlab: 1-chi2cdf(x,1)

𝜒#,*.STA = 3.84Area= 0.05


• Thus,atthe𝛼 level,𝐻* isrejectedif𝑋A > 𝜒#,#YZA

• Using2x2contingencytable,analternateformulaforthe

Yatescorrectedteststatisticis𝑋A =( [\Y]^ Y_`

`

([a])(^a\)([a^)(]a\)

𝑋A =100 12(49) − 28(11) − 50 A

(40)(60)(23)(77) = 1.24 < 3.84 = 𝜒#,*.STA

• ⇒ Failtoreject𝐻* 2x2tablePositive

Negative Total

Reportsharing 𝑎 = 12 𝑏 = 28 𝑎 + 𝑏

= 40Don’treport

sharing 𝑐 = 11 𝑑 = 49 𝑐 + 𝑑= 60

Total 𝑎 + 𝑐= 23

𝑏 + 𝑑= 77 𝑛 = 100

Fisher’sexacttest


Whathappensifallexpectedcounts<5?Insteadofchi-squaretest,useaFisher’sexacttest (seeRosner10.3)

• Likethechi-squaretest,Fisher'sexacttestexaminesthesignificanceoftheassociation(contingency)betweenthetwokindsofclassification– rowsandcolumns.

• Bothrowandcolumntotals(a+c,b+d,a+b,c+d)areassumedtobefixed- notrandom.

• Wethenconsiderallpossibletablesthatcouldgivetherowandcolumntotalsobservedandcorrespondingprobabilityofeachconfiguration(ithelpstorealizethatthefirstcount,a,hasahypergeometricdistributionunderthenull)

• Finally,thep-valuesarecomputedbyaddinguptheprobabilitiesofthetablesasextremeormoreextremethantheobservedone.

Whatifweareinterestedinavariablethathasmorethantwocategories?

Example: Testforassociationbetweeneyecolorandpresenceorabsenceofamutantalleleatsomegeneticlocus.

Eyecolorcategories:blue,green,brown,hazel,gray

Geneticcategories:0copiesmutantallele,

≥1 copymutantallele

11

Chi-squaretestforcontingencytables,RxC

Thechi-squaretest canbeusedforvariableswithmorethantwocategories.DatapresentedinanRxC table,ageneralizationofthe2x2table:

R =#rows,C =#columns(doesn’tmatterwhichvariableiswhich)

12

blue green brown hazel gray TotalMutantallele

absent 3 7 21 15 15 61

Mutantallelepresent 6 10 18 14 17 65

Total 9 17 39 29 32 126

Chi-squaretestforRxC tablesame asfor2x2tableexcept:

• Thismethodcanonly beusedifnomorethan1/5ofcellshaveexpectedcount<5ANDifnocellhasexpectedcount<1.

• UnderH0,theX2 teststatisticfollowsachi-squaredistributionon(R-1)(C-1)degreesoffreedom

13

𝑋A = jkkYlkk `

lkk+ jk`Ylk` `

lk`+ …+ jmnYlmn `

lmn

𝑋A~𝜒(oY#)(pY#)A

Again,wehavetoobtainmarginaltotalstodetermineexpectedcountforeachcell.Forexample…

Theexpectedcountswouldbecalculatedasfollows

blue green brown hazel gray TotalMutantallele

absent 4.36 8.23 18.88 14.04 15.49 61

Mutantallelepresent 4.64 8.77 20.12 14.96 16.51 65

Total 9 17 39 29 32 126

14

E11=q#rS#Aq

= 4.36,… , ERC =qTrsA#Aq

= 16.51

• UnderH0,𝑋A~𝜒tA

•

15

X 2 =3− 4.36( )

2

4.36+

7 −8.23( )2

8.23+!+

17 − 16.51( )2

16.51 = 1.80

MATLAB:1-chi2cdf(1.8,4)p-value=0.77

Conclusion:Noevidenceforassociationbetweeneyecolorandmutantalleles.


Whatifweareinterestedinestimatingandquantifyinguncertaintyaboutthedifferenceinproportionsbetweentwogroups?

• e.g.,wantestimateandCIofdifferenceinproportionsofpositivetuberculosisskintestsbetweenneedlesharersandnon-sharers

Approachissimilartotwo-sampleestimationforcontinuousdataquestions,withsubtledifferences!

Two-samplecomparisonofproportions

Two-samplecomparisonofproportions


• Whereaswehavepreviouslyconsideredthedifferenceinmeansofcontinuoustwo-sampledata,wenowcomparetwopopulations’unknownproportions𝑝# and𝑝A.

• Supposewewanttoknowwhethertwocommunitieshavethesameobesityrate.§ Youdrawrandomsamplesfromboth;inthefirstcity,20outof100areobese,whileinthesecond24outof150areobese.

• Goals:§ estimateandcomputethe95%C.I.forthedifferenceinproportions

§ conductasignificancetestatlevel𝛼 = 0.05 foradifference


• Before,wesawthatifarandomexperimenthastwopossibleoutcomes,“success”and“failure”,andwedo𝑛independentrepetitionswithidenticalsuccessprobability𝑝,then𝑋~Bin(𝑛, 𝑝) isthenumberofsuccesses.§ Now,weobserve𝑋#~Bin(𝑛#, 𝑝#) andXA~Bin(𝑛A, 𝑝A) andthenmakeinferenceabout𝑝# − 𝑝A.

• Estimationisidenticaltotwo-samplecontinuouscase:differenceofsampleproportions, �̂�# − �̂�A

• If𝑛#�̂�# 1 − �̂�# ≥ 5 and𝑛A�̂�A 1 − �̂�A ≥ 5,theassociated100 1 − 𝛼 % CIgivenby

�̂�# − �̂�A ± 𝑧#YZA�̂�#(1 − �̂�#)

𝑛#+�̂�A(1 − �̂�A)

𝑛A

�


• Forexample,considertwosamples

§ 𝑛# = 100, 𝑋# = 20, �̂�# =A*#**

= 0.20, 𝑛#�̂�# 1−�̂�# = 16 ≥ 5

§ 𝑛A = 150, 𝑋A = 24, �̂�A =At#T*

= 0.16, 𝑛A�̂�A(1−�̂�A) = 20.16 ≥ 5

• Thenthe95%CIforthedifferenceis

= (0.20 − 0.16) ± 1.960.2(0.8)100 +

0.16(0.84)150

�

= 0.04 ± 1.96 0.050 = 0.04 ± 0.10 = −0.06, 0.14

Hypothesistestingfordifferenceofproportions


• Now,consider𝐻*:𝑝# = 𝑝A versus𝐻#:𝑝# ≠ 𝑝A§ Under𝐻*,wecanpoolthetwosamplestocalculatestandarderror,

letting�̂� = (k��ka(`��`(ka(`

• ThenIf𝑛#�̂�# 1 − �̂�# ≥ 5 and𝑛A�̂�A 1 − �̂�A ≥ 5,under𝐻*weformtheZ-teststatistic

𝑍 =�̂�# − �̂�A

�̂�(1 − �̂�) 1𝑛#+ 1𝑛A

�

• IthasanapproximateN(0,1)distributionwhenthenullistrue.


• Continuingthesameexample,

§ 𝑛# = 100, 𝑋# = 20, �̂�# =A*#**

= 0.20, 𝑛#�̂�# 1−�̂�# = 16 ≥ 5

§ 𝑛A = 150, 𝑋A = 24, �̂�A =At#T*

= 0.16, 𝑛A�̂�A(1−�̂�A) = 20.16 ≥ 5

§ �̂� = �ka�`(ka(`

= A*aAt#**a#T*

= 0.176

• Teststatisticisthen

𝑧 =�̂�# − �̂�A

�̂�(1 − �̂�) 1𝑛#+ 1𝑛A

�=

0.20 − 0.16

0.176(0.824) 1100 +

1150

�

= 0.81

• FromtableorMATLAB,𝑃 𝑍 > 0.81 = 0.21,sop-valueis2 0.21 = 0.42 > 0.05 ⇒ donotrejectH*

Chi-squaretestsforcontingencytablesallowustotestforassociation betweentwocategoricalvariables.

“Istherestatisticalevidenceofanassociationbetweendailyaspirinandpepticulcerdisease?”

Howdoweestimatethemagnitudeoftheassociation betweentwocategoricalvariables?

“Howmuchhigheristherateofpepticulcerdiseaseamongdailyaspirinusers?”

22

Oddsratioandrelativerisk


• Considertwocategoricalvariables:§ “disease”vs“nodisease”

§ “exposure”vs“noexposure”

• “Exposure”couldbetreatment,riskfactor,orotherfactor§ noassumptionsaboutincreasesordecreasesdiseaserisk

• Prospectivestudy:Supposefornowthatweenrollpatientsbasedonexposurestatus(vs.basedondiseasestatus)§ e.g.,100smokersand100nonsmokers

MeasuresofEffectforCategoricalData


Afterwesampleaspecifiednumberofexposedandunexposedindividuals,weclassifythembydiseasestatusasshownbelow

Threewaystoquantifymagnitudeofassociation:

1. Riskdifference(RD)=sameasdifferenceofproportions

2. Relativerisk(RR)or‘riskratio’

3. Oddsratio

Exposure

Disease+ -

+ a b a+b- c d c+da+c b+d n

RiskDifference =p1 – p2,where

p1 =P(disease|exposed)

p2 =P(disease|unexposed)

estimated Risk Difference =aa + b

−cc + d

25 *

RiskDifference

Exposure

Disease+ -


RelativeRisk(RiskRatio) = 1

2

pp

estimated Relative Risk =

aa + b

!

"#

$

%&

cc + d

!

"#

$

%&

26 *

Exposure

Disease+ -


RiskRatio

Supposethatyouenroll100smokersand100nonsmokersinyourstudy:

smoke

disease+ -

+ 30 70 100- 15 85 100

45 155 200

15010015

10030 difference Risk .=-=

2100

15100

30 risk Relative ==

27

RiskDifferencevs.Ratio

Complicatingfactors


Measuring“effectsize”:Whyitgetsmorecomplicated?

• Time§ Weoftenmeasurerateratioinsteadofariskratio

§ Moreonthisaspectwhenwediscusssurvivalanalysis

• EffectModificationandConfounding§ Ourestimatestypicallyneedtobeadjustedforotherfactors

• Sampling§ Dependingonhowyouenrollpatientsinyourstudy,itmaynotbepossibletoestimateariskdifferenceorriskratioeveninprinciple

Suppose you conduct a case-control study by enrolling 100 patients with disease and 100 without, and then determine which have smoked:

29

RiskDifferencevs.Ratio

smoke

disease+ -

+ 25 10 35- 75 90 165

100 100 200

• Can’testimatep1 &p2 ifyoupre-specifythenumberofsubjectswithdiseaseà can’testimateRDorRR.

• Needtoknowhowdatainyourtableweresampled!

Retrospectivesampling


• Acase-controlstudy(orretrospectivestudy)samplespatientsbasedondiseasestatus,thenclassifiesaccordingtoexposure

§ oftenperformedforcostandefficiency,particularlywhenthediseaseoroutcomeisrarenoneedtofollowsubjectsthroughentirelifetimeandcollecthugesamples

• Case-controlstudiesareoftenperformedforcostandefficiency,particularlywhenthediseaseoroutcomeisrare– noneedtofollowsubjectsthroughtheirentirelifetimeandcollecthugesamples.

• Thereisameasureofeffectsizethatcanbecomputedregardlessofwhetherpatientsareenrolledbasedonexposurestatusordiseasestatus…

Odds


• If𝑝 = 𝑃(event),thendefineoddsoftheeventas �#Y�

§ Probability = 0.2 ⇒ Odds = 0.25

§ Probability = 0.5 ⇒ Odds = 1

§ Probability = 0.75 ⇒ Odds = *.�T*.AT

= 3

§ Probability = 0.99 ⇒ Odds = *.SS*.*#

= 99

• Oddscanrangefrom0toinfinity§ Whenwerandomlysamplepatientsbasedonexposurestatus,wecanestimate𝑃(disease|exposed) and𝑃(disease|unexposed)

§ Ifweinsteadperformacase-controlstudy,wecan’t.Wecanonlyestimate𝑃(exposed|disease) and𝑃(exposed|nodisease)

Oddsratio


Imagineatableshowingallindividualsinthepopulation(thetableyou“wish”youcouldsee)

Let𝑝# = 𝑃(disease|exposed) and𝑝A = 𝑃(disease|unexposed),thentheratioofbothexposure groupsʼoddsofdisease is:

OR =OddsofdiseaseforexposedOddsofdiseaseforunexposed

=𝑝# (1 − 𝑝#)⁄𝑝A (1 − 𝑝A)⁄

=𝑎/(𝑎 + 𝑏)𝑏 (𝑎 + 𝑏)⁄

𝑐/(𝑐 + 𝑑)𝑑 (𝑐 + 𝑑)⁄�

=𝑎𝑑𝑐𝑏

Exposure

Disease+ -


Oddsratio


Imagineatableshowingallindividualsinthepopulation(thetableyou“wish”youcouldsee)

Ifweinsteadconsider𝑃(exposed|disease)and𝑃(exposed|nodisease),thentheratioofboth disease groupsʼoddsofexposure is:

OR =Oddsofexposurefordiseased

Oddsofexposurefornondiseased

=𝑎/(𝑎 + 𝑐)𝑐 (𝑎 + 𝑐)⁄

𝑏/(𝑏 + 𝑑)𝑑 (𝑏 + 𝑑)⁄�

=𝑎𝑑𝑐𝑏

Therefore,theORisameasureofassociationthatisnumericallyidenticalineitherstudydesign.

Exposure

Disease+ -


0.0 0.2 0.4 0.6 0.8

02

46

8

p

p/(1

− p

)

𝑝1 − 𝑝

𝑝


• Therefore,samplingbyexposure,estimating𝑝# and𝑝A,andcomputingoddsratioisestimatingthesamequantityasestimatingtheoddsratio(of“exposureprobabilities”)inacase-controlstudy.

• SowhatifRRisofinterest?§ Ifdiseaseisrare,𝑝#, 𝑝A smallso𝑝

1 − 𝑝 ≈ 𝑝forsmall𝑝and

1 − 𝑝#1 − 𝑝A

≈ 1 ⇒

OR = �k #Y�k⁄�` #Y�`⁄ ≈ �k

�`= 𝑅𝑅

ORapproximatesRRforrareoutcome

Takeaways


• CannotestimateRRandRDinacase-controlstudy(unlessyouhaveadditionaldata).

• Canestimateoddsratiofromeither“prospective”orcase-controlstudy,andweestimateitthesamewayineitherone.

• OddsratioapproximatesRRforraredisease.

Interpretingoddsratio


• Difficulttogivean“everyday”interpretationofwhattheoddsratio’sprecisevaluemeans

• 𝑂𝑅 > 1 → exposureassociatedwithhigherdiseaserisk

• 𝑂𝑅 < 1 → exposureassociatedwithlowerdiseaserisk

• 𝑂𝑅 = 1 → noassociationofexposureanddiseasestatus

Inferenceonoddsratio


• ToperformhypothesistestorgenerateCIforOR,we

1) ComputelogarithmofestimatedOR[ln(OR)]

2) Makeinferenceonln(OR)

3) TranslateconclusionsintostatementsaboutOR

• WhythelogoftheOR?

§ Thesamplingdistributionofln(OR)approximatesnormaldistributionmorecloselythanthatofORitself

o Hence,methodsbasedonnormalapproximationworkbetterforln(OR)

§ Toseethis,comparesamplingdistributionsofORvs.ln(OR):onthenextslidewesimulateapopulationwithfixedratesofexposureanddisease.Forthreedifferentsamplesizes,werandomlydraw1,000samplesandcomputeORandln(OR)foreach


38

CodetorecreateinMatlab


Sample_Size = [50,200,1000]; % Define the sample sizesProb1 = 0.75; Prob2 = 0.5;% Set the binomial probabilities for X and Ufigure;

for i=1:length(Sample_Size)X = binornd(1,Prob1,Sample_Size(i),10000); % Generate 10,000 trials

of XU = binornd(1,Prob2,Sample_Size(i),10000); % Generate 10,000 trials

of U

OR = (sum(X,1).*(sum(1-U,1)))./(sum(U,1).*(sum(1-X,1))); % Calculate the Odds Ratio

LOR = log(OR); % Calculate the log of the Odds Ratio

subplot(length(Sample_Size),2,2*i-1); hist(OR,20); xlim([min(OR) max(OR)]); xlabel('Odds Ratio'); ylabel(['Sample Size ' num2str(Sample_Size(i))]) % Plot the Odds Ratio

subplot(length(Sample_Size),2,2*i); hist(LOR,20); xlim([min(LOR) max(LOR)]); xlabel('Log Odds Ratio'); % Plot the Log Odds Ratioend

suptitle('Odds Ratio Demonstration'); % Set the title for the figure

ConfidenceintervalforOR


• Iftheexpectedcountineachcellofthe2x2tableis≥5,thenthesampleestimateofthetruepopulationln(OR)approximatelyfollowsthedistribution

ln(OR)� ~𝑁 ln OR ,1𝑎 +

1𝑏 +

1𝑐 +

1𝑑

• Anotherwayofwritingthisresultis

Var 𝑂𝑅� ≈1

𝑛#�̂�#(1 − �̂�#)+

1𝑛A�̂�A(1 − �̂�A)

Exposure

Disease+ -

+ a b a+b- c d c+d

a+c b+d n


• Therefore,togeta100(1 − 𝛼)% CIforthepopulationORweuseatwo-stepprocess:

1) CIforln OR :ln OR� ±𝑧#Y�`#[+ #

]+ #

^+ #

\� = (𝑐#, 𝑐A)

2) CIforOR:(𝑒^k, 𝑒^`)

• Importantly,theCIisnotsymmetricaroundestimatedOR


• Consideranoutbreakofgastroenteritisinaschoolfollowinglunch.263studentsatelunchincafeteriathatday.Sandwichessuspected§ Howstrongistheassociation,ifany,betweenconsumptionofthesandwichandillness?Providea95%CIfortheoddsratio

§ OR = [\]^= #*S st

t(##q)= 7.99 ⇒ ln(OR� ) = ln 7.99 = 2.078

§ Step1:2.078 ± 𝑧#Y�`##*S

+ ###q

+ #t+ #

st� = (1.01,3.146)

§ Step2:95%CIforOR𝒆𝟏.𝟎𝟏, 𝒆𝟑.𝟏𝟒𝟔 = (𝟐. 𝟕𝟓, 𝟐𝟑. 𝟐)§ BecauseCIdoesnotcontain1,rejectnullofnoassociationat0.05level

Atesandwich? Ill?

Yes NoYes 109 116 225No 4 34 38

113 150 263

Multiple2x2tables


• Whatifwehaveaconfoundingvariableassociatedwithexposureandoutcome,suchthatthereareseveral2x2tables,eachcorrespondingtooneleveloftheconfoundingvariable?

• Canwepoolthecountsinthetablesintoonetable?§ Notsofast.Thiscanseriouslybiasourresults…


• Forexample,PercutaneousNephrolithotomy(PN)wascomparedwithseveralotherprocedures,classifiedas“open”procedures(OP),fortreatmentofrenalcalculi

• Percutaneoustreatmentclearlylookssuperior;theestimatedoddsratioforsuccessbasedonhaving(vs.nothaving)percutaneoustreatmentis

OR =289 7761(273) = 1.33 > 1

Successful UnsuccessfulPN 289 61 350OP 273 77 350

562 138 700

289/350=0.826chanceofsuccessforPN273/350=0.780chancesuccessesforOP


• However,ifresultsarestratifiedbasedonstonesize,percutaneoustreatmentlooksworse!

§ Largestones:OR = TT �#AT(#SA)

= 0.81 < 1

§ Smallstones:OR = Ast qsq(ª#)

= 0.48 < 1

Suc. Unsuc.PN 289 61 350OP 273 77 350

562 138 700LargestonesSuc. Unsuc.

PN 55 25 80OP 192 71 263

247 96 343

SmallstonesSuc. Unsuc.

PN 234 36 270OP 81 6 87

315 42 357


• Percutaneoustreatmentisassociatedwithhighersuccessrate(OR>1)overall,yetwithlowersuccessrate(OR<1)foreachtypeofstoneseparately§ Howisthatpossible?

• Thisistheresultofconfounding byafactorassociatedwithboththetreatmentandtheoutcome(whatisit?)§ PNwasusedmostlyforsmallstones,whichhadahighersuccessrateingeneral(88%).OP’swereusedmostlyforlargestones,whichhadlowersuccessrates(72%)

§ Poolingthedataallowedthestone-sizeeffecttomaskthedifferenceintreatmenteffectiveness

• Confoundingmayoccurwheneverthereisafactorthatisassociatedwithbothtreatmentassignmentandoutcome§ ConfoundingleadingtotheoppositeconclusioninaggregateddataiscalledSimpson’sParadox(or EcologicalFallacy).


• Nostatisticalprocedure“automatically”protectsyoufromconfounding.Adjustmentforconfoundingrequiresunderstandingofthescience

• Afterastudyisconducted,certainstatisticaltechniquescanbeusedtoadjustforit(discussedovernexttwolectures)§ Stratification

§ Matching

§ (Logistic)Regressionadjustment

Stratification


• Ifyoustratifydataintomultiple2x2tables(strata)basedonaconfounder,andbelievetheyshareacommonOR,youcanestimatethisORusingtheMantel-Haenszel Method(MH)

• Thismethodisvalidiftherelationshipbetweenexposureanddiseaseisthesameineachstratum(eventhoughbaselineriskmaydiffer)§ Iftherelationshipisnotthesameineachstratum,thenitdoesnotmakesensetocombinethedatafordoinginference

• Followtwosteps:1) TestwhethertheOR’sarethesameineachstratum

2) Ifso,proceedwithinferenceforthecommonOR,usingallthetables

Chi-squaretestforhomogeneity


• ToseeiftheOR’sarethesameineachstratum,weusethechi-squaretestforhomogeneity

• Given𝑘 strata(tables),wetestthehypotheses§ 𝐻*:OR# = ORA = ⋯ = OR (homogeneity)

§ 𝐻#: atleastoneoftheOR’sisdifferent

• Teststatisticis𝑋¯°±A = ∑ 𝑤DD³# ln OR� D − ln OR

A

§ 𝑤D =#[´+ #

]´+ #

^´+ #

\´

Y#, ln OR =

∑ µ´¶´·k ¸¹ °º� ´

∑ µ´¶´·k

§ Underthenull,𝑋¯°±A ~𝜒Y#A

• Ifwereject𝐻*,stophere.Otherwise,estimatecommonOR


• InRenalcalculiexample,testofhomogeneitybystonesize

§ Largestones:ln OR # = ln TT �#AT #SA

= −0.206

o 𝑤# =#TT+ #

AT+ #

#SA+ #

�#

Y#= 12.91

§ Smallstones:ln OR A = ln Ast qsq ª#

= −0.731

o 𝑤A =#Ast

+ #sq+ #

ª#+ #

q

Y#= 4.74

§ ln(OR) = #A.S# Y*.A*q at.�t(Y*.�s#)#A.S#at.�t

= −0.347

𝑋¯°±A = 12.91 −0.206 + 0.347 A + 4.74 −0.731 + 0.347 A

= 0.956 < 3.84 = 𝜒#,*.STA

§ Wefailtorejectthenullthattheoddsratiosdiffer,andcontinue

Mantel-Haenzel oddsratioestimator


• Ifweconcludehomogeneityacrossstrata,thentheMantel-Haenszel Estimator ofthecommonOddsRatio is

OR ±¯ =∑ 𝑎D𝑑D/𝑛DD³#

∑ 𝑏D𝑐D/𝑛DD³#

• WecannowusehypothesistestsandconfidenceintervalsforthecommonOR(viatheln(OR)).First,checkthat

§ ∑ (𝑎D + 𝑐D)(𝑎D + 𝑏D)/𝑛DD³# ≥ 5

§ ∑ (𝑎D + 𝑐D)(𝑐D + 𝑑D)/𝑛DD³# ≥ 5

§ ∑ (𝑏D + 𝑑D)(𝑎D + 𝑏D)/𝑛DD³# ≥ 5

§ ∑ (𝑏D + 𝑑D)(𝑐D + 𝑑D)/𝑛DD³# ≥ 5


• Undertheseconditions,the100(1 − 𝛼)% CIforln(OR)is

ln OR »¼ ± z#YZA¾𝑤D

D³#

Y#A

= (𝐿, 𝑈)

§ Where𝑤D =#[´+ #

]´+ #

^´+ #

\´

Y#

• TheCIfortheORisthen 𝑒Á, 𝑒Â

HypothesistestingforMH


• Finally,wemaywishtotestnullhypothesisofnoassociationbetweentwovariables,controllingforacofounder:𝐻*: OR = 1versus𝐻#: OR ≠ 1

• Todothetest,weneedtocalculate3quantities:§ 𝑂 = ∑ 𝑂D

D³# = ∑ 𝑎DD³#

§ 𝐸 = ∑ 𝐸DD³# = ∑ ([á]´)([á^´)

(´D³#

§ 𝑉 = ∑ 𝑉DD³# = ∑ ([á]´)(^á\´)([á^´)(]á\´)

(´`((Ý#)

D³# (mustbe≥ 5)

• 𝑋±¯A = jYl Y*.T `

Ä,whichfollows𝜒#A distributionif𝐻* true


• Returningtorenalcalculiexample,

OR ±¯ =55 71343 +

234 6357

25 192343 +

36 81357� = 0.69

§ compromisebetweentwostratum-specificORs(0.81and0.48)

• Tocompute95%CI,firstverifytheconditionsgivenpreviously(theyaremessytoshow,butinthiscasemet)

ln OR ±¯ ± 𝑧#YZA1/ 12.91 + 4.74� = −0.84,0.10

• Thus,95%CIforORis 𝑒Y*.ªt, 𝑒*.#* = (0.43,1.10)

HST 190: Introduction to Biostatistics

Documents

Transcript of HST 190: Introduction to Biostatistics