Two Populations

8/18/2019 Two Populations

1/41

Copyright ©2009 Pearson Education. Inc.

7. Comparison of Two Groups

Goal: Use CI and/or significance test to copare

eans !"uantitati#e #aria$le%

proportions !categorical #aria$le%

Group & Group 2 Estiate

Population ean

Population proportion

'e conduct inference a$out the difference $et(een the eans

or difference $et(een the proportions !order irrele#ant%.

1 2 2 1

1 2 2 1

ˆ ˆ

y y µ µ

π π π π

−

−


2/41


3/41


4utcoe easure: ean response tie for a

su$5ect o#er a large nu$er of trials

) Purpose of study: *naly6e (hether !conceptual%

population ean response tie differs significantly for

the t(o groups+ and if so+ $y ho( uch.

) ata

Cell7phone group: 8 .2 illiseconds+ s1 = 9.-

Control group: 8 .;+ s2 8 -..

hape< 4utliers<

1 y2 y


4/41



5/41


Types of variales and samples

) =he outcoe #aria$le on (hich coparisons areade is the response variale.

) =he #aria$le that defines the groups to $e copared is

the explanatory variale.

Example: Reaction time is response #aria$le

Experimental group is eplanatory #aria$le77 a categorical #ar. (ith categories: !cell7phone+ control%

4r+ could epress eperiental group as

>cell7phone use? (ith categories !yes+ no%


6/41


) ifferent ethods apply for

independent samples 77 different saples+ no

atching+ as in this eaple and in >cross7sectionalstudies?

dependent samples 77 natural atching $et(een each

su$5ect in one saple and a su$5ect in other saple+such as in >longitudinal studies+? (hich o$ser#esu$5ects repeatedly o#er tie

Example: 'e later consider a separate eperient in(hich the same subjects fored the control group atone tie and the cell7phone group at another tie.


7/41


se for difference etween two estimates

!independent samples"

) =he sapling distri$ution of the difference $et(een t(oestiates is approximately normal !large n1 and n2 % and hasestiated

Eaple: ata on >@esponse ties? has

2 using cell phone (ith saple ean .2+ s 8 9.- 2 in control group (ith saple ean .;+ s 8 -.

'hat is se for difference $et(een saple eans of

.2 A .; 8 &.,<

2 2

1 2( ) ( ) se se se= +


8/41


9/41


C# comparing two proportions

) @ecall se for a saple proportion used in a CI is

) o+ the se for the difference $et(een t(o saple proportions for

independent saples is

) * CI for the difference $et(een population proportions is

*s usual+ ! depends on confidence le#el+ &.9- for 9 confidence

ˆ ˆ(1 ) / se nπ π = −

2 2 1 1 2 21 2

1 2

ˆ ˆ ˆ ˆ(1 ) (1 )( ) ( ) se se se

n n

π π π π − −= + = +

1 1 2 22 1

1 2

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ( ) z

n n

π π π π π π

− −− ± +


10/41Copyright ©2009 Pearson Education. Inc.

Example: College *lcohol tudy conducted $y

ar#ard chool of Pu$lic ealth!http://(((.hsph.har#ard.edu/cas/%

=rends o#er tie in percentage of $inge drin1ing!consuption of or ore drin1s in a ro( for en and , orore for (oen+ at least once in past t(o (ee1s%

and of acti#ities perhaps influenced $y it<

>a#e you engaged in unplanned seual acti#ities$ecause of drin1ing alcohol



) Estiated change in proportion saying >yes? is

0.2& A 0.&92 8 0.02&.

9 CI for change in population proportion is

0.02& D &.9-!0.00-% 8 0.02& D 0.0&&+ or roughly!0.0&+ 0.0%

'e can $e 9 confident that the populationproportion saying >yes? (as $et(een a$out 0.0&

larger and 0.0 larger in 200& than in &99.

1 1 2 2

1 2

ˆ ˆ ˆ ˆ(1 ) (1 ) (.192)(.808) (.213)(.787)0.005612,708 8783 se n n

π π π π − −= + = + =



Coents a$out CIs for difference $et(een

t(o population proportions

) If 9 CI for is !0.0&+ 0.0%+ then 9 CI

for is !70.0+ 70.0&%.

It is ar$itrary (hat (e call Group & and Group 2 and(hat the order is for coparing the proportions.

)'hen 0 is not in the CI+ (e can conclude that onepopulation proportion is higher than the other.

!e.g.+ if all positi#e #alues for Group 2 A Group &+ then conclude

population proportion higher for Group 2 than Group &%

2 1π π −

1 2π π −



) 'hen 0 is in the CI+ it is plausi$le that the populationproportions are identical.

Example: uppose 9 CI for change in population proportion

!200& A &99% is !70.0&+ 0.0%>9 confident that population proportion saying yes (as

$et(een 0.0& smaller and 0.0 larger in 200& than in &99.?

) =here is a significance test of 0: & 8 2 that the population

proportions are identical!i.e.+ difference & 7 2 8 0%+ using test statistic

! = !difference $et(een saple proportions%/se

For unplanned se in &99 and 200&+! = diff./se 8 0.02&/0.00- 8 .;

=(o7sided P7#alue 8 0.0002

=his sees to $e statistical significance (ithout practicalsignificance"



etails a$out test on pp. &97&90 of tet use se#

(hich pools data to get $etter estiate of se under 0

!'e study this test as a special case of >chi7s"uared

test? in net chapter+ (hich deals (ith possi$ly anygroups+ any outcoe categories%

) =he theory $ehind the CI uses the fact that sapleproportions !and their differences% ha#e approiatenoral sapling distri$utions for large nHs+ $y theCentral 3iit =heore+ assuing randoi6ation%

) In practice+ forula (or1s o1 if at least &0 outcoes ofeach type for each saple !Bote: 'e donHt use t dist. forinference a$out proportions ho(e#er+ there are speciali6edsall7saple ethods+ e.g.+ using $inoial distri$ution%



uantitati#e @esponses:

Coparing Jeans

) Paraeter: µ 2 7 µ &

) Estiator:

) Estiated standard error:

A apling dist.: *pproiately noral !large n$s% $y C3=%

A CI for independent rando saples from t&o normal

population distributions has for

A Forula for df for t 7score is cople !later%. If $oth saple

si6es are at least 0+ can 5ust use !'score

2 1 y y− 2 21 2

1 2

s s se

n n= +

( ) ( )

2 2

1 22 1 2 1

1 2 ( ), which is

s s

y y t se y y t n n− ± − ± +



Example: G data on >nu$er of close friends?

Use gender as the eplanatory #aria$le:

,- feales (ith ean .+ s 8 &.-

, ales (ith ean .9+ s 8 &.

Estiated difference of .9 A . 8 0.- has a argin

of error of &.9-!&.09% 8 2.&+ and 9 CI is

0.- D 2.&+ or !7&.+ 2.;%.

1 1 1

2 2 2

2 2 2 2

1 2

/ 15.6 / 486 0.708

/ 15.5 / 354 0.824

( ) ( ) (0.708) (0.824) 1.09

se s n

se s n

se se se

= = =

= = =

= + = + =



) 'e can $e 9 confident that the population ean nu$erof close friends for ales is $et(een &. less and 2.; orethan population ean nu$er of close friends for feales.

) 4rder is ar$itrary. 9 CI coparing eans for feales Aales is !72.;+ &.%

) 'hen CI contains 0+ it is plausi$le that the difference is 0 inthe population !i.e.+ population eans e"ual%

) ere+ noral population assuption clearly #iolated. Forlarge n$ s+ no pro$le $ecause of C3=+ and for sall n$ s the

ethod is ro$ust. !Kut+ eans ay not $e rele#ant for #eryhighly s1e(ed data.%

) *lternati#ely could do significance test to find strength ofe#idence a$out (hether population eans differ.



$ignificance Tests for % & '

) =ypically (e (ish to test (hether the t(o populationeans differ

!null hypothesis $eing no difference+ >no effect?%.

) ( 0: µ 2 7 µ & 8 0 ! µ & 8 µ 2%

) ( a: µ 2 7 µ & ≠ 0 ! µ & ≠ µ 2%

) =est tatistic:

( )2 1 2 12 21 2

1 2

0 y y y yt se s s

n n

− − −= =

+



=est statistic has usual for of

!estiate of paraeter A ( 0 #alue%/standard error.

) P 7#alue: 27tail pro$a$ility fro t distri$ution

) For &7sided test !such as ( a: µ 2 7 µ & L 0%+ P'#alue 8

one7tail pro$a$ility fro t distri$ution !$ut+ not ro$ust%) Interpretation of P'#alue and conclusion using 7le#elsae as in one7saple ethods

ex. uppose P'#alue 8 0.. =hen+ under suppositionthat null hypothesis true+ pro$a$ility 8 0. of gettingdata li1e o$ser#ed or e#en >ore etree+? (here>ore etree? deterined $y ( a

E l C i f l d l $ f



Example: Coparing feale and ale ean nu$er ofclose friends+ 0: µ & 8 µ 2 ( a: µ & ≠ µ 2

ifference $et(een saple eans 8 .9 A . 8 0.-

se 8 &.09 !sae as in CI calculation% =est statistic t = 0.-/&.09 8 0.

P'#alue 8 2!0.29% 8 0. !using standard noral ta$le%

If ( 0 true of e"ual population eans+ (ould not $eunusual to get saples such as o$ser#ed.

For 8 0.0 8 P!=ype I error%+ not enough e#idence to

re5ect 0. !Plausi$le that population eans are identical.%

For ( a: µ & M µ 2 !i.e.+ µ 2 7 µ & L 0%+ P7#alue 8 0.29

For ( a: µ & L µ 2 !i.e.+ µ 2 7 µ & M 0%+ P7#alue 8 & A 0.29 8 0.;&



E"ui#alence of CI and ignificance =est

>0: µ & 8 µ 2 re5ected !not re5ected% at 8 0.0 le#el in

fa#or of ( a: µ & ≠ µ 2?

is e"ui#alent to

>9 CI for µ & 7 µ 2 does not contain 0 !contains 0%?

Example: P'#alue 8 0.+ so >'e do not re5ect 0 of

e"ual population eans at 0.0 le#el?

9 CI of !7&.+ 2.;% contains 0.

!For other than 0.0+ corresponds to &00!& 7 % confidence%



*lternati#e inference coparing eans

assues e)ual population standard de*iations

) 'e (ill not consider forulas for this approach here!in ec. ;. of tet%+ as itHs a special case of >analysisof #ariance? ethods studied later in Chapter &2.

=his CI and test uses t distri$ution (ith

df = n1 + n2 ' 2

) 'e (ill see ho( soft(are displays this approach andthe one (eH#e used that does not assue e"ualpopulation standard de#iations.



Example: Eercise ;.0+ p. 2&. Ipro#eent scores for

therapy *: &0+ 20+ 0

therapy K: 0+ ,+ ,

*: ean 8 20+ s& 8 &0

K: ean 8 ,0+ s2 = .--

ata file+ (hich (e input into P and analy6e

u$5ect =herapy Ipro#eent

& * &0

2 * 20

* 0 , K 0

K ,

- K ,



= t f (



=est of 0: µ & 8 µ 2 ( a: µ & ≠ µ 2

=est statistic t = !,0 A 20%/;.-, 8 2.-2 'hen df = ,+ P'#alue 8 2!0.029,% 8 0.09.

For one7sided ( a: µ & <

µ 2 !i.e.+ predict $efore study thattherapy K is $etter%+ P'#alue 8 0.029

'ith 8 0.0+ insufficient e#idence to re5ect null for t(o7

sided a+ $ut can re5ect null for one7sided a andconclude therapy K $etter.

!$ut ree$er+ ust choose a ahead of tieN%



o( does soft(are get df for >une"ual

#ariance? ethod<

) 'hen allo( σ &2 ≠ σ 22 recall that

) =he >ad5usted? degrees of freedo for the t distri$utionapproiation is !'elch7atterth(aite approiation% :

2 2

1 2

1 2

s s se

n n= +

22 2

1 2

1 2

2 22 2

1 2

1 2

1 21 1

s s

n n

df s s

n n

n n

+ ÷

=

÷ ÷ ÷ ÷+

÷− − ÷ ÷



oe coents a$out coparing eans

) If data sho( potentially large differences in #aria$ility

!say+ the larger s $eing at least dou$le the saller s%+safer to use the >une"ual #ariances? ethod

) ,ne'sided t tests not ro$ust against se#ere #iolationsof noral population assuption+ (hen n is sall.

Ketter then to use >nonparametric ? ethods !(hich do

not assue a particular for of population distri$ution%

for one7sided inference see tet ec. ;.;.

) CI ore inforati#e than test+ sho(ing (hether

plausi$le #alues are near or far fro 0.

Eff t $i



Effect $i(e

) 'hen groups ha#e siilar #aria$ility+ a suary

easure of effect si!e is

) Example: =he therapies had saple eans of 20 for *and ,0 for K and standard de#iations of &0 and .--. Ifcoon standard de#iation in each group is estiated to$e s 8 9. !say%+ then

effect si6e 8 !,0 A 20%/9. 8 2.&.

Jean for therapy K estiated to $e a$out t(o standardde#iations larger than the ean for therapy *.

=his is a large effect.

2 1mean meaneffect size =standad de!iati"n in each #"$%

−


29/41


=his effect si6e easure is soeties called >-ohen$s

d .? e considered

d 8 0.2 8 (ea1+ d 8 0. 8 ediu+ d L 0. large.

Example: 'hich study sho(ed the largest effect<

1 2

1 2

1 2

1. 20, 30, 10

2. 200, 300, 100

3. 20, 25, 2

y y s

y y s

y y s

= = =

= = =

= = =

Coparing Jeans (ith ependent aples


30/41


Coparing Jeans (ith ependent aples

) etting: Each saple has the sae su$5ects !as inlongitudinal studies or crosso#er studies% or matched pairs of su$5ects

) =hen+ it is not true that for coparing t(o statistics+

) Just allo( for >correlation? $et(een estiates !'hy


31/41


Example: Cell7phone study also had eperient (ithsae su$5ects in each group

!data on p. &9, of tet%

For this >atched7pairs? design+ data file has the for

u$5ect CellOno CellOyes

& -0, -- 2 - -2

,0 -&

!for 2 su$5ects%

aple eans are ,.- illiseconds (ithout cell phone

.2 illiseconds+ using cell phone

'e reduce the 2 o$ser#ations to 2 difference scores


32/41


'e reduce the 2 o$ser#ations to 2 difference scores+

-- A -0, 8 2

-2 A - 8 -;

-& A ,0 8 ;

.

and analy6e the (ith standard ethods for a single saple

8 0.- 8 .2 A ,.-+ sd 8 2. 8 std de# of 2+ -;+ ;

For a 9 CI+ df 8 n 1 8 &+ t'score 8 2.0,

'e get 0.- D 2.0,!9.2%+ or !&.;+ -9.%

d y

/ 52.5 / 32 9.28d se s n= = =


33/41


) 'e can $e 9 confident that the population ean

using a cell phone is $et(een &.; and -9.

illiseconds higher than (ithout cell phone.

) For testing 0 : Qd 8 0 against a : Qd ≠ 0+ the test

statistic is

t 8 ! 7 0%/se 8 0.-/9.2 8 .,-+ df = &+

=(o7sided P'#alue 8 0.00000+ so there isetreely strong e#idence against the null

hypothesis of no difference $et(een the population

eans.

d y

In class (e (ill use P to


34/41


In class+ (e (ill use P to

) @un the dependent7saples t analyses

) Plot cellOyes against cellOno and o$ser#e a strong

positi#e correlation !0.&,%+ (hich illustrates ho( an

analysis that ignores the dependence $et(een the

o$ser#ations (ould $e inappropriate.) Bote that one su$5ect !nu$er 2% is an outlier

!unusually high% on $oth #aria$les

) 'ith outlier deleted+ P tell us that t 8 .2-+ df =0 for coparing eans !P 8 0.0000&% for coparing

eans+ 9 CI of !29.&+ --.0%. =he pre#ious results

(ere not influenced greatly $y the outlier.


35/41


P output for original dependent7saples t

analysis !including the outlier%


36/41


oe coents

) ependent saples ha#e ad#antages of !&% controllingsources of potential $ias !e.g.+ $alancing saples on#aria$les that could affect the response%+ !2% ha#ing asaller se for the difference of eans+ (hen the pair(iseresponses are highly positi#ely correlated !in (hich case+ thedifference scores sho( less #aria$ility than the separatesaples%

) 'ith dependent saples+ (hy canHt (e use the se forula

for independent saples<

2 2

1 2

1 2

s s se

n n= +

Ex !artificial $ut a1es the point%


37/41


Ex. !artificial+ $ut a1es the point%'eights $efore and after anoreia therapy

u$5ect Kefore *fter ifference

& && &22 ;

2 9& 9 ;

&00 &0; ; , &2 &9 ;

3ots of #aria$ility (ithin each group of o$ser#ations+ $ut

no #aria$ility for the difference scores !so+ actual se isuch saller than independent saples forula suggests%

If you plot x = $efore against y = after+ (hat do you see<

) =he )c*emar test !pp 20&720% copares


38/41


=he )c*emar test !pp. 20&720% copares

proportions (ith dependent saples

) +isher,s exact test !pp. 20720,% copares proportions for sall independent saples

) oeties itHs ore useful to copare groups usingratios rather than differences of paraeters


39/41


Example: U.. ept. of Rustice reports that proportion of

adults in prison is a$out

900/&00+000 for ales+ -0/&00+000 for feales

/ifference: 900/&00+000 A -0/&00+000 8 ,0/&00+000 8 0.00,

Ratio: S900/&00+000T/S-0/&00+000T 8 900/-0 8 &.0

In applications in (hich the proportion refers to anundesira$le outcoe !e.g.+ ost edical studies%+ the

ratio is called the relati*e ris0. Inference ethods !CI+

test% are a#aila$le for it also.

* f ti


40/41


* fe( suary "uestions

&. Gi#e an eaple of !a% independent saples+ !$% dependentsaples

2. Gi#e an eaple of !a% response #ar.+ !$% categoricaleplanatory #ar.+ and identify (hether response is "uantitati#e orcategorical and state the appropriate analyses.

. uppose that a 9 CI for difference $et(een Jassachusetts and=eas in the population proportion supporting legal sae7searriage is !0.&+ 0.22%.

a. Population proportion of support is higher in =eas

$. ince 0.& and 0.22 M 0.0+ less than half the population supportslegal sae7se arriage.

c. =he 99 CI could $e !0.&;+ 0.20%

d. It is plausi$le that population proportions are e"ual.e. P7#alue for testing e"ual population proportions against t(o7sided

alternati#e could $e 0.,0.

f. 'e can $e 9 confident that the population proportion of

support in J* is $et(een 0.& higher and 0.22 higher than in =.

Example: *noreia study studying (eight change for


41/41

Example: *noreia study+ studying (eight change for

groups !$eha#ioral therapy+ faily therapy+ control%.

Patients randoly assigned to one of the three

therapies. Is this an eaple of independent saplesor dependent saples<

Two Populations

Documents

Transcript of Two Populations