Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

57
Count Data

Transcript of Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Page 1: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Count Data

Page 2: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

H T

),(~ HH pnBinX

),;(~),( THTH ppnMNomXX

),,,;(~),,,( 621621 pppnMNomXXX

Page 3: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Cleopatra VII & Marcus Antony

),,,;(~),,,( cacACaCAcacACaCA ppppnMNomXXXX

C

c

A a

Page 4: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

EVEN

ODD

1st 122nd 123rd 12

EVEN

ODD

Page 5: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Gregor Mendel, 1822-1884

Page 6: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
Page 7: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

900 300 300 100 1600

)1:3:3:9():::(:0 ryrYRyRY ppppH

)1:3:3:9():::(:1 ryrYRyRY ppppH

0H

Which statement is right or ?1H0H

Page 8: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect 900 300 300 100 1600

O-E 50 -50 50 -50 0

2500 2500 2500 2500 100002)( EO

X

Page 9: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

),,,;(~),,,( 43214321 ppppnMNomXXXX

4,3,2,1,),(~ inpPoissonX iiii

iiii NPoisson largefor,),(~)(

)1(~,)1,0(~ 2

2

i

ii

i

ii XN

X

Page 10: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

),,,;(~),,,( 43214321 ppppnMNomXXXX

)1(~ 2

2

i

iiX

nXXXX 4321

)14(~ 24

1

2

i i

iiX

Page 11: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

1 2 3 4 Total

Obs. 950 250 350 50 1600

Expect 900 300 300 100 1600

O-E 50 -50 50 -50 0

25/9 25/3 25/3 25 25*15/9EEO /)( 2

X

)3(~ 2

4

1

24

1

2

i i

ii

i i

ii

E

EOX

Page 12: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

1 3 5 8 15 24 ∞

0.975 0.001

0.216

0.831 2.180

6.262

12.401

0.95 0.004

0.352

1.145 2.733

7.261

13.848

0.05 3.841

7.815

11.071

15.507

24.996

36.415

0.025 5.024

9.348

12.833

17.535

27.488

39.364

2,n

n

0 2 4 6 8 10

0.0

0.1

0.

2

- 2 0 2 4 6 8 10 12

0.0

0

.2

0.4

0.6

0

.8

1.

0

®®

®®

Â2®(n)Â2®(n) Â2

®(n)Â2®(n)

2,n

2,n

Page 13: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

900 300 300 100 1600

)1:3:3:9(),,,(:0 ryrYRyRY ppppH

)1:3:3:9(),,,(:1 ryrYRyRY ppppH

0H

815.744.449/16*25 2

3,05.0

4

1

2

i i

ii

E

EO

> x <- c(950,250,350,50)> p <- c(9,3,3,1)/16> chisq.test(x, p=p) Chi-squared test for given probabilitiesdata: x X-squared = 44.4444, df = 3, p-value = 1.214e-09

Page 14: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

ry

rY

Ry

RY

p

p

p

p

p*

1

3

3

9

16

1Mp

10 HH

0Mp

*p

0H

MM ppHvsppH *1

*0 :.:

303)dim()dim( 010 HHHdf

Page 15: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Y y Total

R 950 250 1200

r 350 50 400

Total 1300 300 1600

scscscsc pppHvspppH ),(1),(0 :.:

Y y

R

r

1

cs

cs),( RYp ),( Ryp

),( ryp),( rYp

Yp yp

Rp

rp

16/12

16/4

16/13 16/3

Page 16: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

scscscsc pppHvspppH ),(1),(0 :.:

Y y

R

r

1

cs),( RYp ),( Ryp

),( ryp),( rYp

Yp yp

Rp

rp

16/12

16/4

16/13 16/3

Y y

R

r

1

cs

16

13

16

12 16/12

16/4

16/13 16/3

16

3

16

12

16

3

16

4

16

13

16

4

Chi-square test for Independence test

Page 17: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

1600975 225 325 75

Y y

R

r

1

cs

16

13

16

12 16/12

16/4

16/13 16/3

16

3

16

12

16

3

16

4

16

13

16

4

Y y

R 1200

r 400

1300 300 1600

cs

16

13

16

121600

16

3

16

121600

16

3

16

41600

16

13

16

41600

0H

Page 18: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

RY Ry rY ry Total

Obs. 950 250 350 50 1600

Expect( )

1600

0.64 2.77 1.92 8.33 13.67

975 225 325 750H

scscscsc pppHvspppH ),(1),(0 :.:

EEO /)( 2

84.367.13 2

1,05.0

4

1

2

i i

ii

E

EO

> mx<- matrix(c(950,250,350,50),2,)> chisq.test(mx,correct=F) Pearson's Chi-squared testdata: mx X-squared = 13.6752, df = 1, p-value = 0.0002173

> mx [,1] [,2][1,] 950 350[2,] 250 50

Page 19: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Y y

R

r

1

cs),( RYp ),( Ryp

),( ryp),( rYp

Yp yp

Rp

rp

1 rR pp

1 yY pp

1)12()12( df

123)dim()dim( 010 HHHdf

yr

Yr

yR

YR

pp

pp

pp

pp

p*

ry

rY

Ry

RY

p

p

p

p

p*

2)dim( 0 H

3)dim( 10 HH

Page 20: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

y1 … ym Tot

r1

rk

Tot 1

cs

)1()1()dim()dim( 010 kmHHHdf

)1()1()dim( 0 kmH

1)dim( 10 mkHH

Page 21: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Total

Obs. 8 12 7 14 9 10 60Expec ( )

10 10 10 10 10 10 60

0.4 0.4 0.9 1.6 0.1 0 3.4

)1:1:1:1:1:1():::::(: 6543210 ppppppH

)1:1:1:1:1:1():::::(: 6543211 ppppppH

EEO /)( 20H

25,05.0

6

1

22 07.114.3

i i

ii

E

EO

> x <- c(8,12,7,14,9,10)> p <- rep(1,6)/6> chisq.test(x,p=p) Chi-squared test for given probabilitiesdata: x X-squared = 3.4, df = 5, p-value = 0.6386

Page 22: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

H T Total

Obs. 60 40 100Expec( )

50 50 100

2 2 4EEO /)( 20H

)1:1():(:0 TH ppH

)1:1():(:1 TH ppH

2/1:0 HpH

2/1:1 HpH

)1:1():(:1 TH ppH

21,05.0

2

1

22 84.34

i i

ii

E

EO

> chisq.test(c(60,40),p=c(1,1)/2) Chi-squared test for given probabilitiesdata: c(60, 40) X-squared = 4, df = 1, p-value = 0.0455

Page 23: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

|| ?

:

:

560

440

640

360

> head2 <- c( 560, 640)> toss2 <- c( 1000, 1000)> prop.test(head2, toss2)2-sample test for equality of proportions ….data: head2 out of toss2 X-squared = 13.0021, df = 1, p-value = 0.0003111alternative hypothesis: two.sided 95 percent confidence interval: -0.12379728 -0.03620272 sample estimates:prop 1 prop 2 0.56 0.64

Caesar Tolemy

Head 560 640

Tail 440 360

> chisq.test(mx,cor=F) Pearson's Chi-squared testdata: mx X-squared = 13.3333, df = 1, p-value = 0.0002607> chisq.test(mx) Pearson's Chi-squared test with Yates‘ continuity correctiondata: mx X-squared = 13.0021, df = 1, p-value = 0.0003111

> mx <- matrix(c(560,440,640,360),2,)> mx [,1] [,2][1,] 560 640[2,] 440 360

Chi-square test for Homogeneity of distributions

Page 24: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> > # H0 : all four coins have the same proportion showing head side> # H1 : at least one coin have different proportion to the others> > head4 <- c( 83, 90, 129, 70 )> toss4 <- c( 86, 93, 136, 82 )> prop.test(head4, toss4)

4-sample test for equality of proportions without continuity correction

data: head4 out of toss4 X-squared = 12.6004, df = 3, p-value = 0.005585alternative hypothesis: two.sided sample estimates: prop 1 prop 2 prop 3 prop 4 0.9651163 0.9677419 0.9485294 0.8536585

Coin 1 Coin 2 Coin 3 Coin 4

Head 83 90 129 70 Alive

Tail 3 3 7 12 Dead

Total 86 93 136 82 Total

Hospital 1

Hospital 2

Hospital 3 Hospital 4

> mx <- matrix(c(83,3,90,3,129,7,70,12),2,)> chisq.test(mx) Pearson's Chi-squared testdata: mx X-squared = 12.6004, df = 3, p-value = 0.005585

Page 25: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

D W WD

CC 37 190 94

CR 23 59 23

RC 10 141 28

RR 15 58 26

Australia rare plants data

Common (C ) & Rare (R ) in ( South Australia, Victoria) and (Tasmania )

The number of plants:

in Dry (D ), Wet (W ) and Wet or Dry (WD ) regions.

Question (null hypothesis):

Is the distribution of plants for (D,W,WD) are equal for all CC, CR, RC and RR?

Page 26: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Australia rare plants data

> rareplants<-matrix(c(37,23,10,15,190,59,141,58,94,23,28,16),4,)> dimnames(rareplants)<-list(c("CC","CR","RC","RR"),c("D","W","WD"))> rareplants> (sout<- chisq.test(rareplants) )

Pearson's Chi-squared test

data: rareplants X-squared = 34.9863, df = 6, p-value = 4.336e-06

> round( sout$expected ,1 ) D W WDCC 39.3 207.2 74.5CR 12.9 67.8 24.4RC 21.9 115.6 41.5RR 10.9 57.5 20.6> round( sout$resid ,3 ) D W WDCC -0.369 -1.196 2.263CR 2.828 -1.067 -0.275RC -2.547 2.368 -2.099RR 1.242 0.072 -1.023

Page 27: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

The lady tasting tea

http://www.youtube.com/watch?v=lgs7d5saFFc

http://en.wikipedia.org/wiki/Fisher's_exact_test

Page 28: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Fisher’s exact test for 2X2 tables with small n (n<25)

> chisq.test(matrix(c(7,2,1,5),2,)) Pearson's Chi-squared test with Yates' continuity correctionX-squared = 3.2254, df = 1, p-value = 0.0725Warning message: 카이 자승 근사는 부정확할지도 모릅니다> fisher.test(matrix(c(7,2,1,5),2,)) Fisher's Exact Test for Count Data p-value = 0.04056alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.8646648 934.0087368 sample estimates: odds ratio 13.59412 > fisher.test(matrix(c(7,2,1,5),2,),alter="greater") Fisher's Exact Test for Count Datap-value = 0.03497alternative hypothesis: true odds ratio is greater than 1 95 percent confidence interval: 1.179718 Inf sample estimates: odds ratio 13.59412

Guess\Making Milk 1st Tea 1st Sum

Milk 1st 7 1 8

Tea 1st 2 5 7

sum 9 6 15

Page 29: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

There are 7 possible tables for given marginal counts.

G\M M 1st

T 1st

Sum

M 1st 8 0 8

T 1st 1 6 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 7 1 8

T 1st 2 5 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 6 2 8

T 1st 3 4 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 5 3 8

T 1st 4 3 7

sum 9 6 15G\M M

1st

T 1st

Sum

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 3 5 8

T 1st 6 1 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 2 6 8

T 1st 7 0 7

sum 9 6 15

What is the probability that each table will show at the experiment ?

Page 30: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

G\M M 1st

T 1st

Sum

M 1st a b a+b

T 1st c d c+d

sum a+c b+d n

G\M M 1st

T 1st

Sum

M 1st r q v

T 1st 1-r 1-q 1-v

sum 1 1 1

r)q(1

q)r(1

q1q

r1r

1 means no discernible ability.

Odds ratio :

qrv dcban

1:.1: 10 HvsH

ba

n

b

db

a

ca

dcban

dbcadcba

ca

n

c

dc

a

ba

p!!!!!

)!()!()!()!(

1:.1: 10 HvsH

cb

ad with some

correction

Page 31: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

G\M M 1st

T 1st

Sum

M 1st 8 0 8

T 1st 1 6 7

sum 9 6 15G\M M

1st

T 1st

Sum

M 1st 7 1 8

T 1st 2 5 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 6 2 8

T 1st 3 4 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 5 3 8

T 1st 4 3 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 3 5 8

T 1st 6 1 7

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 2 6 8

T 1st 7 0 7

sum 9 6 15

0.00140

0.03356 0.19580

1When

0.00560

0.39161

0.29370 0.07832

0.00140 + 0.03356 + 0.00560 = 0.04056 (See, p-value of the fisher exact test; two-sided test)

0.00140 + 0.03356 = 0.03497 (one-sided test)

Page 32: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

G\M M 1st

T 1st

Sum

M 1st 9 0 9

T 1st 0 6 6

sum 9 6 15

G\M M 1st

T 1st

Sum

M 1st 4 4 8

T 1st 5 2 7

sum 9 6 15100% correct answers Some are misclassified

Fisher exact test considers only the cases with the same fixed margins.

The probabilities of tables with different margins are completely ignored.

This is referred to data-respecting (?) inference, from time to time.

Page 33: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Use Fisher’s exact test only for small n ( less than 25).

> Pearson's Chi-squared testX-squared = 10.8036, df = 1, p-value = 0.001013> chisq.test(matrix(c(14,4,2,10),2,)) Pearson's Chi-squared test with Yates' continuity correctionX-squared = 8.4877, df = 1, p-value = 0.003576> fisher.test(matrix(c(14,4,2,10),2,)) Fisher's Exact Test for Count Datap-value = 0.002185alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 2.123319 202.143800 sample estimates: odds ratio 15.40804

Guess\Making Milk 1st Tea 1st Sum

Milk 1st 14 2 16

Tea 1st 4 10 14

sum 18 12 30

No big difference when n is large !

Page 34: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Yates’ continuity correction

8036.10

))()()((

)( 222

dbcadcba

bcadn

E

EO

i i

ii

G\M M 1st

T 1st

Sum

M 1st a b a+b

T 1st c d c+d

sum a+c b+d ndcban

4877.8))()()((

)2/|(|21

||corrected

2

2

2

dbcadcba

nbcadn

E

EO

i i

ii

Page 35: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Odds ratio : q1q

r1r

q1

qlog

r1

rlog)log(

0.0 0.2 0.4 0.6 0.8 1.0

-4-2

02

4

y

logi

t(y)

-6 -4 -2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

x

log

istic

(x)

Page 36: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

),(~ 2ii NYtindependen

Regressionii X 0

Generalized Linear Model (GLM)

iij ),(~ 2ijij NY

ijii X ),(~ 2ijij NY

ANOVA

Linear Model (LM)

tindependen

tindependen

Page 37: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Linear Model (LM) - Regression, - ANOVA

)(~ ijij PoissonY

),(~ ijij pnBinY

tindependen

tindependen

ijiij X )log(

ijiij

ij Xp

p

1log

Generalized Linear Model (GLM)

Poisson Regression

Binomial Regression ( Logistic Regression )

ijiij X ),(~ 2ijij NYtindependen

Page 38: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Guess\Making Milk 1st Tea 1st Sum

Milk 1st 7 1 8

Tea 1st 2 5 7

sum 9 6 15

),9(~ 11 pBinY

):;:(2,1,),,(~ GuessjMakingijipnBinY iii

),6(~ 22 pBinY

11 9 YV 22 6 YV 1,7 21 YY are observed!

Logistic regression

iii

i Xp

p

1

log

Page 39: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> tm<-data.frame(gm=c(7,1),gt=c(2,5), making=c("M","T"))> summary( glm(cbind(gm,gt)~making,family=binomial, data=tm) )Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2528 0.8018 1.562 0.118 makingT -2.8622 1.3575 -2.108 0.035 *

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5.7863e+00 on 1 degrees of freedomResidual deviance: 8.8818e-16 on 0 degrees of freedomAIC: 8.1909

Number of Fisher Scoring iterations: 4

Logistic regression with the lady tasting tea data

Page 40: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

A B C D E F

10 11 0 3 3 11

7 17 1 5 5 9

20 21 7 12 3 15

14 11 2 6 5 22

14 16 3 4 3 15

12 14 1 3 6 16

10 17 2 5 1 13

23 17 1 5 1 10

17 19 3 5 3 26

20 21 0 5 2 26

14 7 1 2 6 24

13 13 4 4 4 13

A B C D E F

05

1015

2025

InsectSprays data

Type of spray

Inse

ct c

ount

Page 41: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> sx<-rep(LETTERS[1:6],e=12)> dx<-c(10,7,20,14,14,12,10,23,17,20,14,13,11,17,21,11,16,14,17,17,19,21,7,13,+ 0,1,7,2,3,1,2,1,3,0,1,4,3,5,12,6,4,3,5,5,5,5,2,4,3,5,3,5,3,6,1,1,3,2,6,+ 4,11,9,15,22,15,16,13,10,26,26,24,13)> ax<- 30-dx> insect<-data.frame(dead=dx,alive=ax,spray=sx)> gout<-glm(cbind(dead,alive)~spray,family=binomial, data=insect)> summary( gout )

Call: glm(formula = cbind(dead, alive) ~ spray, family = binomial, data = insect)

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.06669 0.10547 -0.632 0.5272 sprayB 0.11114 0.14913 0.745 0.4561 sprayC -2.52856 0.23259 -10.871 <2e-16 ***sprayD -1.56288 0.17719 -8.821 <2e-16 ***sprayE -1.95769 0.19513 -10.033 <2e-16 ***sprayF 0.28983 0.14958 1.938 0.0527 .

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 614.07 on 71 degrees of freedomResidual deviance: 171.24 on 66 degrees of freedomAIC: 416.16

Number of Fisher Scoring iterations: 4

ii

i

p

p

1

log

Page 42: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> gres<-rbind(unique(fitted(gout)),unique(predict(gout)))> dimnames(gres)[[2]]<-LETTERS[1:6]

> gres A B C D E F[1,] 0.48333333 0.51111111 0.06944445 0.1638889 0.1166667 0.5555556[2,] -0.06669137 0.04445176 -2.59525468 -1.6295728 -2.0243818 0.2231436

> anova(gout)Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(dead, alive)

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. DevNULL 71 614.07spray 5 442.83 66 171.24

ii

i

p

p

1

log

ip

i

Page 43: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Correlation and causality

The more STBK stores, the higher will APT price increase ?

The more Starbucks, the higher APT price !

APT prices in Seoul

Page 44: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

STBK

APT price

강남구 45 1030

강동구 2 530

중구 24 520

중랑구 0 330

STBK: the number of Starbucks stores

APT price: Average APT price by a 1 m2

)(~ ii PoissonY

ii X )log(

iY iX

Page 45: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

y<-c(45, 2,1,4,4,6,4,2,1,0,2,3,10,8,21,3,5,5,3,12,7,1,20,24,0)x<-c(3373,1907,1115,1413,1286,1861,1218,1018,1250,1135,1240,1528, 1675,1220,2854,1644,1247,2427,2034,1723,2594,1138,1634,1729,1101)

xm<- x/(3.3) # 평단가

( res<- glm(y~xm, family=poisson) )

anova(res)summary(res)

plot(xm,y,ylab="Starbucks",xlab="APT price/m2")

points(xm,fitted(res),col="red",pch=16) # exp(predict(res))=fitted(res)

300 400 500 600 700 800 900 1000

01

02

03

04

0

APT price/m2

Sta

rbu

cks

i

Page 46: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> summary(res)

Call:glm(formula = y ~ xm, family = poisson)

Deviance Residuals: Min 1Q Median 3Q Max -2.6923 -1.7239 -0.6041 0.5783 5.3036

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.0072064 0.2128074 -0.034 0.973 xm 0.0035630 0.0003009 11.841 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 235.19 on 24 degrees of freedomResidual deviance: 111.52 on 23 degrees of freedomAIC: 195.4

Number of Fisher Scoring iterations: 5

Page 47: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> anova(res)Analysis of Deviance Table

Model: poisson, link: log

Response: y

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. DevNULL 24 235.19xm 1 123.67 23 111.52

Page 48: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

A 0.75 0.05 0.05 0.05 0.05 0.05

B 0.1 0.5 0.1 0.1 0.1 0.1

C 0.05 0.05 0.05 0.05 0.05 0.75

distribution & likelihood

Page 49: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

,),(~ pnBinX

xnx ppx

nxf

)1()(

0xX

What is ?

is observed.

p

)1,0(p

Page 50: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

01

56

0.0

p x

1.0

1.0p

7.0p

distribution & likelihood

0 1 2 3 4 5 6

0.00

0.10

0.20

0.30

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

p

lik

eli

ho

od

0.133 0.587

0.15

42 )1(2

6)2( ppf

Page 51: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

2)( jjj

YSSE

nj ,,2,1 ),(~ 2 jj NYtindependen

?

n

jjjy

nn

jjj eyf 1

22 2/)(2/2

1

)2()|(likelihood

)2log(/)()|(log2hood)log(likeli2- 2

1

22

1

nyyfn

jjj

n

jjj

22 2/)(

2

1)(

yeyf

Page 52: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

)log(2 likelihoodDeviance

nj ,,2,1 tindependen

)(~ jj PoissonY

,2,1,0,!

)( yy

eyfy

n

jj

yj

n

jjj yeyf jj

11

)!/()|(likelihood

n

jjjjj

n

jjj yyyf

11

))!log(log(2)|(log2hood)log(likeli2-

Page 53: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

)(~ jj PoissonY

jjj X )log(

n

jjjjj

n

jjj yyyf

11

))!log(log(2)|(log2hood)log(likeli2-

n

jjjjj xyx

1)log()(2

link function(for Poisson family)

:,2)log(2 kklikelhoodAIC the number of parameters

tindependen

linear modeling for the link function

Page 54: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

)log(2 likelihoodDeviance

nj ,,2,1 tindependen

),(~ jj pnBinY

n

j

ynj

yj

j

n

jjj

jj ppy

npyf

11

)1()|(likelihood

n

j jj

jij y

n

p

pypn

1log

1log)1log(2hood)log(likeli2-

nyppy

npyfyf yny ,...,1,0,)1()|()(

iji

i Xp

p

01

loglink function (for binomial family)

linear modeling for the link function

Page 55: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Independence test in GLM for Australia rare plants data

> rareplants<-matrix(c(37,23,10,15,190,59,141,58,94,23,28,16),4,)

> dimnames(rareplants)<-list(c("CC","CR","RC","RR"),c("D","W","WD"))> (sout<- chisq.test(rareplants) )

Pearson's Chi-squared testdata: rareplants X-squared = 34.9863, df = 6, p-value = 4.336e-06

> wdx<-rep(c("D","W","WD"),e=4)> crx<-rep(c("CC","CR","RC","RR"),3)> rplants<-data.frame(wd=wdx,cr=crx,r=c(rareplants))> anova( glm(r~wd*cr,family=poisson,data=rplants) )

Analysis of Deviance TableModel: poisson, link: log, Response: r

Terms added sequentially (first to last) Df Deviance Resid. Df Resid. DevNULL 11 522.11wd 2 305.28 9 216.83cr 3 181.88 6 34.95wd:cr 6 34.95 0 -9.77e-15

D W WD

CC 37 190 94

CR 23 59 23

RC 10 141 28

RR 15 58 26

)(~)log(2 2 dflikelhood > 1-pchisq(34.95,6) [1] 4.406699e-06

Page 56: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

> # H0 : all four coins have the same proportion showing head side> # H1 : at least one coin have different proportion to the others> > head4 <- c( 83, 90, 129, 70 )> toss4 <- c( 86, 93, 136, 82 )> prop.test(head4, toss4) 4-sample test for equality of proportions without continuity correction X-squared = 12.6004, df = 3, p-value = 0.005585alternative hypothesis: two.sided > coins<-factor(LETTERS[1:4])> anova(glm(cbind(head4,toss4-head4)~coins,family=binomial))Analysis of Deviance TableTerms added sequentially (first to last) Df Deviance Resid. Df Resid. DevNULL 3 10.667coins 3 10.667 0 1.132e-14

Coin 1 Coin 2 Coin 3 Coin 4

Head 83 90 129 70 Alive

Tail 3 3 7 12 Dead

Total 86 93 136 82 Total

Hosp’l 1 Hosp’l 2 Hosp’l 3 Hosp’l 4

)(~)log(2 2 dflikelhood > 1-pchisq(10.667,3) [1] 0.01366980

Homogeneity test in GLM for coin tossing example

Page 57: Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

Thank you !!