1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao...

41
1 Clustering in Generalized Linea r Mixed Model Using Dirichlet P rocess Mixtures Ya Xue Xuejun Liao April 1, 2005

Transcript of 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao...

Page 1: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

1

Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures

Ya Xue Xuejun LiaoApril 1, 2005

Page 2: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

2

Introduction Concept drift is in the framework of general

ized linear mixed model, but brings new question of exploiting the structuring of auxiliary data.

Mixtures with a countably infinite number of components can be handled in a Bayesian framework by employing Dirichlet process priors.

Page 3: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

3

Outline Part I: generalized linear mixed model • Generalized linear model (GLM)• Generalized linear mixed model (GLMM)• Advanced applications• Bayesian feature selection in GLMM

Part II: nonparametric method• Chinese restaurant process• Dirichlet process (DP)• Dirichlet process mixture models• Variational inference for Dirichlet process mixtures

Page 4: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

4

Part I Generalized Linear Mixed Model

Page 5: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

5

Generalized Linear Model (GLM)

A linear model specifies the relationship between a dependent (or response) variable Y, and a set of predictor variables, Xs, so that

GLM is a generalization of normal linear regression models to exponential family (normal, Poisson, Gamma, binomial, etc).

.:,' subjectixy ii

Page 6: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

6

GLM differs from linear model in two major respects:

The distribution of Y can be non-normal, and does not have to be continuous.

Y still can be predicted from a linear combination of Xs, but they are "connected" via a link function.

Generalized Linear Model (GLM)

Page 7: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

7

Generalized Linear Model(GLM)

DDE Example: binomial distribution Scientific interest: does DDE exposure increase the

risk of cancer? Test on rats. Let i index rat. Dependent variables:

Independent variable: dose of DDE exposure, denoted by xi.

.,0

,1

.:),,1(~

cancerno

cancerwithdiagnosedisiraty

iratforcancerofriskppBiny

i

iii

Page 8: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

8

Likelihood function of yi:

Choosing the canonical link , the likelihood function becomes

}'exp{1

}'exp{),|(

i

iiii x

xyxyf

'1

ln ii

ii x

p

p

.1

ln,}exp{1

}exp{

}1

1ln

1lnexp{)1()|( 1

i

ii

i

ii

ii

ii

yi

yiii

p

pwhere

y

pp

pypppyf ii

Generalized Linear Model(GLM)

Page 9: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

9

GLMM – Basic Model

Returning to the DDE example, 19 labs all over the world participated this bioassay.

There are unmeasured factors that vary between the different labs.

For example, rodent diet. GLMM is an extension of the generalized lin

ear model by adding random effects to the linear predictor (Schall 1991).

Page 10: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

10

GLMM – Basic Model The previous linear predictor is modified

as: , where index lab, index

rat within lab . are “fixed” effects - parameters

common to all rats. are “random” effects - deviations for

lab i.

iijijij bzx '' ni ,,1 inj ,,1

i

ib

Page 11: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

11

GLMM – Basic Model

If we choose xij = zij , then all the regression coefficients are assumed to vary for the different labs.

If we choose zij = 1, then only the intercept varies for the different labs (random intercept model).

iijijij bzx ''

Page 12: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

12

GLMM - Implementation Gibbs sampling Disadvantage: slow convergence. Solution: hierarchical centering reparametrisat

ion (Gelfand 1994; Gelfand 1995) Deterministic methods are only available for lo

git and probit models.• EM algorithm (Anderson 1985)• Simplex method (Im 1988)

Page 13: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

13

GLMM – Advanced Applications

Nested GLMM: within each lab, rats were group housed with three cats per cage.

let i index lab, j index cage and k index rat.

Crossed GLMM: for all labs, four dose protocols were applied on different rats.

let i index lab, j index rat and k indicate the protocol applied on rat i,j.

ijijkiijkijkijk vbzx '''

kijiijijij vbzx '''

Page 14: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

14

GLMM – Advanced Applications

Nested GLMM: within each lab, rats were group housed with three cats per cage.

Two-level GLMM: level I – lab, level II – cage. Crossed GLMM: for all labs, four dose

protocols were applied on different rats.• Rats are sorted into 19 groups by lab. • Rats are sorted into 4 groups by protocol.

Page 15: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

15

GLMM – Advanced Applications Temporal/spatial statistics: Account for correlation between the random e

ffects at different times/locations.

• Dynamic latent variable model (Dunson 2003) Let i index patient and t index follow-up time,

itjk

Tjk

t

k

Tjk

Titit vxvx

)(1

0

Page 16: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

16

GLMM – Advanced Applications

• Spatially varying coefficient processes (Gelfand 2003): random effects are modeled as spatially correlated process.

-5 0 5 10 15 20 25-5

0

5

10

15

20

25

Possible application:

A landmine field where landmines tend to be close together.

Page 17: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

17

Bayesian Feature Selection in GLMM

Simultaneous selection of fixed and random effects in GLMM (Cai and Dunson 2005)

Mixture prior: )()1()0()( xgxxp

-5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

)0( x

)(xg

Page 18: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

18

Fixed effects: choose mixture priors for the fixed effects coefficients.

Random effects: reparameterization• LDU decomposition of the random effect co

variance• Choose mixture prior for the elements in th

e diagonal matrix.

Bayesian Feature Selection in GLMM

Page 19: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

19

Missing Identification in GLMM

Data table of DDE bioassay

What if the first column is missing? Unusual case in statistics, so few people

work on it. But this is the problem we have to solve for

concept drift.

……Berlin 1 0.01 0.00 34.10 40.90 37.50Berlin 1 0.01 0.00 35.70 35.60 32.10Tokyo 0 0.01 0.00 56.50 28.90 27.10Tokyo 1 0.01 0.00 51.50 29.90 25.90……

Page 20: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

20

Concept Drift Primary data Auxiliary data

If we treat the drift variable as random variable, concept drift is a random intercept model - a special case of GLMM.

))'((),,|(

)'(),|(

iiiiii

iiii

wxywxyp

wxywxyp

Page 21: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

21

Clustering in Concept Drift

0 20 40 60 80 1000

1

2

3

4

5

6

7

8

9Histogram of the estimated non-zero auxiliary variable , C=10

Value of

Num

ber

of

occ

ure

nce

s

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

80

90

index of auxiliary data

Estimated auxiliary variables, C=10

K = 51 clusters (including 0) out of 300 auxiliary data points Bin resolution = 1

Page 22: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

22

Clustering in Concept Drift There are intrinsic clusters in auxiliary data

with respect to drift value.

“The simplest explanation is best.” Occam Razor

Why don’t we instead give each cluster a random effect variable?

Page 23: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

23

Clustering in Concept Drift In usual statistics applications, we know

which individuals share the same random effect .

However, in concept drift, we do not know which individuals (data points or features) share the same random-intercept.

Can we train the classifier and cluster the auxiliary data simultaneously? This is a new problem we aim to solve.

Page 24: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

24

Clustering in Concept Drift How many clusters (K) should we

include in our model?

Does choosing K actually make sense?

Is there a better way?

Page 25: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

25

Part II Nonparametric Method

Page 26: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

26

Nonparametric method Parametric method: the forms of the underl

ying density functions were known. Nonparametric method is a wide category,

e.g. NN, minmax, bootstrapping... Nonparametric Bayesian method: make use

of the Bayesian calculus without prior parameterized knowledge.

Page 27: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

27

Cornerstones of NBM Dirichlet process (DP) allow flexible structures to be learned and al

low sharing of statistical strength among sets of related structures.

Gaussian process (GP) allow sharing in the context of multiple non

parametric regressions (suggest to have a separate seminar on GP)

Page 28: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

28

Chinese restaurant process (CRP) is a distribution on partitions of integers.

CRP is used to represent uncertainty over the number of components in a mixture model.

Chinese Restaurant Process

Page 29: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

29

Chinese Restaurant Process

Unlimited number of tables

Each table has an unlimited capacity to seat customers.

Page 30: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

30

Chinese Restaurant Process

The (m+1)th subsequent customer sits at a table drawn from the following distribution:

mcustomersprevioustableunoccupiedanp

m

mcustomerspreviousitableoccupiedp i

)|(

)|(

where mi is the number of previous customers at table i and is a parameter.

Page 31: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

31

Chinese Restaurant Process

Example:

The probability that next customer sits at table

.1,9 m

19

2

19

1

19

4

19

2

19

1

Page 32: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

32

CRP yields an exchangeable distribution on partitions of integers, i.e., the specific ordering of the customers is irrelevant.

An infinite set of random variables is said to be infinitely exchangeable if for every finite subset , we have

Chinese Restaurant Process

},,,{ 21 nxxx

),,(),,( )()2()1(21 nn xxxpxxxp

for any permutation .

Page 33: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

33

Dirichlet Process

G0: any probability measure on the reals, : partition.A process is a Dirichlet process if the following equation holds for all partitions:

))(,),((~))(,),(( 0101 kk GGDirGG where is a concentration parameter.

Note: Dir – Dirichlet distribution, DP - Dirichlet process.

Page 34: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

34

Denote a sample from the Dirichlet process as

G is a distribution. Denote a sample from the distribution G as

Dirichlet Process

),(~ 0GDPG

GG ~|

Graphical model for a DP generating the parameters .

Page 35: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

35

Dirichlet ProcessProperties of DP:

0][ GGE

)1

,(),,|(1

01

n

in in

Gn

nDPGp

n

in in

Gn

GE1

01

1],,|[

Page 36: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

36

Dirichlet ProcessThe marginal probabilities for a new

nGniforp

nGniforp

nin

n

jjnin i

),,,,|1(

)(1

),,,,|1(

011

1011

This is Chinese restaurant process.

Page 37: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

37

DP Mixtures

),(~

~|

)(~|

0GDPG

GG

Fx

i

iii

If F is a normal distribution, this is the a Gaussian mixture model.

Page 38: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

38

Applications of DP Infinite Gaussian Mixture Model (Rasmusse

n 2000)

Infinite Hidden Markov Model (Beal 2002)

Hierarchical Topic Models and the Nested Chinese Restaurant Process (Blei 2004)

Page 39: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

39

Implementation of DP

Gibbs sampling

If G0 is a conjugate prior for the likelihood given by F: (Escobar 1995)

Non-conjugate prior: (Neal 1998)

Page 40: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

40

Variational Inference for DPM The goal is to compute the predictive densit

y under DP mixture

Also, we minimized the KL distance between p and a variational distribution q.

This algorithm is based on the stick-breaking representation of DP.

(I would suggest to have a separate seminar on stick-breaking view of DP and variational DP.)

dxxpxpxxxp nn ),...,|()|(),...,|( 11

Page 41: 1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.

41

Open Questions Can we apply ideas of infinite models

beyond identifying the number of states or components in a mixture?

Under what conditions can we expect these models to give consistent estimates of densities?

... Specified to our problem: Non conjugate

due to sigmoid function