Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S....

Unsupervised Group Discovery in Relational Datasets:

A nonparametric Bayesian Approach

P.S. Koutsourelakis

School of Civil and Environmental Engineering

Cornell University

Artificial Intelligence Seminar, 10/12/07Joint work with T. Eliassi-Rad, LLNL

P.S. Koutsourelakis, [email protected]

Problem Setting

A B

D

C

ageincomelocation

…

ageincomelocation

…

ageincomelocation

…age

incomelocation

…

friend

co-worker

phone call

Traditional Clustering

Can we improve clustering by using relational data ?

What if only relational data was available ?

Can we make predictions about missing links or attributes?


Problem Setting

• A collection of objects belonging to various types/domains

(i.e. people, papers, locations, devices, movies, etc)

• Each object might have (observable) attributes

• Links/relations between:

– Two or more objects

– Objects can be of the same or different types

– Binary (absence/presence), integer or real-valued

• Each link might have (observable) attributes

• Find groups of objects of each type, or

• Find common identities between objects of each type, or

• Organize objects into clusters that relate to each other in predictable ways

Goal:


Problem Setting

A B

D

C

Given an adjacency matrix where Ri,j= 0 or 1 (observables),

find cluster assignment Ii (hidden/latent).

A B C D

A • 0 0 0

B 0 • 0 0

C 1 0 • 0

D 0 1 1 •

Probabilistic - Bayesian Formulation

data data |p p pI | R R I I

posterior likelihood prior


Problem Setting

, ,,,

| | ,i j i i j i ji ji j

p R I p R I I

Likelihood:

The relational behavior of the objects is completely determined by their cluster assignments Ii

For example:

matrix specifying link probability between any two groups

, , , ~ ,

, ~ ,

i j i j i j

i j

R I I Bernoulli I I

I I Beta

Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N.Learning systems of concepts with an infinite relational model. AAAI 2006.


Augmented Problem Setting

If objects have attributes (i.e., xi which are also observed), then we can augment likelihood :

If links Ri,j are real-valued (i.e. duration of phone call, number of bytes etc):

,

,

, ~ ,

or , ~ , ,b ,

i j i j i j

i j i j i j i j

R I I Exponential I I

R I I Gamma a I I I I

, ,,, ,

| | , |i j i i i j i j i iii ji j i j

p R x I p R I I p x I

Functions of group assignments

, , , ~ ,

, ~ ,

i j i j i j

i j

R I I Bernoulli I I

I I Beta


Problem Setting

We need a prior on group assignments p(I).

• What is an appropriate prior p(K) on the number of clusters K?

• Groups are unlikely to be related as above.

• The distribution on Ii should be exchangeable.

•That is, the order in which nodes are assigned can be permuted without changing the probability of resulting partition.

, ,,,

| | ,i j i i j i ji ji j

p R I p R I I

Likelihood Function

, , , η ~ ,

, ~ ,

i j i j i j

i j

R I I Bernoulli I I

I I Beta


Nonparametric Bayesian Methods*

• Bayesian methods are most powerful when your prior adequately

captures your beliefs.

• Inflexible models (e.g. with a fixed number of groups) might yield

unreasonable inferences.

• Non-parametrics provide a way of getting very flexible models.

• Non-parametric models can automatically infer an adequate model

size/complexity from the data, without needing to explicitly do

Bayesian model comparison

• Many can be derived by starting with a finite parametric model and

taking the limit as number of parameters

* Nonparametric doesn’t mean there are no parameters, but that “the number of parameters grows with the data” (e.g. as in Parzen window density estimation)


Chinese Restaurant Process (CRP)

(potentially infinite dishes)MENU

.

1/ 1

/ 1 2 / 2

/ 2

2 / 3

/ 3

1/ 3


Chinese Restaurant Process (CRP)

if 0

1|

if j is a new group1

jj

i

nn

np I j

n

-iI

Properties:

• CRP is exchangeable (i.e. order in which customers entered doesn’t matter)

• The number of groups grows as O(log n) where n is the number of nodes

• Inference with Gibbs sampling can be based on the conditionals above

• Larger γ favors more clusters

number of people already eating dish j


Infinite Relational Model (IRM)

“Forward” Interpretation (single domain)

1) Sample group assignments Ii from CRP(γ) resulting in K clusters

2) Sample iid η(a,b) for all a,b=1,2,..,K from Beta(β1,β2 )

3) Sample iid each Ri,j from Bernoulli(η(Ii, Ij))

From Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N.Learning systems of concepts with an infinite relational model. AAAI 2006.

,

1 2 1 2

, , η ~ ,

, , ~ ,

i j i j i j

i j

R I I Bernoulli I I

I I Beta

~ CRP γ

I


• 2 domains (animals + features)

• Animals form two groups: birds + 4-legged mammals

Application: Object-Feature Dataset



Maximum-Likelihood Configuration

Animal Domain

Group 1: dove, hen, owl, falcon, eagle

Group 2: duck, goose

Group 3: fox, cat

Group 4: horse, zebra

Group 5: dog, wolf, tiger, lion, cow

Feature Domain

Group 1: small, 2-legs, feathers, fly

Group 2: medium, hunt

Group 3: big, hooves, mane, run

Group 4: 4-legs, hair

Group 5: swim


Predicting Missing Links

% of Missing Links AUC Accuracy

10% 0.96 0.95

25% 0.96 0.91

50% 0.91 0.87

65% 0.82 0.80

Can we make predictions about missing links?


Infinite Relational Model (IRM)

Advantages:

• It is an unsupervised learner with only two tunable parameters β and γ.

• It can be applied to multiple node types and relations.

• It has all the advantages of a Bayesian formulations (missing data, confidence intervals) and nonparametric methods (adaptation to data, outlier accommodation).

• It has been successfully used for co-clustering object features, learning ontologies and social networks.

Disadvantages:

• Significant computational effort

• It does not capture “multiple personalities.”


“Multiple Personalities”

• In real data, objects (e.g. people) do not belong exclusively to one group, i.e. their identity is a mixture of basic components.

• These components can be the same for each object type but the mixing proportions might vary from one object to another..

• IRM assumes that each object participates in all the relations it is involved with a single identity.

A proper model should account for a different mixture for each object over all the possible identity components (which are common for the whole domain).

• This way we learn not only all the groups of the population but also all the existing mixtures of them.

• This can be achieved by introducing a Bayesian hierarchy

groups ≡ identities


Mixed-Membership Model (MMM)

,Each object can assume as many identities as the number of

links it partcipates ini mI

,

, ,

The likelihood of a link depends only on the identities of the

participating objects for that link ,

mi j

i m j m

R

I I

The personality of each object can be made up of several components

A: No, because the groups for each CRP will not be shared across objects

Q: Can we use an independent CRP for each object


Chinese Restaurant Franchise

N restaurants with a common menu

Object 1 = restaurant 1

Object 2 = restaurant 2

Object N = restaurant N

………………

Phase 1: Table Assignment

Phase 2: Dish AssignmentY.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei .

Hierarchical Dirichlet Processes. JASA, 2006 .


Chinese Restaurant Franchise

,

,

, ,

if 01

|

if t is a new table1

i ti t

i ii m m

i

i i

nn

mp t t

m

itnumber of customers already sitting at table t

table assignment for customer m at restaurant i

,

if s 01

|

if is a new dish1

dd

i t

s

Mp d d

dM

-id

number of tables

number of tables already eating dish k

dish assignment for table t in restaurant i


Mixed-Membership Model

, , , ,,

,

| | ,i j i j

m mi m j m

i ji j

p R p R I II

dish assignment of node i

Properties:

- Has a few more parameters, γi, but also has higher expressivity

- Inference with Gibbs sampling can be based on the conditionals above

, ~i mI CRF γ


Non-Identifiability

A Btwo objects: A, B

1 21,2 2,1two links : ,R R

two groups: 1, 2100% group 1 50% group 1

50% group 2

11,2R

22,1R

four latent variables: 1,1 1,2,I I 2,1 2,2,I I

1 0 matrix:

0 1

Probability of a 1 link between any pair of groups


Non-Identifiability

A B

100% group 1 50% group 1

50% group 2

Different configurations (with 2, 3 or 4 groups) have the same likelihood

Prior determines inference results


Application: Mixed-Membership

• 1 domains – 16 objects

• 4 distinct identities

• fully observed adjacency matrix


Application: Mixed-Membership Model



IRM MMM

Error w.r.t. actual probability that any pair of objects belong to the same group



IRM MMM



• 2 domains (animals + features)

• Animals form two groups: birds + 4-legged mammals



00.10.20.30.40.50.60.70.80.9

1

do

ve

he

n

du

ck

go

ose ow

l

falc

on

ea

gle fox

do

g

wo

lf

cat

tige

r

lion

ho

rse

zeb

ra

IRM

MMM

COW: Average posterior pairwise probabilities of belonging to the same group


34 people

A disagreement between administrator (34) and instructor (1) led to the split of the club in two (circles and squares)

Used binary matrix that records “like” relation

Zachary’s Karate Club

from M Girvan and MEJ Newman,Proc. Natl. Acad. Sci. USA, 2002


Zachary’s Karate Club


Learning Hierarchies

Can we meaningfully infer a hierarchy of groups/identities?

Identity 1

Identity 2

Identity 3 Identity 4

most generalmost general

most specificmost specific



Nonparametric prior on trees

Level 0

Level L

Level L-1

each box is a different group/identity

CRPL(aL)

CRPL-1(aL-1)



“Forward” interpretation (for a single domain)

( )

0

( )

1) Generate 1 level tree using nested hierarchical CRP

2) Each object is assoicated with an 1 level branch

3)Let the probability that object belongs to level of its branch

(dra

Lil l

il

L

i L z

i l

,

( ) ( )

( ) ( )

wn from a Dirichlet prior)

4) For each link between objects and :

a) Sample ~ and assign identity

b) Sample ~ and assign identity

c)

i

j

i j

i ii l i l

j jj l j l

R i j

l Discrete I z

l Discrete I z

,Sample Bernoulli ,mi j i jR from I I

Hierarchical Mixed Membership Model (HMMM)


Application: Artificial Dataset

• 1 domain – 40 objects

• 4 distinct identities

• fully observed adjacency matrix


Application: Artificial Dataset


Application: Political Books

43 liberal, 49 conservative, 13 neutral

Links imply frequent co-purchasing by the same buyers (Amazon.com)


Application: Political Books

27%

50%

23%

19%

46%

35%

0%

0%

100%

8%

8%

84%

6%

94%

0%

0%

100%

0%

0%

100%

0%

0%

0%

100%

0%

0%

100%

2222

2626 99

66 1818 77 11 66 33


Reality Mining MIT Data

1 node type (people)

97 people + all outsiders in one node

22 different positions (professor,staff,1styeargrad,….)

sloan29%

faculty& staff5%

students52%

other14%


Reality Mining MIT Data

12%

64%

12%

12%

50%

33%

17%

0%

7%

86%

7%

0%

0%

4%

0%

96%

100%

0%

0%

0%

0%

100%

0%

0%

33%

0%

50%

17%

15%

83%

4%

8%

25%

50%

0%

25%

2626

1717 66

2323 44 11

1414 11 66


Conclusions and Outlook

• Relational data contain significant information about group structure

• Bayesian models allow the analyst to make inferences about communities of interest while quantifying the level of confidence, even when a significant proportion of the data is missing

• Nonparametric models provide a way of getting very flexible priors that allow the model to adapt to the data.

• IRM is a very lightweight framework with a very wide range of applicability, but cannot capture multiple identities.

• MMM and HMMM allows for increased flexibility and provides additional information about objects that simultaneously belong to several groups.

Challenges:Challenges: Accelerated inference especially when dealing with large datasets: - Variational methods - Sequential Monte Carlo

Appropriate priors for time dependent datasets are needed


Application: Senate Vote 2002

Murkowski, Frank AK R YES 19700Stevens, Ted AK R YES 13000Sessions, Jeff AL R YES 9500Shelby, Richard AL R YES 25000Hutchinson, Tim AR R YES 4900Lincoln, Blanche AR D YES 5500McCain, John AZ R NO 29350Kyl, Jon AZ R YES 14500Boxer, Barbara CA D NO 1500Feinstein, Dianne CA D NO 9750Allard, Wayne CO R YES 7500Campbell, Ben CO R YES 4000Dodd, Christopher CT D NO 500Lieberman, Joseph CT D NO 3000Carper, Thomas DE D YES 17640

50 Democrats, 49 Republicans, 1 Independent

Link Ri,j =1 if:

- voted the same

- have both taken more or less than the average contribution

average: $13,800


Application: Senate Vote 2002

0%

100%

0%

0%

67%

33%

0%

25%

75%

3%

12%

85%

0%

70%

30%

Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S....

Documents

Transcript of Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S....