Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S....
-
date post
21-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S....
Unsupervised Group Discovery in Relational Datasets:
A nonparametric Bayesian Approach
P.S. Koutsourelakis
School of Civil and Environmental Engineering
Cornell University
Artificial Intelligence Seminar, 10/12/07Joint work with T. Eliassi-Rad, LLNL
P.S. Koutsourelakis, [email protected]
Problem Setting
A B
D
C
ageincomelocation
…
ageincomelocation
…
ageincomelocation
…age
incomelocation
…
friend
co-worker
phone call
Traditional Clustering
Can we improve clustering by using relational data ?
What if only relational data was available ?
Can we make predictions about missing links or attributes?
P.S. Koutsourelakis, [email protected]
Problem Setting
• A collection of objects belonging to various types/domains
(i.e. people, papers, locations, devices, movies, etc)
• Each object might have (observable) attributes
• Links/relations between:
– Two or more objects
– Objects can be of the same or different types
– Binary (absence/presence), integer or real-valued
• Each link might have (observable) attributes
• Find groups of objects of each type, or
• Find common identities between objects of each type, or
• Organize objects into clusters that relate to each other in predictable ways
Goal:
P.S. Koutsourelakis, [email protected]
Problem Setting
A B
D
C
Given an adjacency matrix where Ri,j= 0 or 1 (observables),
find cluster assignment Ii (hidden/latent).
A B C D
A • 0 0 0
B 0 • 0 0
C 1 0 • 0
D 0 1 1 •
Probabilistic - Bayesian Formulation
data data |p p pI | R R I I
posterior likelihood prior
P.S. Koutsourelakis, [email protected]
Problem Setting
, ,,,
| | ,i j i i j i ji ji j
p R I p R I I
Likelihood:
The relational behavior of the objects is completely determined by their cluster assignments Ii
For example:
matrix specifying link probability between any two groups
, , , ~ ,
, ~ ,
i j i j i j
i j
R I I Bernoulli I I
I I Beta
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N.Learning systems of concepts with an infinite relational model. AAAI 2006.
P.S. Koutsourelakis, [email protected]
Augmented Problem Setting
If objects have attributes (i.e., xi which are also observed), then we can augment likelihood :
If links Ri,j are real-valued (i.e. duration of phone call, number of bytes etc):
,
,
, ~ ,
or , ~ , ,b ,
i j i j i j
i j i j i j i j
R I I Exponential I I
R I I Gamma a I I I I
, ,,, ,
| | , |i j i i i j i j i iii ji j i j
p R x I p R I I p x I
Functions of group assignments
, , , ~ ,
, ~ ,
i j i j i j
i j
R I I Bernoulli I I
I I Beta
P.S. Koutsourelakis, [email protected]
Problem Setting
We need a prior on group assignments p(I).
• What is an appropriate prior p(K) on the number of clusters K?
• Groups are unlikely to be related as above.
• The distribution on Ii should be exchangeable.
•That is, the order in which nodes are assigned can be permuted without changing the probability of resulting partition.
, ,,,
| | ,i j i i j i ji ji j
p R I p R I I
Likelihood Function
, , , η ~ ,
, ~ ,
i j i j i j
i j
R I I Bernoulli I I
I I Beta
P.S. Koutsourelakis, [email protected]
Nonparametric Bayesian Methods*
• Bayesian methods are most powerful when your prior adequately
captures your beliefs.
• Inflexible models (e.g. with a fixed number of groups) might yield
unreasonable inferences.
• Non-parametrics provide a way of getting very flexible models.
• Non-parametric models can automatically infer an adequate model
size/complexity from the data, without needing to explicitly do
Bayesian model comparison
• Many can be derived by starting with a finite parametric model and
taking the limit as number of parameters
* Nonparametric doesn’t mean there are no parameters, but that “the number of parameters grows with the data” (e.g. as in Parzen window density estimation)
P.S. Koutsourelakis, [email protected]
Chinese Restaurant Process (CRP)
(potentially infinite dishes)MENU
.
1/ 1
/ 1 2 / 2
/ 2
2 / 3
/ 3
1/ 3
P.S. Koutsourelakis, [email protected]
Chinese Restaurant Process (CRP)
if 0
1|
if j is a new group1
jj
i
nn
np I j
n
-iI
Properties:
• CRP is exchangeable (i.e. order in which customers entered doesn’t matter)
• The number of groups grows as O(log n) where n is the number of nodes
• Inference with Gibbs sampling can be based on the conditionals above
• Larger γ favors more clusters
number of people already eating dish j
P.S. Koutsourelakis, [email protected]
Infinite Relational Model (IRM)
“Forward” Interpretation (single domain)
1) Sample group assignments Ii from CRP(γ) resulting in K clusters
2) Sample iid η(a,b) for all a,b=1,2,..,K from Beta(β1,β2 )
3) Sample iid each Ri,j from Bernoulli(η(Ii, Ij))
From Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N.Learning systems of concepts with an infinite relational model. AAAI 2006.
,
1 2 1 2
, , η ~ ,
, , ~ ,
i j i j i j
i j
R I I Bernoulli I I
I I Beta
~ CRP γ
I
P.S. Koutsourelakis, [email protected]
• 2 domains (animals + features)
• Animals form two groups: birds + 4-legged mammals
Application: Object-Feature Dataset
P.S. Koutsourelakis, [email protected]
Application: Object-Feature Dataset
Maximum-Likelihood Configuration
Animal Domain
Group 1: dove, hen, owl, falcon, eagle
Group 2: duck, goose
Group 3: fox, cat
Group 4: horse, zebra
Group 5: dog, wolf, tiger, lion, cow
Feature Domain
Group 1: small, 2-legs, feathers, fly
Group 2: medium, hunt
Group 3: big, hooves, mane, run
Group 4: 4-legs, hair
Group 5: swim
P.S. Koutsourelakis, [email protected]
Application: Object-Feature Dataset
P.S. Koutsourelakis, [email protected]
Predicting Missing Links
% of Missing Links AUC Accuracy
10% 0.96 0.95
25% 0.96 0.91
50% 0.91 0.87
65% 0.82 0.80
Can we make predictions about missing links?
P.S. Koutsourelakis, [email protected]
Infinite Relational Model (IRM)
Advantages:
• It is an unsupervised learner with only two tunable parameters β and γ.
• It can be applied to multiple node types and relations.
• It has all the advantages of a Bayesian formulations (missing data, confidence intervals) and nonparametric methods (adaptation to data, outlier accommodation).
• It has been successfully used for co-clustering object features, learning ontologies and social networks.
Disadvantages:
• Significant computational effort
• It does not capture “multiple personalities.”
P.S. Koutsourelakis, [email protected]
“Multiple Personalities”
• In real data, objects (e.g. people) do not belong exclusively to one group, i.e. their identity is a mixture of basic components.
• These components can be the same for each object type but the mixing proportions might vary from one object to another..
• IRM assumes that each object participates in all the relations it is involved with a single identity.
A proper model should account for a different mixture for each object over all the possible identity components (which are common for the whole domain).
• This way we learn not only all the groups of the population but also all the existing mixtures of them.
• This can be achieved by introducing a Bayesian hierarchy
groups ≡ identities
P.S. Koutsourelakis, [email protected]
Mixed-Membership Model (MMM)
,Each object can assume as many identities as the number of
links it partcipates ini mI
,
, ,
The likelihood of a link depends only on the identities of the
participating objects for that link ,
mi j
i m j m
R
I I
The personality of each object can be made up of several components
A: No, because the groups for each CRP will not be shared across objects
Q: Can we use an independent CRP for each object
P.S. Koutsourelakis, [email protected]
Chinese Restaurant Franchise
N restaurants with a common menu
Object 1 = restaurant 1
Object 2 = restaurant 2
Object N = restaurant N
………………
Phase 1: Table Assignment
Phase 2: Dish AssignmentY.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei .
Hierarchical Dirichlet Processes. JASA, 2006 .
P.S. Koutsourelakis, [email protected]
Chinese Restaurant Franchise
,
,
, ,
if 01
|
if t is a new table1
i ti t
i ii m m
i
i i
nn
mp t t
m
itnumber of customers already sitting at table t
table assignment for customer m at restaurant i
,
if s 01
|
if is a new dish1
dd
i t
s
Mp d d
dM
-id
number of tables
number of tables already eating dish k
dish assignment for table t in restaurant i
P.S. Koutsourelakis, [email protected]
Mixed-Membership Model
, , , ,,
,
| | ,i j i j
m mi m j m
i ji j
p R p R I II
dish assignment of node i
Properties:
- Has a few more parameters, γi, but also has higher expressivity
- Inference with Gibbs sampling can be based on the conditionals above
, ~i mI CRF γ
P.S. Koutsourelakis, [email protected]
Non-Identifiability
A Btwo objects: A, B
1 21,2 2,1two links : ,R R
two groups: 1, 2100% group 1 50% group 1
50% group 2
11,2R
22,1R
four latent variables: 1,1 1,2,I I 2,1 2,2,I I
1 0 matrix:
0 1
Probability of a 1 link between any pair of groups
P.S. Koutsourelakis, [email protected]
Non-Identifiability
A B
100% group 1 50% group 1
50% group 2
Different configurations (with 2, 3 or 4 groups) have the same likelihood
Prior determines inference results
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership
• 1 domains – 16 objects
• 4 distinct identities
• fully observed adjacency matrix
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership
IRM MMM
Error w.r.t. actual probability that any pair of objects belong to the same group
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
• 2 domains (animals + features)
• Animals form two groups: birds + 4-legged mammals
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
P.S. Koutsourelakis, [email protected]
Application: Mixed-Membership Model
00.10.20.30.40.50.60.70.80.9
1
do
ve
he
n
du
ck
go
ose ow
l
falc
on
ea
gle fox
do
g
wo
lf
cat
tige
r
lion
ho
rse
zeb
ra
IRM
MMM
COW: Average posterior pairwise probabilities of belonging to the same group
P.S. Koutsourelakis, [email protected]
34 people
A disagreement between administrator (34) and instructor (1) led to the split of the club in two (circles and squares)
Used binary matrix that records “like” relation
Zachary’s Karate Club
from M Girvan and MEJ Newman,Proc. Natl. Acad. Sci. USA, 2002
P.S. Koutsourelakis, [email protected]
Zachary’s Karate Club
P.S. Koutsourelakis, [email protected]
Learning Hierarchies
Can we meaningfully infer a hierarchy of groups/identities?
Identity 1
Identity 2
Identity 3 Identity 4
most generalmost general
most specificmost specific
P.S. Koutsourelakis, [email protected]
Learning Hierarchies
Nonparametric prior on trees
Level 0
Level L
Level L-1
each box is a different group/identity
CRPL(aL)
CRPL-1(aL-1)
P.S. Koutsourelakis, [email protected]
Learning Hierarchies
“Forward” interpretation (for a single domain)
( )
0
( )
1) Generate 1 level tree using nested hierarchical CRP
2) Each object is assoicated with an 1 level branch
3)Let the probability that object belongs to level of its branch
(dra
Lil l
il
L
i L z
i l
,
( ) ( )
( ) ( )
wn from a Dirichlet prior)
4) For each link between objects and :
a) Sample ~ and assign identity
b) Sample ~ and assign identity
c)
i
j
i j
i ii l i l
j jj l j l
R i j
l Discrete I z
l Discrete I z
,Sample Bernoulli ,mi j i jR from I I
Hierarchical Mixed Membership Model (HMMM)
P.S. Koutsourelakis, [email protected]
Application: Artificial Dataset
• 1 domain – 40 objects
• 4 distinct identities
• fully observed adjacency matrix
P.S. Koutsourelakis, [email protected]
Application: Artificial Dataset
P.S. Koutsourelakis, [email protected]
Application: Political Books
43 liberal, 49 conservative, 13 neutral
Links imply frequent co-purchasing by the same buyers (Amazon.com)
P.S. Koutsourelakis, [email protected]
Application: Political Books
27%
50%
23%
19%
46%
35%
0%
0%
100%
8%
8%
84%
6%
94%
0%
0%
100%
0%
0%
100%
0%
0%
0%
100%
0%
0%
100%
2222
2626 99
66 1818 77 11 66 33
P.S. Koutsourelakis, [email protected]
Reality Mining MIT Data
1 node type (people)
97 people + all outsiders in one node
22 different positions (professor,staff,1styeargrad,….)
sloan29%
faculty& staff5%
students52%
other14%
P.S. Koutsourelakis, [email protected]
Reality Mining MIT Data
12%
64%
12%
12%
50%
33%
17%
0%
7%
86%
7%
0%
0%
4%
0%
96%
100%
0%
0%
0%
0%
100%
0%
0%
33%
0%
50%
17%
15%
83%
4%
8%
25%
50%
0%
25%
2626
1717 66
2323 44 11
1414 11 66
P.S. Koutsourelakis, [email protected]
Conclusions and Outlook
• Relational data contain significant information about group structure
• Bayesian models allow the analyst to make inferences about communities of interest while quantifying the level of confidence, even when a significant proportion of the data is missing
• Nonparametric models provide a way of getting very flexible priors that allow the model to adapt to the data.
• IRM is a very lightweight framework with a very wide range of applicability, but cannot capture multiple identities.
• MMM and HMMM allows for increased flexibility and provides additional information about objects that simultaneously belong to several groups.
Challenges:Challenges: Accelerated inference especially when dealing with large datasets: - Variational methods - Sequential Monte Carlo
Appropriate priors for time dependent datasets are needed
P.S. Koutsourelakis, [email protected]
Application: Senate Vote 2002
Murkowski, Frank AK R YES 19700Stevens, Ted AK R YES 13000Sessions, Jeff AL R YES 9500Shelby, Richard AL R YES 25000Hutchinson, Tim AR R YES 4900Lincoln, Blanche AR D YES 5500McCain, John AZ R NO 29350Kyl, Jon AZ R YES 14500Boxer, Barbara CA D NO 1500Feinstein, Dianne CA D NO 9750Allard, Wayne CO R YES 7500Campbell, Ben CO R YES 4000Dodd, Christopher CT D NO 500Lieberman, Joseph CT D NO 3000Carper, Thomas DE D YES 17640
50 Democrats, 49 Republicans, 1 Independent
Link Ri,j =1 if:
- voted the same
- have both taken more or less than the average contribution
average: $13,800
P.S. Koutsourelakis, [email protected]
Application: Senate Vote 2002
0%
100%
0%
0%
67%
33%
0%
25%
75%
3%
12%
85%
0%
70%
30%