Introduction to Graphical Models for Data Mining Arindam Banerjee [email protected] Dept of...

135
Introduction to Graphical Models for Data Mining Arindam Banerjee [email protected] Dept of Computer Science & Engineering University of Minnesota, Twin Cities 16 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining July 25, 2010

Transcript of Introduction to Graphical Models for Data Mining Arindam Banerjee [email protected] Dept of...

Page 1: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Introduction to Graphical Models for Data Mining

Arindam [email protected]

Dept of Computer Science & EngineeringUniversity of Minnesota, Twin Cities

16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 25, 2010

Page 2: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 2

Introduction

• Graphical Models– Brief Overview

• Part I: Tree Structured Graphical Models– Exact Inference

• Part II: Mixed Membership Models– Latent Dirichlet Allocation– Generalizations, Applications

• Part III: Graphical Models for Matrix Analysis– Probabilistic Matrix Factorization– Probabilistic Co-clustering – Stochastic Block Models

Page 3: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 3

Graphical Models: What and Why

• Statistical Data Analaysis– Build diagnostic/predictive models from data

– Uncertainty quantification based on (minimal) assumptions

• The I.I.D. assumption– Data is independently and identically distributed

– Example: Words in a doc drawn i.i.d. from the dictionary

• Graphical models– Assume (graphical) dependencies between (random) variables

– Closer to reality, domain knowledge can be captured

– Learning/inference is much more difficult

Page 4: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 4

Flavors of Graphical Models

• Basic nomenclature– Node = random variable, maybe observed/hidden

– Edge = statistical dependency

• Two popular flavors: ‘Directed’ and ‘Undirected’

• Directed Graphs– A directed graph between random variables, causal dependencies

– Example: Bayesian networks, Hidden Markov Models

– Joint distribution is a product of P(child|parents)

• Undirected Graphs– An undirected graph between random variables

– Example: Markov/Conditional random fields

– Joint distribution in terms of potential functions

X1

X3

X4 X5

X2

Page 5: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 5

Bayesian Networks

• Joint distribution in terms of P(X|Parents(X))

X1

X3

X4 X5

X2

Page 6: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 6

Example I: Burglary Network

This and several other examples are from the Russell-Norvig AI book

Page 7: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Computing Probabilities of Events

• Probability of any event can be computed:P(B,E,A,J,M) = P(B) P(E|B) P(A|B,E) P(J|B,E,A) P(M|B,E,A,J)

= P(B) P(E) P(A|B,E) P(J|A) P(M|A)

• Example:P(b,¬e,a, ¬j,m) = P(b) P(¬e)P(a|b,¬e) P(¬j|a) P(m|a)

Graphical Models 7

Page 8: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 8

Example II: Rain Network

Page 9: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 9

Example III: “Car Won’t Start” Diagnosis

Page 10: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 10

Inference

• Some variables in the Bayes net are observed– the evidence/data, e.g., John has not called, Mary has called

• Inference– How to compute value/probability of other variables

– Example: What is the probability of Burglary, i.e., P(b|¬j,m)

Page 11: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 11

Inference Algorithms

• Graphs without loops: Tree-structured Graphs– Efficient exact inference algorithms are possible

– Sum-product algorithm, and its special cases• Belief propagation in Bayes nets

• Forward-Backward algorithm in Hidden Markov Models (HMMs)

• Graphs with loops– Junction tree algorithms

• Convert into a graph without loops

• May lead to exponentially large graph

– Sum-product/message passing algorithm, ‘disregarding loops’• Active research topic, correct convergence ‘not guaranteed’

• Works well in practice

– Approximate inference

Page 12: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 12

Approximate Inference

• Variational Inference– Deterministic approximation

– Approximate complex true distribution over latent variables

– Replace with family of simple/tractable distributions• Use the best approximation in the family

– Examples: Mean-field, Bethe, Kikuchi, Expectation Propagation

• Stochastic Inference– Simple sampling approaches

– Markov Chain Monte Carlo methods (MCMC)• Powerful family of methods

– Gibbs sampling• Useful special case of MCMC methods

Page 13: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Part I: Tree Structured Graphical Models

• The Inference Problem

• Factor Graphs and the Sum-Product Algorithm

• Example: Hidden Markov Models

• Generalizations

Graphical Models 13

Page 14: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

The Inference Problem

Graphical Models 14

Page 15: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Complexity of Naïve Inference

Graphical Models 15

Page 16: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Bayes Nets to Factor Graphs

Graphical Models 16

Page 17: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Factor Graphs: Product of Local Functions

Graphical Models 17

Page 18: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Marginalize Product of Functions (MPF)

• Marginalize product of functions

• Computing marginal functions

• The “not-sum” notation

Graphical Models 18

Page 19: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

MPF using Distributive Law

• We focus on two examples: g1(x1) and g3(x3)

• Main Idea: Distributive law

ab + ac = a(b+c)

• For g1(x1), we have

• For g3(x3), we have

Graphical Models 19

Page 20: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Computing Single Marginals

• Main Idea:– Target node becomes the root

– Pass messages from leaves up to the root

Graphical Models 20

Page 21: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Compute product of descendants with fThen do not-sum over part

Message Passing

Graphical Models 21

Compute product of descendants

Page 22: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Computing g1(x1)

Graphical Models 22

Page 23: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Computing g3(x3)

Graphical Models 23

Efficient Algorithm is encoded in the structure of the factor graph

Page 24: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Hidden Markov Models (HMMs)

Graphical Models 24

Latent variables:z0,z1,…,zt-1,zt,zt+1,…,zT

Observed variables:x1,…,xt-1,xt,xt+1,…,xT

Inference Problems:1.Compute p(x1:T) 2.Compute p(zt|x1:T)3.Find maxz1:T

p(z1:T|x1:T)

Similar problem for chain-structured Conditional Random Fields (CRFs)

Page 25: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

The Sum-Product Algorithm

• To compute gi(xi), form a tree rooted at xi

• Starting from the leaves, apply the following two rules– Product Rule:

At a variable node, take the product of descendants

– Sum-product Rule:

At a factor node, take the product of f with descendants;

then perform not-sum over the parent node

• To compute all marginals– Can be done one at a time; repeated computations, not efficient

– Simultaneous message passing following the sum-product algorithm

– Examples: Belief Propagation, Forward-Backward algorithm, etc.

Graphical Models 25

Page 26: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Sum-Product Updates

Graphical Models 26

Page 27: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Sum-Product Updates

Graphical Models 27

Page 28: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Step 1

Graphical Models 28

Page 29: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Step 2

Graphical Models 29

Page 30: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Step 3

Graphical Models 30

Page 31: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Step 4

Graphical Models 31

Page 32: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Step 5

Graphical Models 32

Page 33: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Example: Termination

Graphical Models 33

Page 34: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

HMMs Revisited

Graphical Models 34

Latent variables:z0,z1,…,zt-1,zt,zt+1,…,zT

Observed variables:x1,…,xt-1,xt,xt+1,…,xT

Inference Problem:1.Compute p(x1:T)2.Compute p(zt|x1:T)

Sum-product algorithm is knownas the `forward-backward’ algorithm

Smoothing in Kalman Filtering

Page 35: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Distributive Law on Semi-Rings

• Idea can be applied to any commutative semi-ring

• Semi-ring 101 – Two operations (+,×): Associative, Commutative, Identity

– Distributive law: a×b + a×c = a×(b+c)

Graphical Models 35

•Belief Propagation in Bayes nets•MAP inference in HMMs•Max-product algorithm•Alternative to Viterbi Decoding•Kalman Filtering•Error Correcting Codes•Turbo Codes•…

Page 36: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Message Passing in General Graphs

• Tree structured graphs– Message passing is guaranteed to give correct solutions

– Examples: HMMs, Kalman Filters

• General Graphs – Active research topic

• Progress has been made in the past 10 years

– Message passing• May not converge

• May converge to a ‘local minima’ of ‘Bethe variational free energy’

– New approaches to convergent and correct message passing

• Applications– True Skill: Ranking System for Xbox Live

– Turbo Codes: 3G, 4G phones, satellite comm, Wimax, Mars orbiter

Graphical Models 36

Page 37: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Part II: Mixed Membership Models

• Mixture Models vs Mixed Membership Models

• Latent Dirichlet Allocation

• Inference– Mean-Field and Collapsed Variational Inference

– MCMC/Gibbs Sampling

• Applications

• Generalizations

Graphical Models 37

Page 38: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 38

Background: Plate Diagrams

a

b

3

a

b1 b2 b3

Compact representation of large Bayesian networks

Page 39: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

θ

x

dn

d

39Graphical Models

0.3

1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Model 1: Independent Features

d=3, n=1

Page 40: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

θ

x

dn

d

π

z

θ

x

dn

kd

40

Model 2: Naïve Bayes (Mixture Models)

Graphical Models

Page 41: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

-1

3

-0.5

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

41Graphical Models

Naïve Bayes Model

n k

π z θx

d d

Page 42: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

0.1

2.1

-1.5

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

42Graphical Models

Naïve Bayes Model

n k

π z θx

d d

Page 43: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

θ

xdn

d

π

z

θ

xdn

kd

π

z

xdn

α

θkd

43

Model 3: Mixed Membership Model

Graphical Models

Page 44: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

0.7

3.1

-1

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

44Graphical Models

Mixed Membership Models

π z xdn

α θ

k

d

Page 45: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

0.9

2.1

-2

x

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

45Graphical Models

Mixed Membership Models

π z xdn

α θ

k

d

Page 46: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

0.1

2.1

-1.5

x

46Graphical Models

Mixture Model vs Mixed Membership Model

-1

3

-0.5

x0.7

3.1

-1

x

0.9

2.1

-2

x

Single component membership Multi-component mixed membership

Page 47: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 47

Nd D

zi

xi

(d)

p(d) Dirichlet()

zi Discrete( (d) )

xi Discrete( (zi) )

K

distribution over topicsfor each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Latent Dirichlet Allocation (LDA)

j

Page 48: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 48

z~Discrete()

~Drichlet(α)

x1 x3x2

z1 z2 z3

z11

x11

z12

x12

z13

x13

1

z21

x21

2

z22

x22

a

x1 x3x2

z

x1 x3x2

z1 z2 z3

x1 x3x2

z1 z2 z3

Page 49: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 49

LDA Generative Model

document1 document2

Page 50: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 50

LDA Generative Model

document1 document2

Page 51: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Learning: Inference and Estimation

Graphical Models 51

Page 52: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Variational Inference

Graphical Models 52

Page 53: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Variational EM for LDA

Graphical Models 53

Page 54: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

E-step: Variational Distribution and Updates

Graphical Models 54

Page 55: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

M-step: Parameter Estimation

Graphical Models 55

Page 56: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Topics Inferred

Graphical Models 56

Page 57: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Perplexity Comparison

Graphical Models 57

Page 58: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 60

Aviation Safety Reports (NASA)

Page 59: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 61

Results: NASA Reports I

Arrival Departure

Passenger Maintenance

runwayapproachdeparturealtitude

turntower

air traffic controlheadingtaxi way

flight

passengerattendant

flightseat

medicalcaptain

attendantslavatory

toldpolice

maintenance enginemel zzz

air craft installed

checkinspection

fuelWork

Page 60: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 62

Results: NASA Reports II

Medical Emergency

Wheel Maintenance

Weather Condition

Departure

medical

passenger

doctor

attendant

oxygen

emergency

paramedics

flight

nurse

aed

tire

wheel

assembly

nut

spacer

main

axle

bolt

missing

tires

knots

turbulence

aircraft

degrees

ice

winds

wind

speed

air speed

conditions

departure

sid

dme

altitude

climbing

mean sea level

heading

procedure

turn

degree

Page 61: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 63

The pilot flies an owner's airplane with the owner as a

passenger. Loses contact with the center during the flight.

While performing a sky diving, a jet approaches at the same altitude, but an accident is avoided.

Red: Flight Crew Blue: Passenger Green: Maintenance

Two-Dimensional Visualization for Reports

Page 62: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 64

Altimeter has a problem, but the pilot overcomes the difficulty during the flight.

During acceleration, a flap retraction issue happens. The pilot then returns

to base and lands. The mechanic finds out the problem.

Red: Flight Crew Blue: Passenger Green: Maintenance

Two-Dimensional Visualization for Reports

Page 63: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 65

The pilot has a landing gear problem. Maintenance crew joins radio conversation to

help.

The captain has a medical emergency.

Red: Flight crew Blue: Passenger Green: Maintenance

Two-Dimensional Visualization for Reports

Page 64: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 66

Red: Flight Crew Blue: Passenger Green: Maintenance

Mixed Membership of Reports

Flight Crew: 0.7039Passenger: 0.0009Maintenance: 0.2953

Flight Crew: 0.1405Passenger: 0.0663Maintenance: 0.7932

Flight Crew: 0.2563Passenger: 0.6599Maintenance: 0.0837

Flight Crew: 0.0013Passenger: 0.0013Maintenance: 0.9973

Page 65: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 67

Nd D

zi

xi

p (d)

(j)

p(d) Dirichlet()

zi Discrete(p (d) ) (j) Dirichlet()

xi Discrete( (zi) )

T

distribution over topicsfor each document

topic assignment for each word

distribution over words for each topic

word generated from assigned topic

Dirichlet priors

Smoothed Latent Dirichlet Allocation

Page 66: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Stochastic Inference using Markov Chains

• Powerful family of approximate inference methods– Markov Chain Monte Carlo, Gibbs Sampling

• The basic idea– Need to marginalize over complex latent variable distribution

p(x|) = ∫z p(x,z|) = ∫z p(x|) p(z|x,) = Ez~p(z|x,)[p(x|)]– Draw ‘independent’ samples from p(z|x,)

– Compute sample based average instead of the full integral

• Main Issue: How to draw samples?– Difficult to directly draw samples from p(z|x,)

– Construct a Markov chain whose stationary distribution is p(z|x,)

– Run chain till ‘convergence’

– Obtain samples from p(z|x,)

Graphical Models 68

Page 67: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

The Metropolis-Hastings Algorithm

Graphical Models 69

Page 68: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

The Metropolis-Hastings Algorithm (Contd)

Graphical Models 70

Page 69: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

The Gibbs Sampler

Graphical Models 71

Page 70: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Collapsed Gibbs Sampling for LDA

Graphical Models 72

Page 71: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Collapsed Variational Inference for LDA

Graphical Models 73

Page 72: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Collapsed Variational Inference for LDA

Graphical Models 74

Page 73: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Comparison of Inference Methods

Graphical Models 75

Page 74: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Comparison of Inference Methods

Graphical Models 76

Page 75: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Generalizations

• Generalized Topic Models– Correlated Topic Models

– Dynamic Topic Models, Topics over Time

– Dynamic Topics with birth/death

• Mixed membership models over non-text data, applications– Mixed membership naïve-Bayes

– Discriminative models for classification

– Cluster Ensembles

• Nonparametric Priors– Dirichlet Process priors: Infer number of topics

– Hierarchical Dirichlet processes: Infer hierarchical structures

– Several other priors: Pachinko allocation, Gaussian Processes, IBP, etc.

Graphical Models 77

Page 76: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

CTM Results

Graphical Models 78

Page 77: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

DTM Results

Graphical Models 79

Page 78: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

DTM Results II

Graphical Models 80

Page 79: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 81/41

Mixed Membership Naïve Bayes

• For each data point,– Choose π ~ Dirichlet()

• For each of observed features fn:

– Choose a class zn ~ Discrete (π)

– Choose a feature value xn from p(xn|zn,fn,Θ), which could be Gaussian, Poisson, Bernoulli…

Page 80: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 82

MMNB vs NB: Perplexity Surfaces

•MMNB typically achieves a lower perplexity than NB

•On test set, NB shows overfitting, but MMNB is stable and robust.

NB NB

NB

MMNB

MMNB

MMNB

Page 81: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Discriminative Mixed Membership Models

Graphical Models 83

Page 82: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

84

Results: DLDA for text classification

Generally, Fast DLDA has a higher accuracy on most of the datasets

Graphical Models

Page 83: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

85

Topics from DLDA

Graphical Models

cabin flight ice aircraft flight

descent hours aircraft gate smoke

pressurization time flight ramp cabin

emergency crew wing wing passenger

flight day captain taxi aircraft

aircraft duty icing stop captain

pressure rest engine ground cockpit

oxygen trip anti parking attendant

atc zzz time area smell

masks minutes maintenance line emergency

Page 84: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models

• Combining multiple base clusterings of a dataset

• Robust and stable• Distributed and scalable• Knowledge reuse, privacy preserving

Cluster Ensembles

base clustering1 base clustering2 base clustering3

86

Page 85: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models

Problem Formulation

• Input & Output

Data points

Base clusterings Consensus clustering

87

Page 86: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models

Results: State-of-the-art vs Bayesian Ensembles

88

Page 87: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Part III: Graphical Models for Matrix Analysis

• Probabilistic Matrix Factorizations

• Probabilistic Co-clustering

• Stochastic Block Structures

Graphical Models 89

Page 88: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Matrix Factorization

• Singular value decomposition

• Problems– Large matrices, with millions of row/colums

• SVD can be rather slow

– Sparse matrices, most entries are missing• Traditional approaches cannot handle missing entries

Graphical Models 90

Page 89: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

• Model X ϵ Rn×m as UVT where– U is a Rn×k, V is Rm×k

– Alternatively optimize U and V

Matrix Factorization: “Funk SVD”

Graphical Models 91

Xij = uiTvj =

error = (Xij –Xij)2 = (Xij –ui

Tvj)2

^

uiT

vj

^

Page 90: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

• Gradient descent updates

Matrix Factorization (Contd)

Graphical Models 92

uik(t+1) = uik

(t) + η (Xij-Xij) vjk(t)

vjk(t+1) = vjk

(t) + η (Xij-Xij) ujk(t)

^

^

Xij = uiTvj =

error = (Xij -Xij )2

^

uiT

vj

^

Page 91: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Probabilistic Matrix Factorization (PMF)

Graphical Models 93

Xij ~ N(uiTvj , σ2)

uiT

vj

N(0, σu2I)

N(0, σv2I)

uiT ~ N(0, σu

2I)vj ~ N(0, σv

2I)Rij ~ N(ui

Tvj , σ2)

Inference using gradient descent

Page 92: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Bayesian Probabilistic Matrix Factorization

Graphical Models 94

Xij ~ N(uiTvj , σ2)

uiT

vj

N(µu, Λu)

N(µv, Λv)µu ~ N(µ0, Λ u), Λ u ~ W(ν0, W0)µv ~ N(µ0, Λ v), Λ v ~ W(ν0, W0)ui

~ N(µu, Λ u)vj ~ N(µv, Λ v)Rij ~ N(ui

Tvj , σ2)

Wishart

Gaussian

Inference using MCMC

Page 93: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: PMF on the Netflix Dataset

Graphical Models 95

Page 94: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: PMF on the Netflix Dataset

Graphical Models 96

Page 95: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Bayesian PMF on Netflix

Graphical Models 97

Page 96: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Bayesian PMF on Netflix

Graphical Models 98

Page 97: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Bayesian PMF on Netflix

Graphical Models 99

Page 98: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 100

Co-clustering: Gene Expression Analysis

Original Co-clustered

Page 99: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 101

Co-clustering and Matrix Approximation

Page 100: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 102

Probabilistic Co-clustering

Row clusters:Column clusters:

Page 101: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 103

Probabilistic Co-clustering

Page 102: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 104

Generative Process

2

• Assume a mixed membership for each row and column

• Assume a Gaussian for each co-cluster

1. Pick row/column clusters

2. Generate each entry of the matrix

Page 103: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 105

Reduction to Mixture Models

3

Page 104: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 106

Reduction to Mixture Models

3 1.1

Page 105: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 107

Generative Process

2

• Assume a mixed membership for each row and column

• Assume a Gaussian for each co-cluster

1. Pick row/column clusters

2. Generate each entry of the matrix

Page 106: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 108

Bayesian Co-clustering (BCC)

2

• A Dirichlet distribution over all possible mixed memberships

Page 107: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 109

Bayesian Co-clustering (BCC)

Page 108: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 110

Learning: Inference and Estimation

• Learning– Estimate model parameters

– Infer ‘mixed memberships’ of individual rows and columns

• Expectation Maximization

• Issues– Posterior probability cannot be obtained in closed form

– Parameter estimation cannot be done directly

• Approach: Approximate inference– Variational Inference

– Collapsed Gibbs Sampling, Collapsed Variational Inference

),,( 21

Page 109: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 111

Variational EM

• Introduce a variational distribution to approximate

• Use Jensen’s inequality to get a tractable lower bound

• Maximize the lower bound w.r.t – Alternatively minimize the KL divergence between

and

• Maximize the lower bound w.r.t.

Page 110: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 112

Variational Distribution

• for each row, for each column

Page 111: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Collapsed Inference

• Latent distribution can be exactly marginalized over (1,2)

– Obtain p(X,z1,z2|1,2,) in closed form

– Analysis assumes discrete/categorical entries

– Can be generalized to exponential family distributions

• Collapsed Gibbs Sampling– Conditional distribution of (z1uv,z2uv) in closed form

P(z1uv=i, z2

uv=j | X, z1-uv, z2-uv, 1,2,

– Sample states, run sampler till convergence

• Collapsed Variational Bayes– Variational distribution q(z1,z2|) = ∏u,v q(z1

uv,z2uv|uv)

– Gaussian and Taylor approximation to obtain updates for uv

Graphical Models 113

Page 112: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 114

Residual Bayesian Co-clustering (RBC)

•(z1,z2) determines the distribution

•Users/movies may have bias

•(m1,m2): row/column means

•(bm1,bm2): row/ column bias

Page 113: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 115

Results: Datasets

• Movielens: Movie recommendation data – 100,000 ratings (1-5) for 1682 movies from 943 users (6.3%)

– Binarize: 0 (1-3), 1(4-5).

– Discrete (original), Bernoulli (binary), Real (z-scored)

• Foodmart: Transaction data– 164,558 sales records for 7803 customers and 1559 products (1.35%)

– Binarize: 0 (less than median), 1(higher than median)

– Poisson (original), Bernoulli (binary), Real (z-scored)

• Jester: Joke rating data– 100,000 ratings (-10.00,+10.00) for 100 jokes from 1000 users (100%)

– Binarize: 0 (lower than 0), 1 (higher than 0)

– Gaussian (original), Bernoulli (binary), Real (z-scored)

Page 114: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 116

MMNB BCC LDA

Jester 1.7883 1.8186 98.3742

Movielens 1.6994 1.9831 439.6361

Foodmart 1.8691 1.9545 1461.7463

MMNB BCC LDA

Jester 4.0237 2.5498 98.9964

Movielens 3.9320 2.8620 1557.0032

Foodmart 6.4751 2.1143 6542.9920

Training Set Test Set

On Binary Data

MMNB BCC

Jester 15.4620 18.2495

Movielens 3.1495 0.8068

Foodmart 4.5901 4.5938

MMNB BCC

Jester 39.9395 24.8239

Movielens 38.2377 1.0265

Foodmart 4.6681 4.5964

Training Set Test Set

Perplexity Comparison with 10 Clusters

On Original Data

Page 115: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 117

Co-embedding: Users

Page 116: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 118

Co-embedding: Movies

Page 117: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 119

RBC vs. other co-clustering algorithms

Jester

•RBC and RBC-FF perform better than BCC

•RBC and RBC-FF are also the best among others

Page 118: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 120

RBC vs. other co-clustering algorithms

Foodmart

Movielens

Page 119: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 121

RBC vs. SVD, NNMF, and CORR

Jester

•RBC and RBC-FF are competitive with other algorithms

Page 120: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 122

RBC vs. SVD, NNMF and CORR

Movielens

Foodmart

Page 121: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 123

SVD vs. Parallel RBC

Parallel RBC scales well to large matrices

Page 122: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Inference Methods: VB, CVB, Gibbs

Graphical Models 124

Page 123: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Mixed Membership Stochastic Block Models

• Network data analysis– Relational View: Rows and Columns are the same entity

– Example: Social networks, Biological networks

– Graph View: (Binary) adjacency matrix

• Model

Graphical Models 125

Page 124: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

MMB Graphical Model

Graphical Models 126

Page 125: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Variational Inference

• Variational lower bound

• Fully factorized variational distribution

• Variational EM– E-step: Update variational parameters (,)

– M-step: Update model parameters (,B)

Graphical Models 127

Page 126: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Inferring Communities

Graphical Models 128

Original friendship matrix Friendships inferred from the posterior, respectively based on thresholding p

TBq and pTBq

Page 127: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Results: Protein Interaction Analysis

Graphical Models 129

“Ground truth”: MIPS collection of protein interactions (yellow diamond)

Comparison with other models based on protein interactions and microarray expression analysis

Page 128: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Non-parametric Bayes

Graphical Models 130

Dirichlet Process Mixtures

Chinese Restaurant Processes

Indian Buffet Processes

Pittman-Yor Processes

Gaussian Processes

Hierarchical Dirichlet Processes

Mondrain Processes

Page 129: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

References: Graphical Models

• S. Russell & P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 2009.

• D. Koller & N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, 2009.

• C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007.

• D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, 2010.

• M. I. Jordan (Ed), Learning in Graphical Models, MIT Press, 1998.

• S. L. Lauritzen, Graphical Models, Oxford University Press, 1996.

• J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988.

Graphical Models 131

Page 130: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

References: Inference

• F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the

• sum-product algorithm,” IEEE Transactions on Information Theory, vol.47, no. 2, 498–519, 2001.

• S. M. Aji and R. J. McEliece, “The generalized distributive law,” IEEE Transactions on Information Theory, 46, 325–343, 2000.

• M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1-2, 1-305, December 2008.

• C. Andrieu, N. De Freitas, A. Doucet, M. I. Jordan, “An Introduction to MCMC for Machine Learning,” Machine Learning, 50, 5-43, 2003.

• J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free energy approximations and generalized belief propagation algorithms,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2282–2312, 2005.

Graphical Models 132

Page 131: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

References: Mixed-Membership Models

• S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. “Indexing by latent semantic analysis,” Journal of the Society for Information Science, 41(6):391–407, 1990.

• T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, 42(1):177–196, 2001.

• D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research (JMLR), 3:993–1022, 2003.

• T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National Academy of Sciences, 101(Suppl 1): 5228–5235, 2004.

• Y. W. Teh, D. Newman, and M. Welling. “A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation, ” Neural Information Processing Systems (NIPS), 2007.

• A. Asuncion, P. Smyth, M. Welling, Y.W. Teh, “On Smoothing and Inference for Topic Models,” Uncertainty in Artificial Intelligence (UAI), 2009.

• H. Shan, A. Banerjee, and N. Oza, “Discriminative Mixed-membership Models,”IEEE Conference on Data Mining (ICDM), 2009.

Graphical Models 133

Page 132: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

References: Matrix Factorization

• S. Funk, “Netflix update: Try this at home,” http://sifter.org/~simon/journal/20061211.html

• R. Salakhutdinov and A. Mnih. “Probabilistic matrix factorization,” Neural Information Processing Systems (NIPS), 2008.

• R. Salakhutdinov and A. Mnih. “Bayesian probabilistic matrix factorization using Markov chain Monte Carlo,” International Conference on Machine Learning (ICML), 2008.

• I. Porteous, A. Asuncion, and M. Welling, “Bayesian matrix factorization with side information and Dirichlet process mixtures,” Conference on Artificial Intelligence (AAAI), 2010.

• I. Sutskever, R. Salakhutdinov, and J. Tenenbaum. “Modelling relational data using Bayesian clustered tensor facotrization,” Neural Information Processing Systems (NIPS), 2009.

• A. Singh and G. Gordon,  “A Bayesian matrix factorization model for relational data,” Uncertainty in Artificial Intelligence (UAI), 2010.

Graphical Models 134

Page 133: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

References: Co-clustering, Block Structures

• A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, D. Modha., “A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation,” Journal of Machine Learning Research (JMLR), 2007.

• M. M. Shafiei and E. E. Milios, “Latent Dirichlet Co-Clustering,” IEEE Conference on Data Mining (ICDM), 2006.

• H. Shan and A. Banerjee, “Bayesian co-clustering,” IEEE International Conference on Data Mining (ICDM), 2008.

• P. Wang, C. Domeniconi, and K. B. Laskey, “Latent Dirichlet Bayesian Co-Clustering,” European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2009.

• H. Shan and A. Banerjee, “Residual Bayesian Co-clustering for Matrix Approximation,” SIAM International Conference on Data Mining (SDM), 2010.

• T. A. B. Snijders and K. Nowicki, “Estimation and prediction for stochastic blockmodels for graphs with latent block structure,” Journal of Classification, 14:75–100, 1997.

• E.M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed-membership stochastic blockmodels,”  Journal of Machine Learning Research (JMLR),  9, 1981-2014, 2008.

Graphical Models 135

Page 134: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 136

Acknowledgements

Hanhuai Shan Amrudin Agovic

Page 135: Introduction to Graphical Models for Data Mining Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin.

Graphical Models 137