A generic approach to topic models and its application to ... · A generic approach to topic models...

162
A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English translation incl. backup slides, 45min) Faculty of Mathematics and Computer Science University of Leipzig 28 November 2012 Version 2.9 EN BU Gregor Heinrich A generic approach to topic models 1 / 35

Transcript of A generic approach to topic models and its application to ... · A generic approach to topic models...

Page 1: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

A generic approach to topic modelsand its application to virtual communities

Gregor Heinrich

PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science

University of Leipzig

28 November 2012Version 2.9 EN BU

Gregor Heinrich A generic approach to topic models 1 / 35

Page 2: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

A generic approach to topic modelsand its application to virtual communities

Gregor Heinrich

PhD presentation (English translation incl. backup slides, 45min)Faculty of Mathematics and Computer Science

University of Leipzig

28 November 2012Version 2.9 EN BU

Gregor Heinrich A generic approach to topic models 1 / 35

Page 3: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook

Gregor Heinrich A generic approach to topic models 2 / 35

Page 4: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Motivation: Virtual communities

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

cooperation

authorship

citationauthorship

recommendation

authorshipcitation

annotationsimilarity

annotation

Virtual communities = groups of persons who exchange informationand knowledge electronically

Examples: organisations, digital libraries, “Web 2.0” applicationsincl. social networks

Data are multimodal: text content; authorship, citation, annotationsand recommendations; cooperation and other social relations

Typical case: discrete data with high dynamics and large volumes

Gregor Heinrich A generic approach to topic models 3 / 35

Page 5: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Motivation: Unsupervised mining of discrete data

Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)

4/24/12 1:58 AM

Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg

judicial assembly

location

long objectpressure unit

furniture

verb

long object

verb

verb

people

people

location

bank

computer

furniture

bar

court

restaurant

stickatm

counter

prevent

staff

adhere

glue

personnel

employees

yard

teller

network

table

barbar

Gregor Heinrich A generic approach to topic models 4 / 35

Page 6: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Motivation: Unsupervised mining of discrete data

Identification of relationships in large data volumesOnly data (and possibly model) required (information retrieval,network analysis, clustering, NLP methods)Density problem: Features too sparse for analysis inhigh-dimensional feature spaceVocabulary problem: Semantic similarity , lexical similarity(polysemy, synonymy, etc.)

4/24/12 1:58 AM

Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/bar.svg

judicial assembly

location

long objectpressure unit

furniture

verb

long object

verb

verb

people

people

location

bank

computer

furniture

bar

court

restaurant

stickatm

counter

prevent

staff

adhere

glue

personnel

employees

yard

teller

network

table

barbar

Gregor Heinrich A generic approach to topic models 4 / 35

Page 7: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

words documents

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 8: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 9: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

bar

rhythm

drum

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 10: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

bar

restaurant

wine

rhythm

drum

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 11: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

“Rhythm & Spice

“Playing Drums

“Leipzig’s Barsand Restaurants”

for Beginners”

Jamaican Grill”bar

restaurant

wine

rhythm

drum

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 12: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

“Rhythm & Spice

“Playing Drums

“Leipzig’s Barsand Restaurants”

for Beginners”

Jamaican Grill”bar

restaurant

wine

rhythm

drum

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 13: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as approach

...

...

wordstopics

documents

Probabilistic representations of grouped discrete dataIllustrative for text: Words grouped in documents

Latent Topics = Probability distributions over vocabulary. Dominantterms of a topic are semantically similar.Language = Mixture of topics (latent semantic structure)

→ Reduce vocabulary problem: Find semantic relations→ Reduce density problem: Dimensionality reduction

Gregor Heinrich A generic approach to topic models 5 / 35

Page 14: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Language models: Unigram model

wm,n

zz

w2,2 w2,3w2,1w1,2 w1,3w1,1

︸ ︷︷ ︸document 1

︸ ︷︷ ︸document 2

p(w | z)

document m

word n

1 2 3

One distribution for all data

Gregor Heinrich A generic approach to topic models 6 / 35

Page 15: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Language models: Unigram mixture model

wm,n

zmz2z1

w2,2 w2,3w2,1w1,2 w1,3w1,1

︸ ︷︷ ︸document 1

︸ ︷︷ ︸document 2

p(w | z1) p(w | z2)

document m

word n

1 2 3 1 2 3

One distribution per document

Gregor Heinrich A generic approach to topic models 6 / 35

Page 16: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Language models: Unigram admixture model

wm,n

zm,n

document m

word n

z2,2z1,2

w2,2 w2,3w2,1w1,2 w1,3w1,1

︸ ︷︷ ︸document 1

︸ ︷︷ ︸document 2

z1,1 z1,3 z2,1 z2,3

p(w | z1,1) . . . . . . p(w | z2,3)

One distribution per word→ basic topic model

Gregor Heinrich A generic approach to topic models 6 / 35

Page 17: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Language models: Unigram admixture model

wm,n

zm,nz2,2z1,2

w2,2 w2,3w2,1w1,2 w1,3w1,1

︸ ︷︷ ︸document 1

︸ ︷︷ ︸document 2

z1,1 z1,3 z2,1 z2,3

p(w | z1,1) . . . . . . p(w | z2,3)

document m

word n

One distribution per word→ basic topic model

Gregor Heinrich A generic approach to topic models 6 / 35

Page 18: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian topic models: The Dirichlet distribution

p3

p2

p1

p(w | z) ∼ Dir(~α)

~α = (4, 4, 2)

1 2 3 1 2 3 1 2 3

ϑ1 = 1 ϑ2 = 1

ϑ3 = 1

ϑ1 = 1 ϑ2 = 1

ϑ3 = 1

ϑ1 = 1 ϑ2 = 1

ϑ3 = 1

Bayesian methodology:

Distributions generatedfrom prior distributionsSpeech + other discretedata: Dirichlet distributionimportant prior:

Defined on simplex:Surface containing alldiscrete distributionsParameter ~α controlsbehaviour

→ Bayesian topic model:Latent Dirichlet Allocation(LDA) (Blei et al. 2003)

Gregor Heinrich A generic approach to topic models 7 / 35

Page 19: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

Dir(α)

Latent Dirichlet Allocation (Blei et al. 2003)

Gregor Heinrich A generic approach to topic models 8 / 35

Page 20: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

Dir(α)

Concert tonight at Rhythm and Spice Restaurant . . .

Latent Dirichlet Allocation (Blei et al. 2003)

Gregor Heinrich A generic approach to topic models 8 / 35

Page 21: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

restau

rant

barfood

grill

concertmusic

rhythm bar

. . .

. . .

topic 2

topic 1

Concert tonight at Rhythm and Spice Restaurant . . .

~ϕ2

~ϕ1

Dir(α)

Generating word distributions for all topics

Gregor Heinrich A generic approach to topic models 8 / 35

Page 22: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

restau

rant

barfood

grill

concertmusic

rhythm bar

. . .

. . .

topic 2

topic 1

Concert tonight at Rhythm and Spice Restaurant . . .

~ϕ2

~ϕ1

Dir(α)

topic 1

~ϑ1

topic 2 . . .

document 1

Generating topic distribution for document

Gregor Heinrich A generic approach to topic models 8 / 35

Page 23: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

restau

rant

barfood

grill

concertmusic

rhythm bar

. . .

. . .

topic 2

topic 1

Concert tonight at Rhythm and Spice Restaurant . . .

~ϕ2

~ϕ1

Dir(α)

topic 1

~ϑ1

topic 2 . . .

document 1

Sampling the topic index for first word, z = 2

Gregor Heinrich A generic approach to topic models 8 / 35

Page 24: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Latent Dirichlet Allocation

wm,n

zm,n

β ~ϕk

~ϑm

document m

word ntopic k

α

Dir(β)

restau

rant

barfood

grill

concertmusic

rhythm bar

. . .

. . .

topic 2

topic 1

Concert tonight at Rhythm and Spice Restaurant . . .

~ϕ2

~ϕ1

Dir(α)

topic 1

~ϑ1

topic 2 . . .

document 1

Sampling a word from term distribution for topic 2, “concert”

Gregor Heinrich A generic approach to topic models 8 / 35

Page 25: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

State of the art

Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for “topic model” (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

→ Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

Gregor Heinrich A generic approach to topic models 9 / 35

Page 26: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

State of the art

Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for “topic model” (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

→ Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

wm,n

zm,n

cm, j

ym, j

~am

βγ

~ψk ~ϕk

~ϑx

α

xm, j xm,n

m ∈ [1,M]

n ∈ [1,Nm]j ∈ [1, Jm] k ∈ [1,K]k ∈ [1,K]

x ∈ [1, A]

Expert–tag–topic model

(Heinrich 2011)

Gregor Heinrich A generic approach to topic models 9 / 35

Page 27: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

State of the art

Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for “topic model” (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

→ Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

Gregor Heinrich A generic approach to topic models 9 / 35

Page 28: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

State of the art

Large number of published models that extend LDA:Authors (Rosen-Zvi et al. 2004),Citations (Dietz et al. 2007),Hierarchy (Li and McCallum 2006; Li et al. 2007),Image features and captions (Barnard et al. 2003) etc.Results for “topic model” (title + abstract) only since 2012: ACM>400, Google Scholar >1300.

→ Expanding research area with practical relevanceBut: No existing analysis as generic model classPartly tedious derivation, especially for inference algorithms

Conjecture:Important properties generic across modelsSimplifications for derivation of concrete model properties, inferencealgorithms and design methods

Gregor Heinrich A generic approach to topic models 9 / 35

Page 29: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Research questions

“How can topic models be described in a generic way in order to usetheir properties across particular applications?”

“Can generic topic models be implemented generically and, if so,can repeated structures be exploited for optimisations?”

“How can generic models be applied to data in virtual communities?”

Gregor Heinrich A generic approach to topic models 10 / 35

Page 30: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook

Gregor Heinrich A generic approach to topic models 11 / 35

Page 31: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

“How can topic models be described in a generic way in order to usetheir properties across particular applications?”

Gregor Heinrich A generic approach to topic models 12 / 35

Page 32: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models: Example structures

m∈[1,M]

k∈[1,K] n∈[1,Nm]

zm,n

β ~ϕkwm,n

~ϑmα

(a) Latent Dirichlet allocation (LDA)

m∈[1,M ]

k∈[1,K] n∈[1,Nm]

zm,n

cm,n

~ϑc

c∈[1,C]

α

β ~ϕk wm,n

~ζm

(b) Author–topic model (ATM)

m∈[1,M]

n∈[1,Nm]

~β ~ϕk

ℓ∈[1,L]

wm,n

y∈[1,K]

~αx

~αr

z1m,n

z2m,n

~ϑrm

~ϑm,x z3m,n

(c) Pachinko allocation model (PAM4)

m∈[1,M]

n∈[1,Nm]

wm,n

T∈[1,|T |]

~α0 zTm,n

~ϑm,0

~ϑm,T ztm,n~αT

~β ~ϕk

k∈[1,|t|+|T |+1]

ℓm,n γ~ζT,t

T,t∈[1,|T ||t|]

(d) Hierarchical PAM (hPAM)(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

Gregor Heinrich A generic approach to topic models 13 / 35

Page 33: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

α

xout

Kxin

k= f (xin)

~θk

Dir(~θk |α)

~θ3

~θ2

~θ1

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 34: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

α

xout

Kxin

k= f (xin)

α

~θk

xout

Kxin1

xin2

k= f (xin1 , xin2)

α

xout1

Kxin

xout2

k= f (xin)

~θk ~θk

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 35: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

α

Kk= f (xin)

α

~θk

Kxin1

xin2

k= f (xin1 , xin2)

α

xout1

K

xout2

k= f (xin)

~θk ~θk

x1 x2

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 36: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~θk |αxin1 x1xin2 xout2

xout1

~θk |α~θk |αx2

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 37: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~θk |αxin1 x1xin2 xout2

xout1

~θk |α~θk |αx2

Dir(~θk |α)Dir(~θk |α)Dir(~θk |α)

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 38: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm

~ϕk | βzm,n

Dir(~ϕk |α)Dir(~ϑk |α)

wm,n

LDA

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 39: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm

~ϕk | βzm,n

Dir(~ϕk |α)Dir(~ϑk |α)

wm,n

LDA

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 40: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm=1

~ϕk | βzm,n

Dir(~ϕk |α)Dir(~ϑk |α)

wm,n

LDA~ϑ1

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 41: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm=1

~ϕk | βz1,1=3

Dir(~ϕk |α)Dir(~ϑk |α)

wm,n

LDA~ϑ1

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 42: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm=1

~ϕk | βz1,1=3

Dir(~ϕk |α)Dir(~ϑk |α)

wm,n

LDA~ϕ3

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 43: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models – “NoMMs”

~ϑm |αm=1

~ϕk | βz1,1=3

Dir(~ϕk |α)Dir(~ϑk |α)

w1,1=2

LDA~ϕ3

Generic characteristics of topic models:Levels with discrete components ~θk, generated from DirichletdistributionsCoupling via values of discrete variables x

→ “Network of mixed membership” (NoMM):Compact representation for topic modelsDirected acyclical graphNode: sample from mixture component, selection via incomingedges; terminal node: observationEdge: propagation of discrete values to child nodes.

Gregor Heinrich A generic approach to topic models 14 / 35

Page 44: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Topic models as NoMMs

~ϑm | αzm,n=k

[V][K]

m

[M]

wm,n=t~ϕk | β

(a) Latent Dirichlet allocation, LDA

m

[M]

xm,n=x

[A]~ϑx|α

zm,n=k

[V][K]

wm,n=t~ϕk | β~am

(b) Author–topic model, ATM

~ϑrm | ~αr

m

[M]

z2m,n=x

[s1]~ϑm,x|~αx

z3m,n=y

[V][s2]

wm,n=t~ϑy | ~β

(c) Pachinko allocation model, PAMm

[M]zT

m,n=T

~ϑm,T |~αT

ztm,n=t ~ζT,t | γ

~ϑm,0|~α0

m

[M]

[V]

wm,n~ϕ`,T,t|β

`m,n[|T |]

[|t|]

[3][|T |+|t|+1]

zTm,n=T

(d) Hierarchical pachinko allocation model, hPAM(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

m∈[1,M]

k∈[1,K] n∈[1,Nm]

zm,n

β ~ϕkwm,n

~ϑmα

m∈[1,M ]

k∈[1,K] n∈[1,Nm]

zm,n

cm,n

~ϑc

c∈[1,C]

α

β ~ϕk wm,n

~ζm

m∈[1,M]

n∈[1,Nm]

~β ~ϕk

ℓ∈[1,L]

wm,n

y∈[1,K]

~αx

~αr

z1m,n

z2m,n

~ϑrm

~ϑm,x z3m,n

m∈[1,M]

n∈[1,Nm]

wm,n

T∈[1,|T |]

~α0 zTm,n

~ϑm,0

~ϑm,T ztm,n~αT

~β ~ϕk

k∈[1,|t|+|T |+1]

ℓm,n γ~ζT,t

T,t∈[1,|T ||t|]

Gregor Heinrich A generic approach to topic models 15 / 35

Page 45: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook

Gregor Heinrich A generic approach to topic models 16 / 35

Page 46: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

“Can generic topic models be implemented generically. . . ?”

Gregor Heinrich A generic approach to topic models 17 / 35

Page 47: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian inference problem and Gibbs sampler

Bayesian inference: “inversion of generative process”:Find distributions over parameters Θ and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, Θ |V ,A)Intractability→ approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters Θ (“Collapsed” GS)Sample topics Hi for each data point i in turn: Hi ∼ p(Hi |H¬i,V ,A)

xm y w

VΘ1 H1 Θ2 H2 Θ3

~ϑrm |αr ~ϑm,x|α ~ϕy | β

Gregor Heinrich A generic approach to topic models 18 / 35

Page 48: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian inference problem and Gibbs sampler

Bayesian inference: “inversion of generative process”:Find distributions over parameters Θ and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, Θ |V ,A)Intractability→ approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters Θ (“Collapsed” GS)Sample topics Hi for each data point i in turn: Hi ∼ p(Hi |H¬i,V ,A)

xm y w

VΘ1 H1 Θ2 H2 Θ3

~ϑrm |αr ~ϑm,x|α ~ϕy | β

Gregor Heinrich A generic approach to topic models 18 / 35

Page 49: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian inference problem and Gibbs sampler

Bayesian inference: “inversion of generative process”:Find distributions over parameters Θ and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, Θ |V ,A)Intractability→ approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters Θ (“Collapsed” GS)Sample topics Hi for each data point i in turn: Hi ∼ p(Hi |H¬i,V ,A)

~ϑrm |αr

m~ϑm,x|α

w~ϕy | β

Θ1 H1 Θ2 H2 Θ3 V

? ? ?

? ?

p( , , , , | , A)

Gregor Heinrich A generic approach to topic models 18 / 35

Page 50: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian inference problem and Gibbs sampler

Bayesian inference: “inversion of generative process”:Find distributions over parameters Θ and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, Θ |V ,A)Intractability→ approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters Θ (“Collapsed” GS)Sample topics Hi for each data point i in turn: Hi ∼ p(Hi |H¬i,V ,A)

~ϑrm |αr

m~ϑm,x|α

w~ϕy | β

Θ1 H1 Θ2 H2 Θ3 V

? ? ?

? ?

p( , , , , | , A)

Gregor Heinrich A generic approach to topic models 18 / 35

Page 51: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Bayesian inference problem and Gibbs sampler

Bayesian inference: “inversion of generative process”:Find distributions over parameters Θ and latent variables/topics H,given observations V and Dirichlet parameters A

= Determine posterior distribution p(H, Θ |V ,A)Intractability→ approximative approachesGibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)

In topic models: Marginalise parameters Θ (“Collapsed” GS)Sample topics Hi for each data point i in turn: Hi ∼ p(Hi |H¬i,V ,A)

~ϑrm |αr

m~ϑm,x|α

w~ϕy | β? ?

H1i H2

i Vp( , | , A)H¬i ,H1i ,H

2i ∼

Gregor Heinrich A generic approach to topic models 18 / 35

Page 52: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Sampling distribution for NoMMs

xm y w~ϑr

m |αr ~ϑm,x|α ~ϕy | β

Gibbs sampler can be generically derived (Heinrich 2009)

Typical case: Quotients of factors over levels `:

p(Hi|H¬i,V ,A) ∝∏

`

n¬i

k,t + α∑t n¬i

k,t + α

[`]

nk,t = count of co-occurrences between input and output values of alevel (components and samples)

More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + α)

beta({n¬ik,t}Tt=1 + α)

Gregor Heinrich A generic approach to topic models 19 / 35

Page 53: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Sampling distribution for NoMMs

xm y w~ϑr

m |αr ~ϑm,x|α ~ϕy | β

Gibbs sampler can be generically derived (Heinrich 2009)

Typical case: Quotients of factors over levels `:

p(Hi|H¬i,V ,A) ∝∏

`

n¬i

k,t + α∑t n¬i

k,t + α

[`]

nk,t = count of co-occurrences between input and output values of alevel (components and samples)

More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + α)

beta({n¬ik,t}Tt=1 + α)

Gregor Heinrich A generic approach to topic models 19 / 35

Page 54: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Sampling distribution for NoMMs

xm y wq(m, x) q((m,x), y) q(y,w)

Gibbs sampler can be generically derived (Heinrich 2009)

Typical case: Quotients of factors over levels `:

p(Hi|H¬i,V ,A) ∝∏

`

n¬i

k,t + α∑t n¬i

k,t + α

[`]

=∏

`

[q(k, t)

][`]

nk,t = count of co-occurrences between input and output values of alevel (components and samples)

More complex variants covered by q(k, t) ,beta({nk,t}Tt=1 + α)

beta({n¬ik,t}Tt=1 + α)

Gregor Heinrich A generic approach to topic models 19 / 35

Page 55: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Typology of NoMM substructures

~ϑa |α

~ϑb |α

a

b~ϑk |α

z

a~ϑz |α

b

~ϑa |α~ϑz |α

c

c

~ϑa |α

~ϑb |α

a

b~ϑz |α

c

x

z1

yza

~ϑz |αb

~ϑa |α

N1. Dirichlet–multinomial

v

za~ϑz |α~ϑc

a

b

x

a~ϑx |α

b

~ϑa |α y

~ϑy |αc

q(a, z) q(z, b) q(a, x ⊕ y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

ϑca,z q(z, b) q(a, z) q(z, b) q(z, c) ≈ q(a, z1) q(b, z2) q(z1, c ⊕ c) q(z2, c ⊕ c)

E2. Autonomous edges C2. Combined indices

N2. Observed parameters E3. Coupled edges C3. Interleaved indices

z2

NoMM substructures: Nodes, edges/branches, componentindices/merging of edges:

Representation via q-functions and likelihoodMultiple samples per data point: q(a, x⊕ y) for respective level

“Library” incl. additional structures: alternative distributions,regression, aggregation etc. ↔ q-functions + other factors

Gregor Heinrich A generic approach to topic models 20 / 35

Page 56: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Implementation: Gibbs “meta-sampler”

NoMM codegenerator

C/Java codeprototype

Java VM native platform

optimise

codetemplates

generate

Java modelinstance

C/Java codemodule

validate deploydata =⇒

compile

data =⇒

specificationtopic model

︸ ︷︷ ︸ ︸ ︷︷ ︸

Code generator for topic models in Java and C

Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture

Gregor Heinrich A generic approach to topic models 21 / 35

Page 57: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Implementation: Gibbs “meta-sampler”

codegenerator

C/Java codeprototype

Java VM native platform

optimise

codetemplates

generate

Java modelinstance

C/Java codemodule

validate deploydata =⇒

compile

data =⇒

specificationtopic model

︸ ︷︷ ︸ ︸ ︷︷ ︸

NoMM

Code generator for topic models in Java and C

Separation of knowledge domains: topic model applications vs.machine learning vs. computing architecture

Gregor Heinrich A generic approach to topic models 21 / 35

Page 58: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Example NoMM script and generated kernel: hPAM2

~ϕk | βwmn

xmn=x

[V]

~ϑm |αm[M]

[Y]

[X]ymn=y

~ϑxm,x|αx

xm

[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + x

x, y , 0 : k = 1 + X + y

model = HPAM2

description:

Hierarchical PAM model 2 (HPAM2)

sequences:

# variables sampled for each (m,n)

w, x, y : m, n

network:

# each line one NoMM node

m >> theta | alpha >> x

m,x >> thetax | alphax[x] >> y

x,y >> phi[k] >> w

# java code to assign k

k : {

if (x==0) { k = 0; }

else if (y==0) k = 1 + x;

else k = 1 + X + y;

}.

−→

// hidden edge

for (hx = 0; hx < X; hx++) {

// hidden edge

for (hy = 0; hy < Y; hy++) {

mxsel = X * m + hx;

mxjsel = hx;

if (hx == 0)

ksel = 0;

else if (hy == 0)

ksel = 1 + hx;

else

ksel = 1 + X + hy;

pp[hx][hy] = (nmx[m][hx] + alpha[hx])

* (nmxy[mxsel][hy] + alphax[mxjsel][hy])

/ (nmxysum[mxsel] + alphaxsum[mxjsel])

* (nkw[ksel][w[m][n]] + beta)

/ (nkwsum[ksel] + betasum);

psum += pp[hx][hy];

} // for h

} // for h

sub

root

sup

sub sub

supwords

words

words

words words words

Gregor Heinrich A generic approach to topic models 22 / 35

Page 59: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Example NoMM script and generated kernel: hPAM2

~ϕk | βwmn

xmn=x

[V]

~ϑm |αm[M]

[Y]

[X]ymn=y

~ϑxm,x|αx

xm

[M] x = 0 : k = 0x , 0, y = 0 : k = 1 + x

x, y , 0 : k = 1 + X + y

model = HPAM2

description:

Hierarchical PAM model 2 (HPAM2)

sequences:

# variables sampled for each (m,n)

w, x, y : m, n

network:

# each line one NoMM node

m >> theta | alpha >> x

m,x >> thetax | alphax[x] >> y

x,y >> phi[k] >> w

# java code to assign k

k : {

if (x==0) { k = 0; }

else if (y==0) k = 1 + x;

else k = 1 + X + y;

}.

−→

// hidden edge

for (hx = 0; hx < X; hx++) {

// hidden edge

for (hy = 0; hy < Y; hy++) {

mxsel = X * m + hx;

mxjsel = hx;

if (hx == 0)

ksel = 0;

else if (hy == 0)

ksel = 1 + hx;

else

ksel = 1 + X + hy;

pp[hx][hy] = (nmx[m][hx] + alpha[hx])

* (nmxy[mxsel][hy] + alphax[mxjsel][hy])

/ (nmxysum[mxsel] + alphaxsum[mxjsel])

* (nkw[ksel][w[m][n]] + beta)

/ (nkwsum[ksel] + betasum);

psum += pp[hx][hy];

} // for h

} // for h

Gregor Heinrich A generic approach to topic models 22 / 35

Page 60: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 1

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 61: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 5

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 62: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 10

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 63: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 15

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 64: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 20

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 65: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 30

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 66: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 40

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 67: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 50

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 68: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 60

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 69: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 80

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 70: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 100

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 71: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 120

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 72: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 150

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 73: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 200

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 74: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 300

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 75: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Document–Topic distribution in Gibbs sampler

Iteration 500, converged↔ stationary state

Document–topic matrix ϑ (200 documents, 50 topics)

Gregor Heinrich A generic approach to topic models 23 / 35

Page 76: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast sampling: Hybrid scaling methods

1 10 100 1000iterations

depend.indep.

5000

perp

lexi

ty

PAM4

model dim. speedup

LDA 500 30.2

PAM4 40 × 40 7.4

PAM4 24.1

PAM4 49.8

par. indep.ser.

✓✓

✓✓✓✓✓ ✓

40 × 40

40 × 40

Serial and parallel scaling methods:Generalised results for LDA to generic NoMMs, specifically (Porteouset al. 2008; Newman et al. 2009) + novel approach

Problem: Sampling space for stat. dependent variables: K × L × . . .→ Independence assumption: Separate samplers with dimensions

K + L + . . . � K × L × . . .Empirical result: Iterations ↑, but topic quality comparable

→ Hybrid approaches with independent samplers highly effectiveImplementation: complexity covered by meta-sampler

Gregor Heinrich A generic approach to topic models 24 / 35

Page 77: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook

Gregor Heinrich A generic approach to topic models 25 / 35

Page 78: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

“How can generic models be applied to data in virtual communities?”

Gregor Heinrich A generic approach to topic models 26 / 35

Page 79: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

NoMM design process~ϑa |α

~ϑb |α

a

b~ϑk |α

z

a~ϑz |α

b

~ϑa |α~ϑz |α

c

c

~ϑa |α

~ϑb |α

a

b~ϑz |α

c

x

z1

yza

~ϑz |αb

~ϑa |α

N1. Dirichlet–multinomial

v

za~ϑz |α~ϑc

a

b

x

a~ϑx |α

b

~ϑa |α y

~ϑy |αc

q(a, z) q(z, b) q(a, x ⊕ y) q(x, b) q(y, c) q(a, x) q(b, y) q(k, c), k = f (x, y)

ϑca,z q(z, b) q(a, z) q(z, b) q(z, c) ≈ q(a, z1) q(b, z2) q(z1, c ⊕ c) q(z2, c ⊕ c)

E2. Autonomous edges C2. Combined indices

N2. Observed parameters E3. Coupled edges C3. Interleaved indices

z2

Typology→ “Library” of NoMM substructures→ Idea: Construct models from simple substructures that connect

terminal nodes:Terminal nodes↔ multimodal data (virtual communities...)Substructures↔ relationships in data; latent semantics

→ Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

Gibbs distribution known! ↔ model behaviour: q(x, y) = “rich get richer”Implementation and test with Gibbs meta-sampler; possibly iteration

Gregor Heinrich A generic approach to topic models 27 / 35

Page 80: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

NoMM design process

1 2 3 4 51 2 3 4 5Define Definemodelling task Create

and metricsevidence model terminals

Formulatemodel

assumptions

Composemodel and

predict properties

6 7 8 9 106 7 8 9 10Write Generateand adapt Implement

NoMM script target metric

Evaluatebased on

test corpus

Optimiseand integrate fortarget platformGibbs sampler

Typology→ “Library” of NoMM substructures→ Idea: Construct models from simple substructures that connect

terminal nodes:Terminal nodes↔ multimodal data (virtual communities...)Substructures↔ relationships in data; latent semantics

→ Process:Assumptions on dependencies in dataIterative association to structures in model (usage of typology)

Gibbs distribution known! ↔ model behaviour: q(x, y) = “rich get richer”Implementation and test with Gibbs meta-sampler; possibly iteration

Gregor Heinrich A generic approach to topic models 27 / 35

Page 81: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Application: Expert finding with tag annotations

authorship

annotationRetrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

authorship

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents→ expertsFrequently documents with additional annotations, here: tags

→ Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

→ Connection of tags and experts via topics

(1) Data: For each document m: text ~wm, authors ~am, tags ~cm

(2) Goal: Tag query ~c ′: p(~c ′| a) = max, word query ~w ′: p(~w ′| a) = max(3) Terminal nodes: Authors in input, words and tags in output

Gregor Heinrich A generic approach to topic models 28 / 35

Page 82: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Application: Expert finding with tag annotations

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents→ expertsFrequently documents with additional annotations, here: tags

→ Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

→ Connection of tags and experts via topics

(1) Data: For each document m: text ~wm, authors ~am, tags ~cm

(2) Goal: Tag query ~c ′: p(~c ′| a) = max, word query ~w ′: p(~w ′| a) = max(3) Terminal nodes: Authors in input, words and tags in output

Gregor Heinrich A generic approach to topic models 28 / 35

Page 83: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Application: Expert finding with tag annotations

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents→ expertsFrequently documents with additional annotations, here: tags

→ Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

→ Connection of tags and experts via topics

(1) Data: For each document m: text ~wm, authors ~am, tags ~cm

(2) Goal: Tag query ~c ′: p(~c ′| a) = max, word query ~w ′: p(~w ′| a) = max(3) Terminal nodes: Authors in input, words and tags in output

Gregor Heinrich A generic approach to topic models 28 / 35

Page 84: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Application: Expert finding with tag annotations

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

Scenario: Expert finding via documents with tag annotationsAuthors of relevant documents→ expertsFrequently documents with additional annotations, here: tags

→ Goal: Enable tag queries, improve quality of text queriesProblem: Tags often incomplete, partly wrong

→ Connection of tags and experts via topics

(1) Data: For each document m: text ~wm, authors ~am, tags ~cm

(2) Goal: Tag query ~c ′: p(~c ′| a) = max, word query ~w ′: p(~w ′| a) = max(3) Terminal nodes: Authors in input, words and tags in output

Gregor Heinrich A generic approach to topic models 28 / 35

Page 85: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model assumptions

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

Gregor Heinrich A generic approach to topic models 29 / 35

Page 86: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model assumptions

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33AB

12

TH

AB

TH

5

33

~am ~cm

document m

authors tags

(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

Gregor Heinrich A generic approach to topic models 29 / 35

Page 87: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model assumptions

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

14~wm 13

5 14 33

AB

TH

5

33~cm

AB

12

TH

~am

document m

authors tags

(4) Model assumptions:(a) Expertise of an author is weighted with the portion of authorship(b) Semantics of expertise expressed by topics z. Each author has a

single field of expertise (topic distribution).(c) Semantics of tags expressed by topics y

Gregor Heinrich A generic approach to topic models 29 / 35

Page 88: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

~cm

~amwm,n

[M,Nm]

p(. . . |~a, ~w, ~c) ∝ . . .

authors word

tags

(5) Model construction: (a) Start with terminal nodes (from step 3)

Gregor Heinrich A generic approach to topic models 30 / 35

Page 89: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

~cm

. . .xm,n

~am

m wm,n[M,Nm]

p(x, . . . |·) ∝ am,x q(x, . . . ) . . .

document author word

tags

(b) Authorship ~am given as observed distribution→ node sampels author x of a word

Gregor Heinrich A generic approach to topic models 30 / 35

Page 90: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

~cm

q(x, z)xm,n

~am

m zm,nwm,n

[M,Nm]

p(x, z, . . . |·) ∝ am,x q(x, z) . . .

document author word topic word

tags

(c) Each author has only a single field of expertise (topic distribution)→ q(x, z) associates (word-)topics with sampled authors x (cf. ATM)

Gregor Heinrich A generic approach to topic models 30 / 35

Page 91: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

~cm

q(x, z)xm,n

~am

mq(z,w)

zm,nwm,n

[M,Nm]

p(x, z, . . . |·) ∝ am,x q(x, z) q(z,w) . . .

document author word topic word

tags

(d) Topic distribution over terms→ connect z and w via q(z,w)

Gregor Heinrich A generic approach to topic models 30 / 35

Page 92: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

cm, jq(x,z⊕y)

xm,n∪j~am

mq(z,w)zm,n

wm,n[M,Nm]

p(x, z, y |·) ∝ am,x q(x, z ⊕ y) q(z,w) q(y, c)

q(y, c) [M, Jm]

ym, jdocument author

word topic

tag topic

word

tag

(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z ⊕ y) overlays values for z and y

Gregor Heinrich A generic approach to topic models 30 / 35

Page 93: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

cm, jq(x,z⊕y)

xm,n~am

mq(z,w)zm,n

wm,n[M,Nm]

p(x, z, y |·) ∝ am,x q(x, z ⊕ y) q(z,w) q(y, c)

q(y, c) [M, Jm]

ym, jdocument author

word topic

tag topic

word

tag

(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z ⊕ y) overlays values for z and y

Gregor Heinrich A generic approach to topic models 30 / 35

Page 94: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction

cm, jq(x,z⊕y)

xm, j~am

mq(z,w)zm,n

wm,n[M,Nm]

p(x, z, y |·) ∝ am,x q(x, z ⊕ y) q(z,w) q(y, c)

q(y, c) [M, Jm]

ym, jdocument author

word topic

tag topic

word

tag

(e) Introduce tag topics ym,j for cm,j as distributions over tagsq(x, z ⊕ y) overlays values for z and y

Gregor Heinrich A generic approach to topic models 30 / 35

Page 95: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Model construction – ordinary approach

wm,n

zm,n

cm, j

ym, j

~am

βγ

~ψk ~ϕk

~ϑx

α

xm, j xm,n

m ∈ [1,M]

n ∈ [1,Nm]j ∈ [1, Jm] k ∈ [1,K]k ∈ [1,K]

x ∈ [1, A]

Expert–tag–topic model

(Heinrich 2011)

Gregor Heinrich A generic approach to topic models 30 / 35

Page 96: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Expert–tag–topic model: Evaluation

ATM ETT

AP@

10

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ETT

word queries tag queries

ATM ETT-500

-450

-400

-350

-300

-250

-200

-150

cohe

renc

esc

ore

NIPS Corpus: 2.3 million words, 2037 authors, 165 tagsRetrieval: Average Precision @10:

Term queries: ETT > ATMTag queries: Similarly good AP values

Topic coherence (Mimno et al. 2011): ETT > ATMSemi-supervised learning: Tag queries retrieve items without tags

Gregor Heinrich A generic approach to topic models 31 / 35

Page 97: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Overview

Introduction

Generic topic models

Inference methods

Application to virtual communities

Conclusions and outlook

Gregor Heinrich A generic approach to topic models 32 / 35

Page 98: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

Design process based on typology of NoMM substructures

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

∑Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 99: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

∑Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 100: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

scenario modelling (Heinrich 2010)

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

∑Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 101: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures+ AMQ model: Meta-model for virtual communities as formal basis for

scenario modelling (Heinrich 2010)

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

∑Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 102: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models∑

Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 103: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions4/15/12 1:47 AM

Page 1 of 2file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg

1.

2.

4.

8.

3.

6.

9.

7.

10.

5.

authors

authors

authorsauthors

cites

authorsauthors

cites

cites

authors

authors

authors

authors

cites

cites

authors

cites

authors

cites

authors

cites

cites

authorscites

cites

authors

authors

authors

authorscites

authorsauthors

cites cites

authors

cites

authors

citescitesauthors

authors

authors

authors

authors

cites

cites

authors

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

Yang_H1

[blind source separation] [independent component analysis]

Cichocki_A

Bell_A2

Rokni_U

Lee_T1

Parra_L1

Shouval_H

Oja_E1

Lin_J

Hyvarinen_A

A New Learning Algorithm for Blind Signal Separation

9

Independent Component Analysis of Eiectroencephalographic Data

4

Unmixing Hyperspectral Data,

4

Factorizing Multivariate Function Classes,

1

A Non-linear Information Maximisation Algorithm that Performs Blind Separation

4

Maximum Likelihood Blind Source Separation A Context-Sensitive Generalization of ICA,

4

Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems,

2

One-unit Learning Rules for Independent Component Analysis,

Extended ICA Removes Artifacts from Electroencephalographic Recordings,

8

Multiplicative Updating Rule for Blind Separation Derived from the Method of Scoring,

1

Algorithms for Independent Components Analysis and Higher Order Statistics,

2

Edges are the "Independent Components" of Natural Scenes,

2

Blind Separation of Delayed and Convolved Sources,

2

The Efficiency and the Robustness of Natural Gradient Descent Learning Rule,

2

Receptive Field Formation in Natural Scene Environments Comparison of Single Cell Learning Rules,

4

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit,

Search for Information Bearing Components in Speech,

1

Source Separation and Density Estimation by Faithful Equivariant SOM,

2

Data Visualization and Feature Selection New Algorithms for Nongaussian Data,

1

Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings,

5

Unsupervised Classification with Non-Gaussian Mixture Models Using ICA,

3

Sparse Code Shrinkage.' Denoising by Nonlinear Maximum Likelihood Estimation,

2

Symplectic Nonlinear Component Analysis

Blind Separation of Filtered Sources Using State-Space Approach,

1

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

[blind source separation] [independent component analysis]

Figure: ETT1: Expert search in community browser

Gregor Heinrich A generic approach to topic models 33 / 35

Page 104: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model anddomain-specific compact representation of topic modelsInference algorithms: Generic Gibbs sampler

Fast sampling methods (serial, parallel, independent)Implementation in Gibbs meta-sampler

+ Variational Inference for NoMMs (Heinrich and Goesele 2009)Design process based on typology of NoMM substructures

+ AMQ model: Meta-model for virtual communities as formal basis forscenario modelling (Heinrich 2010)

Application to virtual communities: Expert–tag–topic model forexpert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrievalapproaches (Heinrich 2011b)

+ Graph-based expert search using ETT models: Integration ofexplorative search/browsing and distributions from topic models∑

Contribution to facilitated “model-based” construction of topicmodels, specifically for virtual communities and other multimodalscenarios

Gregor Heinrich A generic approach to topic models 33 / 35

Page 105: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Outlook

New applications and NoMM structures, e.g., time as variableAlternative inference methods:

Generic Collapsed Variational Bayes (Teh et al. 2007): Structuresimilar to Collapsed Gibbs-SamplerNon-parametric methods: Learning model dimensions using Dirichletor Pitman–Yor process priors (Teh et al. 2004; Buntine and Hutter2010), NoMM polymorphism (Heinrich 2011a)

Improved support in design process:Data-driven design: Search over model structures to obtain bestmodel for data setArchitecture-specific Gibbs meta-sampler, e.g., massively-parallel orFPGA, cf. (Heinrich et al. 2011)

Integration with interactive user interfaces: Models can be createdon the fly, e.g., for visual analytics

Gregor Heinrich A generic approach to topic models 34 / 35

Page 106: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Thank you!

Q + A

Gregor Heinrich A generic approach to topic models 35 / 35

Page 107: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

References I

References

Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).Matching words and pictures.JMLR – Special Issue on Machine Learning Methods for Text and Images 3(6), 1107–1136.

Bellegarda, J. (2000, August).Exploiting latent semantic information in statistical language modeling.Proc. IEEE 88(8), 1279–1296.

Blei, D., A. Ng, and M. Jordan (2003, January).Latent Dirichlet allocation.Journal of Machine Learning Research 3, 993–1022.

Buntine, W. and M. Hutter (2010).A Bayesian review of the Poisson-Dirichlet process.arXiv:1007.0296v1 [math.ST].

Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).Reading tea leaves: How humans interpret topic models.In Proc. Neural Information Processing Systems (NIPS).

Gregor Heinrich A generic approach to topic models 36 / 35

Page 108: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

References II

Dietz, L., S. Bickel, and T. Scheffer (2007, June).Unsupervised prediction of citation influences.In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,

USA.

Heinrich, G. (2009).A generic approach to topic models.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

(ECML/PKDD), Part 1, pp. 517–532.

Heinrich, G. (2010).Actors—media—qualities: a generic model for information retrieval in virtual communities.In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part

of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.

Heinrich, G. (2011a, March).“Infinite LDA” – Implementing the HDP with minimum code complexity.Technical note TN2011/1, arbylon.net.

Heinrich, G. (2011b).Typology of mixed-membership models: Towards a design method.In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases

(ECML/PKDD).

Gregor Heinrich A generic approach to topic models 37 / 35

Page 109: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

References III

Heinrich, G. and M. Goesele (2009).Variational Bayes for generic topic models.In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).

Heinrich, G., J. Kindermann, C. Lauth, G. Paaß, and J. Sanchez-Monzon (2005).Investigating word correlation at different scopes – a latent concept approach.In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.

Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio

array processing for telepresence, pp. 173–207.Springer.

Li, W., D. Blei, and A. McCallum (2007).Mixtures of hierarchical topics with pachinko allocation.In International Conference on Machine Learning.

Li, W. and A. McCallum (2006).Pachinko allocation: DAG-structured mixture models of topic correlations.In ICML ’06: Proceedings of the 23rd international conference on Machine learning, New York,

NY, USA, pp. 577–584. ACM.

Gregor Heinrich A generic approach to topic models 38 / 35

Page 110: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

References IV

Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).Optimizing semantic coherence in topic models.In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,

Edinburgh, UK, pp. 262272.

Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).Distributed algorithms for topic models.JMLR 10, 1801–1828.

Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).Fast collapsed Gibbs sampling for latent Dirichlet allocation.In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge

discovery and data mining, New York, NY, USA, pp. 569–577. ACM.

Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).The author-topic model for authors and documents.In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).

Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).Hierarchical Dirichlet processes.Technical Report 653, Department of Statistics, University of California at Berkeley.

Teh, Y. W., D. Newman, and M. Welling (2007).A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.In Advances in Neural Information Processing Systems, Volume 19.

Gregor Heinrich A generic approach to topic models 39 / 35

Page 111: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Appendix

Gregor Heinrich A generic approach to topic models 40 / 35

Page 112: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Example: Text mining for semantic clusters

Topic label Dominant terms according to ϕk,t = p(term |topic)

Bundesliga FC SC Munchen Borussia SV VfL Kickers SpVgg Uhr Koln Bochum Freiburg VfBEintracht Bayern Hamburger Bayern+Munchen

Polizei / Unfall Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wa-gen Verletzungen Lawine Mann vier Meter Straße

Tschetschenien Rebellen russischen Grosny russische Tschetschenien Truppen Kaukasus MoskauAngaben Interfax tschetschenischen Agentur

Politik / Hessen FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Wester-welle Wolfgang Roland+Koch Wolfgang+Gerhardt

Wetter Grad Temperaturen Regen Schnee Suden Norden Sonne Wetter Wolken Deutsch-land zwischen Nacht Wetterdienst Wind

Politik / Kroatien Parlament Partei Stimmen Mehrheit Wahlen Wahl Opposition Kroatien PrasidentParlamentswahlen Mesic Abstimmung HDZ

Die Grunen Grunen Parteitag Atomausstieg Trittin Grune Partei Trennung Mandat Ausstieg AmtRoestel Jahren Muller Radcke Koalition

Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus-slands Wladimir+Putin Kreml Boris Prasidenten

Polizei / Schulen Polizei Schulen Schuler Tater Polizisten Schule Tat Lehrer erschossen BeamtenMann Polizist Beamte verletzt Waffe

Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)

Gregor Heinrich A generic approach to topic models 41 / 35

Page 113: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Notation: Bayesian network vs. NoMM levels

α

~ϑk

xi

K

ki

~ϑk |α [T ]

xiki

[K]

parameters ϑ + hyperparameters α⇔ nodes (ϑ |α)

variables ki, xi ⇔ edges ki, xi

plates (i.i.d. repetitions) i, k ⇔ indexes i + dimensions k

Gregor Heinrich A generic approach to topic models 42 / 35

Page 114: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

NoMM representation: Variable dependencies

α

~ϑk ~ϑk ~ϑk

α

~ϑk

vi

α

~ϑk

α

~ϑk

α

`=1 `=2 `=3 `=4 `=5 `=6

α

~ϑk Θ

A

X

vi

`=8`=7

α

~ϑk

vihi

hihi

hi hi

~ϑk |α

`=1

`=2 `=3

`=4

`=5 `=6

~ϑk |α ~ϑk |α

~ϑk |α~ϑk |α ~ϑk |α

~ϑk

`=8

h1i

h2i

h4i

h3i v5

i h6i

v8i

`=7

~ϑk |αv7

i

k1i

k2i

Gregor Heinrich A generic approach to topic models 43 / 35

Page 115: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

~x (0)

x2

x1

~x ∗

p(~x |V)

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 116: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

~x (0)

x2

x1

~x ∗~x continuous, i ∈ [1, 2]

H discrete, i ∈ [1,W]l

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 117: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~xp(x1 | x2)

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 118: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 119: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 2

~x

p(x2 | x1)

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 120: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 2

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 121: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 122: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 123: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 2

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 124: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 125: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 2

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 126: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 127: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 2

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 128: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Collapsed Gibbs sampler

x2

x1i = 1

~x

Collapsed Gibbs sampler: Stochastic EM / MCMC:NoMMs: parameters Θ correlate with H → marginaliseFor each data point, i: draw latent variables, Hi = (yi, zi, . . . ), given allother data, latent, H¬i, and observed, V:

Hi ∼ p(Hi |H¬i,V ,A) . (1)

Stationary state: full conditional distribution (1) simulates posteriorFaster absolute convergence for NoMMs than, e.g., variationalinference (Heinrich and Goesele 2009)

Gregor Heinrich A generic approach to topic models 44 / 35

Page 129: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models: Generative process

~αℓj

~ϑℓkKℓ

Jℓ

xℓix∗i

Iℓx∗i

...

↑Xℓ Xℓ

Θℓ

Aℓ

Generative process on level `:

xi ∼ Mult(xi | ~ϑk) ~ϑk ∼ Dir(~ϑk | ~αj) (2)

k = fk(parents(xi), i) j = fj(known parents(xi), i) . (3)

Gregor Heinrich A generic approach to topic models 45 / 35

Page 130: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models: Complete-data likelihood

Likelihood of all hidden and visible data X = {H,V} and parameters Θ:

p(X, Θ|A) =∏

`∈L

[ ∏

i

p(xi,out|~ϑxi,in)

︸ ︷︷ ︸data items ∼ Discrete t⊥

·∏

k

p(~ϑk|~α)

︸ ︷︷ ︸components ∼ Dirichlet 4

][`]

︸ ︷︷ ︸levels

=∏

`∈L

[∏

k

f (~ϑk, ~nk, ~α)][`]

~nk = (nk,1, nk,2, · · · ) (4)

Product dependent on co-occurrences nk,t between input and outputvalues, xi,in=k and xi,out=t, on each level `

There are variants to component selection xi,in=k

There are mixture node variants, e.g., observed components

Gregor Heinrich A generic approach to topic models 46 / 35

Page 131: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Generic topic models: Complete-data likelihood

The conjugacy between the multinomial and Dirichlet distributions ofmodel levels leads to a simple complete-data likelihood:

p(X, Θ |A) =∏

`

i

Mult(x`i |Θ`, k`i )∏

k

Dir(~ϑ`k | ~α`j ) (5)

=∏

`

i

ϑki,xi

k

1B(~αj)

t

ϑαj−1k,xi

`

(6)

=∏

`

k

1B(~αj)

t

ϑαj+nk,t−1k,xi

`

(7)

=∏

`

k

B(~nk + ~αj)B(~αj)

Dir(~ϑk |~nk + ~αj)

`

(8)

where brackets [·]` enclose a particular level `.nk,t is how often k and t co-occur.

Gregor Heinrich A generic approach to topic models 47 / 35

Page 132: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Inference: Generic full conditionals

Gibbs full conditionals are derived for groups of dependent hidden edges,Hd

i ∈ Hd ⊂ X and “surrounding” edges Sdi ∈ Sd considered observed. All

tokens co-located with a particular observation: Xdi = {Hd

i , Sdi }.

Full conditional via chain rule applied to (8) with Θ integrated out:

p(Hdi |X\Hd

i ,A) =p(Hd

i , Sdi |X\{Hd

i , Sdi },A)

p(Sdi |X\{Hd

i , Sdi },A)

(9)

∝ p(Xdi |X\Xd

i ,A) =p(X |A)

p(X\Xdi |A)

(10)

=∏

`

[∏

k

B(~nk + ~αj)

B(~nk\Xdi + ~αj)

]`(11)

∝∏

`∈{Hd ,Sd}

[B(~nk + ~αj)

B(~nk\Xdi + ~αj)

]`(12)

H11

H21

H12

H22

H13

H23

H14

H24

Hd2

H1

Hd

Gregor Heinrich A generic approach to topic models 48 / 35

Page 133: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Inference: q-functions

q(k, t) =B(~nk + ~αj)

B(~nk\xdi + ~αj)

|xdi |=1=

n¬ik,t + α∑

t n¬ik,t + α

|xdi |=2=

nk,t\xdi,1 + α∑

t nk,t\xdi,1 + α

·nk,t\xd

i,2 + α + δ(xdi,1−xd

i,2)∑

t nk,t\xdi,2 + α + 1

. . .

Gregor Heinrich A generic approach to topic models 49 / 35

Page 134: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn: sampling with over-replacement.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

Gregor Heinrich A generic approach to topic models 50 / 35

Page 135: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn: sampling with over-replacement.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

Gregor Heinrich A generic approach to topic models 50 / 35

Page 136: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn: sampling with over-replacement.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

Gregor Heinrich A generic approach to topic models 50 / 35

Page 137: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn: sampling with over-replacement.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

Gregor Heinrich A generic approach to topic models 50 / 35

Page 138: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

Gregor Heinrich A generic approach to topic models 51 / 35

Page 139: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .×8 ×6 ×4 +α

×8 ×6 ×5 +αB( )

B( )

︸ ︷︷ ︸

ti =

¬ti

Gregor Heinrich A generic approach to topic models 51 / 35

Page 140: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .

︸ ︷︷ ︸

×8 ×6 ×4 +α

×8 +α

¬ti

Gregor Heinrich A generic approach to topic models 51 / 35

Page 141: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . .×7 ×6 ×4 +α

×8 ×6 ×5 +αB( )

B( )

︸ ︷︷ ︸

ti = { , }ui vi

¬ui ¬vi

Gregor Heinrich A generic approach to topic models 51 / 35

Page 142: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . . ︸ ︷︷ ︸

¬ui

¬ui

Gregor Heinrich A generic approach to topic models 51 / 35

Page 143: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . . ︸ ︷︷ ︸ ︸ ︷︷ ︸

¬ui ¬vi

¬ui q(k, ⊕ )u=v

1

Gregor Heinrich A generic approach to topic models 51 / 35

Page 144: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . . ︸ ︷︷ ︸ ︸ ︷︷ ︸

¬ui ¬vi

¬ui q(k, ⊕ )u=v¬vi

1

Gregor Heinrich A generic approach to topic models 51 / 35

Page 145: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

q-functions: Polya urn and sampling weights

E{~ϑk}

Figure: Polya urn and discrete parameters.

q(k, t) ,B(~nk + α)B(~n¬i

k + α)|t|=1=

n¬tik,t + α

n¬tik + Tα

= smoothed ratio of occurrences

t={u,v}=

n¬uik,u + α

n¬uik + Tα

·n¬vi

k,v + α + δ(u − v)

n¬vik + Tα + 1

, q(k, u ⊕ v)

. . . ︸ ︷︷ ︸ ︸ ︷︷ ︸

¬ui ¬vi

¬ui q(k, ⊕ )u=v

1

Gregor Heinrich A generic approach to topic models 51 / 35

Page 146: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

NoMM substructure library: Gibbs weights and likelihood9.3. SUB-STRUCTURE LIBRARY 171

ID.Name

Structure diagram Gibbs sampler weight w, Likelihood p for single token iModelled aspect, example models

N1,E1,C1.Dir–Multnodes,unbranched

�=1 �=2

ziai�ϑk |α

bi,n�ϑa | �αj

C1B: k = ziN1B: j = f (ai, i)

E1S(zi): n ∈ i

C1A: k = i

E1

N1A: j ≡ 1

w(z|·) = q(a, z)q(z, b) E1S(zi): q(a, z)q(z, b1 ⊕ b2 ⊕ . . . bNi)

p(b|a) =�

z ϑa,zϑz,b

Mixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC[Shafiei & Milios 2006] (E1S)

N2.

Observedparameters �=1 �=2

ziai�ϑz |α�ϑc

a

bi w(z|�ϑca, ·) = ϑc

a,z q(z, b)

p(b|a) =�

z ϑca,zϑz,b

Label distribution: ATM [Rosen-Zvi et al. 2004]

N3.

Non-Dirichletprior

�=2

ziai�ϑz |α

bi

�=1

�ϑa | λϑ�ϑa ∼ p(�ϑa |λϑ)

w(z|�ϑa, ·) = p(zi |ai,�ϑa)q(z, b) ;

M-step: estimate �ϑa [Blei & Lafferty 2007]

p(b|a) =�

z ϑa,zϑz,b

Alternative distributions on the simplex: CTM [Blei & Lafferty 2007]: �ϑa ∝exp�η, �η∼N (�µ,Σ); TLM [Wallach 2008]: hierarchy of Dirichlet priors

N4.

Non-discreteoutput

�=1

ziai�ϑa |α

�=2

θz | λθvi

θz ∼ p(θz |λθ)

w(z|θ, ·) = q(a, z)p(vi | θz) ; M-step: estimate θz

p(v|a) =�

z ϑa,z p(v | θz)

Non-multinomial observ.: Corr-LDA [Barnard et al. 2003], GMM [McLachlan &Peel 2000]: p(v|θ) = N (�x | �µ,Σ)

N5+E4.

Aggregation

�=2zi

�ϑz |αbi

�=3

�=1

ai�ϑa |α �zm

�η��ζm| µ,σvm

�ζm ∝ � j∈m δ(z − zj)regression

w(z|�zm, vm, ·) = q(a, z) q(z,w)N (vm |�η�v �ζm,σ2) ;

M-step: estimate �ηv,σ2 |�z,�v (for linear regression, N5B)

prediction: vm = �η�v�ζm

Regression/supervised learning: Supervised LDA [Blei & McAuliffe 2007], Rela-tional topic model [Chang & Blei 2009]

E2.

Autonomousedges

�=1

�=2xi

ai∪ j�ϑx |α

bi

�ϑa |α

�=3

y j

�ϑy |αc j

E2A: i � jE2B: i = j

w(x, y|·) = q(a, x⊕y)q(x, b)q(y, c) E2A: q(ai∪ j, xi⊕y j) = q(ai, xi⊕y j)q(a j, xi⊕y j)

p(b, c|a) =�

x ϑa,xϑx,b�

y ϑa,yϑy,c

Common mixture of causes: Multimodal LDA [Ramage et al. 2009]

E3.

Couplededges

�=1

�=2zi

ai�ϑz |α

bi

�ϑa |α

�=3

�ϑz |αci

w(z|·) = q(a, z)q(z, b)q(z, c)

p(b, c|a) =�

z ϑa,zϑz,bϑz,c

Common cause for observations: Hidden relational model (HRM) [Xu et al. 2006],Link-LDA [Erosheva et al. 2004]

C2.

Combinedindices

�=1�ϑa |α

�=2

�ϑb |α

ai

b j

�=3

�ϑk

ci∪ j

xi

y j |αC2B: k = (xi, y j)C2A: k = (i, xi)

C2C: k = g(i, j, xi, y j)

w(x, y|·) = q(a, x)q(b, y)q(k, c)

p(c|a, b) =�

x[ϑa,x�

y ϑb,yϑk,c] , k = g(x, y, i, j)

Different dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.2006], Multi-LDA [Porteous et al. 2008a]

C3.

Interleavedindices

�=1�ϑa |α

�=2

�ϑb |α

ai

b j

�=3

|αci∪ j

xi

�ϑz

y j C3A: i � jC3B: i = j

C3A: w(xi, y j |·) = q(ai, xi)q(b j, y j)q(xi, ci ⊕ c j)q(y j, ci ⊕ c j)

C3B: w(x, y|·) = q(a, x)q(b, y)[q(x, c ⊕ c)]δ(x−y) [q(x, c ⊕ c)q(y, c ⊕ c)]1−δ(x−y)

C3A: p(c |a) =�

x ϑa,xϑx,c, p(c |b) sim., C3B: p(c|a, b) =��

x ϑa,xϑx,c�

y ϑb,yϑy,c

Different causes, same effect: proposed here

C4.

Switch

�=1�ϑa |α

�=2

�ϑb |α

ai

bi

�ϑz |α

�ϑz |α

�=3ci

�=4

di

zi

si

s ?=0

s ?=1

w(z, s|·) = q(a, z)[q(b, 1)q(z, c)]δ(s−1) · [q(b, 2)q(z, d)]δ(s−2)

p(c, d|a, b) =�

z ϑa,z[ϑb,0ϑz,c + ϑb,1ϑz,d]

Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entity-topic models [Newman et al. 2006a]

C5.

Nodecoupling

�=1�ϑa |α

�=2

�ϑb |α

ai

b j

�ϑx |α

�ϑy |α

�=3ci

�=4

d j

xi

y j

C5A: i � jC5B: i = j

C5A: w(xi, y j |·)=q(ai, xi)q(b j, y j)q(xi, ci ⊕ d j)q(y j, ci ⊕ d j)

C5B: w(x, y|·)=q(a, x)q(b, y)[q(x, c ⊕ d)]δ(x−y) · [q(x, c ⊕ d)q(y, c ⊕ d)]1−δ(x−y)

p(c, d|a, b) =�

x ϑa,xϑx,c�

y ϑb,yϑy,d

Correlation of submodels, relations: Simple relational component model[Sinkkonen et al. 2008], Relational topic model [Chang & Blei 2009]

Figure 9.2: NoMM sub-structure properties. Notation (also see (9.3)): a⊕b adds counts �n (a)· +�n

(b)· ;

a ⊕ b prevents ¬i for a in (9.1); ci∪ j combines sequences {ci, c j, ci, j}, as applicable.Gregor Heinrich A generic approach to topic models 52 / 35

Page 147: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Gibbs meta-sampler: Java data structure

<<implements>>

<<implements>>

<<collects>>

<<collects>><<collects>>

MixNode

∼ theta : Variable

∼ ntheta, nthetasum : Variable

∼ alpha : Variable

// NoMM node

// counts nkt ,∑

t nkt

// hyperparameter α.

// parameters ϑkt

<<collects>>

MixSequence

∼ superseq : MixSequence

∼ m, n, s : Variable

∼M, Mq, Nm, Nmq, W, Wq : Expression

∼ qfixed : boolean

∼ subseqs : List<MixSequence>

// NoMM sequence

// sequence index variables, m, n, s

// supersequence, null for root

// sequence index ranges: M,Nm,W

// flag: fixed topics for query

// subsequences, null for leafMixEdge // NoMM edge

∼ x : Variable

∼ T : Expression

∼ siblingsE2 : List<MixEdge>

∼ sparse : boolean

<<collects>>

// range of x,T

// variable xmn

// E2 edge siblings, for ∆(·) expansion

// flag: parent node emits subset of range

MixItem

∼ parents : List<MixItem>

∼ children : List<MixItem>

∼ datatype : enum

∼ linktype : enum

∼ name : String

<<collects>>

// interface: node or edge

// ≥2 edges: multiple inputs C2

// ≥2 nodes: coupled branches E3

// ≥2 nodes: merged inputs C3

// ≥2 edges: indep. branches E2

// type of item: seq., topic, qfixed...

// global id (= unique variable name)

// link type: C and E classifications

MixNet

∼ sequences : List<MixSequence>

∼ nodes : List<MixNode>

∼ edges : List<MixEdge>

∼ constants : Map<String, String>

// represents a NoMM

// sequences of the NoMM

// nodes of the NoMM

// edges of the NoMM

// constants for the NoMM

Gregor Heinrich A generic approach to topic models 53 / 35

Page 148: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

1 /** run the main Gibbs sampling kernel */

2 public void run(int niter) {

34 // iteration loop

5 for (int iter = 0; iter < niter; iter++) {

67 // major loop, sequence [m][n]

8 for (int m = 0; m < M; m++) {

9 // component selectors

10 int mxsel = -1;

11 int mxjsel = -1;

12 int ksel = -1;

1314 // minor loop, sequence [m][n]

15 for (int n = 0; n < w[m].length; n++) {

16 double psum;

17 double u;

18 // decrement counts

19 nmx[m][x[m][n]]--;

20 mxsel = X * m + x[m][n];

21 nmxy[mxsel][y[m][n]]--;

22 nmxysum[mxsel]--;

23 if (x[m][n] == 0)

24 ksel = 0;

25 else if (y[m][n] == 0)

26 ksel = 1 + x[m][n];

27 else

28 ksel = 1 + X + y[m][n];

29 nkw[ksel][w[m][n]]--;

30 nkwsum[ksel]--;

3132 // compute weights

33 /* p(x_{m,n} \eq x, y_{m,n} \eq y ... (LaTeX omitted) */

34 psum = 0;

35 int hx = -1;

36 int hy = -1;

37 // hidden edge

38 for (hx = 0; hx < X; hx++) {

39 // hidden edge

40 for (hy = 0; hy < Y; hy++) {

41 mxsel = X * m + hx;

42 mxjsel = hx;

43 if (hx == 0)

44 ksel = 0;

45 else if (hy == 0)

46 ksel = 1 + hx;

47 else

48 ksel = 1 + X + hy;

49 pp[hx][hy] = (nmx[m][hx] + alpha[hx])

50 * (nmxy[mxsel][hy] + alphax[mxjsel][hy])

51 / (nmxysum[mxsel] + alphaxsum[mxjsel])

52 * (nkw[ksel][w[m][n]] + beta)

53 / (nkwsum[ksel] + betasum);

54 psum += pp[hx][hy];

55 } // for h

56 } // for h

5758 // sample topics

59 u = rand.nextDouble() * psum;

60 psum = 0;

61 SAMPLED:

62 // each edge value

63 for (hx = 0; hx < X; hx++) {

64 // each edge value

65 for (hy = 0; hy < Y; hy++) {

66 psum += pp[hx][hy];

67 if (u <= psum)

68 break SAMPLED;

69 } // h

70 } // h

7172 // assign topics

73 x[m][n] = hx;

74 y[m][n] = hy;

7576 // increment counts

77 nmx[m][x[m][n]]++;

78 mxsel = X * m + x[m][n];

79 nmxy[mxsel][y[m][n]]++;

80 nmxysum[mxsel]++;

81 if (x[m][n] == 0)

82 ksel = 0;

83 else if (y[m][n] == 0)

84 ksel = 1 + x[m][n];

85 else

86 ksel = 1 + X + y[m][n];

87 nkw[ksel][w[m][n]]++;

88 nkwsum[ksel]++;

89 } // for n

90 } // for m

9192 // estimate hyperparameters

93 estAlpha();

9495 } // for iter

96 } // run()

Figure: Generated Gibbs kernel for hPAM2 model in Fig. ??.Gregor Heinrich A generic approach to topic models 54 / 35

Page 149: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast serial sampling: Using a normalisation bound

0

1 0

2 0,1

3

0..2

4

0..3

︸ ︷︷ ︸Zknown

4

︸ ︷︷ ︸Zunknown

4

Zi︷ ︸︸ ︷

0

l↓

1

2

3

4...

k → skl

main mass s11

adjustment mass s01

︸ ︷︷ ︸Zi,4

︸ ︷︷ ︸Zi,0

u ∼ U[0, 1]

X

uZi,0

uZi,1

uZi,2

uZi,3

uZi,4

Idea: Exploit saliency of few elements→ compute only largest(=most likely) weights

Approximate normalisation via vector norms (Porteous et al. 2008)

Generalisation to multiple dependent variables: more expensivehigher-order vector norms↔ higher sparsity of sampling space

Gregor Heinrich A generic approach to topic models 55 / 35

Page 150: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast parallel sampling: Synchronisation methods

P1

...

P2

PP

Θ∗

sync

global parameters

Θ11

Θ12

Θ13

Θ14

Θ1M

Θ21

Θ22

Θ23

Θ24

Θ2M

document-specific parameters

... ...

� pmfs over vocabulary

� pmfs over (sub-)topics

ymnxmn~ϕy | β

wmn~ϑmx |α

m~%m |α

Multi-processor parallelisationusing shared memory (OpenMP)

Main challenge: synchronisationand communication of global dataSynchronisation methods (LDA +generic NoMMs):

a. Naıve synchronisation locksb. Query read-only ϕ + MAP update

step for ϕ (“split-state”)c. Local copies ϕ + reduction step

(=AD-LDA (Newman et al. 2009))

Gregor Heinrich A generic approach to topic models 56 / 35

Page 151: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast sampling: Serial × parallel

10 20 50 100 200 5000

5

10

30

32.5

K

spee

dup

LDA, NIPS

sapaspa

27.5

2.5

7.5

1

P

Figure: Speed-up for fast sampling methods: LDA.

Gregor Heinrich A generic approach to topic models 57 / 35

Page 152: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast sampling: Serial × parallel × independent

10,10 20,20 20,40 40,40 20,1000

5

10

15

20

25

spee

dup

vs.a

10,10 20,20 20,40 40,40 20,1000

2

4

6

8

spee

dup

vs.i

a

ipaipbipc

iapapbpc

K,L

(a) Parallel, independent

20,20 40,40 20,1000

1

2

3

4

5

6

7

8

9

10

spee

dup

vs.i

a

ipaipasipcs

50

100

150

20,1

00sp

eedu

pvs

.a

(b) Parallel, serial, independent

Figure: Speed-up for combined fast samplers: PAM4 (2 dependent variables).

Gregor Heinrich A generic approach to topic models 58 / 35

Page 153: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

Fast sampling: The impact of assumed independence

100 101 102 1031500

2000

2500

3000

iteration

perp

lexi

ty10,1020,2020,4020,100

convergence: dependent indep.

paipa

Figure: Perplexity over iterations. Example model: PAM4.

Gregor Heinrich A generic approach to topic models 59 / 35

Page 154: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 model: Derivation using NoMM structure

cm, jq(x,z⊕y)

xm,n∪j~am

mq(z,w)zm,n

wm,n

[M] [Am]

[K][V]

{M,Nm}

[C]q(y, c) {M, Jm}

ym, j[K]

Lining up q-functions:

p(x, z, y | ·) ∝ am,x q(x, z ⊕ y) q(z,w) q(y, c) (13)

Transforming to standard Gibbs full conditionals:

p(xm,n=x, zm,n=z | ·) ∝ am,x ·n¬{x,z}m,nx,z + α

n¬{x,z}m,nx + Kα· n¬zm,n

z,w + β

n¬zm,nz + Vβ

(14)

p(xm,j=x, ym,j=y | ·) ∝ am,x ·n¬{x,y}m,jx,y + α

n¬{x,y}m,jy + Kα· n

¬ym,jy,c + γ

n¬ym,jy + Cγ

(15)

Retrieval uber Anfrage-Likelihood-Modell:

p(~w | a) =∏

w∈~w∑

zϑa,zϕz,w p(~c | a) =∏

c∈~c∑

yϑa,yψy,c . (16)

Gregor Heinrich A generic approach to topic models 60 / 35

Page 155: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 model: Derivation using NoMM structure

cm, j~ϑx |α

xm,n∪j~am

m~ϕz | βzm,n

wm,n

[M] [Am]

[K][V]

{M,Nm}

[C]~ψy | γ {M, Jm}

ym, j[K]

Lining up q-functions:

p(x, z, y | ·) ∝ am,x q(x, z ⊕ y) q(z,w) q(y, c) (13)

Transforming to standard Gibbs full conditionals:

p(xm,n=x, zm,n=z | ·) ∝ am,x ·n¬{x,z}m,nx,z + α

n¬{x,z}m,nx + Kα· n¬zm,n

z,w + β

n¬zm,nz + Vβ

(14)

p(xm,j=x, ym,j=y | ·) ∝ am,x ·n¬{x,y}m,jx,y + α

n¬{x,y}m,jy + Kα· n

¬ym,jy,c + γ

n¬ym,jy + Cγ

(15)

Retrieval uber Anfrage-Likelihood-Modell:

p(~w | a) =∏

w∈~w∑

zϑa,zϕz,w p(~c | a) =∏

c∈~c∑

yϑa,yψy,c . (16)

Gregor Heinrich A generic approach to topic models 60 / 35

Page 156: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 model: Derivation using ordinary method (excerpt)

wm,n

zm,n

cm, j

ym, j

~am

βγ

~ψk ~ϕk

~ϑx

α

xm, j xm,n

m ∈ [1,M]

n ∈ [1,Nm]j ∈ [1, Jm] k ∈ [1,K]k ∈ [1,K]

x ∈ [1, A]

(a) Expert–tag–topic model (ETT)(Heinrich 2011b)

Appendix E

Details on application models

This appendix sketches the traditional derivation of model inference and likelihood equations inChapter 10. Comparing this to the NoMM-based derivations developed in the thesis illustratesthe usefulness of that method to avoid tedious calculations.

E.1 Bayesian networks of application models

For comparison with the NoMM representations in Chaper 10, the Bayesian networks of the threevariants of the expert–tag–topic models are presented in Figs. E.1 and E.2. The dashed plate inFig. E.2(b) refers to the duplicate draw of the C3B structure explained in Sec. 9.3.

E.2 Example derivation: Expert–tag–topic model 1

The Bayesian network of the ETT1 model is shown in Fig. E.1. The details of the derivationstrategy have been explained for instance in [Heinrich 2009b]; it is similar to the strategies usedin literature.1 We start with the complete-data likelihood of the corpus:

p(�w,�c,�a, �x,�z,Θ,Φ,Ψ |α, β, γ) = p(�w|�z,Φ)p(Φ|β) · p(�c|�y,Ψ )p(Ψ |γ)

· p(�y |�x,Θ)p(�z |�x,Θ)p(Θ|α) · p(�x |�a) (E.1)

=

M�

m=1

� Nm�

n=1

p(wm,n|�ϕzm,n)p(zm,n|�ϑxm,n

) am,xm,n

·Jm�

j=1

p(cm, j|�ψym, j)p(ym, j|�ϑxm, j

) am,xm, j

· p(Θ|α) · p(Φ|β) · p(Ψ |γ) . (E.2)

1Alternative derivation strategies for topic model Gibbs samplers have been published in [Griffiths 2002] workingvia p(zi|�z¬i, �w) ∝ p(wi|�w¬i, z)p(zi|�z¬i) and [McCallum et al. 2007] who use the chain rule via the joint token likelihood,p(zi|�z¬i, �w¬i) = p(zi,wi|�z¬i, �w¬i)/p(wi|�z¬i, �w¬i) ∝ p(�z, �w)/p(�z¬i, �w¬i), which is similar to the approach taken here.

243

244 APPENDIX E. APPLICATION MODELS

wm,n

zm,n

cm, j

ym, j

�am

βγ

�ψk �ϕk

�ϑx

α

xm, j xm,n

m ∈ [1,M]

n ∈ [1,Nm]j ∈ [1, Jm] k ∈ [1,K]k ∈ [1,K]

x ∈ [1, A]

Figure E.1: ETT1 model: Bayesian network.

Next, we integrate out the model parameters, introducing our knowledge on the types of distribu-tions and their conjugacy:

p(�w,�c,�a, �x,�z|α, β, γ) =� M�

m=1

� Nm�

n=1

p(wm,n|�ϕzm,n)p(zm,n|�ϑxm,n

) am,xm,n

·Jm�

j=1

p(cm, j|�ψym, j)p(ym, j|�ϑxm, j

) am,xm, j

· dp(Θ|α) · dp(Φ|β) · dp(Ψ |γ) (E.3)

=

� M�

m=1

Nm�

n=1

p(wm,n|�ϕzm,n)

K�

k=1

p(�ϕk|β) dϕk

·� M�

m=1

Jm�

j=1

p(cm, j|�ψym, j)

K�

k=1

p(�ψk|γ) d�ψk

·� M�

m=1

p(Θ|α)Nm�

n=1

p(zm,n|�ϑxm,n) am,xm,n

Jm�

j=1

p(ym, j|�ϑxm, j) am,xm, j

d�ϑm

(E.4)

=

� K�

k=1

1∆V (β)

V�

t=1

ϕnk,t+β−1k,t d�ϕk ·

� K�

k=1

1∆C(γ)

C�

c=1

ψnk,c+γ−1k,c d�ψk

·� A�

a=1

1∆K(α)

K�

k=1

ϑn(z)

a,k+n(y)a,k+α−1

a,k d�ϑa ·M�

m=1

A�

a=1

an(z)

m,a+n(y)m,a

m,a (E.5)

=

K�

k=1

∆(�n (z)k + β)

∆V (β)· ∆(�n (y)

k + γ)

∆C(γ)

A�

a=1

∆(�n (z)a + �n

(y)a + α)

∆K(α)

M�

m=1

an(z)

m,a+n(y)m,a

m,a . (E.6)

E.2. EXAMPLE DERIVATION: EXPERT–TAG–TOPIC MODEL 1 245

wm,n

zm,n

cm,n

�am

βγ

�ψk �ϕk

xm,n

m ∈ [1,M]

n ∈ [1,Nm] k ∈ [1,K]k ∈ [1,K]

�cm

�ϑx

α

x ∈ [1, A]

(a) ETT2

wm,n

z1,2m,n

�am

β

�ϕk

�ζc

γ

xm,n

m ∈ [1,M]

n ∈ [1,Nm] k ∈ [1,K]

�cm

c ∈ [1,C]

�ϑx

α

x ∈ [1, A]

cm,n

(b) ETT3

Figure E.2: Iterated ETT models: Bayesian networks.

Note the change in indexing from tokens, wm,n, xm,n, etc., to count statistics, nx,k, nk,c, etc. Thesuperscripts in counts distinguish the branches of the model: w = words, c = tags. To solve theintegrals in (E.5), either the Dirichlet integral of the first type can be used [Abramowitz & Stegun1964], or one can observe that the Dirichlet distributions just re-parametrise (due to conjugacywith the multinomial, cf. Appendix B). Because the actual distributions integrate to one andvanish, solely their new normalisation must be determined.

The Gibbs full conditional can be determined from (E.6) by applying the chain rule. Becausethe Gibbs sampler will scan through words and tags in an alternating fashion, the two variableszm,n and ym, j are sampled independently. However, the author association at the root of the modelmust be sampled jointly for both. Using i = (m, n), nx,k = n(z)

x,k + n(y)x,k and the sum notation

n(z)k =

�Vt=1 nk,t, etc., the full conditional for word tokens becomes:

p(zi=k, xi=x|wi=t,�z¬i,�y, �x¬i, �w¬i,�a,�c)

=p(�w,�z,�y, �x)

p(�w,�z¬i,�y, �x¬i)=

p(�w|�z,�y)p(�w¬i|�z¬i,�y)p(wi)

· p(�z |�x)p(�z¬i|�x¬i)

· p(�x)p(�x¬i)

(E.7)

∝ ∆(�n (z)k + β)

∆(�n (z)k,¬i + β)

· ∆(�nx + α)∆(�nx,¬i + α)

· am,x (E.8)

=Γ(nk,t + β) Γ(nk,¬i + Vβ)Γ(nk,t,¬i + β) Γ(nk + Vβ)

·Γ(n (z)

x,k + α) Γ(n(z)x,¬i + Kα)

Γ(n(z)x,k,¬i + α) Γ(n (z)

x + Kα)· am,x (E.9)

=nk,t,¬i + β

nk,¬i + Vβ·

n(z)x,k,¬i + α

n(z)x,¬i + Kα

· am,x (E.10)

= q(k, t) q(x, k) am,x . (E.11)

246 APPENDIX E. APPLICATION MODELS

For the tag branch, the derivation is analogous, now re-defining i = (m, j):

p(yi=k, xi=x|ci=c,�z¬i,�y¬i, �x¬i, �w,�a,�c¬i) ∝nk,c,¬i + γ

nk,¬i + Vγ·

n(y)x,k,¬i + α

n(y)x,¬i + Kα

· am,x (E.12)

= q(k, c) q(x, k) am,x . (E.13)

The difference of (E.11) and (E.13) to (10.3) is a result of the definition of nx,k as a summed countand the fact that both branches are disjointly sampled.Gregor Heinrich A generic approach to topic models 61 / 35

Page 157: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 evaluation: Truncated Average Precision

AP@5 = 3 = 0.533

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

Retrieval performance.The most prominentretrieval measures areprecision and recall.Recall is defined asthe ratio between thenumber of retrievedrelevant items tothe total number ofexisting relevant items.Precision is defined asthe ratio between the

1/2 +2/4 +3/5

1/1

1. 2. 3. 4. 5.

+2/2 +3/5AP@5 = 3 = 0.867

Figure: Average Precision at 5 (assuming 3 relevant documents in corpus)

Gregor Heinrich A generic approach to topic models 62 / 35

Page 158: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 results: Term Retrieval

1. Scholkopf B, lik = −76.272, tokens = 2830, docs = 10: judged relevant

query: svm support vector machine ∣ kernel classifier hyperplane regression

From Regularization Operators to Support Vector Kernels (9); Improvingthe Accuracy and Speed of Support Vector Machines (9); Shrinking theTube: A New Support Vector Regression Algorithm (11) . . .

2. Smola A, lik = −77.509, tokens = 2760, docs = 11: judged relevantSupport Vector Regression Machines (9); Prior Knowledge in SupportVector Kernels (10); Support Vector Method for Novelty Detection (12)The Entropy Regularization Information Criterion (12, support vectormachines, regularization) . . .

3. Vapnik V, lik = −77.525, tokens = 2332, docs = 10: judged relevantSupport Vector Regression Machines (9); Prior Knowledge in SupportVector Kernels (10); Prior Knowledge in Support Vector Kernels (10);Support Vector Method for Multivariate Density Estimation (12); . . .

4. Crisp D, lik = −81.401, tokens = 699, docs = 2: judged relevantA Geometric Interpretation of t/-SVM Classifiers (12); Uniqueness of theSVM Solution (12)

5. Burges C, lik = −81.630, tokens = 1309, docs = 5: judged relevantImproving the Accuracy and Speed of Support Vector Machines (9); AGeometric Interpretation of t/-SVM Classifiers (12); Uniqueness of theSVM Solution (12) . . .

6. Laskov P, lik = −84.275, tokens = 738, docs = 1: judged relevantAn Improved Decomposition Algorithm for Regression Support VectorMachines (12)

7. Steinhage V, lik = −84.600, tokens = 438, docs = 1: judged irrelevantNonlinear Discriminant Analysis Using Kernel Functions (12)

8. Bennett K, lik = −86.754, tokens = 384, docs = 1: judged relevantSemi-Supervised Support Vector Machines (11)

9. Herbrich R, lik = −86.754, tokens = 462, docs = 2: judged irrelevantClassification on Pairwise Proximity Data (11); Bayesian Transduction(12, classification)

10. Chapelle O, lik = −87.431, tokens = 494, docs = 2: judged relevantModel Selection for Support Vector Machines (12); Transductive Infer-ence for Estimating Values of Functions (12, regression, classification)

✓✓

×

×

Figure: Example term retrieval results: ETT1/J20. Authors shown with selectedarticles. Italicised terms are deemed relevant. In parentheses: NIPS volumenumbers and relevant terms.

Gregor Heinrich A generic approach to topic models 63 / 35

Page 159: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 results: Tag retrieval

1. Movellan J, lik = −4.680, tokens = 3153, docs = 8: judged relevant

query: face recognition

Dyn. Features for Visual Speechreading: A System Comparison (9, notags); Image Representation for Facial Expression Coding (12, tags: facerecognition, image, ICA); Visual Speech Recognition with StochasticNetworks (7, tags: HMM, speech recognition) . . .

2. Bartlett M, lik = −4.951, tokens = 812, docs = 3: judged relevantViewpoint Invariant Face Recognition using ICA and Attractor Networks(9, tags: face recognition, invariances, pattern recognition); Image Rep-resentation for Facial Expression Coding (12, tags: face recognition, im-age, ICA) . . .

3. Dailey M, lik = −4.952, tokens = 903, docs = 2: judged relevantTask and Spatial Frequency Effects on Face Specialization (10, tags: facerecognition); Facial Memory Is Kernel Density Estimation (Almost) (11,no tags)

4. Padgett C, lik = −4.974, tokens = 499, docs = 1: judged relevantRepresenting Face Images for Emotion Classification (9, tags: classifi-cation, face recognition, image)

5. Hager J, lik = −5.023, tokens = 377, docs = 2: judged relevantClassifying Facial Action (8, tags: classification); Image Representationfor Facial Expression Coding (12, tags: face recognition, image, ICA)

6. Ekman P, lik = −5.027, tokens = 374, docs = 2: judged relevantImage Representation for Facial Expression Coding (12, tags: facerecognition, image, ICA); Classifying Facial Action (8, tags: classifi-cation)

7. Phillips P, lik = −5.127, tokens = 795, docs = 1: judged relevantSupport Vector Machines Applied to Face Recognition (11, tags: facerecognition, SVM)

8. Gray M, lik = −5.159, tokens = 470, docs = 2: judged irrelevantDynamic Features for Visual Speechreading: A Systematic Comparison(9, text: dynamic visual features; no tags)

9. Lawrence D, lik = −5.217, tokens = 265, docs = 1: judged relevant

10. Ahuja N, lik = −5.221, tokens = 366, docs = 2: judged relevantA SNoW-Based Face Detector (12, tags: face recognition, image, vision)

✓✓✓

×

✓SEXNET: A Neural Network Identifies Sex From Human Faces (3, tags:neural networks, object recognition, pattern recognition)

Figure: ETT1 tag retrieval results. Italicised terms correspond to tags.

Gregor Heinrich A generic approach to topic models 64 / 35

Page 160: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 results: Tag query and expert topicstag: face recognition (ETT1/J20)0.82702 face images faces image facial visual human video database detection0.09392 image images texture pixel resolution pyramid regions pixels region search0.02696 wavelet video view images tracking user camera image motion shape0.00117 eeg brain ica artifacts subjects activity subject erp signals scalp0.00100 image images visual vision optical pixel surface edge disparity receptive0.00094 orientation cortical dominance ocular cortex development lateral eye cells visual0.00089 chip neuron synapse digital pulse analog synaptic chips synapses murray0.00084 hinton object image energy cost images code visible zemel codes

author: Movellan J (ETT1/J20)0.53816: face images faces image facial visual human video database detection0.16216: image images texture pixel resolution pyramid regions pixels region search0.08954: speech speaker acoustic vowel phonetic phoneme utterances spoken formant0.06216: bayesian prior density posterior entropy evidence likelihood distributions0.03939: filter frequency signals phase channel amplitude frequencies temporal spectrum0.03508: activation boltzmann annealing temperature neuron stochastic schedule machine0.02770: cell firing cells neuron activity excitatory inhibitory synaptic potential membrane0.02154: convergence stochastic descent optimization batch density global update

author: Cottrell G (ETT1/J20)0.41865: recurrent nets correlation cascade activation connection epochs representations0.27523: face images faces image facial visual human video database detection0.17531: subjects human stimulus cue subject trials experiment perceptual psychophysical0.11287: tangent transformation image simard images invariant invariance euclidean0.07130: modules attractors cortex phase olfactory frequency bulb activity oscillatory eeg0.06143: word connectionist representations words activation production cognitive musical0.03695: node activation graph cycle nets message recurrence links connection child0.02049: visual attention contour search selective orientation iiii region saliency segment

Gregor Heinrich A generic approach to topic models 65 / 35

Page 161: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 results: Topic coherence

Topic coherence (Mimno et al. 2011):≈ How often do top-ranked topic terms co-occur in documents?Re-enacts human judgement in topic intrusion experiments (Changet al. 2009; Heinrich 2011b)

1. A. orientationB. cortexC. visualD. ocularE. acousticF. eye

Words in topic (choose worst match (A-F) in every group):

2. A. likelihoodB. mixtureC. theoremD. densityE. emF. prior

3. A. riskB. returnC. stockD. tradingE. processorF. prediction

4. A. languageB. wordC. stressD. grammarE. neuralF. syllable

5. A. circuitB. bayesianC. analogD. voltageE. vlsiF. chip

6. A. validationB. setC. varianceD. regressionE. selectionF. bias

(a) Topic intrusion experiment

LDA ATM ETT1/J20 ETT1/J100-500

-450

-400

-350

-300

-250

-200

-150

cohe

renc

esc

ore

(b) Coherence scores

Gregor Heinrich A generic approach to topic models 66 / 35

Page 162: A generic approach to topic models and its application to ... · A generic approach to topic models and its application to virtual communities Gregor Heinrich PhD presentation (English

ETT1 results: Topic coherence

Topic coherence (Mimno et al. 2011):≈ How often do top-ranked topic terms co-occur in documents?Re-enacts human judgement in topic intrusion experiments (Changet al. 2009; Heinrich 2011b)

1. A. orientationB. cortexC. visualD. ocularE. acousticF. eye

Words in topic (choose worst match (A-F) in every group):

2. A. likelihoodB. mixtureC. theoremD. densityE. emF. prior

3. A. riskB. returnC. stockD. tradingE. processorF. prediction

4. A. languageB. wordC. stressD. grammarE. neuralF. syllable

5. A. circuitB. bayesianC. analogD. voltageE. vlsiF. chip

6. A. validationB. setC. varianceD. regressionE. selectionF. bias

(a) Topic intrusion experiment

LDA ATM ETT1/J20 ETT1/J100-500

-450

-400

-350

-300

-250

-200

-150

cohe

renc

esc

ore

(b) Coherence scores

Gregor Heinrich A generic approach to topic models 66 / 35