Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities

57
Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities Xuerui Wang Computer Science Department University of Massachusetts Amherst Joint work with Andrew McCallum, Andres Corrada- Emmanuel, Chris Pal, Xing Wei and Natasha Mohanty.

description

Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities. Xuerui Wang Computer Science Department University of Massachusetts Amherst. Joint work with Andrew McCallum, Andres Corrada-Emmanuel, Chris Pal, Xing Wei and Natasha Mohanty. Probabilistic topic models. - PowerPoint PPT Presentation

Transcript of Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities

Page 1: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

Structured Topic Models: Jointly Modeling Words and Their

Accompanying Modalities

Xuerui WangComputer Science Department

University of Massachusetts Amherst

Joint work with Andrew McCallum, Andres Corrada-Emmanuel, Chris Pal, Xing Wei and Natasha Mohanty.

Page 2: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

2

Probabilistic topic models

• Main Assumption:– Documents are mixture of topics– Topic distributions over words for co-occurrence

• Objectives:– Understand text using learned topics– Represent documents in topic space

Page 3: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

3

Clustering words into topics withLatent Dirichlet Allocation

[Blei, Ng, Jordan 2003]

Sample a distributionover topics,

For each document:

Sample a topic, z

For each word in doc

Sample a wordfrom the topic, w

Example:

70% finance30% environment

finance

“bank”

GenerativeProcess:

environment

Page 4: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

4

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAM

DREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASES

GERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

Example topicsinduced from a large collection of text

FIELDMAGNETIC

MAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

[Tennenbaum et al]

Page 5: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

5

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAM

DREAMSTHOUGHT

IMAGINATIONMOMENT

THOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASES

GERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETIC

MAGNETWIRE

NEEDLECURRENT

COILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGYFIELD

PHYSICSLABORATORY

STUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYEDPLAYING

HITTENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Example topicsinduced from a large collection of text

[Tennenbaum et al]

Page 6: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

6

Documents are not just text !

• Multiple modalities:– Research papers (author, venue, words, etc.)– Email messages (sender, recipients, time, words, etc.)– Legislative resolutions (voting record, words, etc.)– And many more

• Most previous work: one modality at a time– Learn topics from words– Discover groups from relations– Etc.

Page 7: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

8

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 8: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

9

All possible “topic models” with one latent topic, two observed modalities

and two conditional dependencies

Page 9: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

10

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 10: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

11

From LDA to Author-Recipient-Topic

Page 11: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

12

All possible “topic models” with two observed modalities

Page 12: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

13

Inference and Estimation

Gibbs Sampling:- Easy to implement- Reasonably fast

r

Page 13: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

14

Enron email corpus

• 250k email messages• 147 people

Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001

Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions.

DP

Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]

Page 14: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

15

Topics, and prominent senders / receiversdiscovered by ARTTopic names,

by hand

Page 15: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

16

Topics, and prominent senders / receiversdiscovered by ART

Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice President of Regulatory Affairs”Steffes = “Vice President of Government Affairs”

Page 16: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

17

Comparing role discovery

connection strength (A,B) =

distribution overauthored topics

Traditional SNA

distribution overrecipients

distribution overauthored topics

Author-TopicART

Page 17: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

18

Comparing role discovery Tracy Geaconne Dan McCarty

Traditional SNA Author-TopicART

Similar roles Different rolesDifferent roles

Geaconne = “Secretary”McCarty = “Vice President”

Page 18: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

20

Traditional SNA Author-TopicART

Different roles Very differentVery similar

Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”

Comparing role discovery Lynn Blair Kimberly Watson

Page 19: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

21

McCallum Email Corpus 2004

• January - October 2004• 23k email messages• 825 people

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 20: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

25

Two most prominent topicsin discussions with ____?

Words Problove 0.030514house 0.015402donna 0.013659time 0.012351great 0.011334hope 0.011043dinner 0.00959saturday 0.009154left 0.009154ll 0.009009roweis 0.008282visit 0.008137evening 0.008137stay 0.007847bring 0.007701weekend 0.007411road 0.00712sunday 0.006829kids 0.006539flight 0.006539

Words Probtoday 0.051152tomorrow 0.045393time 0.041289ll 0.039145meeting 0.033877week 0.025484talk 0.024626meet 0.023279morning 0.022789monday 0.020767back 0.019358call 0.016418free 0.015621home 0.013967won 0.013783day 0.01311hope 0.012987leave 0.012987office 0.012742tuesday 0.012558

Page 21: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

27

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 22: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

29

Discovering groups from observed set of relations

Admiration relations among six high school students.

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Page 23: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

30

Adjacency matrix representing relations

A B C D E FABCDEF

A B C D E FG1G2G1G2G3G3

G1G2G1G2G3G3

ABCDEF

A C B D E FG1G1G2G2G3G3

G1G1G2G2G3G3

ACBDEF

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Page 24: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

31

Group Model: partitioning entities into groups

2Sv

β

2Gγ α

Stochastic Blockstructures for Relations[Nowicki, Snijders 2001]

S: number of entities

G: number of groups

Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]

BetaDirichlet

Binomial

SgMultinomial

Page 25: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

32

Two relations with different attributes

A C B D E FG1G1G2G2G3G3

G1G1G2G2G3G3

A C E B D FG1G1G1G2G2G2

G1G1G1G2G2G2

ACEBDF

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Social Admiration

Soci(A, B) Soci(A, D) Soci(A, F)Soci(B, A) Soci(B, C) Soci(B, E)Soci(C, B) Soci(C, D) Soci(C, F)Soci(D, A) Soci(D, C) Soci(D, E)Soci(E, B) Soci(E, D) Soci(E, F)Soci(F, A) Soci(F, C) Soci(F, E)

ACBDEF

Page 26: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

33

Goal:Model relations and their (textual) attributes simultaneously to obtain better groups and more meaningful topics.

budget, funding, annual, cash

document, corrections, review, annual

Page 27: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

34

The Group-Topic model: discovering groups and topics simultaneously

bNw

t

B

T

φ

η

DirichletMultinomial

Uniform

2Sv

β

2Gγ α

Beta

Dirichlet

Binomial

SgMultinomial

T

Page 28: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

35

All possible “topic models” with two observed modalities

Page 29: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

37

U.S. Senate data set

• 16 years of voting records in the US Senate (1989 – 2005)

• a Senator may respond Yea or Nay to a resolution

• 3423 resolutions with text attributes (index terms)

• 191 Senators in total across 16 years

S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: 102-242. Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms

Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……

Page 30: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

38

Topics discovered (U.S. Senate)Education Energy

MilitaryMisc.

Economic

education energy government federalschool power military labor

aid water foreign insurancechildren nuclear tax aid

drug gas congress taxstudents petrol aid business

elementary research law employeeprevention pollution policy care

Mixture of Unigrams

Group-Topic Model

Education

+ DomesticForeign Economic

Social Security

+ Medicareeducation foreign labor social

school trade insurance securityfederal chemicals tax insurance

aid tariff congress medicalgovernment congress income care

tax drugs minimum medicareenergy communicable wage disability

research diseases business assistance

Page 31: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

39

Groups discovered (US Senate)

Groups from topic Education + Domestic

Page 32: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

40

Senators Who Change Coalition the most Dependent on Topic

e.g. Senator Shelby (D-AL) votes with the Republicans on Economicwith the Democrats on Education + Domesticwith a small group of maverick Republicans on Social Security + Medicare

Page 33: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

44

Do we get better groups with the GT model?

1. Cluster bills into topics using mixture of unigrams;

2. Apply group model on topic-specific subsets of bills.

Agreement Index (AI) measures group cohesion. Higher, better.

Datasets Avg. AI for Baseline Avg. AI for GT p-value

Senate 0.8198 0.8294 <.01

UN 0.8548 0.8664 <.01

1. Jointly cluster topic and groups at the same time using the GT model.

Baseline Model GT Model

Page 34: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

46

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 35: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

48

Want to model trends over time

• Is prevalence of topic growing or waning?

• Pattern appears only briefly– Capture its statistics in focused way– Don’t confuse it with patterns elsewhere in time

• How do roles, groups, influence shift over time?

Page 36: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

49

Topics Over Time (TOT)

Betaover time

topicindex

timestamp

word

Multinomialover words

Dirichletprior

Dirichlet prior

multinomialover topics

Betaover time

topicindex

timestamp

wordMultinomialover words

Dirichlet prior

multinomialover topics

Dirichlet prior

Page 37: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

50

All possible “topic models” with two observed modalities

Page 38: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

51

State of the union address

208 Addresses delivered between January 8, 1790 and January 29, 2002.

To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied.

•17156 ‘documents’

•21534 words

•669,425 tokens

Our scheme of taxation, by means of which this needless surplus is takenfrom the people and put into the public Treasury, consists of a tariff orduty levied upon importations from abroad and internal-revenue taxes leviedupon the consumption of tobacco and spirituous and malt liquors. It must beconceded that none of the things subjected to internal-revenue taxationare, strictly speaking, necessaries. There appears to be no just complaintof this taxation by the consumers of these articles, and there seems to benothing so well able to bear the burden without hardship to any portion ofthe people.

1910

Page 39: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

52

Comparing

TOT

against

LDA

Page 40: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

55

Topic Distributions Conditioned on Time

time

top

ic m

ass

(in

ver

tica

l h

eig

ht)

in N

IPS

con

ference p

apers

Page 41: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

57

TOT improves ability to predict time

Predicting the year of a State-of-the-Union address.

L1 = distance between predicted year and actual year.

Page 42: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

58

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 43: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

59

Topic Interpretability

LDA

algorithmsalgorithmgenetic

problemsefficient

Topical N-grams

genetic algorithmsgenetic algorithm

evolutionary computationevolutionary algorithms

fitness function

Page 44: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

60

Topics modeling phrases

• Topics based only on unigrams often difficult to interpret

• Topic discovery itself is confused because important meaning / distinctions carried by phrases.

• Significant opportunity to provide improved language models to ASR, MT, IR, etc.

Page 45: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

61

Topical N-Gram model

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1

T

D

. . .

. . .

. . .

α

WTW

γ1 γ2β 2

Page 46: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

62

All possible “topic models” with two observed modalities

Page 47: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

63

Features of Topical N-Grams model

• Easily trained by Gibbs sampling– Can run efficiently on millions of words

• Topic-specific phrase discovery– “white house” has special meaning as a phrase

in the politics topic,– ... but not in the real estate topic.

Page 48: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

64

NIPS research papers• Full text of NIPS papers between 1987-1999.

• 1,740 research papers in total.

• 13, 649 unique words and 2,301,375 word tokens.

• Stop words removed and no stemming.

Page 49: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

65

“Reinforcement Learning”

state learning policy action reinforcement states time optimal actions function algorithm reward step dynamic control sutton rl decision algorithms agent

LDAreinforcement learningoptimal policydynamic programmingoptimal controlfunction approximatorprioritized sweepingfinite-state controllerlearning systemreinforcement learning RLfunction approximatorsmarkov decision problemsmarkov decision processeslocal searchstate-action pairmarkov decision processbelief statesstochastic policyaction selectionupright positionreinforcement learning methods

policyactionstatesactionsfunctionrewardcontrolagentq-learningoptimalgoallearningspacestepenvironmentsystemproblemstepssuttonpolicies

Topical N-grams (2+) Topical N-grams (1)

Page 50: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

66

“Support Vector Machines”

kernel linear vector support set nonlinear data algorithm space pca function problem margin vectors solution training svm kernels matrix machines

LDA

support vectors test error support vector machines training error feature space training examples decision function cost functions test inputs kkt conditions leave-one-out procedure soft margin bayesian transduction training patterns training points maximum margin strictly convex regularization operators base classifiers convex optimization

kernel training support margin svm solution kernels regularization adaboost test data generalization examples cost convex algorithm working feature sv functions

Topical N-grams (2+) Topical N-grams (1)

Page 51: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

67

Word dependencies in information retrieval

• Long-distance dependency ---- topical (semantic) dependency helps [Hofmann, 1999; Wei and Croft, 2006].

• Short-distance dependency ---- phrases (usually discovered by separate modules) can boost IR performance [Fagan, 1989; Evans et al., 1991; Strzalkowski, 1995; Mitra et al., 1997].

• TNG simultaneously capture both.

Page 52: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

68

San Jose Mercury News (TREC)

• Covers materials from San Jose Mercury News in 1991

• With TREC queries 51-150

• 90,257 documents in total, 255, 686 unique words and 17,574,989 word tokens.

• Stop words removed and no stemming.

<DOC><DOCNO> SJMN91-06364022 </DOCNO><ACCESS> 06364022 </ACCESS><CAPTION> Photo; PHOTO: Associated Press; MONSTER MASH -- Kentucky's Jamal MashBurn shows his stuff in the Wildcats' 103-89 victory over state rival Louisville onSaturday. Mashburn had 25 points. </CAPTION><DESCRIPT> COLLEGE; BASKETBALL; GAME; RESULT; RANKING; SCHOOL </DESCRIPT><LEADPARA> Arizona had a 24-point night from Sean Rooks, a height advantage and strong defense, but still struggled to an 83-76 victory over Evansville in the FiestaBowl Classic in Tucson, Ariz., on Saturday.; The victory moved the No. 6Wildcats into the championship of their tournament for the seventh straighttime. </LEADPARA><SECTION> Sports </SECTION><HEADLINE> ARIZONA EDGES EVANSVILLE……

Page 53: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

69

Ad-hoc retrieval on SJMN

Clearly contain phrases

No phrases due to stopping and punctuation removing

Mixed results on many other queries.

Page 54: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

70

Ad-hoc retrieval on SJMN

* indicates statistically significant differences in performance with 95% confidence according to the Wilcoxon test

Page 55: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

71

Outline

• Introduction

• Role and Topic Discovery in Social Networks

• Group and Topic Discovery from Voting Records

• Topics over Time

• Topical Phrase with Markov Assumption

• Conclusions

Page 56: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

72

All possible “topic models” with two observed modalities (revisit)

ARTGTTOT TNG

Page 57: Structured Topic Models:  Jointly Modeling Words and Their Accompanying Modalities

73

Conclusions

• With carefully designed model structures, we can utilize multi-modality information.

• Choices of configuration are task dependent.

• Better results are obtained from joint inference on various tasks.