The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson,...

The IBP Compound Dirichlet Process and its Application to Focused Topic

Modeling

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Presented by Eric Wang9/16/2011

Introduction• Latent Dirichlet Allocation (LDA) is a powerful and ubiquitous

topic modeling framework.

• Incorporating the hierarchical Dirichlet process (HDP) into the LDA allows for more flexible topic modeling by estimating the global topic proportions.

• A drawback of HDP-LDA is that a topic that is rare globally will also have a low expected proportion within each document.

• The authors propose a model that allows a rare topic to still have large mass within individual documents.

Hierarchical Dirichlet Process• The hierarchical Dirichlet process (HDP) is a prior for Bayesian

nonparametric mixed membership modeling of data groups.

• Hierarchically, it can be defined as

where m indexes the data group.

• In HDP, the expectation of the mixing weights in is . In practice, the mixing weights in is the global average of the mixture membership.

Indian Buffet Process• The Indian Buffet Process (IBP) defines a distribution over

binary matrices with an infinite number of columns, and a finite number of non-zero entries.

• Hierarchically, it is defined as

where m and k denote the rows and columns of binary matrix b. It can be represented via a stick-breaking construction

IBP Compound Dirichlet Process• Combining HDP and IBP into single prior yields an infinite

“spike-slab” prior (ICD).

• A spike distribution (IBP) determines which variables are drawn from the slab (DP).

• The model assumes the following generative process

IBP Compound Dirichlet Process• The atom masses of data group m is Dirichlet distributed as

follows

• In this construction, the are the topic proportions for document m and B is a binary vector indicating usage of the dictionary elements.

Focused Topic Models• The authors use ICD to develop the Focused Topic model

(FTM).

• In this framework, a global distribution over topics is drawn and shared over all documents as in HDP-LDA.

• Each document infers a subset of topics from the global menu. The subset is determined by the binary vector . Since the binary vector is independent of the global topic proportions, topics that are rare globally can still make up a large proportion of individual documents.

Focused Topic Models• The generative process for the FTM is as follows

Posterior Inference• To sample the topic indicator for word i in document m,

where the integral

has an analytical form and .

• This is an important point because it suggests a general framework that can be adapted to other applications.

Posterior Inference• The joint probability of and the total number of words

assigned to topic k is

and is log differentiable with respect to and .

• A hybrid MC algorithm is used to sample from their posteriors.

Posterior Inference• The topic weights are sampled as

• And the binary topic indicators are sampled as

• Notice here that if a topic is used, it is automatically considered “active”, and additional (unused) topics can be activated.

Empirical Results• The authors considered three different text datasets:

• All models were run for 1000 iterations, with the first 500 iterations discarded as burn-in.

Empirical Results• Model Perplexity

• Topic Correlation

Empirical Results• Here, the authors compare the number of topics a word

appears in (a). The FTM has more concentrated topics.

• In (b), the authors show the number of documents the topics appear in. The plot illustrates that HDP has many topics that appear in only a few documents, while a significant portion of the FTM topics appear in many documents.

Discussion• The authors have proposed a novel model called the IBP

compound Dirichlet Process (ICD) that decouples the across-data topic prevalence and the intra-data topic proportions.

• The Focused Topic Model (FTM) was developed from the ICD that addressed several key shortcomings of HDP-LDA.

• In HDL-LDA, the global topic prevalence affects the proportion a topic can appear within a document, but in FTM, globally rare topics can still be highly occupied within a document.

• FTM shows improved perplexity relative to HDP.

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson,...

Documents

Transcript of The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson,...

Blei Weiss

QuikTrip IBP

IBP RePoRt

Yalate Medical Corporation · 2019. 6. 19. · Comen C80 O Schiller Argus LCM . V Device Side Connector REF: IBP-01/027 REF: IBP-02/027 REF: IBP-03/027 REF: IBP-04/027 REF: IBP-05/027

By Finnlay Sinead Stewart Rm 17

Update on Capacity Building through IBP IBP annual meeting, June 2011 Wageningen.

Collision - Sinead Freeman

Catalogue of IBP cable - DIYTrade.com · IBP Adaptor MPI036 PVB-Utah. Brand Part No Description Picture Compatibility IBP Adaptor MPI037 BD-Abbort IBP Adaptor MPI038 BD-Utah IBP Adaptor

IBP Jan 2017 IBP Case Study –Weir Minerals - SAPassets.dm.ux.sap.com/ru/ibp-infoday-8feb17/pdfs/SAP_IBP_8.02.17... · IBP ©2017 Westernacher Consulting AG IBP Case Study –Weir

Kanchan Parsad and Sinead Mc Coy

Blei ngjordan2003

Sinead Freeman Portfolio

Sinead Cusack Profile

SINEAD OCONNOR 2 (1)

Ibp Tariff

IBP Charter

Timebanking - Sinead Quinn (Volunteer Now)

Arsenio Hall v. Sinead O'Connor - complaint.pdf

IBP Membership

From: Sinead McQuillan Sent: 14 June 2017 08:50 Janice Wray; … · 2019. 2. 12. · Sinead From: Derek White [mailto: Sent: 14 June 2017 07:53 To: Sinead McQuillan < @kctmo.org.uk>