On-line Evolutionary Sentiment Topic Analysis ModelingOn-line Evolutionary Sentiment Topic Analysis...

On-line Evolutionary Sentiment Topic Analysis Modeling

YongHeng Chen 1, 2 , ChunYan Yin 1 , YaoJin Lin 2 , Wanli Zuo 3

1 School of Information Engineering, Lingnan Normal University,Zhanjiang, Guangdong, China,

2 College of Computer Science, Minnan Normal University,Zhangzhou, Fujian 363000, China,

Key Laboratory of Data Science and Intelligence Application, Fujian Province UniversityE-mail: {YH chen,yjlin}@mnnu.edu.cn

3 College of Computer Science and Technology, Jilin UniversityChangchun, Jilin 130012, China

E-mail: [email protected]

Abstract

As the rapid booming of reviews, a valid sentiment analysis model will significantly boost the reviewrecommendation system’s capability, and present more constructive information for consumers. Topicprobabilistic models have already shown many advantages for detecting potential structure of topics andsentiments in reviews corpus. However, most reviews are presented through time-dependent data streamsand some respects of the potential structure are unfixed and time-varying, such as topic number andword probability distribution. In this paper, a novel probabilistic topic modelling framework is proposed,called on-line evolutionary sentiment/topic modeling (OESTM), which has the capacity for achieving theoptimization of the aforementioned aspects. Firstly, OESTM depends on an improved non-parametricBayesian model for estimating the best number of topics that can perfectly explain the current time-slice,and analyzes these latent topics and sentiment polarities simultaneously. Secondly, OESTM implementsthe birth, death and inheritance for detected topics through the transfer of parameters from previous timeslices to the updated time slice. The experiments show that significant improvements have been achievedby the proposed model with respect to other state-of-the-art models.

Keywords: topic minding, sentiment analysis, nonparametric Bayesian statistics, Markov chain MonteCarlo

1. Introduction

Reviews imply a lot of valuable sentiment informa-tion of consumers on a variety of topics of the prod-uct or service. For example, there are praises andcomplaints over topics, such as the taste of foodor the hygiene conditions within restaurant reviews.This information implied within these reviews di-rectly affects the purchase decision-making of con-

sumers. It is more important for company to under-stand the experience on use of the product, and opti-mize their produces or services quality. However, itis quite challenging to detect topics and sentimentsover the large textual data sets by human being. Thishas led to the development for the topic-based senti-ment mining and retrieval techniques in recent yearsas a new research field, which aims to automaticallydetect attitudes and emotions with respect to certain

International Journal of Computational Intelligence Systems, Vol. 11 (2018) 634–651___________________________________________________________________________________________________________

634

Received 3 June 2017

Accepted 6 January 2018

Copyright © 2018, the Authors. Published by Atlantis Press.This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

latent topic implied within reviews.

As a core task of topic-based sentiment min-ing, topic models, that can detect latent topics clus-tering, have been explored recently. Topic mod-els consist mainly of parametric Bayesian and non-parametric Bayesian methods. Latent Dirichlet Al-location (LDA) 1 is one of the basic and most gen-erally model for parametric Bayesian. LDA ex-tracts and clusters semantically related topics by co-occurrence information of term within a documentcollection. But LDA can only detect a predefinednumber of topics. However, reviews often come astime-dependent data streams. The number of top-ics should be flexibly and automatically learned. Sowe assume that the number of mixture components(topics) is unknown a priori and is to be inferredfrom the data. In this setting it is natural to con-sider sets of Dirichlet process (DP), one for eachgroup, where the well-known clustering property ofthe Dirichlet process provides a nonparametric priorfor the number of mixture components within eachgroup. In this paper, instead of modeling each doc-ument as a single data point, we model each doc-ument as a Dirichlet process. In this setting, eachword is a data point and thus will be associated witha topic sampled from the random measure. The ran-dom measure thus represents the document-specificmixing vector over a potentially infinite number oftopics. To share the set of topics across documents,Teh et al. introduced the Hierarchical DirichletProcess (HDP), which is a typical non-parametricBayesian model and has ability to estimate the bestnumber of mixture components (topics) 2. In HDP,the document-specific random measures are tied to-gether by modeling the base measure itself as a ran-dom measure sampled from a DP. The discretenessof the based measure ensures sharing of the topicsbetween all the groups.

But HDP itself is a static model and not con-sidering time-stamp information embed in reviews.If topics are extracted independently through staticHDP for each grouped dataset according to timeslice, the evolutionary information will be lost. Sen-timent analysis also plays an important role in topic-based sentiment mining tasks. Earlier researchesmainly use categorization approaches, and have

made some achievements. But the basic premise isthe need for training data collection, which is basedon a substantial amount of time and energy. Moreimportantly, most reviews are not labeled, whichputs forward the problem for traditional approaches.The appearance of LDA topic model provides newopportunity for solving these problems. Sentimentanalysis approaches based on LDA have the ben-efit of LDA model, namely that these approachescan train data collection free, and identify topic andsentiment precisely. Meanwhile, these approachesalso inherit the disadvantage of LDA model, i.e., thenumber of topics cannot be learned flexibly and au-tomatically.

Inspired by the above discussions, we pro-pose an on-line evolutionary sentiment/topic model(OESTM) to jointly exploit the sentiment and top-ics property for continuous reviews. The motivationof OESTM is to detect and track dynamic sentimentand topics simultaneously over time. OESTM hasthe capacity for estimating the best number of topicsthrough adding a sentiment level to the Hierarchi-cal Dirichlet Process (HDP) topic model, and thencontrol the topics’ birth, death and inheritance byproposed Time-dependent (Chinese restaurant fran-chise process) CRFP which adds time decay depen-dencies of historical epochs to the current epochs.Compared with the existing sentiment-topic models,the biggest difference of OESTM is that OESTMcan determine topic number automatically. Further-more, we implement OESTM with the collapsedGibbs sampling algorithm. The experiment resultsshow that OESTM can effectively detect and trackdynamic sentiment and topic.

Compared with the existing methods, the contri-butions of this work are fourfold:

•The proposed OESTM can effectively andjointly exploit the topic and sentiment informationof social media reviews.

•In order to provide the more flexible method toselect the number of topics, OESTM first fuses non-parametric HDP to topic/sentiment analysis model,which models each review document as a DirichletProcess for topic discovery.

•In order to track the evolution of the topics andsentiment, OESTM employs the presented Time-


635

dependent CRFP to accomplish the model’s regen-eration.

•The main purpose of the sentiment and topicmixture model is to extract the sentiment and topicsfrom social media reviews. We apply our OESTMmodel to discover the dynamic sentiment and topicswith real social media data. We compare the perfor-mance of OESTM with ASUM, JST, HDP and LDA.The experimental results show that OESTM outper-form these model in terms of generalization perfor-mance, model’s complexity, and sentiment classifi-cation accuracy, which indicates the effectiveness ofour dynamic non-parametric model.

2. Related Work

There are two research fields related specifically tothis paper: topic-based sentiment analysis and non-parametric Dirichlet Process. In recent years, agreat deal of interest has been attracted to sentimentanalysis, as the amount of product/service reviewgrows rapidly. Typical early studies concentratedmostly on sentiment classification, which is com-posed of detecting opinions and sentiment polari-ties. Through introducing a method that combinedConditional Random Fields (CRF) and a variationof AutoSlog, Choi et al. implemented opinions andemotions detection 3. Liu et al. presented an anal-ysis framework to compare consumer sentiment po-larities score of multiple produces by a supervisedpattern discovery method 4.

To determine sentiment polarities of the docu-ment, Pang et al. proposed a machine learningmethod, which adopts text categorization techniquesand minimum cuts in graphs 5. In a different study,Pang et al. achieved sentiment classification, wherea review can be either positive or negative, at thelevel of the document using machine-learning tech-niques via the overall sentiment 6. However, thesestudies merely focused on to sentiment classifica-tion, and did not take the latent topics embedded inthe document into account, thus providing insuffi-cient information for consumers, likely leading toinapplicable results. For example, consumers justwant to know merits and faults (sentiment) about thebattery life (topic) of a cell phone, without having to

read an overall product evaluation.

Motivated by this observation, researchers con-sider combining topic extraction to sentiment anal-ysis, which is called topic-based sentiment analy-sis. As one of the state-of-the-art methods of topicmodel, Latent Dirichlet Allocation (LDA) model hasgained popularity, which is a hierarchical Bayesiannetwork. LDA builds robust topics’ summaries inaccordance with the multinomial probability distri-bution over words for each topic, and can further de-duce the discrete distributions over topics for eachdocument. In order to use time information to im-prove topics discovery, TOT proposed by Wang etal. models time jointly with word co-occurrence pat-terns based on LDA in an off-line fashion 7. Meoet al. presented a matching algorithm, that allowsdynamically and autonomously managing the evolu-tion 9,8. Nonetheless, above-mentioned topic mod-els have no capabilities of working in an on-linefashion. This stimulated researchers to search for anoptimized model, and ultimately several online topicmodels have been proposed 10,11,12. Using temporalstreams information, the dataset is divided by prede-fined time slice. At each time slice, documents aresupposed to be exchangeable. But it is not true be-tween documents across time slice. This core ideawill be inherited by this paper. Based LDA andits extended models, many topic-based sentimentanalysis models are proposed. Joint sentiment/topicmodel (JST) 13 and Aspect and sentiment unificationmodel (ASUM) 14 are representatives of these. JSTcan implement the detection of sentiment and topicsimultaneously based on LDA model. ASUM con-strains the words in a single sentence to come fromsame polarity, which is called sentence-level JSTmodel. Since social media data are produced con-tinuously by many uncontrolled users, the dynamicnature of such data requires the sentiment and topicanalysis model to be updated dynamically. Time-aware Topic-Sentiment (TTS) 15 and dynamic jointsentiment-topic model 16 are the rarely work to de-tect and track dynamic topic and sentiment based onprobability topic model. However, TTS had jointlymodeled time, word co-occurrence and sentimentwith no Markov dependencies such that it treatedtime as an observed continuous variable. This ap-


636

proach, however, works offline, as the whole batchof documents is used once to construct the model.This feature does not suit the online setting wheretext streams continuously arrive with time.

So far, however, great mass of sentiment anal-ysis models are based on LDA, which also inheritthe defect of LDA model that the number of topicsmust be pre-determined. It is insufficient for the dy-namic and massive social media data. The questioncan be resolved through replacing Dirichlet alloca-tion by nonparametric Bayesian process 17,18,19,20.Dirichlet Process is a typical method for nonpara-metric Bayesian process, which is represented byDP(G0,α), where G0 is a base measure parameterand α is a concentration parameter. Document couldbe modeled as a DP, and each word in documentd is a target object that is created by the distribu-tion words over a topic sampled from the distribu-tion of document-based mixing vector over infinitenumber of topics. To allow sharing data amongthe collection of topics across documents, anothernon-parametric model, Hierarchical Dirichlet Pro-cess, was proposed, which used Dirichlet Processesas the Bayesian prior to solve the topics number de-termination problem.

Many models that integrating time informationbased nonparametric Bayesian have recently beenproposed to improve topic discovery 21,22. In orderto implement dynamically clustering analysis of top-ics, some nonparametric Bayesian-extended mod-els have been proposed on others 23,24. But, actu-ally there’re several important differences betweenOESTM and the aforementioned models as follow-ing: (1)OESTM is the first dynamic sentiment-topicmixture model based on non-parametric HDP topicmodel; (2)In order to track the trend of the detectedtopics, OESTM first uses Time-dependent CRFP torealize the development and change of the topicsover time; (3)OESTM implements Gibbs SamplingProcess to obtain parameters at each time slice ratherthan executes global deduction, which is effectivefor updating timely evolution.

3. Methodology

3.1. Hierarchical Dirichlet Process

Before presenting the on-line evolutionary senti-ment/topic modeling (OESTM), let us review thebasic Hierarchical Dirichlet Process (HDP). Thegraphical representations of HDP and OESTM mod-els are shown in Figure 1.

HDP (see Fig.1(a)) assumes document as xd withd∈{1,...,D}, and Nd as the size of document d, thereis a local random probability measures θd to de-note topics distribution of document d. The randomprobability measure G0 is a global topic distribu-tion shared by all the documents collection, whichis distributed as a Dirichlet Process with concentra-tion parameter γ and base probability measure H.Each document d is generated based on local ran-dom measures θ that are also distributed as Dirich-let Process and conditionally independent given G0with concentration parameter α and base probabil-ity measure G0. Let kd1, kd2 ... be independent ran-dom variables distributed as local measures θ . Eachkdi is topic assignment of a single observation ith

word within the dth document. Then the word xdiis generated from the conditional distribution F(kdi)given kdi. In order to simplify infer of the samplingprocess, F is often selected for multinomial distribu-tion, and then forms conjugate distribution with basemeasure H. The likelihood is given by:

G0|{γ,H} ∼ DP(γ,H)θd |{α,G0} ∼ DP(α,G0)kdi|θd ∼ θd d ∈ {1, ...,D}xdi|kdi ∼ F(kdi) i ∈ {1, ...,Nd}

(1)

The Hierarchical Dirichlet Process can readily beextended to more than two levels. That is, the basemeasure H can itself be a draw from a DP, and thehierarchy can be extended for as many levels as aredeemed useful. In general, we obtain a tree in whicha DP is associated with each node, in which the chil-dren of a given node are conditionally independentgiven their parent, and in which the draw from theDP at a given node serves as a base measure for itschildren.


637

D

Wd,i

Zd,i

N

Sd,i

KS

D

Wd,i

Zd,i

N

Sd,i

KS

D

Wd,i

Zd,i

N

Sd,i

KS

G0t

G0t-1

G0t+1

α α α

H H H

φ φφ

θ dt-1

θ dt

θdt+1

φz,s z,sφz,sφz zz πz πz πz

γ

γ γ

λ λ λρ ρ ρ

(a) HDP (b) OESTM

zdi

G0

xdi

DN

θ d

Hγ

α

s s s

Fig. 1. OESTM is shown with one inspiration model: (a) HDP model (b) OESTM model.

3.2. On-line Evolutionary Sentiment/topicModeling

While HDP has the capability of determining the ap-propriate number of latent topics, it is not adequatefor tracking the trend of topics; it does not have thecapability to combine sentiment labels into trainingprocedure. This stimulates us to propose OESTM(see Fig.1(b)). The reviews dataset x will be dividedaccording to time slice, x= {x1, x2 ,..., xT}, whereT denotes the number of time slices and xt repre-sents the dataset of reviews that included publish-ing times in the time slice t. Moreover, xt = {xt

1,xt2

,..., xtDt}, where Dt denotes the number of reviews

within time slice t. Each review is presented with aseries of words xt

d =(xtdi)

Ntd

i=1, where Ntd is the num-

ber of words within review xtd . Let us suppose a re-

view is presented with the probability distributionover infinite topics, and topic is probability distri-bution over words. The target for on-line evolu-tionary sentiment/topic modeling is to estimate thenumber of topics, and track the development of eachtopic by analyzing the change of the word distribu-tion and sentiment polarity of the topic at differenttime slices.

In order to further integrate the time informationinto HDP, base measure G0 should be dynamically

calculated for each time slice, that is, the numberof mixture components at each time point is un-bounded; the components themselves can retain, dieout or emerge over time; and the actual parameteri-zation of each component can also evolve over timein a Markovian fashion. Through considering previ-ous time slices, the base measure Gt

0 at current timeslice can be obtained as follows:

Gt0|G0,γ,ϕ1:k ∼ DP(γ +∑

k

∆∑

δ=1E(v,δ ) ·dt−δ

k ,

G01+∑∆

δ=0 E(v,δ ) +∑∆

δ=0 E(v,δ )Gt−δ0

1+∑∆δ=0 E(v,δ ) )

(2)

where decay function E(ν ,δ ) represents the func-tion of exponential kernel, E(ν ,δ )=exp(−δ/ν), thatmanages the weight of topic k at time slice t −δ . νand ∆ define the decay factor of the time-decayingkernel and time windows that influences currenttime slice. Each epoch is independent when ∆=0,and time is ignored when ∆ = T and ν = ∞. Inbetween, the values of these two parameters affectthe expected life span of a given component. Thelarger value of ∆ and ν , the longer expected lifespan of the topic, and vice versa. If we let dt

k de-notes the number of parameters in epoch t associated

with component φk, then∆∑δ

E(v,δ )dt−δk , the prior


638

weight of component k at epoch t. Furthermore, inorder to incorporate sentiment polarity labels to re-alize sentiment classification, our model constructsthe relation between topics and sentiment labels onthe basis of some ideas of JST 13. OESTM modelimplements sentiment analysis by adding an addi-tional sentiment layer between the document andtopic. Hence, OESTM is a four layer model, wheresentiment labels are associated with documents, un-der which topics are associated with sentiment labelsand words are associated with both sentiment labelsand topics.

The formal definition of the generative processin OESTM model corresponding to the graphicalmodel is as follows:

(1)Generate the global topic distribution at timeslice t:Gt

0|G0,γ,ϕ1:k

(2)Generate the neural words distribution foreach topic:φk ∼ H

(3)Generate sentiment words distribution giventopic and sentiment:φk,s ∼ Dir(ρ)

(4)For each document d:

(4.1)Generate local topic distribution of docu-ment: θ t

d ∼ DP(α,Gt0)

(4.2)For the ith word in document d:

(4.2.1)Draw an topic assignment:zd,i ∼ Mult(θ td)

(4.2.2)Generate sentiment distribution oftopic:πz ∼ Dir(λ )

(4.2.3)Draw a sentiment assignment:sd,i ∼Mult(πz)

(4.2.4) wd,i ∼ φz,φz,s

3.3. Time-dependent CRFP

!!

"! "# "$

!#

!%

!$

!&

!'

#! #%

##

#&

#$

!"#

!""!!

!"$

! !$$

!

!$"

!

!$#

!

!"#

!

!""

!

!"$

! !$"

!

!$$

!

!$#

!!

!

!

!!

!

! ! !

!

!

!

!

!

!!

!"# !"%

!"" !$$

!$" !$#

!#

!$ !%

!(

#!

#%

##

#'

#&

!'

!"$

!&

#$

) )

#

) ) ) )

) )

) )

) )

#

#

#

#

#

#

#

#

#

##

#

#

#

#

# #

#

#

) )

"! "# "$ !$#

!""

!"# !"% "%

!$$

!$"

!"$

# #

#

#

#

#

#

# ## #

!"#

!""

!

! !"$

!

!$$

!

!$"

! !$#

!

*+,-.+)/012)+030+

4056.24.16)+030+

Fig. 2. Time-dependent CRFP.

Chinese restaurant process (CRP) is a metaphor,which is closely connected to Dirichlet Processes,and therefore useful in applications of nonparamet-ric Bayesian methods including Bayesian statistics.CRP is a discrete-time stochastic process, analogousto seating customers at tables in a Chinese restau-rant. Chinese restaurant franchise process (CRFP)based on Chinese restaurant process extended Chi-nese restaurant process to allow multiple restaurantswhich share a set of dishes, which is a random pro-cess that products an interchangeable division ofdata points and allows multiple data points to sharea set of topics 32. CRFP is usually employed to sim-ulate HDP process. In this metaphor each restau-rant maintains its set of tables but shares the sameset of mixtures. A customer at restaurant can choseto sit at an existing table with a probability propor-tional to the number of customers sitting on this ta-ble, or start a new table with probability and choseits dish from a global distribution. As the expansi-bility and hierarchy, CRFP is widespread applied in


639

non-parametric models. Although CRFP has the ca-pacity of constructing data point by using a set oftopics and allowing topics’ number to be infinite, itis static process and cannot track the developmentof the latent topics and probability distribution ofwords. This paper uses CRFP to construct a mix-ture model of grouped data. However, we replacethe first level of CRFP with a novel time-dependentrandom process. The modified CRFP is called time-dependent CRFP, which can estimate the optimalnumber of topics for current time slice by consid-ering influences from a previous time slices.

A wide array of metaphors and conceptions areemployed in CRFP. An example for time-dependentCRFP is shown in Figure 2. A document is repre-sented as a restaurant, and words are represented ascustomers in the restaurants. Words that convey ho-mogeneous semantic theme are grouped together ascustomers of similar taste sit at the same table. Adish is chosen for each table from the global menuof dish tree, which corresponds to topic assignmentfor each group of words. At restaurant level, eachrestaurant is denoted by a rectangle and consumers(small circles) sit around different dining-tables (bigcircles) associated with a dish in this restaurant. Atglobal menu level, the set of dish (topic), is servedin common for all of restaurants. xt

di represents theith consumers in restaurant d for time slice t. θ t

di rep-resents the dish enjoyed by this customer, ψ t

d j repre-sents the dish for jth table and φ t

k represents dish kon menu. In order to record the relation among con-sumers, tables and dishes, this paper gives two indexvariables. bt

di represents index of table, and ktd j rep-

resents index of dishes in restaurant d for time slicet. So we have Ψt

dbtdi= θ t

di and ϕ tkt

d j=Ψt

d j. This paperuses time-dependent CRFP to implement the assign-ment of customer to dinning-table and the relationbetween tables and dishes.

Table assignmentAt time slice t, the ith customer comes in restau-

rant d. This customer can pick jth dining-table withprobability:

ntd j/(n

td −1+α) (3)

where ntd j and nt

d denote respectively the number ofcustomer around dining table j and in restaurant d,

enjoying dish ψ tdbt

d jordered from global menu by the

first customer who sits at that table. α is a parametergoverning the likelihood of choosing of a new table.Alternatively, this customer can select a new tablewith probability:

α/(ntd −1+α) (4)

Dish assignmentWe firstly give some notations that will be

adopted for dish assignment. Customers in therestaurant sit around different tables and each tableis associated with a dish (topic) ψ according to thedish menu (global topics). Let Ntt

i represents thenumber of dining-tables within restaurant i at timeslice t. The number of dining-tables that have or-dered dish k for all of restaurants is represented asdt

k at time slice t, which is calculated as follows:

dtk = ∑Dt

i=1 ∑Ntti

j=1 1[Ψtibt

i j= k] (5)

where the [Ψtibt

i j= k] is a conditional expression. If

the condition is met, the number of dining-tables thathave ordered dish k is added one.

In order to integrate the historical influencesfrom a previous time slices, another parameter d

′tk

is defined as follows:

d′tk =

∆∑

δ=1E(v,δ ) ·dt−δ

k (6)

So the popularity of a topic at epoch t dependsboth on its usage at this epoch, dt

k as well as it his-toric usage at the proceedings epochs, d

′tk . If the

customer picks a new table and chooses a dish k or-dered by previous customs from the menu at timeslice t, the probability is as follows:

d′tk +dt

k

∑Kts=1 dt

s+d′ts +γ

(7)

where Kt is the number of dishes at time slice t. Thismeans that a topic is considered dead only when itis unused for a consecutive epochs. For simplicity,we let d

′tk for newly-born topics at epoch t, and dt

kfor topics available to be used(i.e having d

′tk > 0) but

not yet used in any document at epoch t.If this dish is ordered by consumers in ∆ previ-

ous time slices but not yet ordered by consumers at


640

t time slice, then changing the distribution of thisdish in Markova fashion:φ t

k|φt−1k ∼ p(.|φ t−1

k ) (i.e.improve the word probability distribution of topic).The probability is as follows:

d′tk

∑Kts=1 dt

s+d′ts +γ

P(.|ϕ t−1k ) (8)

If this dish has not been ordered at any timeslices, i.e., it is new dish, then the number of dish Ktincrements by one, customer can select a new dishφk ∼ H. The probability is as follows:

γ∑Kt

s=1 dts+d′t

s +γH (9)

More formally, getting aforementioned formulatogether, we have

θ tdi|θ t

di,α,ψ t−∆:t ∼j=Bt

d

∑j=1

ntd j

ntd−1+α ξψ t

d j+ α

ntd−1+α ξψ t

d jnew

ψ td jnew |ψ,γ ∼ ∑

k:dtk>0

d′tk +dt

k

∑Kts=1 dt

s+d′ts +γ

ξϕ tk

+ ∑k:dt

k=0

d′tk

∑Kts=1 dt

s+d′ts +γ

P(.|ϕ t−1k )+ γ

∑Kts=1 dt

s+d′ts +γ

H

(10)where Bt

d represents the number of dinning-tables inrestaurant d at time slice t and ξ are probability mea-sures concentrated at ψ and φ .

4. Approximate Posterior Inference

In this section, the collapsed Gibbs sampling algo-rithm is utilized 25,26 for posterior sampling the as-signments of the tables, and the dishes that servea specific table in each restaurant. For perform-ing Gibbs sampling, Markov Chain Monte Carlo(MCMC) based on time-dependent CRFP is con-structed to integrate out parameters bdi

t , ktd j, φ t

k andst

di into joint probability distribution and states con-verges to a sample from this joint probability distri-bution, where bt

di represents index of table assignedto customer xt

di, ktd j represents index of dish enjoyed

by table j, the posterior probability distribution ofk, and the sentiment assignment of customer xt

di re-spectively in restaurant d at time slice t. In orderto use Gibbs sampling algorithm for posterior infer-ence, we add a superscript i to a variable, indicate

the same quantity it is added to without the contribu-tion of object i. For example n−t,di

db is the number ofcustomers sitting on table b in document d in epocht without the contribution of word xt

di.To infer the posterior probability distribution of

these, the conditional probability xtdi (i.e. every cus-

tomer in restaurant d at time slice t) and xtdb (i.e.

all consumers pick bth dining-table in restaurant dat time slice t) should be solved firstly. Assumingthe prior Dirichlet distribution H samples topic (ordish) φ t

k through probability h(φ tk|η), and the multi-

nomial distribution samples word (or customer) xtdi

from topic φ tk through f (xt

di|φ tk). Given all the pre-

vious words except for the considered ith word indocument d at time slice t, the conditional posteriorfor xt

di is:

f−xtdi

k (xtdi) = p(xt

di|− xtdi,b,k)=∫

f (xtdi|ϕ

tk) ∏

d′ i′ ̸=di,zd′i′=kf (xt

d′i′ |ϕtk)h(ϕ

tk|η)d(ϕ t

k)∫∏

d′i′ ̸=di,zd′i′=kf (xt

d′i′ |ϕtk)h(ϕ

tk|η)d(ϕ t

k)

(11)

As base measure H and multinomial distribution(i.e. word distribution with topic) is conjugate dis-tribution, the above formula can be simplified as:

f−xtdi

k (xtdi = v) = n

−xtdi,v

k +η

n−xt

dik +V η

(12)

Where n−xt

di,vk is the word count of v in topic k except

xtdi, n−xt

dik is the number of words in topic k except

word xtdi and V is the length of word vocabulary.

Given all words except for the words xtdb, the con-

ditional posterior for xtdb is:

f−xtdb

k (xtdb) =

n−xt

dbk +V η

n−xt

dbk +n−xt

db+V η

∏v

Γ(n−xt

db,vk +n

xtdb,v+η)

∏v

Γ(nxtdb,v

k +η)

(13)According to time-dependent CRFP and sen-

timent analysis requirement, we employ the fourstages of inference process.

Sampling table b For each customer at timeslice t, the distribution of table bt

di with the spe-cific xt

di is concerned with the number of consumersaround this table, which is given by:


641

p(btdi = b|−bt

di,kt−∆:t ,xtdi) ∝{

n−t,didb · f−xt

dikt

db(xt

di)

αP(ktdbnew = k|k−tdi

t−∆:t,−bt

di,xtdi)

(14)

Several points are in order to explain equation (14).There are two choices for word: either to sit on anexisting table, or to sit on a new table and choose anew topic. In the second case, we need to sample atopic for this new table which leads to the equation(15), where the probability to sit on a new table canbe constructed by marginalizing over all availabledishes.

P(ktdbnew = k|k−tdi

t−D:t,−bt

di,xtdi) ∝

(d−tdbnew

k +d′tk ) f−xt

dik (xt

di)

k is being used : d−tdbnew

k > 0

d′tk f−xt

dik (xt

di)

k is available but not used : d′tk > 0

γ f−xtdi

knew(xt

di)

k is a new topic

(15)

Sampling topic k Once the assignment of ta-bles is complete, the posterior sampling dish k canbe implemented. The process of sampling dish kt

d jis similar as the above equation, but we need to con-sider the probability of small groups of words (likeconsumers on a given table). The conditional prob-ability is estimated as follows:

P(ktdb = k|k−tdi

t−D:t,xt

db) ∝

(d−tdbnew

k +d′tk ) f−xt

dbk (xt

db)

k is being used :d−tdbnew

k > 0

d′tk f−xt

dbk (xt

db)

k is available but not used :d′tk > 0

γ f−xtdb

k (xtdb)

k is a new topic

(16)

Sampling topic is important as it potentially changesthe membership of all data sitting at table and leadsto a well-mixed MCMC.

Sampling φk Given b, k and observed x, the pos-terior conditional probability distribution of every

φk only depends on whether all consumers enjoyedtheir dish, which is estimated as follows:

P(ϕ tk|t,k,x,ϕ

−tk ) ∝ h(ϕ t

k|η) ∏di:kdbdi=k

f (xtdi|ϕ t

k)

(17)Sampling s Let st

di denote the sentiment polar-ity for ith word in document d at time slice t. Af-ter topic inference, a sentiment needs to be chosenfor the very word under this topic. That is we thensample the sentiment polarity st

di for every customerenjoying this dish after dish inference. We apply aDirichlet allocation as the prior for sentiment distri-bution. Under the aspect k at epoch t, the sentimentinference can be made as equation (18), where n−t,xdi

ksxis the number of word x has been assigned to senti-ment s under topic k except xdi, n−t,xdi

ks the numberof words have been assigned to sentiment s undertopic k except xdi, n−t,xdi

k is the word count of topick except xdi, and S is the number of sentiment.

p(stdi = s|k) ∝ n

−t,xdiksx +ρs

n−t,xdiks +V ·S

· n−t,xdiks +λ

n−t,xdik +S·λ

(18)

OESTM model adopts indirect method ofMCMC sampling algorithm to infer distribution pa-rameters θ , φ and π . Using these distribution pa-rameters, the latent topics, the sentiment polarityand represent words of topic can be mined.

5. Experiments

5.1. Datasets Presetting

To accomplish experiments we use two different re-views datasets. The first dataset was composed ofrestaurant reviews from the website Yelp.com. Thesecond dataset was the collection of hotel reviewsthat has been used previously 27. These datasetswere preprocessed by (1) deleting stop-words andnon-English alphabets; (2) deleting low frequencywith appearances be low six times, as well as shortreviews that are shorter than seven words; (3) adopt-ing Snowball algorithm to stemming for words.

Sentiment analysis is much more challengingthan topics detection, because consumers express


642

their attitudes through subtle manner, but topic de-tection is simply implemented on the basis of wordsco-occurrence. One technique increasing the accu-rateness of sentiment analysis is to integrate priordata, i.e., sentiment lexicon. In this section, threesentiment lexicons, Paradiams 28, Mutual Informa-tion (MI) 29 and MPQA 30, will be used to improvethe sentiment classification accuracy. Table 1 showsthe properties of datasets and sentiment lexicon in-formation adopted in our experiments.Table 1. The properties of the data sets and sentiment lexicon.

!"#$% &'()'*% +',-',.'%

&'%-/$"/,- 012314

567889 3:7849 :;7889

0 15 58

1::2564

<!"=%

;261625>6+',-',.'?@',A-B

C!-'@ ;02143

;07889 >57109 :47>69

0 15 58

5;:26;:025>321>;+',-',.'?@',A-B

#/"/=)/D% 51??E??51

F/"/=)/D%GHI 01??E??01

HFJK 1;;4??E??5510

+',-)D',-?@'L).!, M?!N?#!@/")-O?*!"=%P#!%7E,'A7Q

Unless otherwise stated, in this experiment theseparameters were set according to the following val-ues: the hyper parameter η of base probability mea-sure H was set as 10; concentration parameter γ andα were respectively obtained by vague gamma prior,γ∝Γ(1,0.1), α∝ Γ(1,1). Continuous time slices ∆=4.The number of time slices is set as 20, T=20. Toenable comparison to other models, parameters thatare LDA-based models were set to the following:Dirichlet hyper parameter α = 0.5,β = 0.02. Thehyper parameters λ and ρ were set as 1.0 and{10−7,0.01,2.5}. The parameter v in exponentialkernel is set as 0.5.

5.2. Perplexity

The density measurement, expressing the potentialconfiguration of data, is the intention of documentmodeling. Measuring the model’s universal perfor-mance on formerly unobserved document is gen-eral method to estimate that. Perplexity is a canon-ical measure of goodness that is used in language

modeling to measure the likelihood of a held-outtest data to be generated from the potential distri-butions of the model. In this subsection, we willemploy perplexity of the statistical model on test re-view datasets to measure the generalization perfor-mance when the performance of the model starts tosteady state. The lower perplexity manifests the bet-ter generalization performance will be. We classi-fied the data into 80% for training set and 20% fortesting set, where classification proportion is consis-tently across time slices. Formally, given the testingdataset, the perplexity value can be calculated as fol-lows.

perplexity(x) = exp(−J∑j

N j

∑i

log2 p(x ji)/J∑j

N j)

(19)Four models (LDA, ASUM, HDP and OESTM)

will be adopted over two datasets to compare theperplexity performance. Firstly, the number of top-ics for LDA and ASUM was set to 20. The first tworows of Figure 3 illustrate the result of the perplexityas a function of the number of iterations of the Gibbssampler. Secondly, the number of iterations for fourmodels was set to 100. The second two rows of Fig-ure 3 shows the result of the perplexity as a func-tion of the number of topics. Because nonparametricBayesian models, HDP and OESTM, are irrelevantto the number of topics, the values of perplexity forthem are fixed.

As shown in Figure 3, HDP and OESTM modelscan effectively work for documents clustering thanLDA and ASUM models, i.e., have better general-ization performance and presents lower perplexityvalue. From the results it can also be seen that forLDA and ASUM model, picking the right numberof topics is key to getting good performance. Whenthe number of topics is too small, the result suffersfrom under-fitting. However, blindly increasing thenumber of topics could on the other hand make over-fitting. On the other hand, the number of topics ob-tained intelligently under non-parametric OESTMand HDP model is consistent with this range ofthe best-fitting LDA and ASUM model. Otherwise,OESTM model is slightly better than HDP, whichimplies that OESTM can well find topics through


643

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

500

1000

1500

2000

2500

3000

3500

number of topics =20 for ASUM and LDA

Hotel dataset

perp

lexity

the number of iterations

OESTM ASUM HDP LDA

20 40 60 80 100

800

1000

1200

1400

1600

1800

2000

2200

2400

2600 Hotel dataset

number of iterations=100

perp

lexity

the number of topics

OESTM ASUM HDP LDA

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

500

1000

1500

2000

2500

3000

3500

number of topics =20 for ASUM and LDA

Restaurant dataset

perp

lexity


OESTM ASUM HDP LDA

20 40 60 80 100

800

1000

1200

1400

1600

1800

2000

2200

2400

2600 Restaurant dataset

number of iterations=100

perp

lexity

the number of topics

OESTM ASUM HDP LDA

Fig. 3. perplexity score on two datasets against different models. The first two rows are perplexityfor different numbers of iterations. The last two rows are perplexity for different numbers of topics.

incorporating information of time and sentiment toprovide a better prior for the content of emergingdocuments. However, perplexity values of OESTMpresents overall the similar results with HDP model.

5.3. Complexity

Nonparametric Bayesian methods are often used tosidestep model selection and integrate over all in-stances (and all complexities) of a model at hand(e.g., the number of clusters). The model, thoughhidden and random, still lurks in the background.Here we study its posterior distribution with thedesideratum that between two equally good pre-dictive distributions, a simpler modelor a posteriorpeaked at a simpler model is preferred.

In this subsection, we will measure the Complex-ity of model to evaluate non-parametric Bayesianmodels 31. To implement the complexity of modelthe definition of a topic’s complexity will firstly begiven. The complexity of a topic is in proportionto the number of words allocated to this topic, i.e.,the complexity of a topic is zero if no unique words

are allocated to this topic, otherwise, the number ofunique words allocated to this topic. So we expressthe complexity of a topic k as follows:

Complexityk = ∑d 1[(∑n 1[Kdi = k])> 0] (20)

where Kdi is the topic assignment for ith word in doc-ument d. For the posterior topic allocation of theGibbs sample, the complexity of the model is thesum of all topic’s complexities and the number oftopics, which can be computed as follows:

Complexity = #topics+∑k Complexityk (21)

The complexity analysis considers the number oftopics employed to describe the dataset according toEquation 20. A higher complexity of a model showsthis model need more topics to express the dataset- that is, the dataset is classified into more dimen-sions. So a lower complexity manifests the bettermodel will be, on condition that the generated ex-periment results for perplexity is alike. Figure 4 il-lustrates the result of the complexity as a function


644

of the number of iterations of the Gibbs sampler forthe two different datasets. As shown in Figure 4,OESTM model has lower average complexity andis better than the HDP in all cases. In spite of theadvantage of OESTM model in terms of perplexitymight be small than HDP, but taking account of aver-age complexity, the overall effect of OESTM modelis more superior.

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1001500

2000

2500

3000

3500

4000

4500

5000

5500

6000

Restaurant dataset

complex

ity


OESTM HDP

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1001500

2000

2500

3000

3500

4000

4500

5000

5500

6000

Hotel dataset

complex

ity


OESTM HDP

Fig. 4. Model complexity comparison of HDP and OESTMmodels.

5.4. Sentiment Classification

Three sentiment polarity labels, positive, negativeand neutral, are selected to associate to all words.l(x) is used to express the polarity label for word x.l(x)=1 if label is positive, -1 negative and 0 neural.Firstly, each word term matches with sentiments lex-icon. The sentiment polarity label will be assignedto a word, if that word matches with one of wordsin sentiments lexicon. Otherwise, a randomly senti-ment polarity label from three labels is selected fora word.

After posterior sampling the assignments, i.e.,MCMC reaches stead state, each word within a doc-

ument is attached a sentiment polarity label usinga sentiment polarity of a local topic that this wordhas been assigned. The sentiment polarity of localtopics can be further obtained according sentimentdistribution π within a local topic. The documentsentiment value can be calculated as follows:

std = ∑

x∈xtd

l(x)φzd,x,stdx

std ∈ [−1,1] (22)

Where std denotes the sentiment value of document

d. if this value is less than 0, the document is cat-egorized as negative. If this value is more than 0,the document is positive. Otherwise, the documentis neutral. Restaurant and Hotel corpus uses 5-starsrat-ing system. We assume reviews with 1 or 2-stars are negative. Reviews are considered as pos-itive with 4 or 5-stars. Reviews with 3-stars will notbe considered, i.e., reviews are being categorized ei-ther as positive or negative, without the alternativeof neutral. In this subsection, we measure the sen-timent classification accuracy in terms of differentsentiment lexicons for three sentiment models, JST,ASUM and OESTM.

Table 2 presents the predictive results of sen-timent classification accuracy. It can be observedfrom Table 2 that through incorporating only 21 pos.and 21 neg. paradigm words, OESTM model merelyacquired a relatively poor 74.3% overall accuracyand JST and ASUM acquire 66.8% and 73.5% re-spectively based on Restaurant dataset. Similar re-sults can be observed for Hotel dataset. By com-bining the top 20 words based on MI scores withparadigm words, it can be illustrated that there’s lotsof improvement in classification accuracy with 2%,2% and 14% for JST, ASUM and OESTM respec-tively. But classification accuracy is not proportionalto the number of sentiment polarity words. Table 2shows that incorporating the selected words in theMPQA sentiment lexicon caused the deteriorationof classification accuracy, leading to impairment ofthe performance. Classification accuracy decreasearound 1%, 4% and 8% respectively for JST, ASUMand OESTM based on Restaurant dataset. Similarexperiment results can be found for Hotel dataset.As shown in it, in all settings, the accuracy of sen-


645

Table 2. Sentiment classification accuracy comparison.

!"!#$"

%$#"!&'!("

)*+,-. /*01,-. 23*+1,-.

45#6 45#6 45#6($76 ($76 ($76 58$'!9958$'!9958$'!99

:;6< :=6=

:<6= ><6?

:@6< >;6:

=?6< :;6@

:;6@ :>6:

=>6@ :>6>

:;6= :>6;

>=6= >>6=

:>6< ><6:

4!'!AB!C#

D!'!AB!C#E1F

1DG/

*$("BC$("H9$IBJ5(

)*+,-. /*01,-. 23*+1,-.

45#6 45#6 45#6($76 ($76 ($76 58$'!9958$'!9958$'!99

=>6? :K6<

:=6< >=6K

:@6< >L6<

==6= :<6=

:@6? >;6:

=>6= :M6>

:L6< :>6>

>:6M >M6=

:<6= >=6:

N5"$9

!"!#$"

==6> :?6K :<6?

:<6K< :M6<K >:6=

:?6K :=6KK >L6KK

:;6=

:=6K

:<6@

:L6>K

>L6<K

:=6M

:K6L

>>6:K

>;6=K

timent classification of OESTM model always per-forms better than JST and ASUM sentiment models.

5.5. Hyperparameter Sensitivity

There are two principle parameters defined inOESTM model, namely ν and η , where ν definesthe time decaying kernel, and η is the hyper param-eter of base probability measure H. To assess thesensitivity of OESTM to hyper parameters’ settings,we conducted a sensitivity analysis in which we holdall hyper parameters fixed at their default values, andvary one of them. Held-out likelihood (LL) is widelyused in the topic modeling community to comparehow well the trained model explains the held outdata. In this subsection, we use parameter varia-tions as a proposal for calculating the test LL basedon Restaurant dataset for OESTM. We should notehere that the order of the process can be safely setto T, however, to reduce computation, we can set ∆to cover the support of the time-decaying kernel, i.e,we can choose ∆ such that E(ν ,∆) is smaller than athreshold, say .001. The results are shown in Figure5.

1 2 3 4 5 6-32480

-32460

-32440

-32420

-32400

LL

parameter of time decaying kernel

2 4 6 8 10-32480

-32460

-32440

-32420

-32400

LL

hyper parameter of base probability measure

Fig. 5. Held-out Likelihood for different parameters.

Firstly, While varying ν , we fixed ∆ = T to avoidbiasing the result. We noticed that when ν = 6, sometopics weren’t born and where modeled as a contin-


646

uation of other related topics. ν depends on the ap-plication and the nature of the data. In the future, weplan to place a discrete prior over ν and sample it aswell. Secondly, the best setting for the variance ofbase measure is from [5, 10], which results in topicswith reasonably sparse word distributions.

5.6. Evolutionary Senti-Topic Discovery

OESTM is proposed to produce topics coupled witha sentiment for each time slice, easier facilitatingcustomers to reveal how development and changeof topics and topic sentiment scores developed overtime. In this experiment, the probability distributionfor words given topic k, i.e. neural words that areassigned to topic k was estimated using φk, the dis-tribution for words given topic k and sentiment labels was estimated using φk,s. In order to track the sen-timent trend for each topic, the sentiment scores oftopic k with sentiment polarities for time slice t aredefined as following.

stk =

∑

x∈xtk

ϕk,pos. positive score

− ∑x∈xt

k

ϕk,neg. negative score(23)

Where xtk is the words set that are assigned to topic

k. Average sentiment score for each topic is the sumof score for positive or negative. Then, the sentimentscores for each topic series form the sentiment timeseries {...,st−1

k ,stk,s

t+1k , ...}.

In this subsection, the topic word probability dis-tributions and the sentiment word probability dis-tributions of topics for different time slice will befirstly obtained by OESTM. Then each topic’ sen-timent score for different time slices will be calcu-lated through equation 23. Four example topics wereused: the first three are selected by the probabilityvalues, ranging from high to low, while the last oneis selected randomly. This is coupled with topic andsenti-topic words probability distributions of threetime slices in the left of Figures 6 and 7. The firstfive topic words and ten senti-topic words are pickedattached probability. The emotional changes of leftfour topics, reflected by sentiment score, for differ-ent time slices are shown at the right of Figure 6 andFigure 7.

For example, the first topic word for the extractedtopic 1 based on Restaurant dataset is “meat” withprobability 0.181373, and the senti-topic words cou-pled with positive and negative label at time slice 12are “good” and “dry” with probability 0.161267 and0.152256 respectively. Topic 4 based on Restaurantdataset is emerging at time slice 10, which is inher-ited at time slice 13. After time slice 13 this topic isnot be identified and will die at time slice 13+∆. Thisproves the efficacy of OESTM in achieving the birth,death and inherit of topics. The topic 4 based on Ho-tel dataset is born at time slice 3, which is observedfrom sentiment score graph, but do not been identi-fied at time slices 4, 6, 9, 14 and 17. This topic hasno died, because it is used for subsequent consecu-tive ∆ time slices, rather than been inherited throughconsidering influence from previous time slices tothe updated time slice at time slices 5, 7, 8, 10,11, 12 and 13. This proves more that our proposedmodel has capacity to inherit topics that are identi-fied at previous consecutive ∆ time slices.

The sentiment score reflects the topic’ emotionalstatus. For example, the sentiment average score fortopic 2 based on Restaurant dataset at time slice 7is -0.49, which indicates that a lot of negative feed-back during this period on health topic have beenreceived. This can attract the attention of the com-pany, and supervise and urge it to improve relatedservice.

6. Conclusions

In this paper we addressed the problem of model-ing time-dependent review corpus. On-line evolu-tionary sentiment/topic modeling (OESTM) is pro-posed, which can adapt number of topics, the topicwords (without sentiment information) distributionsof topics, the senti-topic words (attached sentimentpolarity labels) distributions of topics, and trackthe topics’ emotional development over time slice.As far as we know, we are the first to deal witha time-dependent reviews through non-parametricBayesian topic model in order to implement topic-based sentiment analysis. At each time slice, atopic-clustering with the estimated best cluster num-ber based on time-dependent CRFP, smoothing with


647

!"#$%

& !"#$'(!)*%+

,!%-

./0-

1/2 #3/2 '(!)*%

4!"#$'5

6-575898:

3/;

6-57<=>?:

;% /

6-<?@8=<:

AB;C!)

6-<?@8=<:

;% /

6-<7?D==:

%;B #

6-5D5=D9:0!!*

6-55D9=9: E#2'

6-<>D?<@:A)/%E'

6-<8<58@: ;% F

6-<=8>88:")/ F

6-<59@D8:$)#%"'

6-<59@D8:*/B#$#!G%

6-<55899:H/ /)

6-<55899:*;#2 F

6-<<D=<8:"B/;%;2 BF''

#3/'%B#$/I5=

6-5@==@D:*)F

6-<>?=@?: !!'

6-<8=<98:$!3"B;#2%'

6-<8=<98:%;B F

6-<=>95<:"!)J'

6-<5D??5:B# B/'

6-<58<=?=@:H;*'

6-<5==@=:% ;B/2/%%

6-<5==@=:G2*/)$!!J/*

6-<5==@=:B;$J''

6-5@?559:0!!*

6-<??=7<:*#B#$#!G%

6-<>7D=<:A;C!)# /

6-<85=79:$)#%"

6-<=@8>@: ;% F

6-<=@8>@:")/ F

6-<57<=@:A)/%E''

6-<57<=@:%G"/)

6-<5<??7:*;#2 F

6-<5<??7:3/2*/*

6-59D>9D:*/B#$#!G%

6-5==>D5: ;% F'

6-<@57@D:$)#%"'

6-<=7>D=:3!#%

6-<=7>D=: /2*/)

6-<=7>D=:A)/%E'

6-<57859:3;2F

6-<57859:$E#$J/2

6-<5>9=9:2#$/''

6-<<D8<5:H/ /)

6-57D@D>:2;% F

6-<D?=5=: !!'

6-<>98>=:0)/;%F

6-<8<<=@:%"#$F

6-<8<<=@:HB;2*

6-<=>=57:%;B F'

6-<5D?@8:G2*/)$!!J/*

6-<5D?@8:*#%;""!#2 ''

6-<5=5<5:"!)J'

6-<5=5<5:% ;B/2/%%

6-<?88=7:2;% F

6-<D8@5?:%;G$'

6-<D8@5?:G2(!) E

6-<8<<85:%;B F

6-<8<<85:*)F'

6-<=857@:B;$J''

6-<57=@@:E;)*'

6-<<?759: !G0E

6-<<?759:B# B/'

6-<<DD<@:$)G%

#3/'%B#$/I58 #3/'%B#$/I5>

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/01

230411567$.-89'

2306:4;17<=#- '

23000>::7?8""@

23AB5:6>7%"8$#!=%

23A4;1>07(-..

23A5;1117C- -)

23A5;1117)-%!)

23A5;11179#$-

23A1AA017C)#D?

23A1AA017E=9

2301AA0>7=D.@'

23001>517*#) @'

23A;4B>17(!)%-'

23A;:6517C8*.@

23A4:>1A79!#%-'

23A4:>1A7.!=*'

23A4:>1A7%,!F#9D'

23A10B047=9?8""@

23AAB4447,89@

23AAB4447,-%%@

G!%3

H-D3

I!"#$'1

2300A;>>7

?-8.

2304A6:17

*#%?(8)-'

23AB;15A7

%89# 8)@

23AB1>507

$="

23A;B1167

8C.-

2305>:AA7$.-89

23015A;67<=#- '

23AB56A178 ,!%"?-)'

23A401:A7(-..

23A61:B>7C)#D?

23A61:B>7%"8$#!=%'

23A>51567*8 -'

23A>51567E=9

23A0B1107D)-8 '

23AAB0557$8"8$#!=%'

230>A4467$.-89'

23011>AA7C)#D?

23AB051>7<=#-

23A4:4567)-%!)

23A4:4567$8"8$#!=%

23A6AA647J!@E=.

23A6AA647(-..

23A0;4>:7D)-8

23A0;4>:79#$-

23AAB>167C-%

230>:41B7*#) @'

2301B5>47=D.@

2300AB;67$)!(*

23AB566079!#%-

23A:B41>7%,!F#9D'

23A6AB6:7,=%#$'

23A6AB6:7 8.F'

23A6AB6:7.!!F'

23A0>AAA7.!=*'

23A0>AAA7,-%%@'

23050:007=D.@

2305AA4;7*#) @

230>>4;079!#%-

23AB6;1A7C8*.@

23A;;64A7%,!F#9D

23A;;64A7.!=*

23A;;64A7,-%%@

23A50B667 8.F

23A10A:>7=9?8""@

23A10A:>7,89@

#,-'%.#$-/0> #,-'%.#$-/05

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/01

2!"#$'3

451607869

%-):-

450;63<09

= ->*=>

4508101?9

% =@@

4508101?9

(=# -)

456;;A??9

B"C--"

450<60189@)#->*.D'

450661309$!,@!) =E.-'

450666609F-."@B.

456;<;309>#$-

456;<;309@--.'

456A66039".-=%=> '

456A66039@=%

456A66039%F!)

4566;3<?9%= #%@=$ !)D

4566;1309$!B) -!B%'

450?66?79@)#->*.D'

450373;19E- -)'

450003019$!,@!) =E.-

450003019%= #%@=$ !)D

456;13?09$!B) -!B%

456??A619@=%

4563<;379%F!) '

4563<;379F-."@B.

45613A319")!@-%%#!>=.'

45601;1?9G!!*''

4501A3<09$!,@!) =E.-'

45008A3?9@)#->*.D'

456;13?19$!B) -!B%'

456<3A109-H )-,-.D

456<3A109= -> #:-

456<3A109")!@-%%#!>=.'

456133A19F-."@B.

456683109".-=%=>

456683109@=%

4566A13?9>#$-

450067<39%.!('

456837<39)B*-

456837<19E=*.D

456763879(=#

456763879)B*-

456?;;739.!>G'

456A683A9"!!)'

456A683A9= # B*-'

456A683A9:-)D'

4566;3119-H")-%%'

456;;;<09)B*-

456;;38A9%.!('

4568;3A19 #,-'

456766;39.!>G

456A66319)B*-'

4566;0179= # B*-'

4566;0179$!.*

4566;0179(=#*

4566?31?9E=*.D

4566?31?9:-)D

45007<3?9)B*-'

45068<319 #,-'

45068<319%.!('

456;6;839= # B*-

456863709E=*.D

456<6;;09(=#*

456<6;;09.!>G

456?38719"!!)

4561888A9-H")-%%

45667<789B>")!@-%%5

I!%5

J-G5

#,-'%.#$-/03 #,-'%.#$-/0A

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/01

2!%3

4-53

6!"#$'7

8)9:*!,'

%-.-$ -*;

<3=>?1@AB

)!9$C

<3=D?1@1B

C-9. C

<3=ED@1@B

FG5

<3=ED@1@B

*#%C

<3=7E@7AB

%!G)

<3=7DA10BH--.

<3=7DA10B(!) C

<3=@DE@AB(-..

<3==>7EEB$!%

<3==D?D@B$C-9"

<3==D?D@B"-!".-'

<3==D?D@B5!!*

<3==7?D@B%( -

<3===11AB#:-I"-:%#J-

<3===11ABKG#$L''

<3101@0>B%#$M'

<30D>?@AB !!'

<30@==10B*#%5G% #:5'

<30@==10B*)!""#:5%

<3=D>?D@B%".# '

<3=A@7>@B9:5)L

<3=@@71@B!G )95-!G%'

<3=@@71@BG:$.-9:

<3=@@71@BF9*

<3=1==1ABG:(!) CL'

#,-'%.#$-/0@ #,-'%.#$-/07

0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re

Pos.Evolution Average Neg.Evolution

0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


Fig. 6. Evolutionary senti-topic discovery based on Restaurant dataset.Left are discovered topics,topic words and senti-topic words. Right are the sentiment emotional changes of left topic.


648

!"#$%

& !"#$'(!)*%+

,!%-

./0-

1/2 #3/2 '(!)*%

4!"#$'5

6-578975:

;!$< #!2

6-55755=:

)<>>#$

6-?@?7A7:

% < #!2

6-?A7=?7:

BC%

6-?A7=?7:

<D#

6-58???=:$!2E/2#/2

6-5=8=@F:BC%G'

6-5=?7@79:%# C< #!2

6-?@5779:$/ )<;'

6-?@5779: )<E/;

6-?H7=59:%I!""#20'

6-?8A7=5:><$#;# < /'

6-?8A7=5:%C"/)#!);G

6-?FAH7=:C2#JC/'

6-?FAH7=:<*E<2 <0/!C%'

#3/'%;#$/KA #3/'%;#$/K@ #3/'%;#$/K5?

6-59HH=9:%I!) '

6-5F98=5:><$#;# < /'

6-57H8=7:%I!""#20'

6-57H8=7:$/ )<;

6-?A?@=5:$!2E/2#/2

6-?A?@=5:(/;;'

6-?A?@=5: )<E/;

6-?9@7=9:<*E<2 <0/!C%'

6-?9@7=9:$!33/)$#<;

6-??@HH5:%C"/)#!);G

6-5F@A75:><$#;# < /'

6-5F@A75:$/ )<;

6-57??AA:$!2E/2#/2

6-57??AA:(/;;'

6-?AH8=7:%I!) '

6-?AH8=7:<*E<2 <0/!C%'

6-?AH8=7: )<E/;

6-?9?@A@:%<>/ G

6-?757=5:LD$/;;/2 '

6-?757=5:% < #!2

6-5=9F89:I/<EG'

6-5=9F89:;!20'

6-55@AH=:$)!(*/*

6-55@AH=: )<2%"!) '

6-?8?@@=:(<% /'

6-?8?@@=:#2$!2E/2#/2$/

6-?8?@@=:*#>>#$C; '

6-?F@@A7:/D"/2%#E/

6-?F@@A7: )<>>#$'

6-?57=75: #3/''

6-57889=:#2$!2E/2#/2$/

6-57889=:;!20'

6-?HAA==:$)!(*/*

6-?HAA==:/D"/2%#E/

6-?HAA==:M<3

6-?F@A==: )<2%"!) '

6-?F@A==:I/<EG'

6-?75775:*#>>#$C; '

6-?75775:)/3! /

6-?75775:$!3";<#2

6-55H8=7:#2$!2E/2#/2$/

6-55H8=7:*#>>#$C; '

6-?HAH79:I/<EG'

6-?HAH79:><)

6-?HAH79:$)!(*/*

6-?FAA7=:)/3! /'

6-?FAA7=:;!20

6-?FAA7=:/D"/2%#E/

6-?77575:B<*;G'

6-??@775: !!''

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/0 #,-'%.#$-/1 #,-'%.#$-/23

4!%5

6-75

85310992:$!,;!) <=.-'

85310992:%"<$#!>%'

853099?@:A>#- '

8530B?9C:%< #%;<$ !)D

8530B?9C:%!; '

853@100?:%># <=.-

853@100?:=)#7E

853C99BB:,!*-)F

853?1310:=)-<G;<% '

853?1310:E!,-% <D

85311902:HF$!,;!) 5

8531BB1B:F!#%D

8530@I@B:F<))!(

8530@I@B:E<)*

853@99?2:*-$!)

853@99?2:J-F #.< -*'

853C000?:%!>F*")!!;

853C000?:*#,.D

853BB11B:%,<..'

853BB11B:E!

K!"#$'B

85222B?2:

$!,;!)

8523B1B3:

;>)F# >)-

85313B?1:

$!>$E

8530B??B:

"#..!(%

853@0B0?:

,< )-%%

8523BC@2:$!,;!) <=.-'

852332?2:%!;

852332?2:A>#- '

85309@BI:=- -)

85309@BI:.<)7-'

853@0B@I:%"<$#!>%'

853@0B@I:%< #%;<$ !)D

853?@9B?:.<)7-

853B109@:=)#7E

8533100@:=)-<G;<% '

8523BB?2:>F$!,;!) <=.-

8523BB?2:F!#%D

853900B2:(!)%-

853@11B?:E!

853?0I@B:%!>F*")!!;

853?0I@B:*#,.D'

853?0I@B:*-$!)

8532B2BB:$!.*

8532B2BB:F<))!('

8533@9B?:"!!).D

852231BI:$!,;!) 5'

852333B0:A># -

8531B?0B:=- -)'

8531B?0B:%< #%;<$ !)D

8539311I:%"<$#!>7'

8539311I:=)#7E

8539311I:,!*-)F

853C990?:$.-<F

853C990?:<#)

853?332B:%># <=.-'

8522311B:>F$!,;!) 5

85319@@B:(!)%-

85319@@B:E!

853C010?:=)-<G

853C010?:F!#%D

853C010?:.!>*

853?10BB:"!!).D

853?10BB:7!F-

853B2BB2:%,<..'

853B2BB2:$)D

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/0 #,-'%.#$-/1 #,-'%.#$-/23

452366789$.-:;'

4531880<9=)-%>

453<11789*#) ?

453<[email protected]:%:; .?''

453A11769(:%>-*

453A11769 #,-.?'

453711629%$-;

453711629% :#;

453688079)-".:$-*

453688079,:#; :#;

453088679*#) ?'

453088679$.-:;'

4530<<769(:%>-*

4530<<769;-(

453801179 #,-.?

453801179B- -)'

453C06729)-".:$-*

453C06729")!=-%%#!;:.'

453688629(-..

453688629=:% '

452201629*D% '

452300729.:$E

4531CC709)!:$>

453<08069.!;F

453<08069%$)#"

453<08069"!!)

453A88079(!)%-

453A88079,!%GD# !

453312669*#%-:%-

453312669=#. >?

@!%5

H-F5

I!"#$'7

452608C79

$D) :#;%

4522676C9

=.!!)

453026769

B-*%

453<167A9

)-% )!!,

453<167A9

!#.-

453111689$.-:;'

45308<629F!!*

45308<629=)-%>'

453<11769*#) ?

453<11769;#$-'

453A80629;-('

453A80629% :#;

453A80629-J ):

[email protected]:%:; .?''

453711629%>--

453013779"!!)'

453807789=#. >?

453807789*D% '

453<88719$:)"- %

453C88769(!);

453C88769.:$E

453C88769%:;# : #!;'

453A00609(-

453A00609(!)%-

453311769*#%-:%-

4531776A9*D% '

453<007C9)!:$>

453A006<9.!;F

4536113A9"!!)

4536113A9,!%GD# !

4536113A9!.*

4536113A9,:;?

453318879*#%:""!#; -*

453318879(-

453318879,D$>

!"#$%

& !"#$'(!)*%+ #,-'%.#$-/0 #,-'%.#$-/1 #,-'%.#$-/23

453166789:;% '

453011729)-.#;<.-

453=007=9->!?@A'

453=007=9;$$-%%'

453B10829(-..

453B10829:;% -)'

4538828B9(#)-.-%%

4538828B9-;%C'

4538828B9%"--*%'

453316=89!>.#>-'

453106689%.!(

45306==79.;$D

453=11069"!!)

453E11879")!<.-,

453E11879#>% ;<#.# C'

453710009$A;)@-

4538007B9<)!D->'

4538007B9A;)*

453307869#>$!))-$ '

453307869 !!

F!%5

G-@5

H!"#$'B

I);>*!,'

%-.-$ -*J

45310=879

<)!;*<;>*'

453066879

#> -)>-.

453600789

$!,"? -)'

453=07B89

)!? -)

453=07B89

%"--*

45233E=E9:;% '

45233E=E9)-.#;<.-

4531887E9;$$-%%'

453066789(#:#

453066789(-..

453628229#> -..#@->

453628229-;%C'

453B00879%"--*%'

453B00879%; #%:;$ !)C

453301089$!>K->#->$-

4522366=9%.!(

45316=709#> -))?"

45306=879")!<.-,

453=008=9:)-L?-> .C

453=008=9$A;)@-

453=008=9#>$!>K->#->

4537006B9 !!

4537006B9 )!?<.-

453360829#>% ;<#.# C

453360829*!(>.!;*

0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

sent

imen

t sco

re

time slices


0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


0 2 4 6 8 10 12 14 16 18 20-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

time slices

sent

imen

t sco

re


Fig. 7. Evolutionary senti-topic discovery based on Hotel dataset.Left are discovered topics, topicwords and senti-topic words. Right are the sentiment emotional changes of left topic.


649

the topics’ inheritance by considering historical in-fluences from a previous time slices and sentimentanalyzing for latent topics are automatically imple-mented. A collapsed Gibbs sampling algorithm isutilized to infer parameters. To evaluate the effec-tiveness of our proposed algorithm, we collect areal-world dataset to conduct various experiments.The preliminary results showed superiority of ourproposed model over several state-of-the-art meth-ods on generalization performance, lower complex-ity, accurate sentiment classification.

One of the limitations of our model is that it re-quires setting the time span of each epoch. In the fu-ture, we will consider other time dependency modesto optimize dynamic parameter inference.

Acknowledgments

This work is supported by the National NaturalScience Foundation of China No.61303131, No.61672272, No. 60973040;

1. D. M. Blei, A. Ng, and M. Jordan, “Latent dirichletallocation,” Journal of Machine Learning Research, 3,993-1022(2003)

2. Y. Teh, M. Jordan, M. Beal, and D. Blei, “Hierarchi-cal Dirichlet Processes,” Journal of the American Sta-tistical Association, 101,1566-1581(2006)

3. Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan,“Identifying Sources of Opinions with ConditionalRandom Fields and Extraction Patterns,” in the confe.on HLT and EMNLP, 355-362(2005)

4. B. Liu, M. Hu, and J. Cheng, “Opinion Observer:Analyzing and Comparing Opinions on the Web,” inconfe. of World Wide Web, 342-351(2005)

5. B. Pang and L. Lee, “A sentimental education:Sentiment analysis using sub-jectivity summarizationbased on minimum cuts,” in Proc. of ACL, 271-278(2004)

6. B. Pang, L. Lee and S. Vaithyanathan, “Thumbs up:sentiment classification using machine learning tech-niques,” in EMNLP of the ACL on Empirical methodsin natural language processing, 79-86(2002)

7. Wang, Xuerui, and A. Mccallum, “Topics overtime:a non-Markov continuous-time model of topicaltrends,” ACM SIGKDD International Conference onKnowledge Discovery and Data Mining ACM, 424-433(2006).

8. P. D. Meo, F. Messina, D. Rosaci, and G. M. L.Sarn, “Forming Time-Stable Homogeneous Groupsinto Online Social Networks,” Information Sciences

(2017).9. P. D. Meo, E. Ferrara, D. Rosaci, and G. M. L. Sarn,

“Trust and Compactness in Social Network Groups,”IEEE Transactions on Cybernetics 45.2(2015):205.

10. L. Alsumait and C. Domeniconi, “On-Line LDA:Adaptive Topic Models for Mining Text Streams withApplications to Topic Detection and Tracking,” in TheIEEE International Confe. on Data Mining series, 3-12(2008)

11. M. Hoffman, D.M. Blei and F. Bach, “Onlinelearning for latent dirichlet allocation,” in Ad-vances in Neural Information Processing Systems,23,856864(2010)

12. J. H Lau, Collier N, Baldwin T, “On-line TrendAnalysis with Topic Models: #twitter Trends De-tection Topic Model Online,” in COLING, 1519-1534(2012)

13. C. Lin, and Y. He, “Joint sentiment/topic model forsentiment analysis,” In ACM confe. on Informationand knowledge management, 375384(2009)

14. Y Jo and AH Oh,“Aspect and sentiment unificationmodel for online review analysis,” in ACM confe. onWeb search and data mining, 815824 (2011)

15. M. Dermouche, J. Velcin, L. Khouas, and S. Loud-cher, “A Joint Model for Topic-Sentiment Evolutionover Time,” IEEE International Conference on DataMining IEEE, 773-778(2014).

16. Y. He, C. Lin, W. Gao, and K. F. Wong, “Dynamicjoint sentiment-topic model,” Acm Transactions onIntelligent Systems and Technology 5.1(2014):1-21.

17. C. Lin, Y. He, R. Everson, and S. Rger,“Weakly supervised joint senti-ment-topic detectionfrom text,” Knowledge and Data Engineering, 24,11341145(2002)

18. T.S. Ferguson, “A bayesian analysis of some non-parametric problems,” the Annals of Statistics, 1,209-230(1973)

19. R. M. Neal, “Markov chain sampling methods fordirichlet process mixture models,” Journal of Compu-tational and Graphical Statistics, 9,249-265(2000)

20. A. Ahmed, E. P. Xing, “Dynamic Non-ParametricMixture Models and The Recurrent Chinese Restau-rant Process: with Applications to Evolutionary Clus-tering,” Siam International Conference on Data Min-ing, 219-230(2008).

21. X. Z. Zoubin, X. Zhu, Z. Ghahramani and J. Laf-ferty, “Time-sensitive dirichlet process mixture mod-els,” Technical report(2005)

22. L. Ren, D. B. Dunson and L. Carin, “The dynamichierarchical dirichlet process,” In International Confe.on Machine Learning, 824831(2008)

23. T. Xu, Z. M. Zhang, P. S. Yu and B. Long, “Dirich-let process based evolution-ary clustering,” In TheIEEE International Confe. on Data Mining series,648657(2008)


650

24. T. Xu, Z. M. Zhang, P. S. Yu and B. Long, “Evo-lutionary clustering by hierar-chical dirichlet processwith hidden markov state,” In The IEEE InternationalConfe. on Data Mining series, 658667(2008)

25. C. Andrieu and N. D. Freitas, “An Introduction toMCMC for Machine Learn-ing,” Journal of MachineLearning, 50,5-43(2003)

26. T. L. Griffiths and M. Steyvers, “Finding scientifictopics,” Proceedings of the National Academy of Sci-ence, 5228-5235(2004)

27. K. Ganesan and C. Zhai, “Opinion-based entityranking,” Information retrieval, 15,116-150(2012)

28. B. Pang, L. Lee and S. Vaithyanathan, “Thumbsup:sentiment classification using machine learningtechniques,” Proceedings of the ACL confe. on Em-pirical methods in natural language processing, 79-86(2002)

29. T. Wilson, J. Wiebe and P. Hoffmann, “Recogniz-ing contextual polarity in phrase-level sentiment anal-ysis,” in Proceedings of the Confe. on Human Lan-guage Technology and Empirical Methods in NaturalLanguage Processing, 347-354(2005)

30. F. Maes, A. Collignon and D. Vandermeulen, “Mul-timodality Image Registra-tion by Maximization ofMutual Information,” IEEE Trans. on Medical Imag-ing, 17, 187-198(1997)

31. C. Wang and D. M. Blei, “Decoupling Sparsity andSmoothness in the Dis-crete Hierarchical DirichletProcess,” In Advances in neural information process-ing systems, 982-1989(2009)

32. W. T. Freeman, A. S. Willsky, E. B. Sudderth,“Graphical models for visual object recognition andtracking,” PhD thesis, Massachusetts Institute ofTechnology(2006)


651

On-line Evolutionary Sentiment Topic Analysis ModelingOn-line Evolutionary Sentiment Topic Analysis...

Documents

Transcript of On-line Evolutionary Sentiment Topic Analysis ModelingOn-line Evolutionary Sentiment Topic Analysis...