Aim› Find trends in document collections
academic papers, patents, blog entries…
Idea› Construct timestamps arrays as a new
observed data
Method› Modify latent Dirichlet allocation (LDA)
Timestamp array for each document
t
“test”
t t
“test” “group” “group” “group” “effect” “space” “space”
t t−1 t −1 t+1 t+1
Modify LDA
› Draw a topic multinomial Multi(θd) from Dirichlet
› For each word tokens
Draw a topic t from Multi(θd)
Draw a word from multinomial Multi(φt)
› For each timestamp tokens
Draw a topic t from Multi(θd)
Draw a timestamp from multinomial Multi(ψt)
θαz t
z w
β φ
γ ψ
Different Dirichlet priors for word and
timestamp multinomials
› Taking Bayesian approach also for
timestamps
› Not just introducing new vocabulary
Topics over TimeBag of
TimestampsModification of LDA(Beta distributionfor continuous timestamps)
Modification of LDA(Dirichlet-multinomialfor discrete timestamps)
O(NK) time, O(N) spaceN: number of word tokens
O((N+L)K) time, O(N+L) spaceL: sum of timestamp array lengths
Non-Bayesian termin updating formulafor Gibbs sampling
Additional input parameterfor timestamp array lengths
θαz t
z w
β φ
ψ1,ψ2
Pros
› Bayesian also for timestamps
› Simple in updating computations
Cons
› Clueless in determining timestamp array
lengths
› Weak for fine-grained timestamps
Determining timestamp array lengths› Controlling strength of timestamp data
Parallelization› OpenMP
› CUDA
› MPICH2
Top Related