Modeling Documents with a Deep Boltzman...
Transcript of Modeling Documents with a Deep Boltzman...
Modeling Documents with a Deep Boltzman Machine
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton
Review By : Nitish Gupta
2nd December, 2016
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 1 / 22
Topic Modeling
Model to find abstract ’topics’ in a collection of documents
Used to find hidden semantic structure of documents
Hypothesize that a document is composed of multiple topics
Topic Modeling builds a generative probabilistic model of the bag ofwords in a document
As inference, gives a distribution over topics
Most commonly used Topic Modeling Technique: LDA
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 2 / 22
Topic Modeling
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 3 / 22
RBM vs. DBM
RBM Pros : Can be efficiently trained and inferring posteriordistribution is exact
RBM Cons : Defines rigid implicit prior on hidden states
DBM Pros : Defines more flexible prior over hidden representations
DBM Cons : Training and performing inference in hard
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 4 / 22
Contributions of the paper
Extends on Replicated Softmax model, a topic model of the RBMfamily
Introduces a Deep Boltzman Machine (DBM) to model documents
Argues that more hidden layers in DBM give more flexibility to thetopic priors, which
Helps better model short documents
Gives better document representation for document retrieval andclassification tasks.
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 5 / 22
Contributions of the paper
Introduces 2 layer DBM : Over-Replicated Softmax Model
Give easy training and fast approximate inference methodology
Retains some level of flexibility in manipulating the prior
Shows efficacy of model as both a better generative model andfeature extractor for retrieval and classification tasks
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 6 / 22
Replicated Softmax Model (Background)
K = Size of worddictionary
N = Number ofwords indocument
h ∈ {0, 1}F -Binary stochastichidden topicfeatures
V: N × Kobserved binarymatrix
E (V , h; θ) = −N∑i=1
F∑j=1
K∑k=1
Wijkhjvik −N∑i=1
K∑k=1
vikbik − NF∑j=1
hjaj (1)
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 7 / 22
Replicated Softmax Model (Background)
E (V , h) =−F∑j=1
K∑k=1
Wjkhj v̂k−
K∑k=1
v̂kbk − NF∑j=1
hjaj
v̂k =N∑i=1
vki
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 8 / 22
Conditional Distributions
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 9 / 22
Over-Replicated Softmax Model
V - N Softmax Units
h(1) - Binary Hidden Layer with shared weights
H(2) - M Softmax Units - M × K binary matrix
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 10 / 22
Over-Replicated Softmax Model
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 11 / 22
Over-Replicated Softmax Model
Showed in ’A Better Way to Pretrain Deep Boltzmann Machines’:
The second-layer of DBM performs 12 pf modeling work as compared to
the first
Therefore if N ≪ M prior over h(1) will be dominated by second-layer
Therefore if M ≪ N, effect of second layer diminshes
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 12 / 22
Learning
Maximize Log-Likelihood of observed data
Derivative of W w.r.t. log-likelihood is given by :
As Exact Maximum Likelihood learning is intractable, VariationalApproach is employed
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 13 / 22
Learning - Variational Approach
Variational Evidence Lower-Bound :
Using Mean-Field Approximation :
µ = {µ1, µ2} - Mean Field Parameters
q(h1j = 1) = µ1
j
q(h1ik = 1) = µ2
k ,∑K
k=1 µ2k = 1
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 14 / 22
Learning - Mean Field Parameters
Variational Evidence Lower-Bound in this case :
Update Rules for Mean Field Parameters :
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 15 / 22
Learning - Model Parameters
Variational Bound is maximized using MCMC-based stochasticapproximation
Let θt and xt = {Vt , h(1)t , h
(2)t } be current parameters and state
1 Sample new state xt+1 using Gibbs sampling
2 Make gradient step using point estimate at sample xt+1 to find newparameters θt+1
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 16 / 22
Pretraining
Exploiting the fact that weights are shared in the two layers,
1 Train RBM with bottom-up weights scaled by factor of 1 + MN
P(h(1)j = 1|V ) = σ
((1 +
M
N)
K∑k=1
vkWkj
)
2 Similar to training N+M observed units with M extra units set toempirical word distribution
3 Show in experiments that this further approximation works well inpractice
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 17 / 22
Inference
Find P(h(1)|V ) : Latent Topic Structure of observed document
Correct Way : Use mean-field approximation as done using training
Fast Alternative : Multiply visible hidden weights by t((1 + MN ) and
approximate using equation in previous page
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 18 / 22
Experiments - Perplexities
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 19 / 22
Experiments - Document Retrieval
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 20 / 22
Experiments - Effect of Document Size
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 21 / 22
Thank you!
Questions?
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 22 / 22