Modeling Documents with a Deep Boltzman...

Modeling Documents with a Deep Boltzman Machine

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton

Review By : Nitish Gupta

2nd December, 2016

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 1 / 22

Topic Modeling

Model to find abstract ’topics’ in a collection of documents

Used to find hidden semantic structure of documents

Hypothesize that a document is composed of multiple topics

Topic Modeling builds a generative probabilistic model of the bag ofwords in a document

As inference, gives a distribution over topics

Most commonly used Topic Modeling Technique: LDA


Topic Modeling


RBM vs. DBM

RBM Pros : Can be efficiently trained and inferring posteriordistribution is exact

RBM Cons : Defines rigid implicit prior on hidden states

DBM Pros : Defines more flexible prior over hidden representations

DBM Cons : Training and performing inference in hard


Contributions of the paper

Extends on Replicated Softmax model, a topic model of the RBMfamily

Introduces a Deep Boltzman Machine (DBM) to model documents

Argues that more hidden layers in DBM give more flexibility to thetopic priors, which

Helps better model short documents

Gives better document representation for document retrieval andclassification tasks.


Contributions of the paper

Introduces 2 layer DBM : Over-Replicated Softmax Model

Give easy training and fast approximate inference methodology

Retains some level of flexibility in manipulating the prior

Shows efficacy of model as both a better generative model andfeature extractor for retrieval and classification tasks


Replicated Softmax Model (Background)

K = Size of worddictionary

N = Number ofwords indocument

h ∈ {0, 1}F -Binary stochastichidden topicfeatures

V: N × Kobserved binarymatrix

E (V , h; θ) = −N∑i=1

F∑j=1

K∑k=1

Wijkhjvik −N∑i=1

K∑k=1

vikbik − NF∑j=1

hjaj (1)


Replicated Softmax Model (Background)

E (V , h) =−F∑j=1

K∑k=1

Wjkhj v̂k−

K∑k=1

v̂kbk − NF∑j=1

hjaj

v̂k =N∑i=1

vki


Conditional Distributions


Over-Replicated Softmax Model

V - N Softmax Units

h(1) - Binary Hidden Layer with shared weights

H(2) - M Softmax Units - M × K binary matrix



Showed in ’A Better Way to Pretrain Deep Boltzmann Machines’:

The second-layer of DBM performs 12 pf modeling work as compared to

the first

Therefore if N ≪ M prior over h(1) will be dominated by second-layer

Therefore if M ≪ N, effect of second layer diminshes


Learning

Maximize Log-Likelihood of observed data

Derivative of W w.r.t. log-likelihood is given by :

As Exact Maximum Likelihood learning is intractable, VariationalApproach is employed


Learning - Variational Approach

Variational Evidence Lower-Bound :

Using Mean-Field Approximation :

µ = {µ1, µ2} - Mean Field Parameters

q(h1j = 1) = µ1

j

q(h1ik = 1) = µ2

k ,∑K

k=1 µ2k = 1


Learning - Mean Field Parameters

Variational Evidence Lower-Bound in this case :

Update Rules for Mean Field Parameters :


Learning - Model Parameters

Variational Bound is maximized using MCMC-based stochasticapproximation

Let θt and xt = {Vt , h(1)t , h

(2)t } be current parameters and state

1 Sample new state xt+1 using Gibbs sampling

2 Make gradient step using point estimate at sample xt+1 to find newparameters θt+1


Pretraining

Exploiting the fact that weights are shared in the two layers,

1 Train RBM with bottom-up weights scaled by factor of 1 + MN

P(h(1)j = 1|V ) = σ

((1 +

M

N)

K∑k=1

vkWkj

)

2 Similar to training N+M observed units with M extra units set toempirical word distribution

3 Show in experiments that this further approximation works well inpractice


Inference

Find P(h(1)|V ) : Latent Topic Structure of observed document

Correct Way : Use mean-field approximation as done using training

Fast Alternative : Multiply visible hidden weights by t((1 + MN ) and

approximate using equation in previous page


Experiments - Perplexities


Experiments - Document Retrieval


Experiments - Effect of Document Size


Thank you!

Questions?


Modeling Documents with a Deep Boltzman...

Documents

Transcript of Modeling Documents with a Deep Boltzman...