Modeling Documents with a Deep Boltzman...

22
Modeling Documents with a Deep Boltzman Machine Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton Review By : Nitish Gupta 2 nd December, 2016 Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC) Modeling Documents with a Deep Boltzman Machine 2 nd December, 2016 1 / 22

Transcript of Modeling Documents with a Deep Boltzman...

Page 1: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Modeling Documents with a Deep Boltzman Machine

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton

Review By : Nitish Gupta

2nd December, 2016

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 1 / 22

Page 2: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Topic Modeling

Model to find abstract ’topics’ in a collection of documents

Used to find hidden semantic structure of documents

Hypothesize that a document is composed of multiple topics

Topic Modeling builds a generative probabilistic model of the bag ofwords in a document

As inference, gives a distribution over topics

Most commonly used Topic Modeling Technique: LDA

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 2 / 22

Page 3: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Topic Modeling

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 3 / 22

Page 4: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

RBM vs. DBM

RBM Pros : Can be efficiently trained and inferring posteriordistribution is exact

RBM Cons : Defines rigid implicit prior on hidden states

DBM Pros : Defines more flexible prior over hidden representations

DBM Cons : Training and performing inference in hard

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 4 / 22

Page 5: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Contributions of the paper

Extends on Replicated Softmax model, a topic model of the RBMfamily

Introduces a Deep Boltzman Machine (DBM) to model documents

Argues that more hidden layers in DBM give more flexibility to thetopic priors, which

Helps better model short documents

Gives better document representation for document retrieval andclassification tasks.

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 5 / 22

Page 6: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Contributions of the paper

Introduces 2 layer DBM : Over-Replicated Softmax Model

Give easy training and fast approximate inference methodology

Retains some level of flexibility in manipulating the prior

Shows efficacy of model as both a better generative model andfeature extractor for retrieval and classification tasks

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 6 / 22

Page 7: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Replicated Softmax Model (Background)

K = Size of worddictionary

N = Number ofwords indocument

h ∈ {0, 1}F -Binary stochastichidden topicfeatures

V: N × Kobserved binarymatrix

E (V , h; θ) = −N∑i=1

F∑j=1

K∑k=1

Wijkhjvik −N∑i=1

K∑k=1

vikbik − NF∑j=1

hjaj (1)

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 7 / 22

Page 8: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Replicated Softmax Model (Background)

E (V , h) =−F∑j=1

K∑k=1

Wjkhj v̂k−

K∑k=1

v̂kbk − NF∑j=1

hjaj

v̂k =N∑i=1

vki

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 8 / 22

Page 9: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Conditional Distributions

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 9 / 22

Page 10: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Over-Replicated Softmax Model

V - N Softmax Units

h(1) - Binary Hidden Layer with shared weights

H(2) - M Softmax Units - M × K binary matrix

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 10 / 22

Page 11: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Over-Replicated Softmax Model

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 11 / 22

Page 12: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Over-Replicated Softmax Model

Showed in ’A Better Way to Pretrain Deep Boltzmann Machines’:

The second-layer of DBM performs 12 pf modeling work as compared to

the first

Therefore if N ≪ M prior over h(1) will be dominated by second-layer

Therefore if M ≪ N, effect of second layer diminshes

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 12 / 22

Page 13: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Learning

Maximize Log-Likelihood of observed data

Derivative of W w.r.t. log-likelihood is given by :

As Exact Maximum Likelihood learning is intractable, VariationalApproach is employed

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 13 / 22

Page 14: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Learning - Variational Approach

Variational Evidence Lower-Bound :

Using Mean-Field Approximation :

µ = {µ1, µ2} - Mean Field Parameters

q(h1j = 1) = µ1

j

q(h1ik = 1) = µ2

k ,∑K

k=1 µ2k = 1

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 14 / 22

Page 15: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Learning - Mean Field Parameters

Variational Evidence Lower-Bound in this case :

Update Rules for Mean Field Parameters :

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 15 / 22

Page 16: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Learning - Model Parameters

Variational Bound is maximized using MCMC-based stochasticapproximation

Let θt and xt = {Vt , h(1)t , h

(2)t } be current parameters and state

1 Sample new state xt+1 using Gibbs sampling

2 Make gradient step using point estimate at sample xt+1 to find newparameters θt+1

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 16 / 22

Page 17: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Pretraining

Exploiting the fact that weights are shared in the two layers,

1 Train RBM with bottom-up weights scaled by factor of 1 + MN

P(h(1)j = 1|V ) = σ

((1 +

M

N)

K∑k=1

vkWkj

)

2 Similar to training N+M observed units with M extra units set toempirical word distribution

3 Show in experiments that this further approximation works well inpractice

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 17 / 22

Page 18: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Inference

Find P(h(1)|V ) : Latent Topic Structure of observed document

Correct Way : Use mean-field approximation as done using training

Fast Alternative : Multiply visible hidden weights by t((1 + MN ) and

approximate using equation in previous page

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 18 / 22

Page 19: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Experiments - Perplexities

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 19 / 22

Page 20: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Experiments - Document Retrieval

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 20 / 22

Page 21: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Experiments - Effect of Document Size

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 21 / 22

Page 22: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1

Thank you!

Questions?

Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 22 / 22