Predictive Text Embedding using LINE

31
Predictive Text Embedding using LINE Shashank Gupta 1 Nishant Prateek 2 Karan Chandnani 3 1 201507574 2 201225113 3 201505507 IRE Major Project, Spring 2016

Transcript of Predictive Text Embedding using LINE

Page 1: Predictive Text Embedding using LINE

Predictive Text Embedding using LINE

Shashank Gupta1 Nishant Prateek2 Karan Chandnani3

1201507574

2201225113

3201505507

IRE Major Project, Spring 2016

Page 2: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 3: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 4: Predictive Text Embedding using LINE

Predictive Text EmbeedingAn Introuction

Adapts the advantages of unsupervised text embeddings but naturallyutilizes labeled information in representation learning

An effective low dimensional representation is learned jointly fromlimited labeled examples and a large amount of unlabeled examples.

Optimized for particular tasks. eg. text classification, sentimentanalysis etc.

Page 5: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 6: Predictive Text Embedding using LINE

Text Networks

Word-Word NetworkWord-word cooccurrence network, denoted as Gww = (V ,Eww ),captures the word co-occurrence information in local contexts of theunlabeled data. V is a vocabulary of words and Eww is the set ofedges between words.

Page 7: Predictive Text Embedding using LINE

Text Networks

Word-Document NetworkWord-document network, denoted as Gwd = (V ∪ D,Ewd), is abipartite network where D is a set of documents and V is a set ofwords. Ewd is the set of edges between words and documents.

Page 8: Predictive Text Embedding using LINE

Text Networks

Word-Label NetworkWord-label network, denoted as Gwl = (V ∪ L,Ewl), is a bipartitenetwork that captures category-level word co-occurrences. L is a setof class labels and V a set of words. Ewl is a set of edges betweenwords and classes.

Page 9: Predictive Text Embedding using LINE

Text Networks

Heterogeneous Text NetworkThe heterogeneous text network is the combination of word-word,word-document, and word-label networks constructed from bothunlabeled and labeled text data. It captures different levels of wordco-occurrences and contains both labeled and unlabeled information.

Page 10: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 11: Predictive Text Embedding using LINE

Bipartite Network Embedding

PTE is composed of three indidual Bipartite graphs. We need amethod to embed each of these Bipartite graphs into alow-dimensional space.

Given a bipartite network G = (VA ∪ VB ,E ), where VA and VB aretwo disjoint sets of vertices of different types, and E is the set ofedges between them. We first define the conditional probability ofvertex vi in set VA generated by vertex vj in set VB as:

p(vi |vj) =exp(~uTi · ~uj)∑i ′∈A exp(~uTi ′ · ~uj)

Page 12: Predictive Text Embedding using LINE

Bipartite Network Embedding

PTE is composed of three indidual Bipartite graphs. We need amethod to embed each of these Bipartite graphs into alow-dimensional space.

Given a bipartite network G = (VA ∪ VB ,E ), where VA and VB aretwo disjoint sets of vertices of different types, and E is the set ofedges between them. We first define the conditional probability ofvertex vi in set VA generated by vertex vj in set VB as:

p(vi |vj) =exp(~uTi · ~uj)∑i ′∈A exp(~uTi ′ · ~uj)

Page 13: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 14: Predictive Text Embedding using LINE

Heterogeneous Text Network Embedding

Opte = Oww + Owd + Owl

Oww = −∑

(i ,j)∈Eww

wij log p(vi |vj),

Owd = −∑

(i ,j)∈Ewd

wij log p(vi |dj),

Owl = −∑

(i ,j)∈Ewl

wij log p(vi |lj)

Page 15: Predictive Text Embedding using LINE

Heterogeneous Text Network Embedding

Opte = Oww + Owd + Owl

Oww = −∑

(i ,j)∈Eww

wij log p(vi |vj),

Owd = −∑

(i ,j)∈Ewd

wij log p(vi |dj),

Owl = −∑

(i ,j)∈Ewl

wij log p(vi |lj)

Page 16: Predictive Text Embedding using LINE

Heterogeneous Text Network Embedding

Opte = Oww + Owd + Owl

Oww = −∑

(i ,j)∈Eww

wij log p(vi |vj),

Owd = −∑

(i ,j)∈Ewd

wij log p(vi |dj),

Owl = −∑

(i ,j)∈Ewl

wij log p(vi |lj)

Page 17: Predictive Text Embedding using LINE

Heterogeneous Text Network Embedding

Opte = Oww + Owd + Owl

Oww = −∑

(i ,j)∈Eww

wij log p(vi |vj),

Owd = −∑

(i ,j)∈Ewd

wij log p(vi |dj),

Owl = −∑

(i ,j)∈Ewl

wij log p(vi |lj)

Page 18: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 19: Predictive Text Embedding using LINE

Pre-training + Fine-Tuning

We used the pre-training and fine-tuning approach to optimize theobjective function Opte . We learn the embeddings with unlabeled datafirst, and then fine-tune the embeddings with the word-label network.

Page 20: Predictive Text Embedding using LINE

Algorithm

Algorithm: Pre-training + Fine-tuningData: Gww ,Gwd ,Gwl , number of samples T , number of negative samplesKResult: word embeddings ~wwhile iter ≤ T do

sample an edge from Eww and draw K negative edges, and updatethe word embeddings;

sample an edge from Ewd and draw K negative edges, and update theword and document embeddings;

end whilewhile iter ≤ T do

sample an edge from Ewl and draw K negative edges, and update theword embeddings;

end while

Page 21: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 22: Predictive Text Embedding using LINE

Dataset

For this project, we use the Large Movie Review Dataset . Thisdataset consists of 25,000 movie reviews from IMDB.

The test set additionally contains another 25,000 movie reviews.

Apart from this there are another 50,000 unlabeled movie reviewsthat we used for fine-tuning.

Page 23: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 24: Predictive Text Embedding using LINE

Experiments

For the first phase of the project, we did unsupervised training on theMovie Review Dataset using word2vec (skipgram) model. This servedas the baseline for further experiments.

For the next part, we tried the unsupervised training on theword-word network as mentioned in section 2. We first tried trainingwith random initialization. We then decided to leverage the word2vecembeddings and use those as initialization fo Gww . This gave usslightly better results than random initialization.

For the third part, we used the unsupervised embeddings obtained inthe previous step (with word2vec initialization) and fine-tuned it withthe word-label network from section 2, with random edge sampling.The probability of each edge being sampled is proportional to itsweight in the heterogenous text network embedding. This gave us afurther increase in performance.

Page 25: Predictive Text Embedding using LINE

Experiments

For the first phase of the project, we did unsupervised training on theMovie Review Dataset using word2vec (skipgram) model. This servedas the baseline for further experiments.

For the next part, we tried the unsupervised training on theword-word network as mentioned in section 2. We first tried trainingwith random initialization. We then decided to leverage the word2vecembeddings and use those as initialization fo Gww . This gave usslightly better results than random initialization.

For the third part, we used the unsupervised embeddings obtained inthe previous step (with word2vec initialization) and fine-tuned it withthe word-label network from section 2, with random edge sampling.The probability of each edge being sampled is proportional to itsweight in the heterogenous text network embedding. This gave us afurther increase in performance.

Page 26: Predictive Text Embedding using LINE

Experiments

For the first phase of the project, we did unsupervised training on theMovie Review Dataset using word2vec (skipgram) model. This servedas the baseline for further experiments.

For the next part, we tried the unsupervised training on theword-word network as mentioned in section 2. We first tried trainingwith random initialization. We then decided to leverage the word2vecembeddings and use those as initialization fo Gww . This gave usslightly better results than random initialization.

For the third part, we used the unsupervised embeddings obtained inthe previous step (with word2vec initialization) and fine-tuned it withthe word-label network from section 2, with random edge sampling.The probability of each edge being sampled is proportional to itsweight in the heterogenous text network embedding. This gave us afurther increase in performance.

Page 27: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 28: Predictive Text Embedding using LINE

Accuracy

Algorithms Accuracy

Skipgram (word2vec) 84.86%Unsupervised (Gww ) 75.37%Unsupervised + Fine-tuning (Gww +Gwl)

77.83%

Page 29: Predictive Text Embedding using LINE

Outline

1 IntroductionPredictive Text EmbeddingText Networks

2 EmbeddingsBipartite Network EmbeddingHeterogeneous Text Network Embedding

3 Training AlgorithmPre-training + Fine-tuning

4 Methods and ExperimentsDatasetExperiments

5 ResultsAccuracyDiscussion

6 Bibliography

Page 30: Predictive Text Embedding using LINE

Discussion

Though the unsupervised pre-training + fine-tuning approach gave us thebest results, it still lacks in performance as compared to the skipgrammodel. Our results fail to align with those mentioned in the paper. Thiscould be a result of replacing the alias table method for edge-sampling instep 3, with random sampling.

Page 31: Predictive Text Embedding using LINE

References

J. Weston, S. Chopra, and K. Adams. tagspace: Semantic embeddingsfrom hashtags. In EMNLP, pages 1822 - 1827, 2014

J. Tang, M. Qu, Q. Mei(2015, August). PTE: Predictive textembedding through large-scale heterogeneous text networks. InProceedings of the 21th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (pp. 1165 - 1174). ACM.

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE:Large-scale information network embedding. In WWW, pages 1067 -1077, 2015.

http://ai.stanford.edu/ amaas/data/sentiment/