Thai Word Embedding with Tensorflow
-
Upload
kobkrit-viriyayudhakorn -
Category
Data & Analytics
-
view
621 -
download
6
Transcript of Thai Word Embedding with Tensorflow
![Page 1: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/1.jpg)
TensorFlow + NLPLanguage Vector Space Model (Word2Vec) Tutorial
![Page 2: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/2.jpg)
Goal of this tutorial
• Learn how to do NLP in Tensorflow
• Learning Word embeddings that can extracting relationship between discrete atomic symbols (words) from the textual corpus.
![Page 3: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/3.jpg)
Wordsin TextCorpus
![Page 4: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/4.jpg)
NLP in Deep Learning• Word Embeddings is needed for NLP Deep Learning. Why?
• Image and audio are already provide useful information for relationship between instance (pixels, frames)
• A pixel value of #FF0000 is very similar to #FE0000, since both are red. We can compute the difference automatically.
• Text does not provide useful information about the relationships between individual symbols.
• 'cat' represented as Id537, 'dog' represented as Id143, Computer don’t know relationship between Id537 and Id143.
![Page 5: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/5.jpg)
![Page 6: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/6.jpg)
Vector Space Model• Find the relationship between discrete symbols (in this case,
words).
• Two proposed methods.
• Count-based method.
• How often the same word co-occurs with its neighbor words in a large text corpus. (e.g., Latent Semantic Analysis)
• Predictive-based method.
• Trying to predict the words from its neighbors (e.g., Neural Probabilistic language model).
![Page 7: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/7.jpg)
Word2Vec• Computationally-efficient predictive model for
learning word embedding from raw text.
• Make by Tomas Mikolov at Google.
• 2 Flavors
• Continuous Bag-of-Words (CBOW)
• Skip-Gram model
![Page 8: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/8.jpg)
CBOW• Continuous Bag-of-Words (CBOW)
• Predict target words from source context words.
• Input: "The cat sits on the ______"
• Output: mat
• Example, 3-gram CBOW = (the,cat) =>sits, (cat,sits)=>on, (sits, on)=> the, (on, the)=> mat
• Better for small dataset.
![Page 9: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/9.jpg)
Skip-Gram model • Skip-Gram model
• Predict source context words from target words.
• Input: sits
• Output: "The cat ____ on the mats"
• Example, 1-skip 3-gram Skip-Gram = (the,sits)=>cat, (cat,on)=>sits, (sits, the)=> on, (on, mats)=> the
• Better for large dataset. We use this in the slide.
![Page 10: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/10.jpg)
Noise-Contrastive Training for Vector Space Model
• We are using Gradient decent method for binary regression to modeling word-relationship models. (Neural Network)
• To discriminates the real target words (that exists in the skip-gram model) and the imaginary noise words (that non-exists in the skip-gram model) => We use the following objective function (maximum it)
![Page 11: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/11.jpg)
Negative Sampling
![Page 12: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/12.jpg)
Input• Batch Training, For e.g., Windows Size = 9
• "the quick brown fox jumped over the lazy dog"
• 1-skip 3-gram Skip-Gram = (the,brown)=>quick, (quick, fox)=>brown, (brown,jumpted)=> fox,...
• Dataset: (quick, the), (quick, brown), (brown, quick), (brown, fox),...
![Page 13: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/13.jpg)
Loop• (quick, the), (quick, brown), (brown, quick),
(brown, fox),...
• For each loop, Random pick word that not in windows set as the negative sampling. Then, Stochastic Gradient Descent method adjust the weight for maximum the above objective function.
![Page 14: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/14.jpg)
Tensorflow code
![Page 15: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/15.jpg)
![Page 16: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/16.jpg)
10,000 ข่าว
![Page 17: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/17.jpg)
Clean Data
![Page 18: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/18.jpg)
Step 0
![Page 19: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/19.jpg)
Step 30,000
![Page 20: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/20.jpg)
Step 0
Step 30,000
![Page 21: Thai Word Embedding with Tensorflow](https://reader034.fdocuments.in/reader034/viewer/2022051521/5a6ea4227f8b9a70728b5b1d/html5/thumbnails/21.jpg)