NLP Programming Tutorial 1 - Unigram Language...
Transcript of NLP Programming Tutorial 1 - Unigram Language...
![Page 1: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/1.jpg)
1
NLP Programming Tutorial 1 – Unigram Language Model
NLP Programming Tutorial 1 -Unigram Language Models
Graham NeubigNara Institute of Science and Technology (NAIST)
![Page 2: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/2.jpg)
2
NLP Programming Tutorial 1 – Unigram Language Model
Language Model Basics
![Page 3: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/3.jpg)
3
NLP Programming Tutorial 1 – Unigram Language Model
Why Language Models?
● We have an English speech recognition system, whichanswer is better?
SpeechW
1 = speech recognition
systemW
2 = speech cognition
system
W4 = スピーチ が 救出 ストン
W3 = speck podcast
histamine
![Page 4: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/4.jpg)
4
NLP Programming Tutorial 1 – Unigram Language Model
Why Language Models?
● We have an English speech recognition system, whichanswer is better?
SpeechW
1 = speech recognition
systemW
2 = speech cognition
system
W4 = スピーチ が 救出 ストン
W3 = speck podcast
histamine
● Language models tell us the answer!
![Page 5: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/5.jpg)
5
NLP Programming Tutorial 1 – Unigram Language Model
Probabilistic Language Models
● Language models assign a probability to eachsentence
W1 = speech recognition
systemW
2 = speech cognition
system
W4 = スピーチ が 救出 ストン
W3 = speck podcast
histamine
P(W1) = 4.021 * 10-3
P(W2) = 8.932 * 10-4
P(W3) = 2.432 * 10-7
P(W4) = 9.124 * 10-23
● We want P(W1) > P(W
2) > P(W
3) > P(W
4)
● (or P(W4) > P(W
1), P(W
2), P(W
3) for Japanese?)
![Page 6: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/6.jpg)
6
NLP Programming Tutorial 1 – Unigram Language Model
Calculating Sentence Probabilities
● We want the probability of
● Represent this mathematically as:
W = speech recognition system
P(|W| = 3, w1=”speech”, w
2=”recognition”, w
3=”system”)
![Page 7: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/7.jpg)
7
NLP Programming Tutorial 1 – Unigram Language Model
Calculating Sentence Probabilities
● We want the probability of
● Represent this mathematically as (using chain rule):
W = speech recognition system
P(|W| = 3, w1=”speech”, w
2=”recognition”, w
3=”system”) =
P(w1=“speech” | w
0 = “<s>”)
* P(w2=”recognition” | w
0 = “<s>”, w
1=“speech”)
* P(w3=”system” | w
0 = “<s>”, w
1=“speech”, w
2=”recognition”)
* P(w4=”</s>” | w
0 = “<s>”, w
1=“speech”, w
2=”recognition”, w
3=”system”)
NOTE:sentence start <s> and end </s> symbol
NOTE:P(w
0 = <s>) = 1
![Page 8: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/8.jpg)
8
NLP Programming Tutorial 1 – Unigram Language Model
Incremental Computation
● Previous equation can be written:
● How do we decide probability?
P(W )=∏i=1
∣W∣+ 1P(wi∣w0…wi−1)
P(wi∣w0…wi−1)
![Page 9: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/9.jpg)
9
NLP Programming Tutorial 1 – Unigram Language Model
Maximum Likelihood Estimation
● Calculate word strings in corpus, take fraction
P(wi∣w1…w i−1)=c (w1…wi)c (w1…w i−1)
i live in osaka . </s>i am a graduate student . </s>my school is in nara . </s>
P(am | <s> i) = c(<s> i am)/c(<s> i) = 1 / 2 = 0.5
P(live | <s> i) = c(<s> i live)/c(<s> i) = 1 / 2 = 0.5
![Page 10: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/10.jpg)
10
NLP Programming Tutorial 1 – Unigram Language Model
Problem With Full Estimation
● Weak when counts are low:
i live in osaka . </s>i am a graduate student . </s>my school is in nara . </s>
Training:
P(W=<s> i live in nara . </s>) = 0
<s> i live in nara . </s>
P(nara|<s> i live in) = 0/1 = 0Test:
![Page 11: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/11.jpg)
11
NLP Programming Tutorial 1 – Unigram Language Model
Unigram Model
● Do not use history:
P(wi∣w1…w i−1)≈P(wi)=c (wi)
∑w̃c (w̃)
P(nara) = 1/20 = 0.05i live in osaka . </s>i am a graduate student . </s>my school is in nara . </s>
P(i) = 2/20 = 0.1P(</s>) = 3/20 = 0.15
P(W=i live in nara . </s>) = 0.1 * 0.05 * 0.1 * 0.05 * 0.15 * 0.15 = 5.625 * 10-7
![Page 12: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/12.jpg)
12
NLP Programming Tutorial 1 – Unigram Language Model
Be Careful of Integers!
● Divide two integers, you get an integer (rounded down)
$ ./my-program.py0
$ ./my-program.py0.5
● Convert one integer to a float, and you will be OK
![Page 13: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/13.jpg)
13
NLP Programming Tutorial 1 – Unigram Language Model
What about Unknown Words?!● Simple ML estimation doesn't work
● Often, unknown words are ignored (ASR)
● Better way to solve
● Save some probability for unknown words (λunk
= 1-λ1)
● Guess total vocabulary size (N), including unknowns
i live in osaka . </s>i am a graduate student . </s>my school is in nara . </s>
P(nara) = 1/20 = 0.05P(i) = 2/20 = 0.1P(kyoto) = 0/20 = 0
P(wi)=λ1 PML(wi)+ (1−λ1)1N
![Page 14: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/14.jpg)
14
NLP Programming Tutorial 1 – Unigram Language Model
Unknown Word Example
● Total vocabulary size: N=106
● Unknown word probability: λunk
=0.05 (λ1 = 0.95)
P(nara) = 0.95*0.05 + 0.05*(1/106) = 0.04750005
P(i) = 0.95*0.10 + 0.05*(1/106) = 0.09500005
P(wi)=λ1 PML(wi)+ (1−λ1)1N
P(kyoto) = 0.95*0.00 + 0.05*(1/106) = 0.00000005
![Page 15: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/15.jpg)
15
NLP Programming Tutorial 1 – Unigram Language Model
Evaluating Language Models
![Page 16: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/16.jpg)
16
NLP Programming Tutorial 1 – Unigram Language Model
Experimental Setup
● Use training and test sets
i live in osakai am a graduate student
my school is in nara...
i live in narai am a student
i have lots of homework…
Training Data
Testing Data
TrainModel Model
TestModel
Model Accuracy
LikelihoodLog LikelihoodEntropyPerplexity
![Page 17: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/17.jpg)
17
NLP Programming Tutorial 1 – Unigram Language Model
Likelihood
● Likelihood is the probability of some observed data (the test set W
test), given the model M
i live in nara
i am a student
my classes are hard
P(w=”i live in nara”|M) = 2.52*10-21
P(w=”i am a student”|M) = 3.48*10-19
P(w=”my classes are hard”|M) = 2.15*10-34
P(W test∣M )=∏w∈W test
P (w∣M )
1.89*10-73
x
x
=
![Page 18: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/18.jpg)
18
NLP Programming Tutorial 1 – Unigram Language Model
Log Likelihood
● Likelihood uses very small numbers=underflow
● Taking the log resolves this problem
i live in nara
i am a student
my classes are hard
log P(w=”i live in nara”|M) = -20.58
log P(w=”i am a student”|M) = -18.45
log P(w=”my classes are hard”|M) = -33.67
log P(W test∣M )=∑w∈W test
log P(w∣M )
-72.60
+
+
=
![Page 19: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/19.jpg)
19
NLP Programming Tutorial 1 – Unigram Language Model
Calculating Logs
● Python's math package has a function for logs
$ ./my-program.py4.605170185992.0
![Page 20: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/20.jpg)
20
NLP Programming Tutorial 1 – Unigram Language Model
Entropy
● Entropy H is average negative log2 likelihood per word
H (W test∣M )=1
|W test |∑w∈W test
−log2P (w∣M )
i live in nara
i am a student
my classes are hard
log2 P(w=”i live in nara”|M)= ( 68.43
log2 P(w=”i am a student”|M)= 61.32
log2 P(w=”my classes are hard”|M)= 111.84 )
+
+
/12=
20.13
# of words=
* note, we can also count </s> in # of words (in which case it is 15)
![Page 21: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/21.jpg)
21
NLP Programming Tutorial 1 – Unigram Language Model
Perplexity
● Equal to two to the power of per-word entropy
● (Mainly because it makes more impressive numbers)
● For uniform distributions, equal to the size ofvocabulary
PPL=2H
H=−log215
V=5 PPL=2H=2−log2
15=2 log2 5=5
![Page 22: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/22.jpg)
22
NLP Programming Tutorial 1 – Unigram Language Model
Coverage
● The percentage of known words in the corpus
a bird a cat a dog a </s>
“dog” is an unknown word
Coverage: 7/8 *
* often omit the sentence-final symbol → 6/7
![Page 23: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/23.jpg)
23
NLP Programming Tutorial 1 – Unigram Language Model
Exercise
![Page 24: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/24.jpg)
24
NLP Programming Tutorial 1 – Unigram Language Model
Exercise
● Write two programs● train-unigram: Creates a unigram model● test-unigram: Reads a unigram model and calculates
entropy and coverage for the test set● Test them test/01-train-input.txt test/01-test-input.txt
● Train the model on data/wiki-en-train.word
● Calculate entropy and coverage on data/wiki-en-test.word
● Report your scores next week
![Page 25: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/25.jpg)
25
NLP Programming Tutorial 1 – Unigram Language Model
train-unigram Pseudo-Code
create a map countscreate a variable total_count = 0
for each line in the training_file split line into an array of words append “</s>” to the end of words for each word in words add 1 to counts[word] add 1 to total_count
open the model_file for writingfor each word, count in counts probability = counts[word]/total_count print word, probability to model_file
![Page 26: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/26.jpg)
26
NLP Programming Tutorial 1 – Unigram Language Model
test-unigram Pseudo-Codeλ
1 = 0.95, λ
unk = 1-λ
1, V = 1000000, W = 0, H = 0
create a map probabilitiesfor each line in model_file split line into w and P set probabilities[w] = P
for each line in test_file split line into an array of words append “</s>” to the end of words for each w in words add 1 to W set P = λ
unk / V
if probabilities[w] exists set P += λ
1 * probabilities[w]
else add 1 to unk add -log
2 P to H
print “entropy = ”+H/Wprint “coverage = ” + (W-unk)/W
Load Model Test and Print
![Page 27: NLP Programming Tutorial 1 - Unigram Language Modelsphontron.com/slides/nlp-programming-en-01-unigramlm.pdf · 26 NLP Programming Tutorial 1 – Unigram Language Model test-unigram](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2001479af393122f4680d1/html5/thumbnails/27.jpg)
27
NLP Programming Tutorial 1 – Unigram Language Model
Thank You!