SI 760 / EECS 597 / Ling 702 Language and Information
-
Upload
ezekiel-acevedo -
Category
Documents
-
view
35 -
download
3
description
Transcript of SI 760 / EECS 597 / Ling 702 Language and Information
![Page 1: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/1.jpg)
SI 760 / EECS 597 / Ling 702
Language and Information
Winter 2004
Handout #1
![Page 2: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/2.jpg)
Course Information
• Instructor: Dragomir R. Radev ([email protected])
• Office: 3080, West Hall Connector• Phone: (734) 615-5225• Office hours: TBA• Course page:
http://www.si.umich.edu/~radev/LNI-winter2004/
• Class meets on Mondays, 1-4 PM in 412 WH
![Page 3: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/3.jpg)
Readings
• Two introductions to statistical NLP
• Collocations paper
• Joshua Goodman’s language modeling tutorial
• Documentation for the CMU LM toolkit
![Page 4: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/4.jpg)
N-gram Models
![Page 5: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/5.jpg)
Word Prediction
• Example: “I’d like to make a collect …”
• “I have a gub”
• “He is trying to fine out”
• “Hopefully, all with continue smoothly in my absence”
• “They are leaving in about fifteen minuets to go to her house”
• “I need to notified the bank of [this problem]
• Language model: a statistical model of word sequences
![Page 6: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/6.jpg)
Counting Words
• Brown corpus (1 million words from 500 texts)
• Example: “He stepped out into the hall, was delighted to encounter a water brother” - how many words?
• Word forms and lemmas. “cat” and “cats” share the same lemma (also tokens and types)
• Shakespeare’s complete works: 884,647 word tokens and 29,066 word types
• Brown corpus: 61,805 types and 37,851 lemmas
• American Heritage 3rd edition has 200,000 “boldface forms” (including some multiword phrases)
![Page 7: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/7.jpg)
Unsmoothed N-grams
• First approximation: each word has an equal probability to follow any other. E.g., with 100,000 words, the probability of each of them at any given point is .00001
• “the” - 69,971 times in BC, while “rabbit” appears 11 times
• “Just then, the white …”
P(w1,w2,…, wn) = P(w1) P(w2 |w1) P(w3|w1w2) … P(wn |w1w2…wn-1)
![Page 8: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/8.jpg)
A language model
• The sum of probabilities of all strings has to be 1.
• Bigram and trigram models
• How do you estimate the probabilities?
Replace P(wn |w1w2…wn-1) with P(wn|wn-1)
Bigram model:
![Page 9: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/9.jpg)
Perplexity of a language model
• What is the perplexity of guessing a digit if all digits are equally likely?
• How about a letter?• How about guessing A with a probability of 1/4, B
with a probability of 1/2 and 10,000 other cases with a probability of 1/2 total (example modified from Joshua Goodman).
)(log)/1(2 iSPN
Perp
![Page 10: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/10.jpg)
Perplexity across distributions
• What if the actual distribution is very different from the expected one?
• Example: all of the 10,000 other cases are equally likely but P(A) = P(B) = 0.
• Cross-entropy = log (perplexity), measured in bits
![Page 11: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/11.jpg)
Smoothing techniques
• Imagine the following situation: you are looking at Web pages and you have to guess how many different languages they are written in.
• First one is in English, then French, then again English, then Korean, then Chinese, etc.Total: 5F, 3E, 1K, 1C
• Can you predict what the next language will be?• What is a problem with the simplest approach to
this problem?
![Page 12: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/12.jpg)
Smoothing
• Why smoothing?
• How many parameters are there in the model (given 100,000 possible words)
• What are the unsmoothed (ML) estimates for unigrams, bigrams, trigrams?
• Linear interpolation (mixture with λi).
• How to estimate λi?
![Page 13: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/13.jpg)
Example
• Consider the problem of estimating bigram probabilities from a training corpus.
• Probability mass must be 1.
• How to account for unseen events?
![Page 14: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/14.jpg)
Common methods
• Add-1 smoothing (add one to numerator, add N to denominator)
• Good-Turing smoothing
• Best: Kneser-Ney
![Page 15: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/15.jpg)
Markov Models
• Assumption: we can predict the probability of some future item on the basis of a short history
• Bigrams: first-level Markov models• Bigram grammars: as an N-by-N matrix of probabilities,
where N is the size of the vocabulary that we are modeling.
![Page 16: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/16.jpg)
Relative Frequenciesa aardvark aardwolf aback … zoophyte zucchini
a X 0 0 0 … X X
aardvark 0 0 0 0 … 0 0
aardwolf 0 0 0 0 … 0 0
aback X X X 0 … X X
… … … … … … … …
zoophyte 0 0 0 X … 0 0
zucchini 0 0 0 X … 0 0
![Page 17: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/17.jpg)
Language Modeling andStatistical Machine Translation
![Page 18: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/18.jpg)
The Noisy Channel Model
• Source-channel model of communication
• Parametric probabilistic models of language and translation
• Training such models
![Page 19: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/19.jpg)
Statistics
• Given f, guess e
ef
e’E F F E
encoder decoder
e’ = argmax P(e|f) = argmax P(f|e) P(e)e e
translation model language model
![Page 20: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/20.jpg)
Parametric probabilistic models
• Language model (LM)
• Deleted interpolation
• Translation model (TM)
P(e) = P(e1, e2, …, eL) = P(e1) P(e2|e1) … P(eL|e1 … eL-1)
P(eL|e1 … eK-1) P(eL|eL-2, eL-1)
Alignment: P(f,a|e)
![Page 21: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/21.jpg)
IBM’s EM trained models
1. Word translation
2. Local alignment
3. Fertilities
4. Class-based alignment
5. Non-deficient algorithm (avoid overlaps, overflow)
![Page 22: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/22.jpg)
Bayesian formulas
• argmaxe P(e | f) = ?
• P(e|f) = P(e) * P(f | e) / P(f)
• argmaxe P(e | f) = argmaxe P(e) * P(f | e)
The rest of the slides in this section are based on“A Statistical MT Tutorial Workbook” by Kevin Knight
![Page 23: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/23.jpg)
N-gram model
• P(e) = ?
• P(how's it going?) = 76,413/1,000,000,000 = 0.000076413
• Bigrams: b(y|x) = count (xy)/count (x)
• P(“I like snakes that are not poisonous”) = P(“I”|start) * P(“like”|”I”) * …
• Trigrams: b(z|xy) = ??
![Page 24: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/24.jpg)
Smoothing
• b(z | x y) = 0.95 * count (“xyz”) / count (“xy”) + 0.04 * count (“yz”) / count (“z”) + 0.008 * count (“z”) / totalwordsseen + 0.002
![Page 25: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/25.jpg)
Ordering words
(1) a a centre earthquake evacuation forced has historic Italian of of second southern strong the the village
(2) met Paul Wendy
![Page 26: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/26.jpg)
Translation models
• Mary did not slap the green witch.• Mary not slap slap slap the the green witch • Mary no daba una botefada a la verde bruja • Mary no daba una botefada a la bruja verde
![Page 27: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/27.jpg)
Translation models
• Fertility
• Permutation
![Page 28: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/28.jpg)
IBM model 3
• Fertility
• Spurious words (e0)
• Pick words
• Pick positions
![Page 29: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/29.jpg)
Translation models
• Mary did not slap the green witch.• Mary not slap slap slap the green witch • Mary not slap slap slap NULL the green witch • Mary no daba una botefada a la verde bruja • Mary no daba una botefada a la bruja verde
![Page 30: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/30.jpg)
Parameters
• N - fertility (x*x)
• T - translation (x)
• D - position (x)
• P
![Page 31: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/31.jpg)
Example
NULL And the program has been implemented
Le programme a ete mis en application
![Page 32: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/32.jpg)
Alignments
The blue house The blue house
La maison bleue La maison bleue
• Needed: P(a|f,e)
![Page 33: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/33.jpg)
Markov models
![Page 34: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/34.jpg)
Motivation
• Sequence of random variables that aren’t independent
• Example: weather reports
![Page 35: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/35.jpg)
Properties
• Limited horizon:P(Xt+1 = sk|X1,…,Xt) = P(Xt+1 = sk|Xt)
• Time invariant (stationary):= P(X2=sk|X1)
• Definition: in terms of a transition matrix A and initial state probabilities .
![Page 36: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/36.jpg)
Example
h a p
e t i1.0
1.0
0.4
0.6
0.4
1.0
0.6
0.3
0.3
0.4
start
![Page 37: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/37.jpg)
Visible MM
P(X1,…XT) = P(X1) P(X2|X1) P(X3|X1,X2) … P(XT|X1,…,XT-1)
= P(X1) P(X2|X1) P(X3|X2) … P(XT|XT-1)
P(t, i, p) = P(X1=t) P(X2=i|X1=t) P(X3=p|X2=i)
= 1.0 x 0.3 x 0.6
= 0.18
1
111
T
tXXX tt
a
![Page 38: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/38.jpg)
Hidden MM
start
0.3
0.5
0.50.7 COLA ICE TEA
![Page 39: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/39.jpg)
Hidden MM
• P(Ot=k|Xt=si,Xt+1=sj) = bijk
cola icetea lemonade
COLA 0.6 0.1 0.3
ICETEA 0.1 0.7 0.2
![Page 40: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/40.jpg)
Example
• P(lemonade,icetea|COLA) = ?
• P = 0.7 x 0.3 x 0.7 x 0.1 + 0.7 x 0.3 x 0.3 x 0.1 + 0.3 x 0.3 x 0.5 x 0.7 + 0.3 x 0.3 x 0.5 x 0.7 = 0.084
![Page 41: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/41.jpg)
Hidden MM
• Part of speech tagging, speech recognition, gene sequencing
• Three tasks:– A=state transition probabilities, B=symbol emission
probabilities, = initial state prob.
– Given = (A,B, ), find P(O|)
– Given O, , what is (X1,…XT+1)
– Given O and a space of all possible , find model that best describes the observations
![Page 42: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/42.jpg)
Collocations
![Page 43: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/43.jpg)
Collocations
• Idioms
• Free word combinations
• Know a word by the company that it keeps (Firth)
• Common use
• No general syntactic or semantic rules
• Important for non-native speakers
![Page 44: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/44.jpg)
Examples
Idioms
To kick the bucketDead endTo catch up
Collocations
To trade activelyTable of contentsOrthogonal projection
Free-word combinations
To take the busThe end of the roadTo buy a house
![Page 45: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/45.jpg)
Uses
• Disambiguation (e.g, “bank”/”loan”,”river”)
• Translation
• Generation
![Page 46: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/46.jpg)
Properties
• Arbitrariness
• Language- and dialect-specific
• Common in technical language
• Recurrent in context
• (see Smadja 83)
![Page 47: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/47.jpg)
Arbitrariness
• Make an effort vs. *make an exertion
• Running commentary vs. *running discussion
• Commit treason vs. *commit treachery
![Page 48: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/48.jpg)
Cross-lingual properties
• Régler la circulation = direct traffic
• Russian, German, Serbo-Croatian: direct translation is used
• AE: set the table, make a decision
• BE: lay the table, take a decision
• “semer le désarroi” - “to sow disarray” - “to wreak havoc”
![Page 49: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/49.jpg)
Types of collocations
• Grammatical: come to, put on; afraid that, fond of, by accident, witness to
• Semantic (only certain synonyms)
• Flexible: find/discover/notice by chance
![Page 50: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/50.jpg)
Base/collocator pairs
Base
NounNounVerbAdjectiveVerb
Collocator
verbadjectiveadverbadverbpreposition
Example
Set the tableWarm greetingsStruggle desperatelySound asleepPut on
![Page 51: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/51.jpg)
Extracting collocations• Mutual information
• What if I(x;y) = 0?• What if I(x;y) < 0?
P(x,y)
P(x)P(y)I (x;y) = log2
![Page 52: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/52.jpg)
Yule’s coefficient
A - frequency of lemma pairs involving both Li and Lj
B - frequency of pairs involving Li only
C - frequency of pairs involving Lk only
D - frequency of pairs involving neither
YUL = AD - BC
AD + BC -1 YUL 1
![Page 53: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/53.jpg)
Specific mutual information
• Used in extracting bilingual collocations
I (e,f) = p (e,f)
p(e) p(f)
• p(e,f) - probability of finding both e and f in aligned sentences
• p(e), p(f) - probabilities of finding the word in one of the languages
![Page 54: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/54.jpg)
Example from the Hansard corpus (Brown, Lai, and Mercer)
French word Mutual information
sein 5.63
bureau 5.63
trudeau 5.34
premier 5.25
résidence 5.12
intention 4.57
no 4.53
session 4.34
![Page 55: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/55.jpg)
Total p-5 p-4 p-3 p-2 p-1 p+1 p+2 p+3 p+4 p+5
8031 7 6 13 5 7918 0 12 20 26 24
Flexible and rigid collocations
• Example (from Smadja): “free” and “trade”
![Page 56: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/56.jpg)
Xtract (Smadja)
• The Dow Jones Industrial Average
• The NYSE’s composite index of all its listed common stocks fell *NUMBER* to *NUMBER*
![Page 57: SI 760 / EECS 597 / Ling 702 Language and Information](https://reader035.fdocuments.in/reader035/viewer/2022081504/5681360a550346895d9d8057/html5/thumbnails/57.jpg)
Translating Collocations
• Brush up a lesson, repasser une leçon
• Bring about/осуществлять
• Hansards: late spring: fin du printemps, Atlantic Canada Opportunities Agency, Agence de promotion économique du Canada atlantique