Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav...
-
Upload
myles-tolman -
Category
Documents
-
view
213 -
download
0
Transcript of Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav...
![Page 1: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/1.jpg)
Using the Web as an Implicit Training Set:
Application to Structural Ambiguity Resolution
Using the Web as an Implicit Training Set:
Application to Structural Ambiguity Resolution
Preslav Nakov and Marti HearstComputer Science Division and SIMS
University of California, Berkeley
Supported by NSF DBI-0317510 and a gift from Genentech
![Page 2: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/2.jpg)
Motivation
Huge datasets trump sophisticated algorithms. “Scaling to Very Very Large Corpora for Natural
Language Disambiguation”, ACL 2001 (Banko & Brill, 2001)
Task: spelling correction Raw text as “training data” Log-linear improvement even to billion words
Getting more data is better than fine-tuning algorithms.
How to generalize to other problems?
![Page 3: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/3.jpg)
Web as a Baseline
“Web as a baseline” (Lapata & Keller 04;05): applied simple n-gram models to: machine translation candidate selection article generation noun compound interpretation noun compound bracketing adjective ordering spelling correction countability detection prepositional phrase attachment
All unsupervised Findings:
Sometimes rival best supervised approaches. => Web n-grams should be used as a baseline.
Significantly better than the best supervised algorithm.
Not significantly different from the best supervised algorithm.
![Page 4: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/4.jpg)
Our Contribution
Potential of these ideas is not yet fully realized We introduce new features
paraphrases surface features
Applied to structural ambiguity problems Data sparseness: need statistics for every possible
word and for word combinations Problems (unsupervised):
Noun compound bracketing PP attachment NP coordination
state-of-the-art results(Nakov&Hearst, 2005)
this work
![Page 5: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/5.jpg)
Task 1:
Prepositional Phrase
Attachment
![Page 6: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/6.jpg)
PP attachment
(a) Peter spent millions of dollars. (noun)
(b) Peter spent time with his family. (verb)
quadruple: (v, n1, p, n2)
(a) (spent, millions, of, dollars)
(b) (spent, time, with, family)
Human performance: quadruple: 88% whole sentence: 93%
PP combines with the NPto form another NP
PP is an indirect object of the verb
![Page 7: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/7.jpg)
Related Work
Supervised (Brill & Resnik, 94):
transformation-based learning, WordNet classes, P=82%
(Ratnaparkhi & al., 94): ME, word classes (MI),
P=81.6% (Collins & Brooks, 95): back-off, P=84.5% (Stetina & Makoto, 97):
decision trees, WordNet, P=88.1%
(Toutanova & al., 04): morphology, syntax, WordNet, P=87.5%
(Olteanu & Moldovan, 05): in context, parser, FrameNet,
Web, SVM, P=92.85%
Unsupervised (Hindle & Rooth, 93): partially
parsed corpus, lexical associations over subsets of (v,n1,p), P=80%,R=80%
(Ratnaparkhi, 98): POS tagged corpus, unambiguous cases for (v,n1,p), (n1,p,n2), classifier: P=81.9%
(Pantel & Lin,00): collocation database, dependency parser, large corpus (125M words), P=84.3%
Unsup. state-of-the-art
Ra
tna
par
khi
dat
ase
t
![Page 8: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/8.jpg)
Related Work: Web Unsup.
(Volk, 00): Altavista, NEAR operator, German, compare Pr(p|n1) vs. Pr(p|v), P=75%, R=58%
(Volk, 01): Altavista, NEAR operator, German, inflected queries, Pr(p,n2|n1) vs. Pr(p,n2|v), P=75%, R=85%
(Calvo & Gelbukh, 03): exact phrases, Spanish, P=91.97%, R=89.5%
(Lapata & Keller,05): Web n-grams, English, Ratnaparkhi dataset, P in low 70’s
(Olteanu & Moldovan, 05): supervised, English, in context, parser, FrameNet, Web counts, SVM, P=92.85%
![Page 9: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/9.jpg)
PP-attachment: Our Approach
Unsupervised (v,n1,p,n2) quadruples, Ratnaparkhi test set Google and MSN Search Exact phrase queries Inflections: WordNet 2.0 Adding determiners where appropriate Models:
n-gram association models Web-derived surface features paraphrases
![Page 10: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/10.jpg)
Probabilities: Estimation
Using page hits as a proxy for n-gram counts
Pr(w1|w2) = #(w1,w2) / #(w2) #(w2) word frequency; query for “w2”
#(w1,w2) bigram frequency; query for “w1 w2”
Pr(w1,w2|w3) = #(w1,w2,w3) / #(w3)
![Page 11: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/11.jpg)
N-gram models
(i) Pr(p|n1) vs. Pr(p|v) (ii) Pr(p,n2|n1) vs. Pr(p,n2|v)
I eat/v spaghetti/n1 with/p a fork/n2. I eat/v spaghetti/n1 with/p sauce/n2.
Pr or # (frequency) smoothing as in (Hindle & Rooth, 93) back-off from (ii) to (i)
N-grams unreliable, if n1 or n2 is a pronoun. MSN Search: no rounding of n-gram estimates
![Page 12: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/12.jpg)
Web-derived Surface Features
Example features open the door / with a key verb (100.00%, 0.13%) open the door (with a key) verb (73.58%, 2.44%) open the door – with a key verb (68.18%, 2.03%) open the door , with a key verb (58.44%, 7.09%)
eat Spaghetti with sauce noun (100.00%, 0.14%) eat ? spaghetti with sauce noun (83.33%, 0.55%) eat , spaghetti with sauce noun (65.77%, 5.11%) eat : spaghetti with sauce noun (64.71%, 1.57%)
Summing achieves high precision, low recall.
P R
su
ms
um
compare
![Page 13: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/13.jpg)
Paraphrases
v n1 p n2
v n2 n1 (noun) v p n2 n1 (verb) p n2 * v n1 (verb) n1 p n2 v (noun) v PRONOUN p n2 (verb) BE n1 p n2 (noun)
![Page 14: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/14.jpg)
Paraphrases: pattern (1)
(1) v n1 p n2 v n2 n1 (noun)
Can we turn “n1 p n2” into a noun compound “n2 n1”? meet/v demands/n1 from/p customers/n2 meet/v the customer/n2 demands/n1
Problem: ditransitive verbs like give gave/v an apple/n1 to/p him/n2 gave/v him/n2 an apple/n1
Solution: no determiner before n1 determiner before n2 is required the preposition cannot be to
![Page 15: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/15.jpg)
Paraphrases: pattern (2)
(2) v n1 p n2 v p n2 n1(verb)
If “p n2” is an indirect object of v, then it could be switched with the direct object n1. had/v a program/n1 in/p place/n2 had/v in/p place/n2 a program/n1
Determiner before n1 is required to prevent
“n2 n1” from forming a noun compound.
![Page 16: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/16.jpg)
Paraphrases: pattern (3)
(3) v n1 p n2 p n2 * v n1 (verb)
“*” indicates a wildcard position (up to three intervening words are allowed)
Looks for appositions, where the PP has moved in front of the verb, e.g. I gave/v an apple/n1 to/p him/n2 to/p him/n2 I gave/v an apple/n1
![Page 17: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/17.jpg)
Paraphrases: pattern (4)
(4) v n1 p n2 n1 p n2 v(noun)
Looks for appositions, where “n1 p n2” has moved in front of v shaken/v confidence/n1 in/p markets/n2 confidence/n1 in/p markets/n2 shaken/v
![Page 18: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/18.jpg)
Paraphrases: pattern (5)
(5) v n1 p n2 v PRONOUN p n2(verb)
n1 is a pronoun verb (Hindle&Rooth, 93)
Pattern (5) substitutes n1 with a dative pronoun (him or her), e.g. put/v a client/n1 at/p odds/n2 put/v him at/p odds/n2
pronoun
![Page 19: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/19.jpg)
Paraphrases: pattern (6)
(6) v n1 p n2 BE n1 p n2 (noun)
BE is typically used with a noun attachment
Pattern (6) substitutes v with a form of to be (is or are), e.g. eat/v spaghetti/n1 with/p sauce/n2 is spaghetti/n1 with/p sauce/n2
to be
![Page 20: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/20.jpg)
Evaluation
Ratnaparkhi dataset 3097 test examples, e.g.
prepare dinner for family Vshipped crabs from province V
n1 or n2 is a bare determiner: 149 examples problem for unsupervised methodsleft chairmanship of the Nis the of kind Nacquire securities for an N
special symbols: %, /, & etc.: 230 examples problem for Web queriesbuy % for 10 Vbeat S&P-down from % Vis 43%-owned by firm N
![Page 21: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/21.jpg)
Results
Simpler but not significantlydifferent from 84.3%(Pantel&Lin,00).
For prepositions other then OF.(of noun attachment)Smoothing is not
needed on the Web
Models in bold are combined in a majority vote.
Checking directly for...
![Page 22: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/22.jpg)
Task 2:
Coordination
![Page 23: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/23.jpg)
Coordination & Problems (Modified) real sentence:
The Department of Chronic Diseases and Health Promotion leads and strengthens global efforts to prevent and control chronic diseases or disabilities and to promote health and quality of life.
Problems: boundaries: words, constituents, clauses etc. interactions with PPs: [health and [quality of life]]
vs. [[health and quality] of life]
or meaning and: chronic diseases or disabilities ellipsis
![Page 24: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/24.jpg)
NC coordination: ellipsis Ellipsis
car and truck production means car production and truck production
No ellipsis president and chief executive
All-way coordination Securities and Exchange Commission
![Page 25: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/25.jpg)
NC Coordination: ellipsis
Quadruple (n1,c,n2,h) Penn Treebank annotations
ellipsis:
(NP car/NN and/CC truck/NN production/NN). no ellipsis:
(NP (NP president/NN) and/CC (NP chief/NN executive/NN))
all-way: can be annotated either way This is a problem a parser must deal with.
Collins’ parser always predicts ellipsis,but other parsers (e.g. Charniak’s) try to solve it.
![Page 26: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/26.jpg)
Related Work
(Resnik, 99): similarity of form and meaning, conceptual association, decision tree, P=80%, R=100%
(Rus & al., 02): deterministic, rule-based bracketing in context, P=87.42%, R=71.05%
(Chantree & al., 05): distributional similarities from BNC, Sketch Engine (freqs., object/modifier etc.), P=80.3%, R=53.8%
(Goldberg, 99): different problem (n1,p,n2,c,n3), adapts Ratnaparkhi (99) algorithm, P=72%, R=100%
![Page 27: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/27.jpg)
N-gram models
(n1,c,n2,h)
(i) #(n1,h) vs. #(n2,h) (ii) #(n1,h) vs. #(n1,c,n2)
![Page 28: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/28.jpg)
Surface Featuress
um
su
m
compare
![Page 29: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/29.jpg)
Paraphrases
n1 c n2 h
n2 c n1 h (ellipsis) n2 h c n1 (NO ellipsis) n1 h c n2 h (ellipsis) n2 h c n1 h (ellipsis)
![Page 30: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/30.jpg)
Paraphrases: Pattern (1)
(1) n1 c n2 h n2 c n1 h (ellipsis)
Switch places of n1 and n2 bar/n1 and/c pie/n2 graph/h pie/n2 and/c bar/n1 graph/h
![Page 31: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/31.jpg)
Paraphrases: Pattern (2)
(2) n1 c n2 h n2 h c n1 (NO ellipsis)
Switch places of n1 and n2 h president/n1 and/c chief/n2 executive/h chief/n2 executive/h and/c president/n1
![Page 32: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/32.jpg)
Paraphrases: Pattern (3)
(3) n1 c n2 h n1 h c n2 h (ellipsis)
Insert the elided head h bar/n1 and/c pie/n2 graph/h bar/n1 graph/h and/c pie/n2 graph/h
h
![Page 33: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/33.jpg)
Paraphrases: Pattern (4)
(4) n1 c n2 h n2 h c n1 h (ellipsis)
Insert the elided head h, but also switch n1 and n2 bar/n1 and/c pie/n2 graph/h pie/n2 graph/h and/c bar/n1 graph/h
h
![Page 34: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/34.jpg)
(Rus & al.,02) Heuristics
Heuristic 1: no ellipsis n1=n2 milk/n1 and/c milk/n2 products/h
Heuristic 4: no ellipsis n1 and n2 are modified by an adjective
Heuristic 5: ellipsis only n1 is modified by an adjective
Heuristic 6: no ellipsis only n2 is modified by an adjective
We use a determiner.
![Page 35: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/35.jpg)
Number Agreement
Introduced by Resnik (93)(a) n1&n2 agree, but n1&h do not ellipsis;
(b) n1&n2 don’t agree, but n1&h do no ellipsis;
(c) otherwise leave undecided.
![Page 36: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/36.jpg)
Results428 examples from Penn TB
Models in bold are combined in a majority vote.
Comparable to other researchers (but no standard dataset).
Bad, compares bigram to trigram.
![Page 37: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/37.jpg)
Conclusions & Future Work
Tapping the potential of very large corpora for unsupervised algorithms Go beyond n-grams
Surface features Paraphrases
Results competitive with best unsupervised Results can rival supervised algorithms’
Future Work other NLP tasks better evidence combination
There should be even more exciting features on the Web!
![Page 38: Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution Preslav Nakov and Marti Hearst Computer Science Division and.](https://reader038.fdocuments.in/reader038/viewer/2022110205/56649c785503460f9492e325/html5/thumbnails/38.jpg)
The End
Thank you!