Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative...
Transcript of Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative...
![Page 1: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/1.jpg)
Discriminative Lexical Semantic Segmentation with Gaps:
Running the MWE Gamut
Nathan Schneider • August 27, 2013
![Page 2: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/2.jpg)
2
Opiliones
daddy longlegs
harvestman
Kevin Knight
Weberknechte
Schuster
Kanker
Opa Langbein
Zimmermann
Schneider
![Page 3: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/3.jpg)
3
Opiliones
daddy longlegs
harvestman
Weberknechte
Schuster
Kanker
Opa Langbein
Zimmermann
Schneider
Kevin Knight
![Page 4: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/4.jpg)
4
The aliens will destroy Earth
unless we
accept agree toaccede toyield to
give in to
comply withcooperate withgo along with
their demands.
![Page 5: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/5.jpg)
5
• sdfsf
Jonathan Huang
Kevin_Knight
daddy_longlegs
give_in_to
![Page 6: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/6.jpg)
6
Kevin_Knight refused to give_in_to the vicious daddy_longlegs .
![Page 7: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/7.jpg)
7
Kevin_Knight refused to give_in_to the vicious daddy_longlegs .
![Page 8: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/8.jpg)
8
Kevin_Knight refused to give_in_to the vicious daddy_longlegs .
![Page 9: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/9.jpg)
Kevin_Knight refused to give_in_to the vicious daddy_longlegs .
9
Kevin_Knight refused to give_in_tothe vicious daddy_longlegs .
![Page 10: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/10.jpg)
10
Alan_Black refused to give_in_to the vicious daddy_longlegs .
Lexical segmentation
Kevin_Knight
refused
to
give_in_to
the
vicious
daddy_longlegs
.
![Page 11: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/11.jpg)
11
Roadmap
• MWEs in NLP
‣ What are they?
‣ Why are they important?
‣ Why are they challenging?
‣ How are they handled?
• Corpus annotation
• Sequence tagging formulation & experiments
![Page 12: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/12.jpg)
Definition• Multiword expression (MWE): 2 or more
orthographic words/lexemes that function together as an idiomatic whole
• idiomatic = not fully predictable in form, function, and/or frequency
‣ unusual morphosyntax: Me/*Him neither; by and large; plural of daddy longlegs?
‣ non- or semi-compositional: ice cream, daddy longlegs, pay attention
‣ statistically collocated:p(highly unlikely) > p(strongly unlikely)
12
![Page 13: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/13.jpg)
Definition• Multiword expression (MWE): 2 or more
orthographic words/lexemes that function together as an idiomatic whole
• idiomatic = not fully predictable in form, function, and/or frequency
‣ unusual morphosyntax: Me/*Him neither; by and large; plural of daddy longlegs?
‣ non- or semi-compositional: ice cream, daddy longlegs, pay attention
‣ statistically collocated:p(highly unlikely) > p(strongly unlikely)
13
SPECIALLY LEARNED
![Page 14: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/14.jpg)
Applications• semantic analysis: minimal meaning-bearing
units (e.g., predicates)
‣ named entity recognition, supersense tagging already target some kinds of MWEs
‣ sentiment analysis: MW opinion expressions & opinion targets
• IR: keyphrase extraction, query segmentation
• MT: decomposing MWEs in translation often incorrect or more ambiguous
• language acquisition: many MWEs are difficult for learners
14
![Page 15: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/15.jpg)
Challenges
• Not superficially apparent in text
• Number/frequency
‣ Too many expressions to list all of them
‣ Individually rare, but frequent in aggregate
• Diversity
‣ Many different construction types
‣ Semantically unrestricted
‣ Can be gappy
15
![Page 16: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/16.jpg)
16
Kevin Knightdaddy longlegs, hot dog
dry outdepend on
pay attention (to)put up with, give in (to)
under the weathercut and dryin spite of
pick up where __ left offeasy as pie
You’re welcome.To each his own.
pay
dry the clothesout
closeattention (to)
theypick up where left off__
no attention was paid (to)
![Page 17: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/17.jpg)
Current state of affairsResource-building
‣ lexicons (e.g., WordNet, WikiMwe), grammars
‣ corpora: treebanks (French Treebank, Prague Czech-English Dependency Treebank)
Explicit
‣ Corpus → List: collocation extraction by word association measures
‣ List → Corpus: matching, classification
‣ Corpus (+ List): sequence modeling, parsing
Implicit
‣ language modeling, phrase-based MT17
![Page 18: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/18.jpg)
Current state of affairsResource-building
‣ lexicons (e.g., WordNet, WikiMwe), grammars
‣ corpora: treebanks (French Treebank, Prague Czech-English Dependency Treebank)
Explicit
‣ Corpus → List: collocation extraction by word association measures
‣ List → Corpus: matching, classification
‣ Corpus (+ List): sequence modeling, parsing
Implicit
‣ language modeling, phrase-based MT18
![Page 19: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/19.jpg)
Contributions• Our goal: general-purpose, shallow,
automatic identification of MWEs in context
• Existing resources are not satisfactory.
‣ New corpus—first freely annotated for MWEs, without a preexisting lexicon.
• Existing discriminative sequence modeling techniques do not handle gaps.
‣ New gappy tagging scheme + model trained and evaluated on our annotated corpus.
19
![Page 20: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/20.jpg)
20
Roadmap
✓ MWEs in NLP
‣ What are they?
‣ Why are they important?
‣ Why are they challenging?
‣ How are they handled?
• Corpus annotation
• Sequence tagging formulation & experiments
![Page 21: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/21.jpg)
21
![Page 22: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/22.jpg)
Examples
My wife had taken_ her '07_Ford_Fusion _in for a routine oil_change .
22
![Page 23: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/23.jpg)
23
![Page 24: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/24.jpg)
The corpus
• The entire Reviews subsection of the English Web Treebank (Bies et al. 2012), fully annotated for MWEs
‣ 723 reviews
‣ 3,800 sentences
‣ 55,000 words
• Every sentence: negotiated consensus between at least 2 annotators
‣ IAA between pairs: ~77%
24
![Page 25: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/25.jpg)
Examples
Among the animals that were available to touch were pony's , camels and EVEN AN OSTRICH !!!
25
No MWEs here. (This sentence is in the minority: 57% of all sentences/72% >10 words contain an MWE.)
![Page 26: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/26.jpg)
Examples
They gave me the run around and missing paperwork only to call back to tell me someone
else wanted her and I would need to come in and put down a deposit .
26
![Page 27: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/27.jpg)
Examples
It put_hair_on_ my _chest and thanks_to the owner s advice I invested vanguard , got myself a
woman like Jerry , and became a republican .
27
![Page 28: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/28.jpg)
Examples
They gave_ me _the_run_around and missing paperwork only to call_back to tell me someone
else wanted her and I would need to come_in and put_down a deposit .
28
Simplified a bit for presentational purposes (we also made a strong/weak distinction)
![Page 29: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/29.jpg)
Examples
I highly~recommend Debi , she does~ an amazing ~job , I " love " the way she cuts_ my _hair ,
extremely thorough and cross_checks her work to make_sure my hair is perfect .
29
Weak expressions: highly~recommend, does~job
![Page 30: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/30.jpg)
Examples
I recently threw~ a surprise ~birthday_party for my wife at Fraiser_'s .
30
Weak expressions can contain strong MWEs.
Overlap: Ideally we’d have threw~party, birthday_party, surprise_party
![Page 31: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/31.jpg)
Annotation guidelines
31
https://github.com/nschneid/nanni/wiki/MWE-Annotation-Guidelines
![Page 32: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/32.jpg)
32
Roadmap
✓ MWEs in NLP
‣ What are they?
‣ Why are they important?
‣ Why are they challenging?
‣ How are they handled?
✓ Corpus annotation
• Sequence tagging formulation & experiments
![Page 33: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/33.jpg)
Gappy sequence tagging
• Simplest tagger (our baseline):
1. obtain MWE candidates from lexicons
2. predict the segmentation with fewest total expressions
• We extract lexicons from 10 existing sources of MWEs
‣ WordNet, SemCor, Prague Czech-English Treebank, SAID, WikiMwe, Wiktionary, and other lists
33
![Page 34: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/34.jpg)
• Contiguous MWE identification resembles chunking, so we can use the familiar BIO scheme (Ramshaw & Marcus 1995):
• We add 3 new tags for gaps:
‣ Assumption: no more than 1 level of nesting
• Evaluation: MWE precision/recall
‣ MUC criterion: partial credit for partial overlap
Gappy sequence tagging
34
a routine oil_change .O O B I O
My wife had taken_ her '07_Ford_Fusion _inBO O O o b i i I
![Page 35: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/35.jpg)
Pathological examples
On August 3 , two massive headlands reared out_of the mists -- great gateways never~before~ , so_far_as~ Hudson ~knew , ~seen by Europeans .
35
![Page 36: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/36.jpg)
Pathological examples
All you have to do to make it authentic Jamaican food , is add a_~whole~_lot of pepper .
36
![Page 37: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/37.jpg)
Gappy sequence tagging
• Standard supervised learning with the enriched tagging scheme
• We use the structured perceptron (Collins 2002)
‣ Discriminative
‣ 1st-order Markov assumption
‣ Averaging
‣ Fast to train
37
![Page 38: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/38.jpg)
Gappy sequence tagging• Basic features
adapted from Constant et al. (2012):
‣ word: current & context, unigrams & bigrams
‣ gold POS: current & context, unigrams & bigrams
‣ capitalization; word shape
‣ prefixes, suffixes up to 4 characters
‣ has digit; non-alphanumeric characters
‣ lemma + context lemma if one is a V and the other is ∈ {N, V, Adj., Adv., Prep., Part.}
• Lexicon features: WordNet & other lexicons38
![Page 39: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/39.jpg)
Gappy sequence tagging
• Experimental setup
‣ Regularization by early stopping
‣ 8-fold cross-validation; results are 8-way averages
‣ Jon Clark’s ducttape
39
![Page 40: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/40.jpg)
40
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.25 0.30 0.35 0.40 0.45 0.50 0.55
Statistical vs. Matching, and # of lexicons used
Prec
isio
n
Recall
lexicons model + lexicons
10 lexicons
2 lexicons
10 lexiconsF = 62%
0 lexicons
F = 34%
P = R
![Page 41: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/41.jpg)
41
(all use 10 lexicons) P R F
Baseline: lexicon matching 0.279 0.446 0.342
Sequence model 0.790 0.511 0.618
![Page 42: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/42.jpg)
42
Word clusters
• Brown clusters (Brown et al. 1992)
‣ latent word categories explaining observed sequences
‣ hard assignment: each word goes in 1 cluster
‣ agglomerative, greedy, scalable algorithm
• 1000 from reviews in the Yelp Academic Dataset (20.7M words)
‣ words occurring ≥25 times
‣ Percy Liang’s implementation
![Page 43: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/43.jpg)
43
![Page 44: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/44.jpg)
44
spelling variation, synonymy
![Page 45: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/45.jpg)
45
spelling variation, synonymy syntactic &
pragmatic similarity
![Page 46: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/46.jpg)
46
spelling variation, synonymy syntactic &
pragmatic similarity
semantic category
idiosyncratic lexical context
![Page 47: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/47.jpg)
47
(all use 10 lexicons) P R F
Baseline: lexicon matching 0.279 0.446 0.342
Sequence model 0.790 0.511 0.618
+ Brown clusters 0.790 0.515 0.624
![Page 48: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/48.jpg)
48
Word association scores• large literature on statistical measures of
collocation
‣ information theoretic: mutual information, …
‣ frequentist: t-statistic, χ², …
• scores → rankings → rank threshold features
1. POS tag the Yelp Academic Dataset with the Twitter tagger (Owoputi et al. 2013)
2. Define 2-word patterns of interest: Adj. N, N N, Prep. N, V N, V Prep., V Particle
3. Use mwetoolkit (Ramisch et al. 2010) to identify, score (t), and rank each group of candidates
![Page 49: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/49.jpg)
49
(all use 10 lexicons) P R F
Baseline: lexicon matching 0.279 0.446 0.342
Sequence model 0.790 0.511 0.618
+ Brown clusters 0.790 0.515 0.624
+ mwetoolkit word associations 0.793 0.511 0.621
![Page 50: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/50.jpg)
50
Recall-oriented learning
• Our supervised learner is actually optimizing for tag accuracy, not expression precision/recall
‣ This tends to hurt recall, because (short of strong evidence) the safest tag is O
• A recall-oriented cost function can compensate by biasing in favor of recall (Mohit et al. 2012), improving the F score
‣ Tunable hyperparameter controls the strength of this preference
![Page 51: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/51.jpg)
51
(all use 10 lexicons) P R F
Baseline: lexicon matching 0.279 0.446 0.342
Sequence model 0.790 0.511 0.618
+ Brown clusters 0.790 0.515 0.624
+ mwetoolkit word associations 0.793 0.511 0.621
+ recall-oriented learning 0.700 0.596 0.645
![Page 52: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/52.jpg)
52
Error analysis
• Cross-gap recall: 155/466 = 33%
✓ unseen TPs:
✗ unseen FPs: a little girl, bad for, cigarette smoke, funeral director, get coupon, kitchen sink
✗ unseen FNs: an arm and a leg, bad for business, child predator, dfw metro area
above allallen tireamusement parksantipasto mistoaortic stenosis
associate withat peacebehind the scenebrand newcarnegie mellon
check - incleaning ladycome upcowboy bootcup of joe
![Page 53: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/53.jpg)
Conclusions• Multiword expressions are important and
challenging
• We can shallowly mark them in free text
‣ new corpus resource!
• MWE identification can be modeled as sequence tagging
‣ even with gaps!
‣ statistical learning ≫ lexicon-based segmentation
‣ but lexical resources are still useful (features!)53
![Page 54: Discriminative Lexical Semantic Segmentation with Gapsnschneid/mwe_slides1.pdf · Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut Nathan Schneider ¥](https://reader030.fdocuments.in/reader030/viewer/2022040418/5d60f61a88c99356158b7892/html5/thumbnails/54.jpg)
54
Many_thanks(*Several thanks)
Thanks_a_million(*Thanks a thousand)
Thanks_a_lot(?Lots of thanks)
Emily Mike
Spencer
Chris Noah
Henrietta
Nora