Fine-Grained Soft Semantic Constraints
description
Transcript of Fine-Grained Soft Semantic Constraints
![Page 1: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/1.jpg)
Fine-Grained Soft
Semantic Constraints
Yuval MartonUniversity of Maryland
http://umiacs.umd.edu/~ymarton/pub/umanch/Hybrid Knowledge-CorpusBasedSem-Manchester_090614.ppt
![Page 2: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/2.jpg)
Yuval Marton, U Manchester talk 2
Why Care?
Tell’em apart:
These, too:
![Page 3: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/3.jpg)
Yuval Marton, U Manchester talk 3
FOX
• FOX =
• FOX = FOrkhead/winged-heliX replicator gene
![Page 4: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/4.jpg)
Yuval Marton, U Manchester talk 4
Road map
• Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
![Page 5: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/5.jpg)
Yuval Marton, U Manchester talk 5
Dissertation Theme
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints– Soft Constraints– Fine-Grained– Syntactic (parsing)– Semantic (“concepts”, paraphrases)
• Evaluated in – Word-pair similarity ranking and – Statistical Machine Translation (SMT)
![Page 6: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/6.jpg)
Yuval Marton, U Manchester talk 6
Soft Constraints
• Hard constraints– [0,1]; in/out– Decrease search space– “structural zeroes”– Theory-driven– Faster, slimmer
• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge
Univ.
Hard
Univ.
Soft
![Page 7: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/7.jpg)
Yuval Marton, U Manchester talk 7
Fine-grained
• Granularity is a big deal– Soft syntactic constraints in SMT
• Chiang 2005 vs. Marton and Resnik 2008
• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009
• Pos results better results
![Page 8: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/8.jpg)
Yuval Marton, U Manchester talk 8
Soft Syntactic Constraints• X X1 speech ||| X1 espiche
– What should be the span of X1?
• Chiang’s 2005 constituency feature– Reward rule’s score if rule’s
source-side matches a constituent span
– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)
– Good idea -- Neg-result • But what if…
![Page 9: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/9.jpg)
Yuval Marton, U Manchester talk 9
Rule granularity
• Chiang: Single weight for all constituents (parse tags)
• … But what if we can assign a separate feature and weight for each constituent?
• E.g., NP-only: (NP= )
• Or VP-only: (VP= )
![Page 10: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/10.jpg)
Yuval Marton, U Manchester talk 10
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009• Pos results better results
![Page 11: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/11.jpg)
Yuval Marton, U Manchester talk 11
Word-pair similarity ranking
• Give each word pair a similarity score– Rooster – voyage– Coast – shore
• Noun-noun (Rubinstein & Goodenough, 1965)
• Verb-verb (Resnik & Diab, 2000)
• Result: list of pairs ordered by similarity• Spearman rank correlation
![Page 12: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/12.jpg)
Yuval Marton, U Manchester talk 12
Similarity measures
• Distributional profiles (DP)– Which words did I occur next to?
• Context vectors
• Similar vectors similar meaning
![Page 13: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/13.jpg)
Yuval Marton, U Manchester talk 13
Bank (pure word-based)
Bank
![Page 14: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/14.jpg)
Yuval Marton, U Manchester talk 14
Bank (pure concept-based)
BankTellerMoney
…
Financial Institution
Water
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
![Page 15: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/15.jpg)
Yuval Marton, U Manchester talk 15
Bank (Hybrid Model)
BankRiverBankFin.Inst
![Page 16: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/16.jpg)
Yuval Marton, U Manchester talk 16
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
Soft semantic constraints in word-pair similarity ranking • Mohammad and Hirst 2006 vs.
Marton, Mohammad and Resnik 2009• Pos results better results
![Page 17: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/17.jpg)
Yuval Marton, U Manchester talk 17
Unified Model
• Soft constraints in a log-linear model– Syntactic
– Semantic
– …
• ihi(x)
• Add more terms to the sum
![Page 18: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/18.jpg)
Yuval Marton, U Manchester talk 18
Road map
Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
![Page 19: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/19.jpg)
Yuval Marton, U Manchester talk 19
Distributional profiles (DPs)
• DPW: word-based distributional profile– First order
– Distributional Hypothesis (Harris 1940; Firth 1957)
– Second order (vector representation)
– Strength of association• Counts, PMI, TF/IDF-based,
Log-likelihood ratios …
– Vector similarity (cosine, L1, L2,..)
word x word
Bush Obama
Presi-dent
.93 .96
Demo-crat
.13 .89
Repub-lican
.88 .15
White-house
.76 .91
… .45 .74
![Page 20: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/20.jpg)
Yuval Marton, U Manchester talk 20
Taxonomies and Groupings
• WordNet– Synsets– Relations (“is-a”)– Arc distance
• UMLS• Thesaurus
– Flat– Coarse
– Bankriver = water ??
job
Academic job
Is-a
Postdoc
Is-a
Industry job
Is-a
CEO
Is-a
![Page 21: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/21.jpg)
Yuval Marton, U Manchester talk 21
Hybrid measures
• WordNet– Resnik’s method (info content)– Lin and others
• Thesaurus Concept-based – Mohammad and Hirst (coarse-grained)– word may be listed under several concepts– Distance b/w most similar senses– Pro: Resource-poor languages and domains– Con: Small thesaurus low applicability– WCCM: Financial instit. ~ academic instit.
– Bankriver = water ??
![Page 22: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/22.jpg)
Yuval Marton, U Manchester talk 22
WCCM: Concept-Word matrix
• WCCM: word-concept collocation matrix
• DPC: concept-based distributional profile
• Potentially iterative process
• Clean-up
conceptx word
Fin.Inst Water
bank .97 .85
teller .88 .07
money .94 .15
water .32 .91
… .45 .74
![Page 23: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/23.jpg)
Yuval Marton, U Manchester talk 23
Use concept-based DPCs to bias word-based DPWs
Bank
BankTellerMoney
…
WaterFinancial Institution WaterFinancial Institution
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
BankRiverBankFin.Inst
+
=
![Page 24: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/24.jpg)
Yuval Marton, U Manchester talk 24
Fine-grained soft constraints
• DPWS: distributional profile of word senses
• Use concept-based DPCs to bias word-based DPWs– Hybrid-filtered
– Hybrid-proportional
![Page 25: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/25.jpg)
Yuval Marton, U Manchester talk 25
Hybrid-filtered
Fin.Inst DPC
Water DPC
bank
DPW
bankriver
DPWS
bank .97 .85 .76 .76
teller .88 .07 .54 .54
money .94 .15 .68 .68
water .00 .91 .62 .00
… .45 .74 .25 .25
Filter out collocates in DPW, if not appearing in DPC
![Page 26: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/26.jpg)
Yuval Marton, U Manchester talk 26
Hybrid-proportional
Fin.Inst DPC
Water DPC
bank
DPW
bankriver
DPWS
bank .97 .85 .76 .33
teller .88 .07 .54 .05
money .94 .15 .68 .08
water .00 .91 .62 .00
… .45 .74 .25 .15
Only discount collocate’s value in DPW, in proportion to the ratio of its count in current DPC relative to all DPCs of the target word
![Page 27: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/27.jpg)
Yuval Marton, U Manchester talk 27
WSD with DPWS
• Each sense of each word has a unique profile
– Bankfin.inst ≠ Bankriver ≠ water !
• Pro:– Not aggregated: DPC profiles are
– Non/less smearing: DPW profiles smear all senses in a single profile
![Page 28: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/28.jpg)
Yuval Marton, U Manchester talk 28
Results
![Page 29: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/29.jpg)
Yuval Marton, U Manchester talk 29
evaluation
• Word-pair similarity ranking– Spearman Rank correlation
• Paraphrasing in SMT– BLEU, TER, METEOR, ..
![Page 30: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/30.jpg)
Yuval Marton, U Manchester talk 30
comparison
• WordNet results
• LSA results
![Page 31: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/31.jpg)
Yuval Marton, U Manchester talk 31
Challenges
• Antonyms (black – white)
• “Hyperonyms” (vehicle – car)
• Co-hypernyms / co-taxonyms
![Page 32: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/32.jpg)
Yuval Marton, U Manchester talk 32
Named Entities
• Challenges:– Bush – Obama
• Potentially helpful:– H2O – Water– FOX – “forkhead/winged-helix replicator”– FOXP2 – SPCH1
• “SPCH1” turned out to be a member of the FOX (forkhead/winged-helix replicator genes) family, of which several other genes are known all across the animal world. It was then labeled FOXP2, that being its current, and more conventional, name.
![Page 33: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/33.jpg)
Yuval Marton, U Manchester talk 33
Biomedical/Chemical WSD
• Explore hybrid methods to create DPWS – FOXgene , FOXanimal
• requires a lexical resource – UMLS or other resources
• Useful for smaller training sets!
![Page 34: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/34.jpg)
Yuval Marton, U Manchester talk 34
conclusion
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints– Soft Constraints
– Fine-Grained
– Semantic (“concepts”)
– resource-poor setting, special domains
Univ.
Soft
BankRiverBankFin.Inst
![Page 35: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/35.jpg)
Yuval Marton, U Manchester talk 35
Thank you!
Questions?
Advisors: Philip Resnik & Amy Weinberg
Department of Linguistics and CLIP Lab
![Page 36: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/36.jpg)
Yuval Marton, U Manchester talk 36
Fine-grained semantic
• Word-based: – Bank: river, money, water, teller, …
• “concept”-based– River: water, bank, boat, …– Financial institution: bank, money, teller,…– Humans compare closest senses
– Bankriver = water ??
• Hybrid: – Bankriver: more strongly associated with water
– Bankfin.inst: more strongly associated with money
![Page 37: Fine-Grained Soft Semantic Constraints](https://reader035.fdocuments.in/reader035/viewer/2022062315/56814e87550346895dbc237e/html5/thumbnails/37.jpg)
Yuval Marton, U Manchester talk 37
SMT
• Statistical Machine Translation– What translational units to use?
– Syntactic constituents, re-ordering
– “es gibt”
• Paraphrases– Pivoting vs. bitext-free paraphrasing
– Typically monolingual
– Translation = bilingual / cross-domain paraphrasing
– Can be evaluated in SMT