Chapter 20 Part 2

25
1 1 Chapter 20 Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney,

description

Chapter 20 Part 2. Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova. 1. Knowledge-based WSD. Task definition - PowerPoint PPT Presentation

Transcript of Chapter 20 Part 2

Page 1: Chapter 20 Part 2

11

Chapter 20Part 2

Computational Lexical Semantics

Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova

Page 2: Chapter 20 Part 2

Knowledge-based WSD

• Task definition

• Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text

• Resources– Yes

• Machine Readable Dictionaries

• Raw corpora

– No• Manually annotated corpora

2

Page 3: Chapter 20 Part 2

Machine Readable Dictionaries

• In recent years, most dictionaries made available in Machine Readable format (MRD)– Oxford English Dictionary– Collins– Longman Dictionary of Ordinary Contemporary

English (LDOCE)

• Thesauruses – add synonymy information– Roget Thesaurus

• Semantic networks – add more semantic relations– WordNet– EuroWordNet

3

Page 4: Chapter 20 Part 2

MRD – A Resource for Knowledge-based WSD

• For each word in the language vocabulary, an MRD provides:– A list of meanings– Definitions (for all word meanings)– Typical usage examples (for most word meanings)

WordNet definitions/examples for the noun plant1. buildings for carrying on industrial labor; "they built a large plant to

manufacture automobiles”

2. a living organism lacking the power of locomotion

3. something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant"

4. an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience

4

Page 5: Chapter 20 Part 2

MRD – A Resource for Knowledge-based WSD

• A thesaurus adds:

– An explicit synonymy relation between word meanings

• A semantic network adds:

– Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, etc.

WordNet synsets for the noun“plant” 1. plant, works, industrial plant 2. plant, flora, plant life

WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} member holonym: {Plantae, kingdom Plantae, plant kingdom}

5

Page 6: Chapter 20 Part 2

Lesk Algorithm

• (Michael Lesk 1986): Identify senses of words in context using definition overlap. That is, disambiguate more than one word.

• Algorithm:

– Retrieve from MRD all sense definitions of the words to be disambiguated

– Determine the definition overlap for all possible sense combinations

– Choose senses that lead to highest overlap

Example: disambiguate PINE CONE• PINE

1. kinds of evergreen tree with needle-shaped leaves2. waste away through sorrow or illness

• CONE 1. solid body which narrows to a point2. something of this shape whether solid or hollow3. fruit of certain evergreen trees

Pine#1 Cone#1 = 0Pine#2 Cone#1 = 0Pine#1 Cone#2 = 1Pine#2 Cone#2 = 0Pine#1 Cone#3 = 2Pine#2 Cone#3 = 0

6

Page 7: Chapter 20 Part 2

Lesk Algorithm for More than Two Words?

• I saw a man who is 98 years old and can still walk and tell jokes– nine open class words: see(26), man(11), year(4),

old(8), can(5), still(4), walk(10), tell(8), joke(3)

• 43,929,600 sense combinations! How to find the optimal sense combination?

• Simulated annealing (Cowie, Guthrie, Guthrie 1992)– Let’s review (from CS1571)

7

Page 8: Chapter 20 Part 2

•8

Search Types

– Backtracking state-space search– Local Search and Optimization – Constraint satisfaction search– Adversarial search

Page 9: Chapter 20 Part 2

•9

Local Search

• Use a single current state and move only to neighbors.

• Use little space

• Can find reasonable solutions in large or infinite (continuous) state spaces for which the other algorithms are not suitable

Page 10: Chapter 20 Part 2

•10

Optimization

• Local search is often suitable for optimization problems. Search for best state by optimizing an objective function.

Page 11: Chapter 20 Part 2

•11

Visualization

• States are laid out in a landscape

• Height corresponds to the objective function value

• Move around the landscape to find the highest (or lowest) peak

• Only keep track of the current states and immediate neighbors

Page 12: Chapter 20 Part 2

•12

Simulated Annealing

• Based on a metallurgical metaphor– Start with a temperature set very high and

slowly reduce it.

Page 13: Chapter 20 Part 2

•13

Simulated Annealing

• Annealing: harden metals and glass by heating them to a high temperature and then gradually cooling them

• At the start, make lots of moves and then gradually slow down

Page 14: Chapter 20 Part 2

•14

Simulated Annealing

• More formally…– Generate a random new neighbor from current

state.– If it’s better take it.– If it’s worse then take it with some probability

proportional to the temperature and the delta between the new and old states.

Page 15: Chapter 20 Part 2

•15

Simulated annealing

• Probability of a move decreases with the amount ΔE by which the evaluation is worsened

• A second parameter T is also used to determine the probability: high T allows more worse moves, T close to zero results in few or no bad moves

• Schedule input determines the value of T as a function of the completed cycles

Page 16: Chapter 20 Part 2

•16

function Simulated-Annealing(problem, schedule) returns a solution stateinputs: problem, a problem

schedule, a mapping from time to “temperature”current ← Make-Node(Initial-State[problem])for t ← 1 to ∞ do

T ← schedule[t]if T=0 then return currentnext ← a randomly selected successor of currentΔE ← Value[next] – Value[current]if ΔE > 0 then current ← nextelse current ← next only with probability eΔE/T

Page 17: Chapter 20 Part 2

Intuitions

• the algorithm wanders around during the early parts of the search, hopefully toward a good general region of the state space

• Toward the end, the algorithm does a more focused search, making few bad moves

•17

Page 18: Chapter 20 Part 2

Lesk Algorithm for More than Two Words?

• I saw a man who is 98 years old and can still walk and tell jokes– nine open class words: see(26), man(11), year(4), old(8), can(5), still(4),

walk(10), tell(8), joke(3)

• 43,929,600 sense combinations! How to find the optimal sense combination?

• Simulated annealing (Cowie, Guthrie, Guthrie 1992)• Given: W, set of words we are disambiguating• State: One sense for each word in W• Neighbors of state: the result of changing one word sense• Objective function: value(state)

– Let DWs(state) be the words that appear in the union of the definitions of the senses in state;

– value(state) = sum over words in DWs(state): # times it appears in the union of the definitions of the senses

– The value will be higher, the more words appear in multiple definitions.• Start state: the most frequent sense of each word

18

Page 19: Chapter 20 Part 2

Lesk Algorithm: A Simplified Version

• Original Lesk definition: measure overlap between sense definitions for all words in the text– Identify simultaneously the correct senses for all words in the text

• Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and its context in the text– Identify the correct sense for one word at a time

• Search space significantly reduced (the context in the text is fixed for each word instance)

19

Page 20: Chapter 20 Part 2

Lesk Algorithm: A Simplified Version

Example: disambiguate PINE in

“Pine cones hanging in a tree”

• PINE

1. kinds of evergreen tree with needle-shaped leaves

2. waste away through sorrow or illness

Pine#1 Sentence = 1Pine#2 Sentence = 0

• Algorithm for simplified Lesk:

1.Retrieve from MRD all sense definitions of the word to be disambiguated

2.Determine the overlap between each sense definition and the context of the word in the text

3.Choose the sense that leads to highest overlap

20

Page 21: Chapter 20 Part 2

Selectional Preferences

• A way to constrain the possible meanings of words in a given context

• E.g. “Wash a dish” vs. “Cook a dish” – WASH-OBJECT vs. COOK-FOOD

• Alternative terminology– Selectional Restrictions – Selectional Preferences– Selectional Constraints

21

Page 22: Chapter 20 Part 2

Acquiring Selectional Preferences

• From raw corpora – Frequency counts– Information theory measures

22

Page 23: Chapter 20 Part 2

Preliminaries: Learning Word-to-Word Relations

• An indication of the semantic fit between two words

• 1. Frequency counts (in a parsed corpus)

– Pairs of words connected by a syntactic relations

• 2. Conditional probabilities

– Condition on one of the words

),,( 21 RWWCount

),(

),,( ),|(

2

2121 RWCount

RWWCountRWWP

23

Page 24: Chapter 20 Part 2

Learning Selectional Preferences

• Word-to-class relations (Resnik 1993)

– Quantify the contribution of a semantic class using all the senses subsumed by that class (e.g., the class is an ancestor in WordNet)

)(),|(

log),|(

)(),|(

log),|(

),,(

2

1212

2

1212

21

2CP

RWCPRWCP

CPRWCP

RWCP

RCWA

C

24

P(C2 |W1,R) =Count(W1,C2,R)

Count(W1,R)

Count(W1,C2,R) =Count(W1,W2,R)

W2 ∈C2

Page 25: Chapter 20 Part 2

Using Selectional Preferences for WSD

• Algorithm:– Let N be a noun that stands in relationship R to predicate P. Let s1…sk

be its possible senses.– For i from 1 to k, compute:– Ci = {c |c is an ancestor of si}– Ai = max for c in Ci A(P,c,R)– Ai is the score for sense i. Select the sense with the highest score.

• For example: Letter has 3 senses in WordNet (written message; varsity letter; alphabetic character) and belongs to 19 classes in all.

• Suppose we have predicate “write”. For each sense, calculate a score, by measuring association of “write” & direct object, with each ancestor of that sense.

25