I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition –...

46
I256 Applied Natural Language Processing Fall 2009 Lecture 8 • Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara Rosario
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition –...

Page 1: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

I256

Applied Natural Language Processing

Fall 2009

Lecture 8

• Words– Lexical acquisition– Collocations– Similarity– Selectional preferences

Barbara Rosario

Page 2: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

2

Lexical acquisition

• Develop algorithms and statistical techniques for filling the holes in existing dictionaries and lexical resources by looking at the occurrences of patterns of words in large text corpora– Collocations– Semantic similarity– Logical metonymy– Selectional preferences

Page 3: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

3

The limits of hand-encoded lexical resources

• Manual construction of lexical resources is very costly

• Because language keeps changing, these resources have to be continuously updated

• Quantitative information (e.g., frequencies, counts) has to be computed automatically anyway

Page 4: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

4

The coverage problem

From CS 224N / Ling 280, Stanford, Manning

Page 5: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

5

Lexical acquisition

• Examples:– “insulin” and “progesterone” are in WordNet 2.1 but

“leptin” and “pregnenolone” are not. – “HTML” and “SGML”, but not “XML” or “XHTML”. – “Google” and “Yahoo”, but not “Microsoft” or “IBM”.

• We need some notion of word similarity to know where to locate a new word in a lexical resource

Page 6: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

6

Lexical acquisition

• Lexical acquisition problems– Collocations– Semantic similarity– Logical metonymy– Selectional preferences

Page 7: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

7

Collocations

• A collocation is an expression consisting of two or more words that correspond to some conventional way of saying things– Noun phrases: weapons of mass destruction, stiff breeze (but

why not *stiff wind?)– Verbal phrases: to make up– Not necessarily contiguous: knock…. door

• Limited compositionality– Compositional if meaning of expression can be predicted by the

meaning of the parts– Idioms are most extreme examples of non-compositionality

• Kick the bucket– In collocations there is an element of meaning added to the

combination (i.e. the exact meaning cannot be derived directly form its components)

• White hair, white wine, white woman

Page 8: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

8

Collocations

• Non Substitutability– Cannot substitute words in a collocation

• *yellow wine

• Non modifiability– To get a frog in one’s throat

• *To get an ugly frog in one’s throat

• Useful for– Language generation

• *Powerful tea, *take a decision– Machine translation

• Easy way to test if a combination is a collocation is to translate it into another language

– Make a decision *faire une decision (prendre), *fare una decisione (prendere)

Page 9: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

9

Subclasses of collocations

• Light verbs– Make a decision, do a favor

• Phrasal verbs– To tell off, make up

• Proper names– San Francisco, New York

• Terminological expressions– Hydraulic oil filter

• This is compositional, but need to make sure, for example that it’s always translated the same

Page 10: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

10

Finding collocations• Frequency

– If two words occur together a lot, that may be evidence that they have a special function

– But if we sort by frequency of pairs C(w1, w2), then “of the” is the most frequent pair

– Filter by POS patterns – A N (linear function), N N (regression coefficients) etc..

• Mean and variance of the distance of the words• For not contiguous collocations

– She knocked at his door (d = 2)– A man knocked on the metal front door (d = 4)

– Hypothesis testing (see page 162 Stat NLP)• How do we know it’s really a collocation?• Low mean distance can be accidental (new company)• We need to know whether two words occur together by chance

or not (because they are a collocation)– Hypothesis testing

Page 11: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

11

Finding collocations

• Mutual information measure – A measure of how much a word tells us about the

other, i.e. the reduction in uncertainty of one word due to knowing about another

• 0 when the two words are independent

• (see Stat NLP page 66 and178)

))log),

p(yp(x

p(x,y)yI(x

Page 12: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

12

Lexical acquisition

• Lexical acquisition problems– Collocations– Semantic similarity– Logical metonymy– Selectional preferences

Page 13: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

13

Lexical and semantic similarity• Lexical and distributional notions of meaning similarity• How can we work out how similar in meaning words are?• What is it useful for?

– IR– Generalization

• Semantically similar words behave similarly– QA, inference…

• We could use anything in the thesaurus – Meronymy – Example sentences/definitions – In practice, by “thesaurus-based” we usually just mean using the

is-a/subsumption/hypernym hierarchy • Word similarity versus word relatedness

– Similar words are near-synonyms – Related could be related any way

• Car, gasoline: related, not similar• Doctor nurse fever: related (topic) • Car, bicycle: similar

Page 14: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

14

Semantic similarity

• Similar if contextually interchangeable– The degree for which one word can be

substituted for another in a given context• Suit similar to litigation (but only in the legal

context)

• Measures of similarity– WordNet-based– Vector-based– Detecting hyponymy and other relations

Page 15: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

15

WordNet: Semantic Similarity

• Whale is very specific (and baleen whale even more so), while vertebrate is more general and entity is completely general. We can quantify this concept of generality by looking up the depth of each synset:

Page 16: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

16

WordNet: Semantic Similarity

• Path_similarity: Two words are similar if nearby in thesaurus hierarchy (i.e. short path between them) – path_similarity assigns a score in the range 0–1 based on the shortest

path that connects the concepts in the hypernym hierarchy

• The numbers don’t mean much, but they decrease as we move away from the semantic space of sea creatures to inanimate objects.

Page 17: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

17

WordNet: Path Similarity

From CS 224N / Ling 280, Stanford, Manning

Page 18: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

18

WordNet: Path Similarity

• Problem with path similarity– Assumes each link represents a uniform

distance – Instead: – Want a metric which lets us represent the cost

of each edge independently – There have been a whole slew of methods

that augment thesaurus with notions from a corpus (Resnik, Lin, …)

From CS 224N / Ling 280, Stanford, Manning

Page 19: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

19

Vector-based lexical semantics

• Very old idea: the meaning of a word can be specified in terms of the values of certain `features’ (`COMPONENTIAL SEMANTICS’) – dog : ANIMATE= +, EAT=MEAT, SOCIAL=+ – horse : ANIMATE= +, EAT=GRASS, SOCIAL=+ – cat : ANIMATE= +, EAT=MEAT, SOCIAL=-

• Similarity / relatedness: proximity in feature space

From CS 224N / Ling 280, Stanford, Manning

Page 20: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

20

Vector-based lexical semantics

From CS 224N / Ling 280, Stanford, Manning

Page 21: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

21

General characterization of vector-based semantics

• Vectors as models of concepts • The CLUSTERING approach to lexical

semantics: 1. Define properties one cares about, and give values to each

property (generally, numerical) 2. Create a vector of length n for each item to be classified 3. Viewing the n-dimensional vector as a point in n-space,

cluster points that are near one another

• What changes between models: 1. The properties used in the vector 2. The distance metric used to decide if two points are `close’ 3. The algorithm used to cluster

From CS 224N / Ling 280, Stanford, Manning

Page 22: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

22

Distributional Similarity: Using words as features in a vector-based semantics• The old decompositional semantic approach requires

– i. Specifying the features – ii. Characterizing the value of these features for each lexeme

• Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry – Intuition: “You shall know a word by the company it keeps.” (J.

R. Firth) • More specifically, you can use as `values’ of these

features – The FREQUENCIES with which these words occur near the

words whose meaning we are defining – Or perhaps the PROBABILITIES that these words occur next to

each other • Some psychological results support this view.

From CS 224N / Ling 280, Stanford, Manning

Page 23: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

23

Using neighboring words to specify the meaning of words

• Take, e.g., the following corpus: – John ate a banana.

– John ate an apple.

– John drove a lorry.

• We can extract the following co-occurrence matrix:

Page 24: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

24

Acquiring lexical vectors from a corpus

• To construct vectors C(w) for each word w: 1. Scan a text 2. Whenever a word w is encountered, increment all cells of C(w)

corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size

• Differences among methods: – Size of window – Weighted or not – Whether every word in the vocabulary counts as a dimension

(including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only).

– The words chosen as dimensions are often called CONTEXT WORDS

– (Whether dimensionality reduction methods are applied)

From CS 224N / Ling 280, Stanford, Manning

Page 25: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

25

Variant: using only modifiers to specify the meaning of words

From CS 224N / Ling 280, Stanford, Manning

Page 26: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

26

The CLUSTERING approach to lexical semantics

– Create a vector of length n for each item to be classified • Viewing the n-dimensional vector as a point in n-

space, cluster points that are near one another

– Define a similarity measure (the distance metric used to decide if two points are `close’)• For example:

– (Eventually) clustering algorithm

From CS 224N / Ling 280, Stanford, Manning

Page 27: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

27

The HAL model

• Burges and Lund (95, 98)– A 160 million words corpus of articles extracted from all

newsgroups containing English dialogue – Context words: the 70,000 most frequently occurring

symbols within the corpus – Window size: 10 words to the left and the right of the

word – Measure of similarity: cosine – Frightened: scared, upset, shy, embarrassed, anxious,

worried, afraid – Harmed: abused, forced, treated, discriminated, allowed,

attracted, taught – Beatles: original, band, song, movie, album, songs, lyrics,

British

From CS 224N / Ling 280, Stanford, Manning

Page 28: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

28

Latent Semantic Analysis

• Landauer at al (97, 98)– Goal: extract expected contextual usage from

passages – Steps:

• Build a word / document co-occurrence matrix • `Weight’ each cell (e.g., tf.idf) • Perform a DIMENSIONALITY REDUCTION

– Argued to correlate well with humans on a number of tests

From CS 224N / Ling 280, Stanford, Manning

Page 29: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

29

Detecting Hyponymy and other relations with patterns

• Goal: discover new hyponyms, and add them to a taxonomy under the appropriate hypernym– Agar is a substance prepared from a mixture of

red algae, such as Gelidium, for laboratory or industrial use.

– What does Gelidium mean? How do you know?

Page 30: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

30

Hearst approach

• Hearst hand-built patterns:

From CS 224N / Ling 280, Stanford, Manning

Page 31: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

31

Trained algorithm to discover patterns

• Snow, Jurafsky, Ng (05)• Collect noun pairs from corpora

– (752,311 pairs from 6 million words of newswire)

• Identify each pair as positive or negative example of hypernym/hyponym relationship – (14,387 yes, 737,924 no)

• Parse the sentences, extract patterns (lexical and parses-paths)

• Train a hypernym classifier on these patterns

From CS 224N / Ling 280, Stanford, Manning

Page 32: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

32

From CS 224N / Ling 280, Stanford, Manning

Page 33: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

33

Evaluation: precision and recall

• Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness.

• Used in information retrieval– A perfect Precision score of 1.0 means that every result

retrieved by a search was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect Recall score of 1.0 means that all relevant documents were retrieved by the search (but says nothing about how many irrelevant documents were also retrieved).

– Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved

– Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents (which should have been retrieved).

Page 34: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

34

Evaluation: precision and recall• Classification context

• A perfect Precision score of 1.0 for a class C means that every item labeled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labeled correctly)

• A perfect Recall of 1.0 means that every item from class C was labeled as belonging to class C (but says nothing about how many other items were incorrectly also labeled as belonging to class C).

Page 35: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

35

Precision and recall: trade-off

• Often, there is an inverse relationship between Precision and Recall, where it is possible to increase one at the cost of reducing the other.

• For example, an search engine can increase its Recall by retrieving more documents, at the cost of increasing number of irrelevant documents retrieved (decreasing Precision).

• Similarly, a classification system for deciding whether or not, say, a fruit is an orange, can achieve high Precision by only classifying fruits with the exact right shape and color as oranges, but at the cost of low Recall due to the number of false negatives from oranges that did not quite match the specification.

Page 36: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

36

From CS 224N / Ling 280, Stanford, Manning  

Page 37: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

37

Lexical acquisition

• Lexical acquisition problems– Collocations– Semantic similarity– Logical metonymy– Selectional preferences

Page 38: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

38

Other lexical semantics tasks• Metonymy is a figure of speech in which a thing

or concept is not called by its own name, but by the name of something intimately associated with that thing or concept. – Examples:

• Logical Metonymy– enjoy the book means enjoy reading the book, and

easy problem means a problem that is early to solve.

Page 39: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

39

Other lexical semantics tasks

From CS 224N / Ling 280, Stanford, Manning

Page 40: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

40

From CS 224N / Ling 280, Stanford, Manning

Page 41: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

41

From CS 224N / Ling 280, Stanford, Manning

Page 42: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

42

Lexical acquisition

• Lexical acquisition problems– Collocations– Semantic similarity– Logical metonymy– Selectional preferences

Page 43: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

43

Selectional preferences

• Most verbs prefer arguments of a particular type: selectional preferences or restrictions– Objects of eat tend to be food, subjects of

think tend to be people etc..– “Preferences” to allow for metaphors

• Feat eats the soul

• Why is it important for NLP?

Page 44: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

44

Selectional preferences

• Why Important?– To infer meaning from selectional restrictions

• Suppose we don’t know the words durian (not in the vocabulary)

• Susan ate a very fresh durian• Infer that durian is a type of food

– Ranking the possible parses of a sentence• Give higher scores to parses where the verbs has

‘natural argument”

Page 45: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

45

Model of selectional preferences

• Resnick, 93 (see page 288 Stat NLP)• Two main concepts1. Selectional preference strength

– How strongly the verb constrains its direct object• Eat, find, see

2. Selectional association between the verb and the object semantic class

• Eat and food

• The higher 1 and 2 the less important is to have an object (i.e. the more likely is to have the implicit object construction)

• Bo ate, but *Bo saw

Page 46: I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.

46

Next class

• Next time: review

• Classification

• Project ideas (likely on October 6)

• Two more assignments (most likely)

• Project proposals (1-2 pages description)

• Projects