Noun Paraphrasing Based on a Variety of Contexts

26
Noun Paraphrasing Based on a Variety of Contexts Tomoyuki Kajiwara and Kazuhide Yamamoto Nagaoka University of Technology, Japan

Transcript of Noun Paraphrasing Based on a Variety of Contexts

Page 1: Noun Paraphrasing Based on a Variety of Contexts

Noun Paraphrasing Based on a Variety of Contexts

Tomoyuki Kajiwara and Kazuhide Yamamoto Nagaoka University of Technology, Japan

Page 2: Noun Paraphrasing Based on a Variety of Contexts

Abstract We propose a method to paraphrase nouns

in consideration of the contexts.

The Characteristic of Our Proposed Method –  It can paraphrase robust without the word frequency.

•  Our “Number of Differences” based method is better than the “Co-occurrence Frequency” based method.

–  It can paraphrase depending on the context.

•  e.g. Reduce the “burdens” on the back. •  NoD : load, stress, damage, exhaustion, tense, etc. •  CoF : cost, expense, actual cost, etc. (money-related)

2

Page 3: Noun Paraphrasing Based on a Variety of Contexts

Teacher

Instructor

Lexical Paraphrasing, Lexical Substitution

The different linguistic representation showing the same meaning.

3

Page 4: Noun Paraphrasing Based on a Variety of Contexts

Application of the Lexical Paraphrasing

•  For Reading Assistance (Lexical Simplification) –  Never judge people by “external” appearance. –  Never judge people by “outside ” appearance.

•  For Machine Translation (pre-editing) – その本なら書類の下にある

It is under the papers if it is the book. – その本 は 書類の下にある

The book is under the papers. ✔

4

Page 5: Noun Paraphrasing Based on a Variety of Contexts

Difficulty of the Lexical Paraphrasing

•  Force someone to shoulder a huge increase in his financial “burdens”. –  Force someone to shoulder a huge increase in his financial “costs”.

–  Force someone to shoulder a huge increase in his financial “loads”.

•  Reduce the “burdens” on the back. –  Reduce the “costs” on the back. –  Reduce the “loads” on the back.

It changes depending on the context whether paraphrasing is possible or impossible.

5

Page 6: Noun Paraphrasing Based on a Variety of Contexts

Input: Look for the “access” to the airport.

Output: Look for the “way” to the airport.

Approach

restaurant market purpose

transfer fee way

bus transportation delivery

look for the *** *** to the airport

1. way 2. transfer 3. fee To sort by the context similarity

6

Page 7: Noun Paraphrasing Based on a Variety of Contexts

Input: Look for the “access” to the airport.

Output: Look for the “way” to the airport.

Approach

restaurant market purpose

transfer fee way

bus transportation delivery

look for the *** *** to the airport

1. way 2. transfer 3. fee To sort by the context similarity

To generate a proper sentence

To select a suitable paraphrase

7

Page 8: Noun Paraphrasing Based on a Variety of Contexts

Proposed Method We propose a method to paraphrase nouns

in consideration of the contexts.

1.  To extract candidate words used in the same context as the input sentence

2.  To calculate the similarity between the “original” and candidate words •  The number of differences of the context in the candidate word.

•  The number of differences of the common context between the “original” and the candidate word.

3.  To select a candidate word with the maximum similarity as the “paraphrase”

“original” → “paraphrase”

8

Page 9: Noun Paraphrasing Based on a Variety of Contexts

Proposed Method We propose a method to paraphrase nouns

in consideration of the contexts.

1.  To extract candidate words used in the same context as the input sentence

2.  To calculate the similarity between the “original” and candidate words •  The number of differences of the context in the candidate word.

•  The number of differences of the common context between the “original” and the candidate word.

3.  To select a candidate word with the maximum similarity as the “paraphrase”

“original” → “paraphrase”

9

Page 10: Noun Paraphrasing Based on a Variety of Contexts

To extract candidate words

•  To extract candidate words used in the same context •  But words used in the completely same context is hardly found

↓ •  On the basis of an object word “access”, an input sentence is divided into a pre- and a post-context.

Look for the “access” to the airport.

look for the *** *** to the airport pre- context

post- context

restaurant transfer market fee purpose way

transfer bus fee transportation way delivery

10

Page 11: Noun Paraphrasing Based on a Variety of Contexts

To extract candidate words Look for the “access” to the airport.

look for the *** *** to the airport pre- context

post- context

restaurant transfer market fee purpose way

transfer bus fee transportation way delivery

•  Words appearing in common may be used in the input sentence

•  We can generate a proper sentence

11

Page 12: Noun Paraphrasing Based on a Variety of Contexts

Proposed Method We propose a method to paraphrase nouns

in consideration of the contexts.

1.  To extract candidate words used in the same context as the input sentence

2.  To calculate the similarity between the “original” and candidate words •  The number of differences of the context in the candidate word.

•  The number of differences of the common context between the “original” and the candidate word.

3.  To select a candidate word with the maximum similarity as the “paraphrase”

“original” → “paraphrase”

12

Page 13: Noun Paraphrasing Based on a Variety of Contexts

To calculate similarity between words

The larger number of differences of the common context between the “original” and the candidate word, the larger paraphrasability. 1 The larger number of differences of the context in the candidate word, the smaller paraphrasability. 2

common(A, B): The number of differences of the common context between A and B difference(A): The number of differences of the context in A TNC: The total number of differences of the context 13

similarity(original,candidate) =

common(original,candidate)× log( TNCdifference(candidate)

)1 2

Page 14: Noun Paraphrasing Based on a Variety of Contexts

tf(w): The number of occurrences of the word df(w): The number of documents occurring the word TND: The total number of documents

common(A, B): The number of differences of the common context difference(A): The number of differences of the context TNC: The total number of differences of the context

tf (word)× log( TNDdf (word)

)

common(original,candidate)× log( TNCdifference(candidate)

)

TF-IDF

14

New Statistics: Number of Occurrences → Number of Differences

Page 15: Noun Paraphrasing Based on a Variety of Contexts

Proposed Method We propose a method to paraphrase nouns

in consideration of the contexts.

1.  To extract candidate words used in the same context as the input sentence

2.  To calculate the similarity between the “original” and candidate words •  The number of differences of the context in the candidate word.

•  The number of differences of the common context between the “original” and the candidate word.

3.  To select a candidate word with the maximum similarity as the “paraphrase”

“original” → “paraphrase”

15

Page 16: Noun Paraphrasing Based on a Variety of Contexts

The characteristic of our proposed method

•  Extraction – We can generate a proper sentence based on the common contexts.

•  Selection – We can select a suitable paraphrase based on the number of differences of the context.

To compare with the co-occurrence frequency and pointwise mutual information experimentally

16

Page 17: Noun Paraphrasing Based on a Variety of Contexts

Comparative Methods •  Marton et al. (2009) Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases.

•  Bhagat and Ravichandran (2008) Large Scale Acquisition of Paraphrases for Learning Surface Patterns.

1.  Both of these methods generate a feature vector from contexts of the target word “original”.

2.  They calculate a cosine similarity between the feature vectors.

3.  They select a word with the maximum similarity as the “paraphrase”.

17

Page 18: Noun Paraphrasing Based on a Variety of Contexts

Comparative Methods •  [Marton 09]: Co-occurrence frequency based method

•  [Bhagat 08]: Pointwise mutual information based method

1.  Both of these methods generate a feature vector from contexts of the target word “original”.

2.  They calculate a cosine similarity between the feature vectors.

3.  They select a word with the maximum similarity as the “paraphrase”.

18

Page 19: Noun Paraphrasing Based on a Variety of Contexts

Experimental setup •  Japanese

–  In this experiment, we paraphrase for Japanese nouns. –  This approach is language-independent.

•  Definition of a context –  We define the content words in the phrase which is dependency to a noun as context.

Look for the “access” to the airport.

19

Page 20: Noun Paraphrasing Based on a Variety of Contexts

Experimental setup •  Web Japanese N-gram: To extract candidate words

–  Japanese word N (1-7) grams. (We use 7-gram as sentence.) –  Each N-gram appears more than 20 times in the Web. –  We use 200 sentences in the following 1.3M sentences.

•  Noun … Noun(paraphrase target) … Verb(original form). * Japanese is SOV language.

•  Kyoto University case frame: To calculate similarity

–  Japanese predicate and Japanese noun pairs from the Web. –  It is contained 34k predicates and 824k nouns. (We use all.) –  We define these predicates as context, and we calculate similarity between these nouns.

20

Page 21: Noun Paraphrasing Based on a Variety of Contexts

Number of paraphrasable nouns to the 1st place of similarity

21

Page 22: Noun Paraphrasing Based on a Variety of Contexts

Number of paraphrasable nouns to the 1st place of similarity

High frequent words (e.g. こと(thing)) have a bad influence.

Postfix words have a bad influence. (e.g. the word that describe the number of items) 22

The proposed method is robust because we don’t depend on the word frequency.

Page 23: Noun Paraphrasing Based on a Variety of Contexts

Relationship by rank of similarity and number of paraphrasable nouns

23

Page 24: Noun Paraphrasing Based on a Variety of Contexts

Relationship by rank of similarity and number of paraphrasable nouns

There are few differences.

24

Many paraphrase appear with rank 1.

Page 25: Noun Paraphrasing Based on a Variety of Contexts

Examples of the paraphrasing in consideration of context

•  Assign a maximum “penalty” of N$. –  Comparative method: imprisonment, pecuniary penalty, etc. –  Our method: paying penalty, administrative penalty, etc.

•  “imprisonment” does not appear as a candidate.

•  Reduce the “burdens” on the back. –  Comparative method: cost, expenses, actual cost, etc.

•  All of which are money-related. •  Any words listed within the top 10 are not appropriate.

–  Our method: load, stress, damage, exhaustion, tense, etc. •  All of which are appropriate paraphrase in the context.

25

Page 26: Noun Paraphrasing Based on a Variety of Contexts

Conclusion We propose a method to paraphrase nouns

in consideration of the contexts.

26

The Characteristic of Our Proposed Method –  It can paraphrase robust without the word frequency.

•  Our “Number of Differences” based method is better than the “Co-occurrence Frequency” based method.

–  It can paraphrase depending on the context.

•  e.g. Reduce the “burdens” on the back. •  NoD : load, stress, damage, exhaustion, tense, etc. •  CoF : cost, expense, actual cost, etc. (money-related)