Paraphrase Extraction

23
Paraphrase Extraction based on „Paraphrase Substitution for Recognizing Textual Entailment“ by Wauter Bosma and Chris Callison-Burch prepared by Teresa Herrmann Seminar „Current Trends in Information Extraction“ PD Günter Neumann

Transcript of Paraphrase Extraction

Page 1: Paraphrase Extraction

Paraphrase Extractionbased on

„Paraphrase Substitution for Recognizing Textual Entailment“by Wauter Bosma and Chris Callison-Burch

prepared by Teresa Herrmann

Seminar „Current Trends in Information Extraction“PD Günter Neumann

Page 2: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 2

Paraphrase

l „alternative way of conveying the same information“

Page 3: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 3

Application for Paraphrase Extraction

l Paraphrase Substitution for Recognizing TextualEntailment

l by Wauter Bosma and Chris Callison-Burchl participation in CLEF 2006 Answer Validation Exercise

Page 4: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 4

Textual Entailment

l Given 2 text passagesl Text and Hypothesisl determine whether the text entails the hypothesis (can be

inferred from it)

Text: Clonaid said, Sunday, that the cloned baby, allegedly bornto an American woman, and her family were going to return to the United States Monday, but where they live and furtherdetails were not released.

Hypothesis: Clonaid announced that mother and daughter wouldbe returning to the US on Monday.

Page 5: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 5

Recognizing Textual Entailment

l Deep and surface analysis of text and hypothesisl different words express same meaningl simple word overlapping techniques don‘t recognize

semantic relationsl detect relations between words

Text: Clonaid said, Sunday, that the cloned baby, allegedly bornto an American woman, and her family were going to return to the United States Monday, but where they live and furtherdetails were not released.

Hypothesis: Clonaid announced that mother and daughter wouldbe returning to the US on Monday.

Page 6: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 6

Recognizing Textual EntailmentTask

l determine textual entailment computationallyl recent research areal linguistic-syntactic techniquesl aligning syntactic treesl word overlap

l otherl logical inferencel background knowledgel paraphrasing

Page 7: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 7

The approachl paraphrasingl more reliable match between text and hypothesis for which

entailment relation is checkedl length of Longest Common Subsequencel criterion for deciding about entailment between passagesl identification of LCSl word matchingl automatic paraphrasing method§ synonymous, but non-identical phrases

l relatively language independentl in contrast to using dependency parsing or other linguistic

resources

Page 8: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 8

Paraphrase Extraction

l automatic generation of paraphrasesl extraction from bilingual corporal Find source phrasel Source phrase aligned with foreign translationsl Find candidate paraphrasesl Ranking of candidates

Page 9: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 9

Paraphrase Extraction (2)

l language independent methodl method multilinguall can be applied to any language for which parallel corpus

existsl presented systeml CLEF 2006 AVE Taskl Europarl corpusl proceedings of the European Parliamentl 11 languages

l languages: German, English, French, Spanish, Italian, Dutch, Portuguese

Page 10: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 10

Finding Paraphrases

l Candidate paraphrasesl identify occurrences of English source phrasesl find corresponding translationl what other English phrases is it translated back

to?

Page 11: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 11

Finding Paraphrases Example

Page 12: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 12

Paraphrase Probability

l often many possible paraphrasesl Ranking of candidatesl calculate paraphrase probability

Page 13: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 13

Paraphrase Probability (2)l include multiple corpora (different languages)

Page 14: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 14

Longest Common Subsequencel measure of similarity between passagesl approximate ratio of information shared in text and

hypothesisl Variant of Longest Common Substringl requires adjacency of words

l Longest Common Subsequencel doesn‘t require adjacency of wordsl relative orderl LCS(Text, Hypothesis) is the longest possible sequence Q

with words in Q also being words in T and H in the same order

Page 15: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 15

Recognizing Entailmentl Entailment Score: LCS(T,H)/|H|l LCS measured after paraphrasing hypothesisl paraphrase of H entailed by T à H entailed by T

l maximize Entailment Scorel iteratively transform H l substitute in paraphrasesà closer to T

l each iteration: substitution that increases scorel stop: no increase

l Decision if entailmentl entailment score > threshold (0.75)

Page 16: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 16

Example

l words in italics = words shared with text sentencesl Entailment Score increased from 43% up to 77%

Page 17: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 17

Results

l system testedl baselinesl 100% YESl pure LCS (word matching)

l Dependency Tree Alignment System

Page 18: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 18

Results (2) l average of languagesl 7 languages for baseline, LCS and LCS/paraphrasingl 3 languages for all 4 systems

Page 19: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 19

System Variant - Treesl system for Dutch, English,

Spanishl tree analysis for texts and

hypothesesl substitution of paraphrasesl computation of Longest

Common Subtreel compared to dependency

tree alignment systeml performance comparatively

well

Page 20: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 20

Conclusionl LCS after paraphrasing outperforms pure LCS and

100% YES baselinel applicable to wide range of languagesl no language specific natural language analysis or

background knowledge neededl paraphrases automatically extracted from bilingual parallel

corporal presented system <-> syntactic-based systeml performance comparatively welll but little overlapl information conveyed rather complementary

Page 21: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 21

Future work

l enhance systeml combination of syntactic-based and paraphrase-

substitution approachesl better threshold determination methods

Page 22: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 22

Referencesl Wauter Bosma and Chris Callison-Burch (2006): Paraphrase

Substitution for Recognizing Textual Entailment. [http://wwwhome.cs.utwente.nl/%7Ebosmaw/files/clef06.pdf]

l Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, Bernardo Magnini and Idan Szpektor (2006): The Second PASCAL Recognising Textual Entailment Challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment.[http://ir-srv.cs.biu.ac.il:64080/RTE2/proceedings/01.pdf]

Page 23: Paraphrase Extraction

Teresa Herrmann Paraphrase Extraction 23

Thank you for your attention

Questions,Comments,

Discussion, ...?