Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs:...

28

Transcript of Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs:...

Page 1: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns
Page 2: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Bilingual Paraphrase Rules for Noun Compounds:

Concepts and Rules for Exploring Web Language Resources

Mar 25 2004

Kageura, K.*, Yoshikane, F$. and Nozawa, T$.*National Institute of Informatics, Tokyo.

$National Institution for Academic Degrees and University Evaluation, Tokyo.

Page 3: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Outline of Presentation

PurposeWhat is the paraphrase of terms?What is the parallel paraphrase?Parallel paraphrase rules for terms (or noun compounds)Web-based paraphrase explorationsConclusions and Outlook

Page 4: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Purpose

With the advent of neo-cons, the world has become more dangerous than ever.→「ネオコンの出現により」 or 「ネオコンが出現したことで」, the latter sounds “softer”.

The advent of neo-cons made the world…→「ネオコンの出現は」 is good (given that

the overall sentence constructions keepcorrespondences between E and J).

Page 5: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Purpose

Best forms among possible variations at various levels of constructions in translation depend on many factors.Expert translators know them, though not explicitly nor in articulated forms.

Is it possible to describe the knowledge?Pertly yes if we start from describing limitedlocal variation rules and patterns in loose formsfor terms/noun compounds.

Page 6: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Purpose

Parallel variation rules will necessarily be loose and over-generative; cannot be used for making best choice automatically.Rather they will be used for showing range of possible variations from examples.There’s no suitable corpora for this aim.

Web? Yes, though it’s usually the first and most convenient resort for cheap, unarticulated and mediocre run-away from difficulties….

Page 7: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Purpose

On the other hand, multilingual utilisationof Web has not fully been explored, due to non-parallel, non-comparable nature of Web mixed-language world.Use of Web for extracting multilingual paraphrase samples using loose parallel paraphrase rules for terms/noun compounds is a good combination, complementing each other’s limitations.

Page 8: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Paraphrase/Variations of Terms?

Variations: different forms with the samemeaning

Morphological and morpho-syntactic variations of terms in contexts are studied intensively since 1990’s.System of detecting term variations from monolingual texts were developed

Uses paraphrase/variation rules at the level of morpho/syntactic patterns and source terms whose paraphrased forms are to be detected.

Page 9: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrases/Variations

Essentially, variations of focal expressions within the multiple (all potential) sets of texts made from translations and back-translations constitute parallel variations.Nature of parallel variations:

Constructions with finite number of variations are expected to be limited to lexica/phrases.All potential sets of translated and back-translated texts are in no way available.

Page 10: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrases/Variations

Given this limitations, what should we do?There are units which are regarded as having correspondences (roughly) across languages, i.e. words and compounds (and some phrases).There are linguistic concepts which are regarded as having correspondences (roughly) among languages, i.e. part-of-speech, head-modifier, argument….

Page 11: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrases/Variations

Using these, “local” quasi-parallelism among variation patterns can be defined:

Starting from corresponding POS-patterns of noun compounds as an anchoring point;Classify monolingual paraphrase/variation rules for Japanese and English noun compounds;Aligning Japanese and English paraphrase rules by anchoring points and rule patterns, on the basis of correspondences between POS, head-modifier, argument, etc.

Page 12: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrases/Variations

An example: term verbalisation ruleJapanese Rule: X NS → XをNSする

e.g. 概念学習→概念を学習するEnglish Rule: N1 N2 → V2 {ART?} N1

e.g. concept learning → lean conceptsNS ⇔N2/V2, X⇔N, verbalise⇔verbalise, etc.

⇒ so the above J & E rules are parallel.“lean concepts” corresponds to 概念を学習するas variations of “concept learning”=概念学習.

Page 13: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrases/Variations

This only gives “local” and “internal” rule correspondences.Used only when anchoring points are instantiated by actual lexical items, but parallel instantiation is not guaranteed for individual lexical items/compounds.It still can be used for:

Exploring full notion of “parallel paraphrases”Looking up relevant corresponding variations.

Page 14: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

We used English and Japanese term variation rules developed by Jacquemin(1999) and Yoshikane (2003).Assumed to use parallel rules for variation detection using Fastr (Jacquemin 1999).Made POS correspondences on the basis of Treetagger (E) and ChaSen (J).Define some lexical correspondences, e.g. “of” and の.

Page 15: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Three major paraphrase types:Major category shift: paraphrases that change the grammatical category of original compounds, e.g. “concept classification”->“classify concepts”;Head swap: paraphrases that change/swap the head elements, e.g. “memory sharing”->”shared memory”;Internal variants: paraphrases that retain the head and overall category, e.g. “concept classification”->”classification of concept”.

Page 16: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Major category shift (2 subtypes):Argument-Verb: e.g. “system implementation” to “to implement (a) system”.J: X1 NS1 → X1 を NS1 VSE: N1 N2 → V2 ART? N1 (root(N2)=root(V2))Modification-Verb: e.g. “ambiguous classification” to “to classify ambiguously”.J: NA1 NS2 → NA1 S4 NS2 VSE: A1 N2 → ADV1 V2 (root(A1)=root(ADV1))

Page 17: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Head swap:e.g. “added material” to “material addition”;J: NS1 NX2 → NX2 の NS1E: V1 N2 → N2 V1

Page 18: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Internal variations (3 major subtypes):Functional operations;Content-word operations;Morphological operations.

Functional operations:e.g. “job amount” to “amount of jobs”.J: NX1 NX2 → NX1 の NX2.E: N1 N2 → N2 “of” N1.

Page 19: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Content-word operations:Modifications: e.g. “big cat” to “big noisy cat”.J: NX1 NX2 → NX1 {NX TPX?}+ NX2.E: N1 N2 → N1 {A|N|V}+ N2.Coordinations: e.g. “word class” to “word and concept class”.J: NX1 NX2 → NX1 C NX S NX2.E: N1 N2 → N1 C N N2.

Page 20: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Morphological operations:N to N: “word class” to “word classification”;N to V: “index grammar” to “indexed grammar”;V to N: “indexed grammar” to “index grammar”;N to A: “category grammar” to “categorialgrammar”;

A to N: “categorial gram.” to “category gram.”;A to A: “syntactic information” to “syntactical information”.

Page 21: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Coverage of rules for lexical pairs:Actual Japanese and English corresponding lexical items do not always take POS-patterns provided by parallel paraphrase rules.So it is useful to observe that, given a set of actual bilingual vocabulary, to what extent the parallel paraphrase rules can be invoked.We checked this on the different word basis, using 19,532 entries of bilingual terminological list.

Page 22: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

45.4%8870Morphological Operations53.4%10433Coordinations53.0%10361Modifications53.4%10433Content-word operations55.3%10792Functional operations63.5%12405Internal Variants31.4%6141Head Swap32.1%6279Modification-Verb

6.7%1316Argument-Verb32.3%6312Major Category Shift

Page 23: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Parallel Paraphrase Rules

Coverage of rules for lexical pairs:7090 (36.3%) of 19,532 complex terms listed in the bilingual terminology do not have any parallel paraphrase rules that can be applied to them.The number of term pairs to which neither Japanese nor English rules can be applied is 558 (2.9%).

Page 24: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Web-based Paraphrase Explorations

All in all, roughly 100 Japanese rules and 70 English rules are established and linked as parallel rules.They are currently under review and re-examination, on the basis of analytical and empirical obserbations.Part of these rules are implemented on the Web, to explore Web spaces.

Page 25: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Web-based Paraphrase Explorations

Japanese or English Complex Terms

Dictionary Lookup

Japanese & English Complex Terms

Web Search by Constituent Elements

Fastr

( )

Japanese Pages

English Pages

Output: Corresponding Forms of Variants

Parallel Rules for Variants

Page 26: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Web-based Paraphrase Explorations

The experimental system for Web-based paraphrase detection is currently run at:http://svrrd2.niad.ac.jp/faculty/nozawa/VSearch/in

dex.html

Page 27: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Summary

Introduced the concept of parallel paraphrases within translation context.Define local monolingual paraphrase rules for terms and parallelise them in terms of correspondences.Implemented the rules using Fastr and experimentally run on the Web.

Page 28: Parallel Bilingual Paraphrase...Parallel Paraphrase Rules zCoverage of rules for lexical pairs: {Actual Japanese and English corresponding lexical items do not always take POS-patterns

Outlook

Multilingual expansion, especially to French, Spanish and Asian languages.Focusing on some paraphrase patterns and observe contextual factors to understand parallel paraphrases, e.g.:

Due to the irresponsibility of people ->「国民の無責任さにより」「国民が無責任なために」

Activities of election observation -> 『選挙監視活動」「選挙の監視活動」「選挙監視の活動」