Post on 12-Jun-2018
Machine Translation– Classical and Statistical Approaches
Session 4: Interlingua-based MTJonas Kuhn
Universität des Saarlandes, SaarbrückenThe University of Texas at Austin
jonask@coli.uni-sb.de
DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005Jonas Kuhn: MT 2
Session 4: Interlingua-based MT
Dorr (1992, 1994): UNITRAN systemClassification of divergencesLexical Conceptual StructureTranslation mappings between syntactic structure and LCS representationsLanguage-specific exceptions to translation mappings
Jonas Kuhn: MT 3
UNITRANTranslation between Spanish, English and German (bidirectionally)
Jonas Kuhn: MT 4
Translation divergences(1) Thematic divergence:E: I like Mary S: Maria me gusta a mi
'Mary pleases me'(2) Promotional divergence:E: John usually goes home S: Juan suele ira casa
'John tends to go home'(3) Demotional divergence:E: I like eating G: Ich esse gern
'I eat likingly'(4) Structural divergence:E: John entered the house S: Juan entró en la casa
'John entered in the house'
Jonas Kuhn: MT 5
Translation divergences(5) Conflational divergence:E: I stabbed John S: Yo le di puñaladas a Juan
'I gave knife-wounds to John'(6) Categorial divergence:E: I am hungry G: Ich habe Hunger
'I have hunger'(7) Lexical divergence:E: John broke into the room
S: Juan forzó la entrada al cuarto'John forced (the) entry to the room'
Jonas Kuhn: MT 6
Lexical Conceptual Structure
Following Jackendoff (1983, 1990)
Example:English: Bill went into the houseLCS: GO(BILL,TO(IN(HOUSE)))Spanish: Bill entró a la casa.
Jonas Kuhn: MT 7
LCS – DefinitionsDefinition 1 (Dorr 1994)
A lexical conceptual structure (LCS) is a modified version of the representation proposed by Jackendoff (1983, 1990) that conforms to the following structural form:
This corresponds to the tree-like representation shown in Figure 2, in which (1) X' is the logical head; (2) W' is the logical subject; (3) Z'1 ... Z'n are the logical arguments; and (4) Q'1 ... Q'n are the logical modifiers.
Figure 2:In addition, T(φ) is the logical type (Event, State, Path, Position, etc.) corresponding to the primitive φ (CAUSE, LET, GO, STAY, BE, etc.);
Primitives are further categorized into fields (e.g., Possessional, Identificational, Temporal, Locational, etc.).
Jonas Kuhn: MT 8
LCS – Definitions
Example 1John went happily to school [Event GOLoc
([Thing JOHN], [Path TOLoc ([Position ATLoc ([Thing JOHN], [Location SCHOOL])])][Manner HAPPILY])]
Logical Head
Logical Subject
Logical Argument Logical
Modifier
Jonas Kuhn: MT 9
LCS – Definitions
Types and primitives:
Jonas Kuhn: MT 10
LCS – DefinitionsPrimitives must adhere to constraints on argument structure
Spatial dimension
Causal dimension
Jonas Kuhn: MT 11
LCS – DefinitionsField dimension (specialization of primitive stating undrewhich domain it is interpreted – e.g., GOLoc vs. GOTemp)
Footnote 14: Technically the second argument for each of these fields is a Path or a Position. For the purposes of the current description the column under “Argument 2” refers to the lowest leaf node embedded inside of the second argument.
Jonas Kuhn: MT 12
LCS – DefinitionsLCS representation in the lexicon and as the interlingua representation
Definition 2 (Dorr 1994)A RLCS (i.e., a root LCS) is an uninstantiated LCS that is associated with a word definition in the lexicon (i.e., a LCS with unfilled variable positions).
Definition 3 (Dorr 1994)A CLCS (i.e., a composed LCS) is an instantiated LCS that is the result of combining two or more RLCSs by means of unification (roughly). This is the interlingua, or language-independent, form that serves as the pivot between the source and target languages.
Jonas Kuhn: MT 13
LCS – DefinitionsExamples of RLCSs and CLCSs:
RLCS associated with the word go:[Event GOLoc ([Thing X], [Path TOLoc ([Position ATLoc ([Thing X], [Location Z])])])]
CLCS: composition of RLCSs for go, John, school, and happily leads to the LCS seen previously (using a concept of “unification”)
Jonas Kuhn: MT 14
Composition of LCSs
Notion of “Unification” differs from standard unification
Not directly invertibleMore “relaxed” notion (for words associated with special parameters like :INT, :EXT, :PROMOTE etc.)
Jonas Kuhn: MT 15
Composition of LCSsComposition based on syntactic parse (following the GB framework(Government-and-Binding theory))
Definition 4 (Dorr 1994)A syntactic phrase is a maximal projection that conforms to the following structural form:
Syntactic Head
External Argument
Internal Arguments
Syntactic Adjuncts
Syntactic Adjuncts
Jonas Kuhn: MT 16
Composition of LCSs
ExampleJohn went happily to school
Syntactic Head
External Argument
Internal Argument
Syntactic Adjunct
Jonas Kuhn: MT 17
The translation mappings
Generalized linking routine (GLR)
Canonical syntactic realization (CSR)
Jonas Kuhn: MT 18
The translation mappings
Generalized linking routine (GLR)
Simplified schema:
X: Syntactic Head
W: External Argument
Z: Internal Argument
Q: Syntactic Adjunct
X’: Logical Head
W’: Logical Subject
Z’: Logical Argument
Q’: Logical Modifier
Jonas Kuhn: MT 19
The translation mappings
X: Syntactic Head
W: External Argument
Z: Internal Argument
Q: Syntactic Adjunct
X’: Logical Head
W’: Logical Subject
Z’: Logical Argument
Q’: Logical Modifier
Generalized linking routine (GLR)
Example
Jonas Kuhn: MT 20
The translation mappings
Canonical syntactic realization (CSR)
Jonas Kuhn: MT 21
The Divergence Problem
There can be (language-specific) exceptions to the GLR and/or the CSRTranslation divergences occur when such exceptions occur in one language, but not in the other
Formal classification of lexical-semantic divergences
Jonas Kuhn: MT 22
Addressing the Divergence Problem
Parameters for encoding language-specific information
GLR, CSR: language independentParameters: language-specific information about lexical items
Seven parameters::INT:EXT:PROMOTE:DEMOTE*:CAT:CONFLATED
Jonas Kuhn: MT 23
Thematic Divergence
E: I like Mary S: Maria me gusta a mi'Mary pleases me'
Arises only where there is a logical subject
Jonas Kuhn: MT 24
Thematic Divergence
Encoded with the :INT and :EXT parameters
Jonas Kuhn: MT 25
Thematic Divergence
Translation mapping for
English relies on
GLR defaults
Jonas Kuhn: MT 26
Parameter markings
Parameter markers such as :INT and :EXT show up only in the RLCS (for lexicon entries)The CLCS does not include such markers, it is a language-independent representation
Jonas Kuhn: MT 27
Promotional Divergence
E: John usually goes home S: Juan suele ira casa'John tends to go home‘
Logical ModifierLogical Head
Logical ArgumentLogical HeadJonas Kuhn: MT 28
Promotional Divergence
Jonas Kuhn: MT 29
Promotional Divergence
Jonas Kuhn: MT 30
Demotional Divergence
E: I like eating G: Ich esse gern'I eat likingly'
Jonas Kuhn: MT 31
Demotional Divergence
:DEMOTE parameter:logical head and logical argument swap places
Jonas Kuhn: MT 32
Demotional Divergence
Jonas Kuhn: MT 33
Divergence Types
The difference between promotional and demotional divergences
In promotional divergences (e.g., soler-usually), the verb (soler) triggers the head switching, no matter what event is substituted as its argumentIn demotional divergences (e.g., like-gern), the adverbial satellite (gern) is the trigger
Jonas Kuhn: MT 34
Structural Divergence
E: John entered the house S: Juan entró en la casa'John entered in the house'
In structural divergence it is not the positions in the GLR mapping that are altered, but the nature of the relation betweenthe different positions
Jonas Kuhn: MT 35
Structural Divergence
Jonas Kuhn: MT 36
Conflational Divergence
E: I stabbed John S: Yo le di puñaladas a Juan'I gave knife-wounds to John‘
Logical Argument; suppressed in English
Jonas Kuhn: MT 37
Conflational Divergence
Not realized syntactically
Jonas Kuhn: MT 38
Conflational Divergence
Jonas Kuhn: MT 39
Divergence Types
(1) Thematic divergence(2) Promotional divergence(3) Demotional divergence(4) Structural divergence(5) Conflational divergence(6) Categorial divergence(7) Lexical divergence
Default Operationof GLR is changed
Default Operationof CSR is changed
Jonas Kuhn: MT 40
Categorial Divergence
E: I am hungry G: Ich habe Hunger'I have hunger'
Jonas Kuhn: MT 41
Categorial Divergence
Jonas Kuhn: MT 42
Lexical Divergence
Arises only in the context of other divergence typesChoice of lexical items in any languge relies on the realization and composition properties of those itemsSince the various other divergences alter these properties, lexical divergence is viewed as a side effect of other divergences
No specific override markers used
Jonas Kuhn: MT 43
Lexical Divergence
E: John broke into the room S: Juan forzó la entrada al cuarto
'John forced (the) entry to the room‘Conflational divergence forces the occurrence of a lexical divergence
Jonas Kuhn: MT 44
Lexical Divergence
“break into”subsumes two concepts