An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic&...
-
Upload
braxton-merithew -
Category
Documents
-
view
218 -
download
2
Transcript of An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic&...
An Intuitive An Intuitive Representation of Human Representation of Human
Languages Languages for Translationfor Translation
Gábor PrószékyGábor PrószékyMorphoLogicMorphoLogic
&&
Faculty of Information Faculty of Information Technology,Technology,
Pázmány UniversityPázmány University
Kalmár WorkshopKalmár WorkshopSzeged, October 1-2, 2003Szeged, October 1-2, 2003
Contents
Some words on Prof. Kalmár’s activity in computational linguistics
Problems of human language description with formal tools
A new representation with patterns Introduction to machine translation
methods Application of patterns to
translation
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Kalmár & languages
Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages”
Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures”
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Linguistic representation problems of the 60’s
Dependency structure Constituent structure X-bar theory:
X’ (P) X (Q) Related structures Using transformations
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Structured symbols
Linguistic categories: atomic symbols
Not enough: subcategorization Semantic features: ± alive, ... Syntactic features: ± countable,
... Rule sets instead of rules ID/LP
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Feature structures
DAGs Unification problems Feature geometry, typed
features LFG, GPSG, HPSG Parsing: CF-skeleton +
features or feature structures only?
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Complexity of NL grammars
RG/FSA: not enough CF/RTN: not enough CS ? 0/ATN: Turing Machine Transformations and
metarules Arguments for and against
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
NL grammar formalisms Competence and performance? Kornai number (left-recursion, center-
embedding, “respectively” construction) Gradually from unrestricted to regular (i) anbn ->a*b* (n is lost!) (ii) anbn ->{ε,ab,aabb,aaabbb} “Finitization” by length No structure in FSA; finite systems,
however, can produce structural output
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Syntax and semantics
Logical representations(e.g. λx.dog(x), λx.run(x))
World-knowledge representations(e.g. IS-A, PART-OF, INSTANCE-OF)
Categorial grammar: early logical representations of syntax (Kalmár)
DCG: interpretation & representation
Rule-to-rule hypothesis
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Conflict handling
Lexicon meets syntax: who is right?
Lexicon: off-line info coming from past experiences
Which is more important in a specific situation?
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Open classes
Open vs. closed classes:that is, features can or cannot be overridden
Proper names, jabbers, folk etymology, loanwords, ...
Grammar of closed classes:minimal grammar
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Finite morphology Finite patterns Finite number of entries Descriptions assigned to
entries Finite & open vs.
infinite & closed Underspecified entries for
guessing
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Finite syntax
“Item and arrangement” (as in morphology)
“Arrangement” describes a rather free constituent-order
Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’ ‘DAN’
Cascades without loop
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
The „plastic box”
John is a boy. ”John” is a noun. Go is a verb. ”Go” is a verb. is a sign. ”” is a sign. is a . � �
(where is a ”plastic box”)�
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Real examples
(a) Unusual use:Go is a verb.POS [np] POS [v]
(b) Metaphor:My car drinks a lot.ANIMATE [+] ANIMATE [-]
(c) Unknown entry:Kalmár is a family name.POS [np]
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Linguistic frames
Psychology: ”Gestalt” Morphological complex
structures treated as frames by humans
Frames in AI: ‘shopping’, ‘walking’, ...
As ‘high-level parsing’ relates to ‘detailed on-line analysis’
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Translation of human languages
old problems (50’s) direct (60’s) interlingual (70’s) transfer (80’s) examples (90’s)
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Patterns: general linguistic Patterns: general linguistic informationinformation in lexicalized formin lexicalized form Short, fully specified patterns are:
lexical entries Longer, fully specified entries are:
multi-word expressions Partially underspecified patterns are:
collocations, phrasal verbs, idioms Totally underspecified patterns are:
linguistic rules Pattern/interpretation pairs:
Translation Description Language
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
The MetaMorpho principlesThe MetaMorpho principles
No single words but contextual expressions (in form of patterns) only
Pattern pairs: input/interpretation structure pairs
Single pass: no separate transfer steps Target structure generation:
by-product of parsing
Jabberwocky
‘Twas brillig, and the slighty tovesDid gyre and gimble in the wabe:All mimsy were the borogroves,And the mone raths outgrabe.
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
‘Twas �, and the � �sDid � and � in the �:All � were the �s,And the � �s �.
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Translation rules for Jabberwocky
‘twas � � volt �, and � �, és � the �s did � a �ok �tak � and � � és � in the � a �ban all � teljesen � � were the �s �k voltak az �ok the �s � a �ok �tek
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
‘Twas �, and the � �s
Did � and � in the �:All � were the �s,And the � �s �.
� volt, és a � �ok�tak és �tek a �ben:teljesen � voltak a �okés a � �ok �tek.
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Translation of Jabberwocky
Dzsebervoki
Brillig volt, és a szlájti tóvokgájertak és gimbeltek a vébben:teljesen mimszik voltak a borogróvokés a món rátok autgrébtek.
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
An intuitive representation...
1. X-bar based structures2. Feature-based descriptions3. Metarules (used off-line) 4. Rule-to-rule principle5. Lexicon should be finite but
open6. Closed classes belong to the
minimal grammar7. Minimal grammar describes
”basically” linguistic elements
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
An intuitive representation...
(cont’d)8. Linguistic constructions can be
described by finite patterns9. A huge & finite description set
is used rather than a limited & infinite grammar
10. In case of conflict, lexical information is either redundant or contradicting to the actual description
11. Known constructions need no real-time analysis (Gestalt, frame)
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
An intuitive representation... (cont’d)
12. ”Broken” frames are analyzed real-time
13. Structural (source/target) pattern pair is assigned to every frame to be translated
14. Target structure is computed while parsing source structure
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation
Kalmár Kalmár Workshop Workshop
2003 2003
Gábor Prószéky:An Intuitive
Representationof Human Languages
for Translation