20-22 August 2014 CNL 2014, Galway Embedded -...
Transcript of 20-22 August 2014 CNL 2014, Galway Embedded -...
![Page 1: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/1.jpg)
Aarne Ranta
CNL 2014, Galway 20-22 August 2014
CLT
Embedded Controlled Languages
![Page 2: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/2.jpg)
Joint work withKrasimir Angelov, Björn Bringert, Grégoire Détrez, Ramona Enache, Erik de Graaf, Normunds Gruzitis, Qiao Haiyan, Thomas Hallgren, Prasanth Kolachina, Inari Listenmaa, Peter Ljunglöf, K.V.S. Prasad, Scharolta Siencnik, Shafqat Virk
50+ GF Resource Grammar Library contributors
![Page 3: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/3.jpg)
Embedded programming languagesDSL = Domain Specific Language
Embedded DSL = fragment (library) of a host language+ low implementation effort+ no additional learning if you know the host language+ you can fall back to host language if DSL is not enough
- reasoning about DSL properties more difficult
![Page 4: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/4.jpg)
Timeline
1998: GF = Grammatical Framework2001: RGL = Resource Grammar Library2008: CNL, explicitly2010: MOLTO: CNL-based translation2012: wide-coverage translation2014: embedded CNL translation
![Page 5: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/5.jpg)
Outline
● “CNL is a part of NL”
● CNL embedded in NL
● Example: translation
● Demo: web and mobile app
![Page 6: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/6.jpg)
CNL as a part of NL
It is a part:● it is understandable without extra learning
It is a proper part:● it excludes parts that are not so good● it can be controlled, maybe even defined
![Page 7: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/7.jpg)
How to define and delimit a CNL
How to guarantee that it is a part● the CNL may be formal, the NL certainly isn’t
How to help keep within the limits● so that the user stays within the CNL
![Page 8: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/8.jpg)
Bottom-up vs. top-down CNL
Bottom-up: define CNL rule by rule● nothing is in the CNL unless given by rules● e.g. Attempto Controlled EnglishTop-down: delimit CNL by constraining NL● everything is in the CNL unless blocked by
rules● e.g. Simplified English
![Page 9: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/9.jpg)
Defining and delimiting CNL
Bottom-up: ● How do we know that the rules are valid NL?
Top-down: ● How do we decide what is in the CNL?
![Page 10: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/10.jpg)
Defining bottom-up Message ::= “you have” Number “points”
you have five points
you have one points
![Page 11: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/11.jpg)
Delimiting top-down
Passives must be avoided.
How to recognize them in all contexts? Tenses, questions, infinitives, separate from adjectives...
![Page 12: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/12.jpg)
An answer to both problems
Define CNL formally as a part of NL● use a grammar of the whole NL● bottom-up: rules defined as applications of
NL rules● top-down: constraints written as conditions
on NL trees
![Page 13: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/13.jpg)
The whole NL?An approximation: GF Resource Grammar Library (RGL)● morphology● syntactic structures● lexicon● common syntax API● 29 languages
![Page 14: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/14.jpg)
Bottom-up CNLUse RGL as library● use its API function calls rather than plain strings
HavePoints p n = mkCl p have_V2 (mkNP n point_N)
This generates you have five points, she has one point, etcAlso in other languages
![Page 15: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/15.jpg)
Top-down CNLUse RGL as run-time grammar● use its parser to produce trees● filter trees by pattern matching hasPassive t = case t of
PassVPSlash _ -> return True
_ -> composOp hasPassive t
(Bringert & Ranta, A Pattern for Almost Compositional Operations, JFP 2008)
![Page 16: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/16.jpg)
Top-down CNLUse RGL as run-time grammar● change unwanted input
unPassive t = case t of PredVP np (PassVPSlash vps) -> liftM2 PredVP (unPassive np) (unPassive vps) _ -> composOp unPassive t
Non-CNL input is recognized but corrected.
![Page 17: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/17.jpg)
Embedded bottom-up CNL1. Define CNL as usual, maybe with RGL as library2. Build a module that inherits both CNL and RGL
abstract Embedded = CNL, RGL ** {
cat Start ;
fun UseCNL : CNL_Start -> Start ;
fun UseRGL : RGL_Start -> Start ;
}
![Page 18: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/18.jpg)
Using embedded CNLParsing will try both CNL and RGL.
You can give priority to CNL trees.
The parser is robust (if RGL has enough coverage)
Non-CNL input is not a failure, but can be processed further.
![Page 19: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/19.jpg)
Example: translationWe want to have machine translation that● delivers publication quality in areas where reasonable
effort is invested● degrades gracefully to browsing quality in other areas● shows a clear distinction between these
We do this by using grammars and type-theoretical interlinguas implemented in GF, Grammatical Framework
![Page 20: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/20.jpg)
GF translation app in greyscale
![Page 21: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/21.jpg)
GF translation app in full colour
![Page 22: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/22.jpg)
translation by meaning- correct- idiomatic
translation by syntax- grammatical- often strange- often wrong
translation by chunks- probably ungrammatical- probably wrong
![Page 23: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/23.jpg)
word to word transfer
syntactic transfer
semantic interlingua
The Vauquois triangle
![Page 24: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/24.jpg)
word to word transfer
syntactic transfer
semantic interlingua
The Vauquois triangle
![Page 25: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/25.jpg)
What is it good for?
![Page 26: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/26.jpg)
get an idea
get the grammar right
publish the content
![Page 27: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/27.jpg)
Who is doing it?
![Page 28: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/28.jpg)
Google, Bing, Apertium
GF the last 15 months
GF in MOLTO
![Page 29: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/29.jpg)
What should we work on?
![Page 30: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/30.jpg)
chunks for robustness and speed
syntax for grammaticality
semantics for full quality and speed
All!
![Page 31: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/31.jpg)
We want a system that● can reach perfect quality● has robustness as back-up● tells the user which is which
We “combine GF, Apertium, and Google”
But we do it all in GF!
![Page 32: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/32.jpg)
How to do it?
a brief summary
![Page 33: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/33.jpg)
translator
chunk grammar
resource grammar
CNL grammar
![Page 34: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/34.jpg)
How much work is needed?
![Page 35: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/35.jpg)
translator
chunk grammar
resource grammar
CNL grammars
![Page 36: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/36.jpg)
resource grammar
● morphology● syntax● generic lexiconprecise linguistic knowledgemanual work can’t be escaped
![Page 37: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/37.jpg)
CNL grammars
domain semantics, domain idioms● need domain expertiseuse resource grammar as library● minimize hand-hacking
the work never ends ● we can only cover some domains
![Page 38: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/38.jpg)
chunk grammar
words suitable word sequences● local agreement● local reorderingeasily derived from resource grammareasily variedminimize hand-hacking
![Page 39: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/39.jpg)
translator PGF run-time system● parsing● linearization● disambiguationgeneric for all grammarsportable to different user interfaces● web● mobile
![Page 40: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/40.jpg)
Disambiguation?Grammatical: give priority to green over yellow, yellow over red
Statistical: use a distribution model for grammatical constructs (incl. word senses)
Interactive: for the last mile in the green zone
![Page 41: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/41.jpg)
Advantages of GF
Expressivity: easy to express complex rules● agreement● word order● discontinuityAbstractions: easy to manage complex codeInterlinguality: easy to add new languages
![Page 42: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/42.jpg)
Resources: basic and bigger
Norwegian Danish Afrikaans
Maltese
Romanian Catalan
Polish Estonian
Russian
Latvian Thai Japanese Urdu Punjabi Sindhi
Greek Nepali Persian
English Swedish German Dutch
French Italian Spanish
Bulgarian Finnish
Chinese Hindi
![Page 43: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/43.jpg)
![Page 44: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/44.jpg)
How to do it?
some more details
![Page 45: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/45.jpg)
Translation model: multi-source multi-target compiler
![Page 46: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/46.jpg)
Translation model: multi-source multi-target compiler-decompiler
Abstract Syntax
Hindi
Chinese
Finnish
Swedish
English
Spanish
German
French
Bulgarian Italian
![Page 47: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/47.jpg)
Word alignment: compiler
1 + 2 * 3
00000011 00000100 00000101 01101000 01100000
![Page 48: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/48.jpg)
Abstract syntax
Add : Exp -> Exp -> ExpMul : Exp -> Exp -> ExpE1, E2, E3 : Exp
Add E1 (Mul E2 E3)
![Page 49: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/49.jpg)
Concrete syntax
abstrakt Java JVMAdd x y x “+” y x y “01100000”Mul x y x “*” y x y “01101000”E1 “1” “00000011”E2 “2” “00000100”E3 “3” “00000101”
![Page 50: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/50.jpg)
Compiling natural languageAbstract syntax Pred : NP -> V2 -> NP -> S Mod : AP -> CN -> CN Love : V2Concrete syntax: English Latin Pred s v o s v o s o v Mod a n a n n a Love “love” “amare”
![Page 51: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/51.jpg)
Word alignment
the clever woman loves the handsome man
femina sapiens virum formosum amat
Pred (Def (Mod Clever Woman)) Love (Def (Mod Handsome Man))
![Page 52: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/52.jpg)
Linearization types English Latin CN {s : Number => Str} {s : Number => Case => Str ; g : Gender} AP {s : Str} {s : Gender => Number => Case => Str}
Mod ap cn {s = \\n => ap.s ++ cn.s ! n} {s = \\n,c => cn.s ! n ! c ++ ap.s ! cn.g ! n ! c ; g = cn.g }
![Page 53: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/53.jpg)
Abstract syntax treesmy name is John
HasName I (Name “John”)
![Page 54: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/54.jpg)
Abstract syntax treesmy name is John
HasName I (Name “John”)
Pred (Det (Poss i_NP) name_N)) (NameNP “John”)
![Page 55: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/55.jpg)
Abstract syntax treesmy name is John
HasName I (Name “John”)
Pred (Det (Poss i_NP) name_N)) (NameNP “John”)
[DetChunk (Poss i_NP), NChunk name_N, copulaChunk, NPChunk (NameNP “John”)]
![Page 56: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/56.jpg)
Building the yellow part
![Page 57: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/57.jpg)
Building a basic resource grammar
Programming skillsTheoretical knowledge of language3-6 months work3000-5000 lines of GF code- not easy to automate+ only done once per language
![Page 58: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/58.jpg)
Building a large lexiconMonolingual (morphology + valencies)● extraction from open sources (SALDO etc)● extraction from text (extract)● smart paradigmsMultilingual (mapping from abstract syntax)● extraction from open sources (Wordnet, Wiktionary)● extraction from parallel corpora (Giza++)
Manual quality control at some point needed
![Page 59: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/59.jpg)
Improving the resourcesMultiwords: non-compositional translation● kick the bucket - ta ner skyltenConstructions: multiwords with arguments● i sötaste laget - excessively sweetExtraction from free resources (Konstruktikon)Extraction from phrase tables● example-based grammar writing
![Page 60: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/60.jpg)
Building the green part
![Page 61: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/61.jpg)
Define semantically based abstract syntax fun HasName : Person -> Name -> Fact
Define concrete syntax by mapping to resource grammar structures lin HasName p n = mkCl (possNP p name_N) y my name is John lin HasName p n = mkCl p heta_V2 y jag heter John lin HasName p n = mkCl p (reflV chiamare_V) y (io) mi chiamo John
![Page 62: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/62.jpg)
Resource grammars give crucial help● CNL grammarians need not know linguistics● a substantial grammar can be built in a few
days● adding new languages is a matter of a few
hours
MOLTO’s goal was to make this possible.
![Page 63: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/63.jpg)
Automatic extraction of CNLs?
● abstract syntax from ontologies● concrete syntax from examples
○ including phrase tables
As always, full green quality needs expert verification
● formal methods help (REMU project)
![Page 64: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/64.jpg)
These grammars are a source of● “non-compositional” translations● compile-time transfer● idiomatic language● translating meaning, not syntax
Constructions are the generalized form of this idea, originally domain-specific.
![Page 65: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/65.jpg)
Building the red part
![Page 66: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/66.jpg)
1. Write a grammar that builds sentences from sequences of chunks cat Chunk fun SChunks : [Chunk] -> S
2. Introduce chunks to cover phrases
fun NP_nom_Chunk : NP -> Chunk fun NP_acc_Chunk : NP -> Chunk fun AP_sg_masc_Chunk : AP -> Chunk fun AP_pl_fem_Chunk : AP -> Chunk
![Page 67: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/67.jpg)
Do this for all categories and feature combinations you want to cover.
Include both long and short phrases● long phrases have better quality● short phrases add to robustness
Give long phrases priority by probability settings.
![Page 68: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/68.jpg)
Long chunks are better:
[this yellow house] - [det här gula huset]
[this] [yellow house] - [den här] [gult hus]
[this] [yellow] [house] - [den här] [gul] [hus]
Limiting case: whole sentences as chunks.
![Page 69: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/69.jpg)
Accurate feature distinctions are good, especially between closely related language pairs. god bon buono good gott bonne buona goda bons buoni bonnes buone
Apertium does this for every language pair.
![Page 70: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/70.jpg)
Resource grammar chunks of course come with reordering and internal agreement Prep Det+Fem+Sg N+Fem+Sg A+Fem+Sg dans la maison bleue
im blauen Haus Prep-Det+Neutr+Sg+Dat A+Weak+Dat N+Neutr+Sg
![Page 71: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/71.jpg)
Recall: chunks are just a by-product of the real grammar.
Their size span is
single words <---> entire sentences
A wide-coverage chunking grammar can be built in a couple of hours by using the RGL.
![Page 72: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/72.jpg)
Building the translation system
![Page 73: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/73.jpg)
GF source
![Page 74: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/74.jpg)
GF source
probability model
![Page 75: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/75.jpg)
GF source
probability model
PGF binary
GFcompiler
![Page 76: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/76.jpg)
PGF binaryPGF runtime
system
![Page 77: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/77.jpg)
PGF binaryPGF runtime
system
user interface
![Page 78: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/78.jpg)
PGF binaryPGF runtime
system
user interface
another PGF binary
![Page 79: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/79.jpg)
PGF binaryPGF runtime
system
user interface
another PGF binary
CNL
![Page 80: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/80.jpg)
PGF binaryPGF runtime
system
user interface
another PGF binary
anotherCNL
![Page 81: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/81.jpg)
PGF binaryPGF runtime
system
custom user interface
genericuser interface
PGF runtimesystem
generic grammar
CNL
White: free, open-source. Green: a business idea (Digital Grammars)
![Page 82: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/82.jpg)
User interfaces
command-lineshellweb serverweb applicationsmobile applications
![Page 83: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/83.jpg)
Demos
![Page 84: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/84.jpg)
To test it yourself
Android app
http://www.grammaticalframework.org/demos/app.html
Web app
http://www.grammaticalframework.org/demos/translation.html
![Page 85: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/85.jpg)
Take home
![Page 86: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/86.jpg)
Implementing CNL in GF using RGL● less work and linguistic expertise● multilinguality (29 languages)
Embedding CNL in RGL● robustness● confidence control
On-going effort: translation● CNL as semantic model● contributions wanted to lexicon etc!
Other CNL applications: to do!
![Page 87: 20-22 August 2014 CNL 2014, Galway Embedded - UZHattempto.ifi.uzh.ch/site/cnl2014/slides/ranta.pdf · Aarne Ranta CNL 2014, Galway 20-22 August 2014 CLT Embedded Controlled Languages](https://reader034.fdocuments.in/reader034/viewer/2022042711/5f83510142517c6a3e1c3c35/html5/thumbnails/87.jpg)