CARMA - Constructional Analyzer using Recursively Multiple ... · CARMA Constructional Analyzer...
Transcript of CARMA - Constructional Analyzer using Recursively Multiple ... · CARMA Constructional Analyzer...
CARMA
Constructional Analyzer using Recursively Multiple AVMs
Ely Edison [email protected]
September 6, 2018
FrameNet Brasil Project - UFJF
Table of contents
1. Introduction
2. Premises
3. Computational Processing
4. Limitations and Outlook
1
Introduction
Context
FNBr is working on NLU (Natural Language Understanding) projects.
FNBr approach to NLU comprises three main elements:
1. Linguistic Knowledge: Lexicon, Constructions, GF, POS, Roles,Syntax, etc.
2. World Knowledge: Ontologies and external datasets
3. Situational Context: Frames and Frame Elements
NLU processes must use linguistic knowledge cognitively to get anapproximated shape of a world knowledge in a given situationalcontext.
2
CARMA
CARMA1 is a constructional analyzer: given a raw sentence, it tries toidentify the constructions in the sentence.
If these constructions evoke a frame, it helps to identify the SituationalContext.
1this is a new version of [4]
3
Why constructional analysis?
o celular quebrou a telaDET NOUN VERB DET NOUNThe cellphone break.PST the screen
det nsubj det
obj
’The screen in cellphone broke’/Cxn Split_object
o menino quebrou a cadeiraDET NOUN VERB DET NOUNThe boy break.PST the chair
det nsubj det
obj
’The boy broke the chair’/Cxn Transitive_action
4
Resources
CARMA is using 4 different resources:
1. FNBr framenet• the network of frames and LUs, including all lexicon stuff (words,
lexemes, lemmas,..)
2. FNBr constructicon• the network of constructions
3. FNBr ontology• a Generative Lexicon based ontology defining extended qualia
relations between LUs (based on SIMPLE ontology[1])
4. UD parser• to get the syntactic structure of sentence using UD POS and
relations.
5
Premises
AVM
Figure 1: AVM structure
6
AVM
Figure 2: Everything as AVM
! Recursive AVMs: the value can be another AVM
7
Constraints
CARMA is a constraint-based system: the AVM attributes must berestricted by a (set of) possible/acceptable value(s)
Constructions are defined by constraining construction elements todependency relations, as proposed by Property Grammar [2]
8
Construction definition
cxn_split_object:type: cxnclass: cxn_split_objectregion: cxn_split_objectattributes:
nsubj:features: {optional: false, head: false}value: [ud_nsubj]
verb:features: {optional: false, head: true}value: [pos_verb]
obj:features: {optional: false, head: false}value: [ud_obj]
x_part:type: xevalue: [rel_is_part_of]
x_frame:type: xevalue: [frm_undergoing]
constraints:- {arg1: verb, constraint: dominance, arg2: nsubj}- {arg1: verb, constraint: dominance, arg2: obj}- {arg1: nsubj, arg2: x_part, constraint: hasword}- {arg1: obj, arg2: x_part, constraint: hasword}
9
Computational Processing
Topology
CARMA
is a recursive hierarchical network
and an elaborated pattern-matching system
So, it is amenable to some Machine Learning techniques
10
RCN
RCN: Recursive Cortical Network[3]
Figure 3: Overview of RCN (source:[3]) 11
RCN
? RCN resembles AVM
Figure 4: Detail of of RCN (source:[3])
12
RCN Processing
RCN can be used for generation and inference (parsing)
Inference
• Belief propagation
• Forward-pass
• Backward-pass
13
CARMA processing
In CARMA we are interested in the parsing process
Resources are stored in some persistent medium
• Lexicon on FNBr database (MySQL)
• Frames, Constructions, Ontology exported to Neo4j graph database
14
CARMA processing
1. User inputs a sentence.
2. The sentence is parsed for UD (currently using UDPipe parser)
3. FNBr database is queried for wordforms, lexemes and lemmas
4. A type network is built with lexical stuff
5. Graph database is queried to complete the type network
6. Type network is traversed to create a token network
7. Word nodes are activated, constraints are calculated and theactivation spreads in token network until a root node
8. Activated constructions nodes correspond to constructions detectedin the sentence
9. Conflicts (more than one construction activated) are resolved basedon MAP (maximum a posteriori)
15
CARMA processing
Figure 5: Partial view of activated network
16
Limitations and Outlook
Limitations
• Current version is at very beginning
• UD parsing for Brazilian Portuguese is very limited and error prone
• Some basic linguistic phenomenons are not handled yet (e.g. NullInstantiation)
• and many others...
17
Outlook
• How to implement a learning process?
• How to use the analysis in the context of construction alignment
• How many constraint types?
• and many others...
18
Thank you!
18
References i
N. Bel, F. Busa, N. Calzolari, E. Gola, A. Lenci, M. Monachini,A. Ogonowski, I. Peters, W. Peters, N. Ruimy, M. Villegas, andA. Zampolli.SIMPLE: A General Framework for the Development ofMultilingual Lexicons.Proceedings of the 2nd International Conference on LanguageResources and Evaluation, 2000.
P. Blache.Property Grammars: A Fully Constraint-Based Theory.In Christiansen H. et al. (eds.), Constraint Solving and LanguageProcessing, Sptinger-Verlag, Berlin Heidelberg, pages 1–16, 2005.
19
References ii
D. George, W. Lehrach, K. Kansky, M. Lazaro-Gredilla, C. Laan,B. Marthi, X. Lou, Z. Meng, and Y. Liu.A generative vision model that trains with high data efficiencyand breaks text-based CAPTCHAs.Science, 10(October):1–19, 1126.
E. Matos, T. Torrent, V. Almeida, A. Laviola, L. Lage, N. Marção,and T. Tavares.Constructional Analysis Using Constrained SpreadingActivation in a FrameNet-Based Structured ConnectionistModel.The AAAI 2017 Spring Symposium on Computational ConstructionGrammar and Natural Language Understanding, Technical ReportSS-17-02, pages 222–229, 2017.
20