A Neural Net Model for natural language learning: examples from cognitive metaphor and...
-
Upload
noah-malone -
Category
Documents
-
view
225 -
download
0
Transcript of A Neural Net Model for natural language learning: examples from cognitive metaphor and...
A Neural Net Model for natural language learning: examples from cognitive
metaphor and constructional polysemy
Eleni KoutsomitopoulouPhD candidate, Computational Linguistics, Georgetown
University, Washington DCand
Senior Indexing Analyst, LexisNexis Butterworths Tolley, London UK
GURT 2003 Cognitive and Discourse Perspectives on Language and Language
Learning
Summary of the presentation• Older approaches to NL Learning• Initial Motivations for the ART0 neural network
Model• The Adaptive Resonance Theory Approach• Learning through differentiation• Some critical questions in NL learning vis-à-vis
cognition• Illustrative examples from cognitive metaphor and
constructional polysemy• Conclusions
A high-level overview of related models
• The ‘classical’ Hierarchical Propositional Approach (e.g. Quillian’s 1969)
• A distributed connectionist model that learns from exposure to information about the relations between concepts and their properties (Rumelhard & Maclelland, PDP, 1986 et seq)
• A natural language based propositional distributed connectionist model that learns about concepts and their properties in discourse.
2
Quillian’sHierarchicalPropositional
Model
3
Initial Motivations for the Model
• Provide a connectionist alternative to traditional hierarchical propositional models of conceptual knowledge representation.
• Account for development of conceptual knowledge as a gradual process involving progressive differentiation.
4
The ART Approach• Processing occurs via propagation of activation among simple
processing units (represented as nodes in the network).• Knowledge is stored in the weights on connections between the
nodes (LTM), as well as in individual nodes (STM). • Propositions are stored directly after being parsed and mapped
as nodes in the network. – The ability to produce resonant propositions from partial
probes based on previous learned propositional input arises through the activation process, based on the interaction between STM and LTM knowledge stored in the nodes and their interconnections respectively.
• Learning occurs via adjustment in time of the strength of the nodes and that of their connections (ART differential equations).
• Semantic knowledge is gradually acquired through repeated exposure to new propositional input, mirroring the gradual nature of cognitive and NL development.
5
ART0 basic equations (Grossberg 1980, Loritz 2000)ART equations
Xj = -Cxj +Dxj (nij ) -E xk ( nkj)
C: inhibition
D: excitation
E: decay
Zij = -Azij + Bxixj
B: learning rate
A: decay rate6
ART equations
B= learning rateC= inhibitionE=node decayA=LTM decayD=node excitation
Differentiation in Learning
7
Some critical questions in NL learning
8
-Which properties are central to particular natural language categories (prototype effects, Rosch, 1975 et seq) -How properties should be generalized from one category to another (inference through experience)-Must some “constraints” on acquiring natural language be available ‘initially’? (signal decay, habituation, rebounds, expectancies) -Is reorganization of such NL knowledge possible through experience, and how.
ART0 Basics
-In the network, the salient properties for a given NL concept are represented in an antagonistic dipole anatomy. Probing the activation patterns of the NL concepts mapped
we represent learning.-The traditional notions of “category aptness” and “feature salience” are a matter of gradual structural and functional
modification via specialization of the NL input.-Attributes/Concepts activated as part of the same pattern create conceptual clusters contiguous in semantic space
facilitating learning.-Granularity: Primary concepts (“feature-centric”) are the building blocks of more complex super-ordinate concepts (“cognitive categories”), but whether we classify (learn) “concepts” or “features” we do it via the vehicle of NL
propositions. -Learning via self-similarity is easier, faster and more
economical. Individual concepts (i.e. concepts in no relation with any others) are learned at a slower pace and after
certain pertinent subnetworks have been acquired. The principle of differentiation via inhibition and its effects on NL learning9
9
Certain ART0 assumptions about Conceptual Reorganization
• General assumption: Higher-level concepts are acquired only after certain crucial lower-level (primary) concepts have been acquired (Carey 1985). However, the acquisition comes via quantification (assimilation of information and acquisition via differentiation and classification) not qualification (granularity is irrelevant – there is no a priori hierarchy of concepts/features).
• Primary metaphors (Grady 1997) are basic dipole anatomies. Resemblance metaphors are complex conceptual clusters learned around each dipole.
• For the emergence of a new concept or feature assimilation different kinds of information is needed. If a new concept cannot be readily accommodated in the cognitive system (because some prerequisite factoids have not been acquired yet) a new cognitive category is built to retain it in memory for as long as some supportive factoids reinforce this learning. If no related factoids will be presented, the new category will get “forgotten”.
10
Testing Conceptual contiguity: methods(1)
• Representations are generated by usingthe ART differential equations, testing the effects of the nodes and weights across the links in the network.
• Instead of comparing separate trained representations as a typical Rumelhart-McClelland model would do, we check patterns of activation by comparing the activation numbers to see whether the anticipated relationships between concepts were successfully modeled.
11
Domain specific vs. domain generic: methods (2)
• The simulation suggests that generic inter-domain/discourse learning mechanisms such as that of inhibition can teach the network the aptness of different features for different concept and different concepts for different discourses.
• The network is able to map and acquire stable domain-specific conceptual knowledge.
• Knowledge acquisition in the network is possible via introduction/mapping of factoids based on NL input and native speaker intuitions about it, without the need for initial or a priori domain knowledge.
12
Running the ART0 simulations • First we construct a few "minimal anatomies" which
display requisite properties such as stability in the face of (plastic) inputs and LTM stability in the absence of inputs. These minimal anatomies are generated by metaphoric and non-metaphoric sentential inputs to an artificial neural network constructed on ART principles.
• The ART network takes as input parse trees for sentences drawn from some major classes of metaphor identified by CMT. A basic parser generates the parse trees, and the parse tree of each input sentence will be converted to a resonant network according to the ART equations. Each input sentence is connected (mapped) to the network at the terminal nodes, i.e. the lexical items which may be common to multiple input sentences.
13
Conceptual Reorganization in the Model
• The ART0 simulation model provides a vehicle for exploring how conceptual reorganization can occur.– Changing the links (relations) between the nodes
(concepts), as well as the nodes involved in a simulation each time, the ART0 model is capable of forming initial representations based on “superficial” appearances (for instance, internal sentence structure).
– Later, after the phasic input has been introduced to the network, the model reorganizes its previous representations as it learns new discourse-dependent concept relations.
– The network can categorize patterns across different discourses, and the emergent structure may be used as a basis for a deeper NL understanding.
14
examples
Metaphoric feature probe (resemblance metaphor)
• John is a Hominidae.• Wilbur is a Suidae.• Wilbur is a pig. t--------------------------------------------------- t+1• John is a pig.
16
How the network looks like
Resemblance metaphor
S1
John
S2
Wilbur
S3
S4John
t +1
t
Hominidae
pig
Suidae
17
Experimental Results(activation patterns)activation
patterns)
• Table here Results
7.9303.648values
Was_kicked_out_of_the_house
Was_asked_out_the_house
arguments
S 1S 3
par. C (inhibition) = .6
(connection weight) Zij = .05
par. B (learning rate) = .45
Results
5.227-2.717values
Sui daearguments
S2S1
par. C (i nhi biti on) = . 6
(connecti on wei ght) Zij = . 05
par. B (l ear ni ng rat e) = . 45
18
Orientational Primary metaphor
• The boy ran down the stairs.• Mary feels down.• John feels bad.
t----------------------------------------------
t+1• Down is bad.
19
How the network looks like
Orientational metaphor
S1
be
S2
feel
S3
S4down
t +1
t
down
bad
down
20
Experimental Results(activation patterns)activation
patterns)
• Table here Results
5.227-2.717values
downdownarguments
S2S1
par. C (inhibition) = .6
(connection weight) Zij = .05
par. B (learning rate) = .45
21
A glimpse at event-structure metaphor
• John is at a crossroads in his business.• John is at a crossroads in his life.• Life is a journey.• A journey may lead to an intersection.
t----------------------------------------------
t+1• John is at an intersection.
22
How the network looks like
Event-structure metaphor
S1
business
S2
lifeS3
S5John
t +1
t
crossroads
journeycrossroads
intersection
S4
23
Experimental Results(activation patterns)activation
patterns)
• Table here Results
13.6161.164values
crossroadscrossroadsarguments
S2S1
par. C (inhibition) = .6
(connection weight) Zij = .05
par. B (learning rate) = .45
24
Constructional polysemy
• The dog was kicked out of the house. • John was asked out of the house.• John is out of the house. t------------------------------------------------- t+1• Bill is out of the house.
25
How the network looks like
Constructional polysemy
S1
was asked
dogwas kicked
S2
JohnS3
S4Bill
t +1
t
out of the house
out of the house is
is
26
Experimental Results(activation patterns)activation
patterns)
Results
7.9303.648values
Was_kicked_out_of_the_house
Was_asked_out_the_house
arguments
S2S1
par. C (inhibition) = .6
(connection weight) Zij = .05
par. B (learning rate) = .45
27
conclusion
• The model exhibits certain characteristics of human cognition and NL learning in particular.
• The model does this simply by mapping NL propositional input as nodes in the network and by adjusting both the weights on the connections as well as the connectivity between and activation patterns of individual nodes in time, and by propagating signals forward (in time and structure) through these connections.
28
Review of ART0 system features
– It provides explicit mechanisms indicating how intra-domain and inter-domain knowledge influences semantic cognition and NL learning.
– It offers a learning process that provides a means for the acquisition of such knowledge.
– It demonstrates that some of the sorts of constraints people have suggested might be innate can in fact be acquired from experience.
– Unlike other connectionist models (e.g. PDP), the ART0 learning algorithm emphasizes the role of memory in NL learning.
29