Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...
-
Upload
karen-burke -
Category
Documents
-
view
217 -
download
4
Transcript of Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...
Modelling Language EvolutionLecture 2: Learning Syntax
Simon Kirby
University of Edinburgh
Language Evolution & Computation Research Unit
Multi-layer networks
For many modelling problems, multi-layer networks are used
Three layers are common: Input layer Hidden layer Output layer
What do the hidden-node activations correspond to?
Internal representation For some problems, networks need
to compute an “intermediate” representation of the data
XOR network - step 1
XOR is the same as OR but not AND Calculate OR Calculate NOT AND AND the results
NOT AND OR
AND
XOR network - step 2
OUTPUTBIAS NODE
HIDDEN 1 HIDDEN 2
INPUT 1 INPUT 2
10
10
-7.5
-5-5
7.5
5 5
-7.5
NOT AND OR
AND
Simple example (Smith 2003)
Smith wanted to model a simple language-using population Needed a model that learned vocabulary
3 “meanings” (1 0 0), (0 1 0), (0 0 1) 6 possible signals (0 0 0), (1 0 0) , (1 1 0) …
Used networks for reception and production:
MEANING SIGNAL
SIGNAL MEANING
After training, knowledge of language stored in the weights During reception/production, internal representation is in the activations of the hidden
nodes
Perform Train
Can a network learn syntax? (Elman 1993)
Important question for the evolution of language:
Modelling can tell us what we can do withoutCan we model the acquisition of syntax using a
neural network?
One problem… sentences can be arbitrarily long
How much knowledge of grammar are we born with?
Representing time
Imagine we presented words one at a time to a network
Would it matter what order the words were give? No: Each word is a brand new experience
The net has no way of relating each experience with what has gone before
Needs some kind of working memoryIntuitively: each word needs to be presented along
with what the network was thinking about when it heard the previous word
The Simple Recurrent Net (SRN)
At each time step, the input is: a new experience plus a copy of the hidden unit activations at the last time step
Copy backconnections
Input
Output
Hidden
Context
What inputs and outputs?
How do we force the network to learning syntactic relations?
Can we do it without an external “teacher”?
Answer: the next-word prediction task Inputs: Current word (and context) Outputs: Predicted next word
The error signal is implicit in the data
Long distance dependencies and hierarchy
Elman’s question: how much is innate? Many argue:
Long-distances dependencies and hierarchical embedding are “unlearnable” without innate language faculty
How well can an SRN learn them? Examples:
1. boys who chase dogs see girls
2. cats chase dogs
3. dogs see boys who cats who mary feeds chase
4. mary walks
First experiments
Each word encoded as a single unit “on” in the input.
Initial results
How can we tell if the net has learned syntax?Check whether it predicts the correct number
agreementGets some things right, but makes many mistakes
Seems not to have learned long-distance dependency.
boys who girl chase see dog
Incremental input
Elman tried teaching the network in stages Five stages:
1. 10,000 simple sentences (x 5)
2. 7,500 simple + 2,500 complex (x 5)
3. 5,000 simple + 5,000 complex (x 5)
4. 2,500 simple + 7,500 complex (x 5)
5. 10,000 complex sentences (x 5)
Surprisingly, this training regime lead to success!
Is this realistic?
Elman reasons that this is in some ways like children’s behaviour
Children seem to learn to produce simple sentences first
Is this a reasonable suggestion?Where is the incremental input coming from?Developmental schedule appears to be a product of
changing the input.
Another route to incremental learning
Rather than the experimenter selecting simple, then complex sentences, could the network?
Children’s data isn’t changing… children are changing
Elman gets the network to change throughout its “life”
What is a reasonable way for the network to change?
One possibility: memory
Reducing the attention span of a network
Destroy memory by setting context nodes to 0.5 Five stages of learning (with both simple and
complex sentences):1. Memory blanked every 3-4 words (x 12)
2. Memory blanked every 4-5 words (x 5)
3. Memory blanked every 5-6 words (x 5)
4. Memory blanked every 6-7 words (x 5)
5. No memory limitations (x 5)
The network learned the task.
Counter-intuitive conclusion: starting small
A fully-functioning network cannot learn syntax. A network that is initially limited (but matures) learns well. This seems a strange result, suggesting that networks aren’t
good models of language learning after all On the other hand…
Children mature during learning Infancy in humans is prolonged relative to other species Ultimate language ability seems to be related to how early
learning starts i.e., there is a critical period for language acquisition.
Next lecture
We’ve seen how we can model aspects of language learning in simulations
What about evolution?
Culturalevolution
Individual learning
Biological evolution