Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...

Modelling Language EvolutionLecture 2: Learning Syntax

Simon Kirby

University of Edinburgh

Language Evolution & Computation Research Unit

Multi-layer networks

For many modelling problems, multi-layer networks are used

Three layers are common: Input layer Hidden layer Output layer

What do the hidden-node activations correspond to?

Internal representation For some problems, networks need

to compute an “intermediate” representation of the data

XOR network - step 1

XOR is the same as OR but not AND Calculate OR Calculate NOT AND AND the results

NOT AND OR

AND

XOR network - step 2

OUTPUTBIAS NODE

HIDDEN 1 HIDDEN 2

INPUT 1 INPUT 2

10

10

-7.5

-5-5

7.5

5 5

-7.5

NOT AND OR

AND

Simple example (Smith 2003)

Smith wanted to model a simple language-using population Needed a model that learned vocabulary

3 “meanings” (1 0 0), (0 1 0), (0 0 1) 6 possible signals (0 0 0), (1 0 0) , (1 1 0) …

Used networks for reception and production:

MEANING SIGNAL

SIGNAL MEANING

After training, knowledge of language stored in the weights During reception/production, internal representation is in the activations of the hidden

nodes

Perform Train

Can a network learn syntax? (Elman 1993)

Important question for the evolution of language:

Modelling can tell us what we can do withoutCan we model the acquisition of syntax using a

neural network?

One problem… sentences can be arbitrarily long

How much knowledge of grammar are we born with?

Representing time

Imagine we presented words one at a time to a network

Would it matter what order the words were give? No: Each word is a brand new experience

The net has no way of relating each experience with what has gone before

Needs some kind of working memoryIntuitively: each word needs to be presented along

with what the network was thinking about when it heard the previous word

The Simple Recurrent Net (SRN)

At each time step, the input is: a new experience plus a copy of the hidden unit activations at the last time step

Copy backconnections

Input

Output

Hidden

Context

What inputs and outputs?

How do we force the network to learning syntactic relations?

Can we do it without an external “teacher”?

Answer: the next-word prediction task Inputs: Current word (and context) Outputs: Predicted next word

The error signal is implicit in the data

Long distance dependencies and hierarchy

Elman’s question: how much is innate? Many argue:

Long-distances dependencies and hierarchical embedding are “unlearnable” without innate language faculty

How well can an SRN learn them? Examples:

1. boys who chase dogs see girls

2. cats chase dogs

3. dogs see boys who cats who mary feeds chase

4. mary walks

First experiments

Each word encoded as a single unit “on” in the input.

Initial results

How can we tell if the net has learned syntax?Check whether it predicts the correct number

agreementGets some things right, but makes many mistakes

Seems not to have learned long-distance dependency.

boys who girl chase see dog

Incremental input

Elman tried teaching the network in stages Five stages:

1. 10,000 simple sentences (x 5)

2. 7,500 simple + 2,500 complex (x 5)

3. 5,000 simple + 5,000 complex (x 5)

4. 2,500 simple + 7,500 complex (x 5)

5. 10,000 complex sentences (x 5)

Surprisingly, this training regime lead to success!

Is this realistic?

Elman reasons that this is in some ways like children’s behaviour

Children seem to learn to produce simple sentences first

Is this a reasonable suggestion?Where is the incremental input coming from?Developmental schedule appears to be a product of

changing the input.

Another route to incremental learning

Rather than the experimenter selecting simple, then complex sentences, could the network?

Children’s data isn’t changing… children are changing

Elman gets the network to change throughout its “life”

What is a reasonable way for the network to change?

One possibility: memory

Reducing the attention span of a network

Destroy memory by setting context nodes to 0.5 Five stages of learning (with both simple and

complex sentences):1. Memory blanked every 3-4 words (x 12)

2. Memory blanked every 4-5 words (x 5)



5. No memory limitations (x 5)

The network learned the task.

Counter-intuitive conclusion: starting small

A fully-functioning network cannot learn syntax. A network that is initially limited (but matures) learns well. This seems a strange result, suggesting that networks aren’t

good models of language learning after all On the other hand…

Children mature during learning Infancy in humans is prolonged relative to other species Ultimate language ability seems to be related to how early

learning starts i.e., there is a critical period for language acquisition.

Next lecture

We’ve seen how we can model aspects of language learning in simulations

What about evolution?

Culturalevolution

Individual learning

Biological evolution

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...

Documents

Transcript of Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...