Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...

18
Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit

Transcript of Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language...

Page 1: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Modelling Language EvolutionLecture 2: Learning Syntax

Simon Kirby

University of Edinburgh

Language Evolution & Computation Research Unit

Page 2: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Multi-layer networks

For many modelling problems, multi-layer networks are used

Three layers are common: Input layer Hidden layer Output layer

What do the hidden-node activations correspond to?

Internal representation For some problems, networks need

to compute an “intermediate” representation of the data

Page 3: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

XOR network - step 1

XOR is the same as OR but not AND Calculate OR Calculate NOT AND AND the results

NOT AND OR

AND

Page 4: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

XOR network - step 2

OUTPUTBIAS NODE

HIDDEN 1 HIDDEN 2

INPUT 1 INPUT 2

10

10

-7.5

-5-5

7.5

5 5

-7.5

NOT AND OR

AND

Page 5: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Simple example (Smith 2003)

Smith wanted to model a simple language-using population Needed a model that learned vocabulary

3 “meanings” (1 0 0), (0 1 0), (0 0 1) 6 possible signals (0 0 0), (1 0 0) , (1 1 0) …

Used networks for reception and production:

MEANING SIGNAL

SIGNAL MEANING

After training, knowledge of language stored in the weights During reception/production, internal representation is in the activations of the hidden

nodes

Perform Train

Page 6: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Can a network learn syntax? (Elman 1993)

Important question for the evolution of language:

Modelling can tell us what we can do withoutCan we model the acquisition of syntax using a

neural network?

One problem… sentences can be arbitrarily long

How much knowledge of grammar are we born with?

Page 7: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Representing time

Imagine we presented words one at a time to a network

Would it matter what order the words were give? No: Each word is a brand new experience

The net has no way of relating each experience with what has gone before

Needs some kind of working memoryIntuitively: each word needs to be presented along

with what the network was thinking about when it heard the previous word

Page 8: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

The Simple Recurrent Net (SRN)

At each time step, the input is: a new experience plus a copy of the hidden unit activations at the last time step

Copy backconnections

Input

Output

Hidden

Context

Page 9: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

What inputs and outputs?

How do we force the network to learning syntactic relations?

Can we do it without an external “teacher”?

Answer: the next-word prediction task Inputs: Current word (and context) Outputs: Predicted next word

The error signal is implicit in the data

Page 10: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Long distance dependencies and hierarchy

Elman’s question: how much is innate? Many argue:

Long-distances dependencies and hierarchical embedding are “unlearnable” without innate language faculty

How well can an SRN learn them? Examples:

1. boys who chase dogs see girls

2. cats chase dogs

3. dogs see boys who cats who mary feeds chase

4. mary walks

Page 11: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

First experiments

Each word encoded as a single unit “on” in the input.

Page 12: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Initial results

How can we tell if the net has learned syntax?Check whether it predicts the correct number

agreementGets some things right, but makes many mistakes

Seems not to have learned long-distance dependency.

boys who girl chase see dog

Page 13: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Incremental input

Elman tried teaching the network in stages Five stages:

1. 10,000 simple sentences (x 5)

2. 7,500 simple + 2,500 complex (x 5)

3. 5,000 simple + 5,000 complex (x 5)

4. 2,500 simple + 7,500 complex (x 5)

5. 10,000 complex sentences (x 5)

Surprisingly, this training regime lead to success!

Page 14: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Is this realistic?

Elman reasons that this is in some ways like children’s behaviour

Children seem to learn to produce simple sentences first

Is this a reasonable suggestion?Where is the incremental input coming from?Developmental schedule appears to be a product of

changing the input.

Page 15: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Another route to incremental learning

Rather than the experimenter selecting simple, then complex sentences, could the network?

Children’s data isn’t changing… children are changing

Elman gets the network to change throughout its “life”

What is a reasonable way for the network to change?

One possibility: memory

Page 16: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Reducing the attention span of a network

Destroy memory by setting context nodes to 0.5 Five stages of learning (with both simple and

complex sentences):1. Memory blanked every 3-4 words (x 12)

2. Memory blanked every 4-5 words (x 5)

3. Memory blanked every 5-6 words (x 5)

4. Memory blanked every 6-7 words (x 5)

5. No memory limitations (x 5)

The network learned the task.

Page 17: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Counter-intuitive conclusion: starting small

A fully-functioning network cannot learn syntax. A network that is initially limited (but matures) learns well. This seems a strange result, suggesting that networks aren’t

good models of language learning after all On the other hand…

Children mature during learning Infancy in humans is prolonged relative to other species Ultimate language ability seems to be related to how early

learning starts i.e., there is a critical period for language acquisition.

Page 18: Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Next lecture

We’ve seen how we can model aspects of language learning in simulations

What about evolution?

Culturalevolution

Individual learning

Biological evolution