The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky,...

The dynamics of iterated learning

Tom GriffithsUC Berkeley

with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.data


are needed to see this picture.



Iterated learning(Kirby, 2001)

What are the consequences of learners learning from other learners?

Objects of iterated learning

• Knowledge communicated across generations through provision of data by learners

• Examples:– religious concepts– social norms– myths and legends– causal theories– language

Language

• The languages spoken by humans are typically viewed as the result of two factors– individual learning – innate constraints (biological evolution)

• This limits the possible explanations for different kinds of linguistic phenomena

Linguistic universals

• Human languages possess universal properties– e.g. compositionality

(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)

• Traditional explanation:– linguistic universals reflect innate constraints

specific to a system for acquiring language(e.g., Chomsky, 1965)

Cultural evolution

• Languages are also subject to change via cultural evolution (through iterated learning)

• Alternative explanation:– linguistic universals emerge as the result of the fact

that language is learned anew by each generation (using general-purpose learning mechanisms)

(e.g., Briscoe, 1998; Kirby, 2001)

• Variables x(t+1) independent of history given x(t)

• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)

x x x x x x x x

Transition matrixT = P(x(t+1)|x(t))

Markov chains

Stationary distributions

• Stationary distribution:

• In matrix form

is the first eigenvector of the matrix T– easy to compute for finite Markov chains– can sometimes be found analytically

€

i = P(x(t +1) = i |j

∑ x(t ) = j)π j = Tijπ j

j

∑

€

=Tπ

Bayesian inference

QuickTime™ and aTIFF (Uncompressed) decompressor


Reverend Thomas Bayes

• Rational procedure for updating beliefs

• Foundation of many learning algorithms

• Lets us make the inductive biases of learners precise

Bayes’ theorem

€

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

∑

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Stationary distributions

• Markov chain on h converges to the prior, P(h)

• Markov chain on d converges to the “prior predictive distribution”

€

P(d) = P(d | h)h

∑ P(h)

(Griffiths & Kalish, 2005)

Explaining convergence to the prior







PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

• Intuitively: data acts once, prior many times

• Formally: iterated learning with Bayesian agents is a Gibbs sampler for P(d,h)

(Griffiths & Kalish, submitted)

Gibbs sampling

For variables x = x1, x2, …, xn

Draw xi(t+1) from P(xi|x-i)

x-i = x1(t+1), x2

(t+1),…, xi-1(t+1)

, xi+1(t)

, …, xn(t)

Converges to P(x1, x2, …, xn)

(a.k.a. the heat bath algorithm in statistical physics)

(Geman & Geman, 1984)

Gibbs sampling

(MacKay, 2003)

Implications for linguistic universals

• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between

inductive biases and linguistic universals

€

PL (h | d)∝PP (d | h)P(h)

PP (d | ′ h )P( ′ h )′ h ∈H

∑

⎡

⎣

⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥

r

r = 1 r = 2 r =

From sampling to maximizing

• General analytic results are hard to obtain– (r = is a stochastic version of the EM algorithm)

• For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r

– the ordering identified by the prior is preserved, but not the corresponding probabilities

(Kirby, Dowman, & Griffiths, submitted)

From sampling to maximizing

Implications for linguistic universals

• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between inductive

biases and linguistic universals

• As learners move towards maximizing, the influence of the prior is exaggerated– weak biases can produce strong universals– cultural evolution is a viable alternative to traditional

explanations for linguistic universals


• The outcome of iterated learning is determined by the inductive biases of the learners– hypotheses with high prior probability ultimately

appear with high probability in the population

• Clarifies the connection between constraints on language learning and linguistic universals…

• …and provides formal justification for the idea that culture reflects the structure of the mind

Conclusions

• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups

• When cognition affects culture, studying groups can give us better insight into individuals


are needed to see this picture.data





The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky,...

Documents

Transcript of The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky,...