Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory,...

35
Bayesian Decision Theory, Iterated Learning and Portuguese Clitics Psychocomputational Models of Human Language Acquisition Catherine Lai Department of Linguistics University of Pennsylvania July 23, 2008

Transcript of Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory,...

Page 1: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Bayesian Decision Theory, Iterated Learning andPortuguese Clitics

Psychocomputational Models of Human Language Acquisition

Catherine Lai

Department of LinguisticsUniversity of Pennsylvania

July 23, 2008

Page 2: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Computational Models of Language Change

Some questions:

I How can we deal with individual and population variation inmodels of language change?

I Where does instability come from in these models?

I How do we use all these frequency counts to choose agrammar?

Some Frameworks:

I Iterated learning: (Kirby, 2001; Kirby et al., 2007)

I Dynamical systems: (Mitchener and Nowak, 2003; Nowaket al., 1999).

I Social learning: (Niyogi and Berwick, 1998; Yang, 2001)

Page 3: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

How do you make a decision?

The decision rule through which a grammar is selected is crucial!

I Are learners just trying to fit the probability distribution of theinput data to a predefined model? This is basically whatMaximum Likelihood Estimation (MLE) allows you to do.

I MLE requires us to conflate several factors: innate biases(priors), social and communicative factors, and random noise.

I If we view language learning as a problem involving beliefs andfactors outside pure point estimation, the Bayesian viewbecomes very attractive.

I However, even within general Bayesian frameworks, MLE isstill often implicitly employed (cf. MAP estimation Griffithsand Kalish (2005); Dowman et al. (2006); Briscoe (2000))

What does it mean to be a Bayesian?

Page 4: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Outline

I Portuguese clitic data

I Models and requirements

I Bayesian decision theory

Page 5: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Portuguese Direct Object Clitics

(1) a1 Paolo a ama (affirmative, proclisis)

a2 Paolo ama-a (affirmative, enclisis)

a3 Quem a ama (obligatory proclisis)

I Affirmative sentences with topics, adjuncts or referentialsubjects:

I Classical Portuguese (ClP, 16th to mid-19th century): alloweddirect object enclitics and proclitics (preferred Galves et al.(2005a)). (a1, a2)

I Modern European Portuguese (EP): obligatory enclisis. (a2)

I Proclitic forms are obligatory in other syntactic contexts. (a3)

Note: we’re treating EP as a subset of ClP.

Page 6: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Change!

I According to corpus studies (Galves et al., 2005a), there wasa sharp rise in enclisis in the early to mid 18th century.

I Galves and Galves (1995): this syntactic change was driven bychange in stress patterns. (although see Galves (2003); Costaand Duarte (2002)).

I Acquisition question: How does the learner do parametersetting?

I Production question: What sort of data will the learnerproduce for the next generation?

Page 7: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Galves Batch Model

I Galves and Galves (1995): Construction types are given aprobability proportional to the stress contour associated withit.

I Clauses of type a1 and a3 (proclisis) have weight p andclauses of type a2 (enclisis) have weight q.

P(a1|GClP) = p/(2p + q) (2)

P(a1|GEP) = 0 (3)

I Grammar selection via Maximum Likelihood Estimation(MLE).

I The probability of the learner acquiring GClP as the probabilitythat clause type 1 occurs at least once in n samples (thecritical period).

Page 8: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Batch Learning as Markov Process

I Niyogi and Berwick (1998); Niyogi (2006): re-implement theGBM but take more of a a population level view.

I αt = proportion of the population with GEP at time t.

I αt+1 depends on αt and the learning mechanism (MLE).(Markov process with two states)

P(a1|GClP) = P(a3|GClP) = p for some p ∈ [0, 1],

P(a2|GClP) = 1− 2p.

P(a1|GEP) = 0,

P(a2|GEP) = q, for some q ∈ [0, 1],

P(a3|GEP) = 1− q,

I p and q are production probabilities encoded in the grammar.These hold across the board for all speakers of a particulargrammar.

Page 9: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

And so...

I Learners may still acquire GClP even though they do not seeany instances of variational proclisis (a1)! That is, if there aretoo many instances of the type a3 (obligatory proclisis).

I Is because there is proclisis in a3? No! This would still happenif the syntax of the a3 type was totally devoid of clitics.

I Also, a learner who acquires GClP will continue to use a(possibly) very high rate of variational proclisis (p) in spite ofbeing surrounded by GEP speakers.

I Shouldn’t we expect that the desire to communicate wouldpressure speakers of GClP to lower the rate of variationalproclisis in the face of multitudes of GEP speakers?

I How do we deal with noise? (c.f. Briscoe (2002)) What aboutbiases? How about being Bayesian?

Page 10: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Bayesian Iterated Learning

Signal/meaning pairs (Griffiths and Kalish, 2005)

I (Yk ,Xk) = {(y1, x1) . . . (yn, xn)}: (utterance, meaning) pairsreceived by agent in generation k. (y → x is many to one).

I This allows us to focus only on types that show variation.

I Grammar selection is based on the posterior (g is thehypothesized grammar),

P(g |Xk ,Yk) =P(Yk |Xk , g)P(g)

P(Yk |Xk),

Priors over grammars are assumed to be innate and invariableacross generations.

I Also, add an error term to account for random noise.

I Griffiths and Kalish (2005); Kirby et al. (2007) showanalytically that convergence to the prior depends on theselection mechanism (MAP, sampling from the posterior, etc.)

Page 11: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

BIL and Portuguese

I BIL ≈ Griffiths and Kalish (2005) does not take into accountvariation in the community (!). However, in general IL allowsmore than one agent in a generation Kirby and Hurford(2002).⇒ BIL is like the previous models except for the priors.

I For Portuguese, we don’t have to consider cases of obligatoryproclisis (a3) since they do not differentiate the two grammars.

I However, P(a1|x1) = p is still seems to be an innate part ofthe grammar with MAP estimation.

Page 12: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

What would we like in the model?

I Frameworks are frameworks – they still need articulation.

I We would like to incorporate some formal notion of whyfrequency estimation is important to the learner.

I At least part of this should come from the fact that the learnerwishes to communicate effectively with a variety of speakers.

I For example, we want to incorporate the intuitive idea thatusing rare forms when frequent forms exists may be disfavored.

I Also forms that are harder to produce (and process) should bedisfavoured (c.f. the prosody argument).

I The decision problem that learners face is subjective – learnerschoose a grammar that they believe will be most useful forthem. That is, they make decisions based on expected utility.

Page 13: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Components of the Bayesian Decision Rule

Bayesian decision rule: maximize the expected utility of taking:action a from decision set Θ with respect to the possible values ofθ and the observed values of y . That is,

a = argmaxa

∫Θ

U(a, θ)P(θ|y)dθ

I The likelihood function

I The prior

I The utility function

I The decision rule

I The production distribution

Page 14: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Parameter setting problem

I Things the learner doesn’t know but would like to(parameters, θ):

I α = proportion of GEP speakers [syntactic parameter ‘ON’]I p = rate of enclisis of GClP speakers

I The only evidence the learner has for any given parameter isthe count of inputs that support the parameter setting and acount of those that oppose it (observations, y).

I The task of the learner is to use these frequency counts toevaluate what is the best grammar for them (decision set, Θ).

Page 15: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Likelihood Function

I Treat the data as N (independent) Bernoulli trials:SN = {(yi , xi )}Ni=1.

I Let, k be the number of cases that were parseable withparameter setting on. e.g. enclitics.

I The likelihood function is:

P(SN |α, p) =

(N

k

)[(1− α)p + α]k [(1− α)(1− p)]N−k

I α, p are dummies here, they aren’t part of the grammar.

I Note: GEP is a subset of GClP so does not present anycounter-evidence for GClP in this model.

Page 16: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Prior

Prior beliefs of the learner about possible combinations of α and p:

I If α = 1 then the population is entirely made up of GEP

speakers, the value of p is irrelevant as it only applies to GClP

speakers.

I The simplest hypothesis is that α = 1, p = 1 is a maximum.i.e. before being wiped out, GClP speakers would haveincreasingly used enclitic constructions to fit with the rest ofthe population.

I Similarly, if α = 0 then the population would most likely beusing proclitic construction a large proportion of the time. So,maxima around α = 0, p = 0.05 (Galves et al., 2005b).

Page 17: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The PriorAs a function:

f (α, p) =1

ce−(p−(0.95α+0.05))2

where c is a normalizing constant. f (α, p) is then just a squaredGaussian with mean 0.95α+ 0.05. This is the rate of enclisis foundin the Tycho Brahe corpus.

0.0

0.20.4

0.60.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

density

p (enclisis)

alpha (Prop. EP)

Figure: The prior density: f (α, p)

Page 18: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

The Utility Function

I The learner wants to acquire the same grammar as the rest ofits community.

I The learner wants to be able to play both roles of speaker andhearer successfully.

I A speaker of ClP will be able to understand both EP and ClPspeakers without any penalty. However, a speaker of EP willhave difficulty understanding ClP speakers. Conversely, EPspeakers will be able to converse without penalty but notvice-versa.

I Assume the individual plays speaker/hearer half the time.

U(a, α, p) =

{−1

2α if a = 0 (GClP),−1

2 (1− α) if a = 1 (GEP).

This is also where we should be encoding pronounciation difficulty!

Page 19: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Utility Maximization

The learner does not actually know what α and p are. They needto infer it from frequencies k and N. Instead of trying to pin thisdown (or stipulate it) expected utility maximization hedges itsbets. So,

E[U(a, α, p)|SN ] =

∫[0,1]2

U(a, α, p)dP(α, p|SN).

To find out whether the parameter should be set ‘off’ (andsimplified), we calculate:

E[U(0, α, p)|SN ] > E[U(1, α, p)|SN ].

∫[0,1]2

(2α− 1)P(SN |α, p)f (α, p)d(α, p) < 0

If this last statement is true, the learner should choose GClP .

Page 20: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Estimating Production Rates

I Assume that production probabilities are derivable from thefrequencies observed in the acquisition process.

I For a ClP speaker:

P(a1|x1) = (N − k)/N,

P(a2|x1) = k/N

I For an EP speaker:

P(a2|x1) = 1.

I Let α0 be the proportion of GEP speakers observed ingeneration 0. Then the probability of getting the encliticversion in the first round.

q0 = Ppop(a2|x1,T = 0) = (1− α0)p0 + α0

Page 21: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Over and Over...

I The proportion of speakers who will see k encliticconstructions in N Bernoulli trials is:(

N

k

)qct (1− qt)N−k

where qt is probability of seeing an enclitic in generation t.

I This proportion of speakers will then contribute enclitics witha rate of k/N to the next generation, t + 1.

Page 22: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Initial GClP rate of enclisis between 60-70%

0 50 100 150 200 250 300

0.0

0.2

0.4

0.6

0.8

1.0

Iteration no.

Rat

e of

enc

lisis

Figure: Rate of enclisis. N = 100, α0 = 0, and the initial rate of GClP

enclisis, p, ranges from 0.6 to 0.7.

Change doesn’t take off from p = 0.05!

Page 23: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

More interactions?

I The stability of GClP is really assumed by the model via theprior.

I Crucially, the simulation above still does not incorporate theeffects of simultaneous change in other modules of language(e.g. phonology).

I Production changes? We could define a new decision problemthat estimates production probabilities.

Page 24: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Conclusion

Take home points:

I This model articulates the general social learning modelNiyogi (2006): learners learn from an (infinite) population.

I The decision procedure was presented as a utility maximizingdecision rule where the learner estimates populationfrequencies in order to maximize communicability.

I Ideally we would look at a change in progress where we coulddo better estimation of the prior and utility functions.

Thanks!

Especially to: Charles Yang, Andrew Clausen, and Ling575/506-ers!

Page 25: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Simulation: 300 iterations

Figure: Transition diagram: Proportion of GEP speakers. Different curvesrepresent different GClP enclisis rates (p): 0.05, 0.1, 0.2, 0.5, 0.8, and 1.n = 100 and α = 0.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Proportion of EP

Pro

port

ion

of E

P

Page 26: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Different input sizes

Figure: Transition diagram: Overall rates of enclisis. Different curvesrepresent different input sizes n: n = 10, 20, 50, 80, 100. α = 0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Rate of enclisis

Rat

e of

enc

lisis

Page 27: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Initial GClP rate of enclisis between 60-70%

0 50 100 150 200 250 300

0.0

0.2

0.4

0.6

0.8

1.0

Iteration no.

Pro

port

ion

of E

P s

peak

ers

Figure: Proportions of GEP speakers. n = 100, α0 = 0, and the initialrate of GClP enclisis, p, ranges from 0.6 to 0.7.

Page 28: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Portuguese BIL Example

I If there is a 1-1 mapping between a meaning x and a type ythen

P(y |x ,G ) = 1− ε,

if G admits x (ε is an error term).

I Let frequencies for input be: a1 = a, a2 = b, a3 = c .

I Let, x1, x3 be the meanings associate with a1 and a3

respectively.

I We do not need to consider the contribution of obligatoryproclisis to calculate the MLE (or MAP) grammar.

Page 29: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

P(G |Yk ,Xk) ∝ P(Yk |Xk ,G )P(G )

=k∏

i=1

P(yi |xi ,G )P(xi )

= P(a1)aP(a2)bP(a3)c P(a1|x1,G )a′P(a1|x3,G )a′′

P(a2|x1,G )b′P(a2|x3,G )b′′

P(a3|x1,G )c ′P(a3|x3,G )c ′′

P(G )

Where a = a′ + a′′ and similarly for the other frequency counts.Proclisis in affirmative sentences is simply given the errorprobability, ε, in GEP .

Page 30: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

If we only care about finding MLE (or MAP) grammar, and takingprobabilities from Nigoyi’s implementation of GBM, then we havethe following.

P(GClP |Yk ,Xk) ∝ P(GClP)(p − ε/2)

(1− p − ε))a′((1− 2p − ε/2)

(1− p − ε)b′

P(GEP |Yk ,Xk) ∝ P(GEP)(ε/2)a′(1− ε/2)b′

Page 31: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

I The explicit connection between meaning and types allows usto reduce the parameter space needed to evaluate the twogrammars in question.

I We only need to parameterize the error term to do the thelikelihood computation for GEP . In general, it will allow us tofocus only on types that show variation.

I The prior notwithstanding, this reduction in the parameterspace is welcome in comparison with Nigoyi’s implementation.

I However, this still suffers from over-parameterization theproblems associated with MLE. P(a1|x1) = p is still assumedto be an innate part of the grammar.

Page 32: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Briscoe, E. J. (2002). Grammatical acquisition and linguisticselection. In Briscoe, E. J., editor, Linguistic Evolution throughLanguage Acquisition: Formal and Computational Models,chapter 9. Cambridge University Press.

Briscoe, T. (2000). Grammatical Acquisition: Inductive Bias andCoevolution of Language and the Language Acquisition Device.Language, 76(2):245–296.

Costa, J. and Duarte, I. (2002). Preverbal sbjects in null subjectlanguages are not necessarily dislocated. Journal of PortugueseLinguistics, 2:159–176.

Dowman, M., Kirby, S., and Griffiths, T. L. (2006). Innateness andculture in the evolution of language. In Proceedings of the 6thInternational Conference on the Evolution of Language, pages83–90.

Galves, A. and Galves, C. (1995). A case study of prosody drivenlanguage change: from classical to modern EuropeanPortuguese. Unpublished MS, University of Sao Paolo, SaoPaolo, Brasil.

Page 33: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Galves, C. (2003). Clitic-placement in the history of portugueseand the syntax-phonology interface. In Talk given at 27th PennLinguistics Colloquium, University of Pennsylvania, USA.

Galves, C., Britto, H., and Paixao de Sousa, M. (2005a). TheChange in Clitic Placement from Classical to Modern EuropeanPortuguese: Results from the Tycho Brahe Corpus. Journal ofPortuguese Linguistics, pages 39–67.

Galves, C., de Sousa, P., and Clara, M. (2005b). Clitic Placementand the Position of Subjects in the History of EuropeanPortuguese. In Geerts, T., Ginneken, V., Ivo, and Jacobs, H.,editors, Romance Languages and Linguistic Theory 2003:Selected papers from ’Going Romance’ 2003, pages 97–113.John Benjamins, Amsterdam.

Griffiths, T. and Kalish, M. (2005). A Bayesian view of languageevolution by iterated learning. In Proceedings of the XXVIIAnnual Conference of the Cognitive Science Society.

Kirby, S. (2001). Spontaneous evolution of linguistic structure: aniterated learning model of the emergence of regularity and

Page 34: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

irregularity. IEEE Transactions on Evolutionary Computation,5(2):102–110.

Kirby, S., Dowman, M., and Griffiths, T. (2007). Innateness andculture in the evolution of language. Proceedings of the NationalAcademy of Sciences, 104(12):5241.

Kirby, S. and Hurford, J. (2002). The emergence of linguisticstructure: An overview of the iterated learning model. InCangelosi, A. and Parisi, D., editors, Simulating the Evolution ofLanguage, chapter 6, pages 121–148. Springer-Verlag.

Matsen, F. A. and Nowak, M. A. (2004). Win-stay, lose-shift inlanguage learning from peers. PNAS, 101(52):18053–18057.

Mitchener, W. and Nowak, M. (2003). Competitive exclusion andcoexistence of universal grammars. Bulletin of MathematicalBiology, 65(1):67–93.

Niyogi, P. (2006). The computational nature of language learningand evolution, chapter 7. MIT Press.

Niyogi, P. and Berwick, R. (1998). The Logical Problem ofLanguage Change: A Case Study of European Portuguese.Syntax, 1(2):192–205.

Page 35: Bayesian Decision Theory, Iterated Learning and …laic/bptalk.pdf · Bayesian Decision Theory, Iterated Learning and ... (ClP, 16th to mid-19th ... Bayesian Decision Theory, Iterated

Nowak, M., Plotkin, J., and Krakauer, D. (1999). The evolutionarylanguage game. Journal of Theoretical Biology, 200(2):147–162.

Yang, C. (2001). Internal and external forces in language change.Language Variation and Change, 12(03):231–250.