Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics...

28
Models of Language Evolution: Part IV Linguistic Coherence as Emergent Property Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 CS101 Win2015: Linguistics Language Evolution 4

Transcript of Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics...

Page 1: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Models of Language Evolution: Part IVLinguistic Coherence as Emergent Property

Matilde MarcolliCS101: Mathematical and Computational Linguistics

Winter 2015

CS101 Win2015: Linguistics Language Evolution 4

Page 2: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Main Reference

Partha Niyogi, The computational nature of language learningand evolution, MIT Press, 2006.

CS101 Win2015: Linguistics Language Evolution 4

Page 3: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Population Dynamics Model

• following previous model: languages µ1, . . . , µn and linguisticevolution in population modelled by ODE

xj =∑i

xi fiQij − φ xj

xj = αj proportion of individuals speaking language µj

• matrix Q measure fidelity of language map (how much deviationfrom teacher to learner)

• fi = fitnessfi =

∑j

xj F (µi , µj)

CS101 Win2015: Linguistics Language Evolution 4

Page 4: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Assumptions

• assuming as before that

Qii = q and Qij = 1−qn−1 for i 6= j

F (µi , µi ) = 1 and F (µi , µj) = a for all i 6= j

fi = (1− a)xi + a + f0

Threshold behavior depending on parameter q

for q small only stable critical point is uniform distribution: allxj = 1/n

bifurcation at some q = q1: two new critical points r±

• one-grammar solutions emerge where the majority of populationspeaks one of the languages

CS101 Win2015: Linguistics Language Evolution 4

Page 5: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Without fitness

• Note: same equation with fi = f0 (without fitness function)

• would have φ = f0∑

j xj = f0

• equation would be

xj = f0∑i

xiQij − f0xj

becomes a linear system of ODE

• only equilibrium solution at xj = 1/n, uniform distribution

• no bifurcation and no emergent behavior creating languagecoherence: those are effects of the presence of the fitness function

CS101 Win2015: Linguistics Language Evolution 4

Page 6: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Social Learning

• this model was based on assumption that learner takes input onlyfrom one teacher (with the possibility of errors in reproductionencoded in Qij)

• consider again other scenario where learner’s input is comingfrom the entire population

• given n languages L1, . . . ,Ln assume a set of expressions isespecially useful for language acquisition (triggers, cues, ...)

• this gives subsets Ci ⊆ Li ; assume Ci ∩ Cj = ∅ (these areunambiguous cues)

• speakers of Li produce sentences randomly with distribution Pi

and likelihood of producing a cue is

ai = Pi (Ci )

• simplifying assumption: all ai = a same

CS101 Win2015: Linguistics Language Evolution 4

Page 7: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Case of two languages

• proportions α, 1− α of speakers: function of time x1(t) = α(t),x2(t) = 1− α(t)

• cue-frequency based batch learner: m = k1 + k2 + k3

k1 sentences in input that are in C1

k2 in C2

k3 are not cues

• probability of k1 > k2

f1,a,m(x1, x2) =∑(

m

k1k2k3

)(ax1(t))k1(ax2(t))k2(1− a)k3

sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2

• probability f2,a,m of k1 < k2, same with sum over (k1, k2, k3) withm = k1 + k2 + k3 and k1 > k2

CS101 Win2015: Linguistics Language Evolution 4

Page 8: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• symmetric assumption ai = a gives f2,a,m(x1, x2) = f1,a,m(x2, x1)

• probability after m inputs of learner acquiring L1

f1 +1

2(1− f2 − f1)

(if no cues received at all: 1/2 chance of one language or other)

• population dynamics equation

x1(t + 1) =1

2(1 + f1,a,m(x1(t), x2(t))− f2,a,m(x1(t), x2(t))

• a fixed point at x1 = x2 = 1/2: uniform distribution ofpopulation among the two languages

CS101 Win2015: Linguistics Language Evolution 4

Page 9: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• if number of inputs m small: only fixed point (stable)

• for larger m other fixed points appear (one language becomesdominant)

• for larger m uniform solution x1 = x2 = 1/2 becomes unstable

• the value of m where bifurcation occurs is a function ofparameter a

• can also keep m fixed and vary a:

a close to zero: only x1 = x2 = 1/2 (stable fixed point)

bifurcation when a grows: new stable fixed points andx1 = x2 = 1/2 becomes unstable

bifurcation occurs at a value of a dependent on m

CS101 Win2015: Linguistics Language Evolution 4

Page 10: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Stability of x1 = x2 = 1/2: more details

• derivative at the fixed point

f ′1,a,m(1/2, 1/2) =∑k1>k2

(m

k1k2k3

)am−k3(1−a)k3(k1−k2)(

1

2)k1+k2−1

similar for f ′2,a,m

• f ′1,a,m(1/2, 1/2)|a=0 = 0 so by continuity for small a have

|f ′1,a,m(1/2, 1/2)| < 1

stability while in this range

• also see that when a = 1, for sufficiently large m havef ′1,a,m(1/2, 1/2)|a=1 > 1 so in between will cross value 1: wherebifurcation occurs

• emergence of linguistic coherence in the population

CS101 Win2015: Linguistics Language Evolution 4

Page 11: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Case of n languages

• learner is exposed to a mixture of languages form theenvironment

• learner scans incoming data for cues and chooses the languagefrom which largest number of cues is received

• if multiple languages with same number of cues: pick one amongthem randomly

• same simplifying assumption as before Pi (Ci ) = a same for alllanguages

CS101 Win2015: Linguistics Language Evolution 4

Page 12: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Algorithm

1 Count cues

ki = number of cues in Ci out of m inputskn+1 = number of non-cues (in any of the languages)m = k1 + · · ·+ kn + kn+1

2 Find maximal languages: languages Li with ki = maxj kj :I = set of indices of Li maximal

3 Choose language: if |I| = 1 choose that language; if |I| > 1choose one language randomly in the set I with probability1/|I|

4 of naive version: just choose a language randomly among all nwith probability 1/n

CS101 Win2015: Linguistics Language Evolution 4

Page 13: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Population Dynamics in this model

• P =∑

i xi (t)Pi probability with which input is generated

• pi = pi (t) = axi (t) probability of receiving a cue from languageLi ; pn+1 = 1− a

• probability of receiving (strictly) more cues from language L1

than from any other

F1,m,a(x1, . . . , xn) =∑(

m

k1 · · · kn+1

)pk1

1 · · · pknn p

kn+1

n+1

sum over all (k1, . . . , kn+1) with m = k1 + · · ·+ kn+1 and k1 > kjfor all j 6= 1

• similar for other languages with symmetry

Fi ,m,a(· · · , xi , · · · , xj · · · ) = Fj ,m,a(· · · , xj , · · · , xi · · · )

CS101 Win2015: Linguistics Language Evolution 4

Page 14: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• in this model, probability that learner will choose Li after minput data

fi ,m,a(x1, . . . , xn) = Fi ,m,a(x1, . . . , xn)+(1−n∑

j=1

Fj ,m,a(x1, . . . , xn))1

n

(with naive version of choice in the cue-less case)

• Recursion relation for population distribution in next generation

xi (t + 1) = fi ,m,a(x1(t), . . . , xn(t))

CS101 Win2015: Linguistics Language Evolution 4

Page 15: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Fixed Points

• f = (fi ,m,a)ni=1 continuous map f : ∆n−1 → ∆n−1

• Results

1 f has finite number of fixed points: at most m2n

2 for small m only fixed point is ( 1n , . . . ,

1n ), stable

3 for fixed (sufficiently large) m number of fixed points varieswith a: small a only one fixed point (uniform distribution); asa increases bifurcation: other fixed points arise

4 large values of a ∼ 1: uniform distribution no longer stable,only the fixed points with one dominant language are

CS101 Win2015: Linguistics Language Evolution 4

Page 16: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Language Learning and Statistical Physics

• these bifurcations and emergence of linguistic coherencereminiscent of behavior of Ising model and spin glass systems inStatistical Physics

• an ensemble of interacting components

• degree of interaction governed by a thermodynamic parameterβ ∼ 1/T inverse temperature

• these systems often exhibit phase transitions between differentregimes, at some critical temperature T = Tc (different states ofmatter, loss of magnetization, etc.)

CS101 Win2015: Linguistics Language Evolution 4

Page 17: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Language Evolution in Locally Connected Societies

• two possible languages: {L0,L1} = {0, 1}

• Graph G representing linguistic agents and their interaction

each vertex v ∈ V (G ) has an associated random variableXv (t)

Xv (t) ∈ {0, 1}: language of agent occupying position v

Xv (t + 1) language occupying same position at next step(generation)

P(Xv (t + 1) = 1) = ga,m(µv (t))

µv =1

val(v)

∑e∈E(G):∂(e)={v ,v ′}

Xv ′(t)

• nearest neighbor interaction considered only

CS101 Win2015: Linguistics Language Evolution 4

Page 18: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• as before assuming a = Pi (Ci ) same for both languages

• a possible choice for the function ga,m : [0, 1]→ [0, 1]:

g(x) =1

2+

1

2(f1,a,m(x , 1− x)− f1,a,m(1− x , x))

with f1,a,m as before counting probability of set of cues k1 > k2

f1,a,m(x , 1− x) =∑(

m

k1k2k3

)(ax)k1(a(1− x))k2(1− a)k3

sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2

CS101 Win2015: Linguistics Language Evolution 4

Page 19: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• study evolution of

αG (t) =1

#V (G )

∑v∈V (G)

Xv (t)

average number of L1-speakers at time/generation t

• for a complete graph have all language users connected to allothers: recover model in which learning from whole community

• can consider asymptotic behaviors when size of graph becomeslarge #V (G ) = N →∞

• can simplify the geometry making special assumptions on thegraph: e.g. a square lattice

CS101 Win2015: Linguistics Language Evolution 4

Page 20: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

The Ising Model of spin systems on a graph G

• configurations of spins s : V (G )→ {±1}

• magnetic field B and correlation strength J: Hamiltonian

H(s) = −J∑

e∈E(G):∂(e)={v ,v ′}

sv sv ′ − B∑

v∈V (G)

sv

• first term measures degree of alignment of nearby spins

• second term measures alignment of spins with direction ofmagnetic field

CS101 Win2015: Linguistics Language Evolution 4

Page 21: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Equilibrium Probability Distribution

• Partition Function ZG (β)

ZG (β) =∑

s:V (G)→{±1}

exp(−βH(s))

• Probability distribution on the configuration space: Gibbsmeasure

PG ,β(s) =e−βH(s)

ZG (β)

• low energy states weight most

• at low temperature (large β): ground state dominates; at highertemperature (β small) higher energy states also contribute

CS101 Win2015: Linguistics Language Evolution 4

Page 22: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Average Spin Magnetization

MG (β) =1

#V (G )

∑s:V (G)→{±1}

∑v∈V (G)

sv P(s)

• Free energy FG (β,B) = logZG (β,B)

MG (β) =1

#V (G )

1

β

(∂FG (β,B)

∂B

)|B=0

• thermodynamic limit: #V (G ) = N →∞

m(β) = lim#V (G)→∞

MG (β)

• in these thermodynamic limits need to fix a way in which thegeometry of the graph grows: if it is a lattice, just grow size N oflattice; for other kinds of graphs, fix how smaller graphs embeddedin larger graphs

CS101 Win2015: Linguistics Language Evolution 4

Page 23: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Ising Model on a 2-dimensional lattice

• ∃ critical temperature T = Tc where phase transition occurs

• for T > Tc equilibrium state has m(T ) = 0 (computed withrespect to the equilibrium Gibbs measure PG ,β

• demagnetization: on average as many up as down spins

• for T < Tc have m(T ) > 0: spontaneous magnetization

• Warning: beware of thermodynamic limits!

• a lot of technical problems in these spin glass models go intohow one takes these limits where the size N of the graph N →∞(even for simple geometries like lattice case)

CS101 Win2015: Linguistics Language Evolution 4

Page 24: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

A Spin Glass model of Language Learning

• a multilingual society = a graph G

• linguistic agents = vertices of the graph V (G )

• which agents interact with which others = edges E (G )

• possible languages = spin states- Ising models {±1}: two languages model- Potts models {1, . . . , q}: many languages model

• distribution of population across different languages = averagemagnetization

• previous analysis for “input from whole society” = mean fieldtheory for case of Ising model on complete graph

CS101 Win2015: Linguistics Language Evolution 4

Page 25: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

Syntactic Parameters and Ising/Potts Models

• a different view on how to use spin glass models for languageevolution

• characterize set of n = 2N languages Li by binary strings of Nsyntactic parameters (Ising model)

• or by ternary strings (Potts model) if take values ±1 forparameters that are set and 0 for parameters that are not definedin a certain language

• a system of n interacting languages = graph G with n = #V (G )

• languages Li = vertices of the graph (though of as, for instance,the language that occupies a certain geographic area)

• languages that have interaction with each other = edges E (G )(geographical proximity, or high volume of exchange for otherreasons)

CS101 Win2015: Linguistics Language Evolution 4

Page 26: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

graph of language interaction (detail) from Global LanguageNetwork of MIT Medialab, with interaction strengths Je on edgesbased on number of book translations

CS101 Win2015: Linguistics Language Evolution 4

Page 27: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• if only one syntactic parameter, would have an Ising model onthe graph G : configurations s : V (G )→ {±1} set the parameterat all the locations on the graph

• variable interaction energies along edges (some pairs oflanguages interact more than others) • magnetic field B andcorrelation strength J: Hamiltonian

H(s) = −∑

e∈E(G):∂(e)={v ,v ′}

N∑i=1

Je sv ,i sv ′,i

• if N parameters, configurations

s = (s1, . . . , sN) : V (G )→ {±1}N

• if all N parameters are independent, then it would be like havingN non-interacting copies of a Ising model on the same graph G (orN independent choices of an initial state in an Ising model on G )

CS101 Win2015: Linguistics Language Evolution 4

Page 28: Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics Model following previous model: languages 1;:::; n and linguistic evolution in population

• an interesting problem in this model is the entailment ofparameters: it is known that flipping certain syntactic parameterscauses others to flip as well

• so in addition to the edge interaction, instead of alignment withexternal magnetic field term in H

−B∑

v∈V (G)

sv

should have a term that favors alignment of entailed parameters

• set of parameters P with N = #P; subset E ⊂ P × P ofentailments: pairs of parameters (Π,Π′) such that flipplng Πcauses Π′ to flip as well

• term in Hamiltonian favoring alignments of entailed parameters

−B∑

v∈V (G)

∑(Π,Π′)∈E

sv ,Π sv ,Π′

CS101 Win2015: Linguistics Language Evolution 4