Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... ·...

20
Word Order Universals and Information Density Michael Cysouw & Jelena Prokić

Transcript of Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... ·...

Page 1: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Word Order Universals and Information Density

Michael Cysouw & Jelena Prokić

Page 2: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Information ...

• We will be using notions from information theory as a statistical method

• Not to measure the information density of linguistic utterances itself

• But as a method to test typological correlations

Page 3: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

LETTERdoi:10.1038/nature09923

Evolved structure of language shows lineage-specifictrends in word-order universalsMichael Dunn1,2, Simon J. Greenhill3,4, Stephen C. Levinson1,2 & Russell D. Gray3

Languages vary widely but not without limit. The central goal oflinguistics is to describe the diversity of human languages andexplain the constraints on that diversity. Generative linguists fol-lowing Chomsky have claimed that linguistic diversity must beconstrained by innate parameters that are set as a child learns alanguage1,2. In contrast, other linguists following Greenberg haveclaimed that there are statistical tendencies for co-occurrence oftraits reflecting universal systems biases3–5, rather than absoluteconstraints or parametric variation. Here we use computationalphylogenetic methods to address the nature of constraints onlinguistic diversity in an evolutionary framework6. First, contraryto the generative account of parameter setting, we show that theevolution of only a few word-order features of languages arestrongly correlated. Second, contrary to the Greenbergian general-izations, we show that most observed functional dependenciesbetween traits are lineage-specific rather than universal tendencies.These findings support the view that—at least with respect to wordorder—cultural evolution is the primary factor that determineslinguistic structure, with the current state of a linguistic systemshaping and constraining future states.

Human language is unique amongst animal communication sys-tems not only for its structural complexity but also for its diversity atevery level of structure and meaning. There are about 7,000 extantlanguages, some with just a dozen contrastive sounds, others with morethan 100, some with complex patterns of word formation, others withsimple words only, some with the verb at the beginning of the sentence,some in the middle, and some at the end. Understanding this diversityand the systematic constraints on it is the central goal of linguistics. Thegenerative approach to linguistic variation has held that linguisticdiversity can be explained by changes in parameter settings. Each ofthese parameters controls a number of specific linguistic traits. Forexample, the setting ‘heads first’ will cause a language both to placeverbs before objects (‘kick the ball’), and prepositions before nouns(‘into the goal’)1,7. According to this account, language change occurswhen child learners simplify or regularize by choosing parameter set-tings other than those of the parental generation. Across a few genera-tions such changes might work through a population, effectinglanguage change across all the associated traits. Language changeshould therefore be relatively fast, and the traits set by one parametermust co-vary8.

In contrast, the statistical approach adopted by Greenbergian linguistssamples languages to find empirically co-occurring traits. These co-occurring traits are expected to be statistical tendencies attributable touniversal cognitive or systems biases. Among the most robust of thesetendencies are the so-called ‘‘word-order universals’’3 linking the orderof elements in a clause. Dryer has tested these generalizations on aworldwide sample of 625 languages and finds evidence for some of theseexpected linkages between word orders9. According to Dryer’s reformu-lation of the word-order universals, dominant verb–object orderingcorrelates with prepositions, as well as relative clauses and genitives

after the noun, whereas dominant object–verb ordering predicts post-positions, relative clauses and genitives before the noun4. One generalexplanation for these observations is that languages tend to be consist-ent (‘harmonic’) in their order of the most important element or ‘head’of a phrase relative to its ‘complement’ or ‘modifier’3, and so if the verbis first before its object, the adposition (here preposition) precedes thenoun, while if the verb is last after its object, the adposition follows thenoun (a ‘postposition’). Other functionally motivated explanationsemphasize consistent direction of branching within the syntactic struc-ture of a sentence9 or information structure and processing efficiency5.

To demonstrate that these correlations reflect underlying cognitiveor systems biases, the languages must be sampled in a way that controlsfor features linked only by direct inheritance from a commonancestor10. However, efforts to obtain a statistically independent sampleof languages confront several practical problems. First, our knowledgeof language relationships is incomplete: specialists disagree about high-level groupings of languages and many languages are only tentativelyassigned to language families. Second, a few large language familiescontain the bulk of global linguistic variation, making sampling purelyfrom unrelated languages impractical. Some balance of related, unre-lated and areally distributed languages has usually been aimed for inpractice11,12.

The approach we adopt here controls for shared inheritance byexamining correlation in the evolution of traits within well-establishedfamily trees13. Drawing on the powerful methods developed in evolu-tionary biology, we can then track correlated changes during the his-torical processes of language evolution as languages split and diversify.Large language families, a problem for the sampling method describedabove, now become an essential resource, because they permit theidentification of coupling between character state changes over long timeperiods. We selected four large language families for which quantitativephylogenies are available: Austronesian (with about 1,268 languages14

and a time depth of about 5,200 years15), Indo-European (about 449languages14, time depth of about 8,700 years16), Bantu (about 668 or522 for Narrow Bantu17, time depth about 4,000 years18) and Uto-Aztecan (about 61 languages19, time-depth about 5,000 years20).Between them these language families encompass well over a third ofthe world’s approximately 7,000 languages. We focused our analyses onthe ‘word-order universals’ because these are the most frequently citedexemplary candidates for strongly correlated linguistic features, withplausible motivations for interdependencies rooted in prominent formaland functional theories of grammar.

To test the extent of functional dependencies between word-ordervariables, we used a Bayesian phylogenetic method implemented in thesoftware BayesTraits21. For eight word-order features we comparedcorrelated and uncorrelated evolutionary models. Thus, for each pairof features, we calculated the likelihood that the observed states of thecharacters were the result of the two features evolving independently,and compared this to the likelihood that the observed states were theresult of coupled evolutionary change. This likelihood calculation was

1Max Planck Institute for Psycholinguistics, Post Office Box 310, 6500 AH Nijmegen, The Netherlands. 2Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Kapittelweg 29,6525 EN Nijmegen, The Netherlands. 3Department of Psychology, University of Auckland, Auckland 1142, New Zealand. 4Computational Evolution Group, University of Auckland, Auckland 1142, NewZealand.

5 M A Y 2 0 1 1 | V O L 4 7 3 | N A T U R E | 7 9

Macmillan Publishers Limited. All rights reserved©2011

Page 4: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical
Page 5: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

WALS Feature

82 Order of Subject and Verb83 Order of Object and Verb84 Order of Object, Oblique, and Verb85 Order of Adposition and Noun Phrase86 Order of Genitive and Noun87 Order of Adjective and Noun88 Order of Demonstrative and Noun89 Order of Numeral and Noun90 Order of Relative Clause and Noun91 Order of Degree Word and Adjective92 Position of Polar Question Particles93 Position of Interrogative Phrases in Content Questions94 Order of Adverbial Subordinator and Clause

All data from Matthew Dryer

Page 6: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical
Page 7: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Autocorrelation(Galton’s Problem)

“The difficulty raised by Mr. Galton that some of the concurrences might result from transmission from a common source, so that a single character might be counted several times from its mere duplication, is a

difficulty ever present in such investigations [...]. The only way of meeting this objection is to make separate

classifications depend on well marked differences, and to do this all over the world” (Taylor 1889: 272).

Page 8: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Dunn et al. (2011)

• Different solution to Galton’s problem‣ based on work by Mark Pagel (see also Elena Maslova)

• Use detailed structure of genealogical tree to investigate changes in types

• Correlated characteristics should co-evolve, i.e change together

Page 9: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

TsouRukai

Siraya

Sediq

CiuliAtayalSquliqAtayal

Sunda

Minangkabau

IndonesianMalayBahasaIndonesia

Iban

TobaBatakLampung

TimugonMurutMerinaMalagasy

Palauan

MunaKatobuTongkuno DWunaPopalia

Wolio

Paulohi

MurnatenAluneAlune

KasiraIrahutuLetineseBuruNamroleBay

LoniuLou

ManamKairiru

DobuanBwaidogaSaliba

KilivilaGapapaiwa

MekeoMotu

AriaMoukSengseng

KoveMaleu

Lakalai

NakanaiBileki DMaututu

NggelaKwaio

Vitu

TigakTungagTungakLavongaiNalik

KuanuaSiar

Kokota

BabatanaSisingga

BanoniTeop

NehanHapeNissan

KiribatiKusaie

Woleai

WoleaianPuluwatesePuloAnnan

PonapeanMokilese

Rotuman

TokelauRapanuiEasterIsland

TahitianModern

HawaiianMaori

FutunaFutunaWest

FutunaEastNiueSamoanTongan

FijianBau

DehuIaai

SyeErromangan

LenakelAnejomAneityum

NgunaPaameseSouth

SakaoPortOlry

BumaTeanu

Giman

AmbaiYapenMor

NgadhaManggarai

Lamaholot

ESLewa D

ESKamberaSouthern DKambera

ESUmbuRatuNggai D

NiasChamorro

Bajo

BikolNagaCity

CebuanoMamanwa

TboliTagabiliWBukidnonManobo

AgtaIlokano

IfugaoBatadBontokGuinaang

Pangasinan

Kapampangan

PaiwanO

OO

O

OO

O

O

OO

O

OO

OO

O

OOO

O

O

OO

OOO

OO

OO

OOO

OO

OO

OOO

OO

O

OO

OO

O

OOO

OO

O

OO

OO

OO

OO

O

OOO

OO

O

OO

O

OO

OO

OOOO

O

OO

O

OO

OO

O

OO

O

OO

OO

O

O

OO

O

OO

O

O

OO

OO

OO

OO

O

O

O

Austronesian

Tocharian BTocharian A

Albanian G

Greek ModAncientGreek

Armenian Mod

FrenchProvencalWalloonLadin

Italian

Catalan

SpanishPortuguese ST

Sardinian CRumanian List

Latin

Old Norse

DanishSwedish List

Riksmal

FaroeseIcelandic ST

German STDutch List

FrisianFlemishAfrikaans

Pennsylvania DutchLuxembourgish

English STOld English

Gothic

Old Church Slavonic

SlovenianSerbocroatian

MacedonianBulgarian

Lusatian ULusatian LSlovak

CzechUkrainianByelorussian

RussianPolish

LatvianLithuanian ST

Nepali ListBihariBengali

AssameseOriya

GujaratiMarathi

MarwariSindhi

LahdnaPanjabi ST

UrduHindi

SinghaleseKashmiri

Kurdish

WaziriAfghan

TadzikPersian List

WakhiOssetic

Baluchi

Scots GaelicIrish B

Welsh N

Breton STCornish

Hittite

OO

O

OO

O

OOOO

O

O

OO

OO

O

O

OO

O

OO

OO

OOO

OOO

O

O

O

OO

OO

OOOO

OO

OO

OO

OOO

OO

OO

OO

OO

OO

OO

O

OO

OO

OO

O

OO

O

OO

O

Indo-European

PipilAztecTetelcingo

Cora

OpataEudeve

GuarijioTarahumara

Yaqui

NorthernTepehuan

PapagoPimaSoutheasternTepehuan

CahuillaLuiseno

MonoNorthernPaiute

Comanche

SouthernPaiuteSouthernUte

Chemehuevi

Hopi

OO

O

OOO

OO

O

OO

OO

OO

O

OO

O

OUto-Aztecan

Makwa P311

Cewa N31bNyanja N31a

Shona S10Ngoni S45

Xhosa S41Zulu S42

Hadimu G43cKaguru G12

Gikuyu E51Haya J22H

Hima J13Rwanda J611

Luba L33Ndembu K221

Lwena K141Ndonga R22

Bira D322Doko C40D

Mongo C613MongoNkundo C611

Tiv 802

O

OO

OO

OO

OO

OOO

OO

OO

OO

OOO

OBantu

Figure 1 | Two word-order features plotted onto maximum clade credibilitytrees of the four language families. Squares represent order of adposition andnoun; circles represent order of verb and object. The tree sample underlyingthis tree is generated from lexical data16,22. Blue-blue indicates postposition,

object–verb. Red-red indicates preposition, verb–object. Red-blue indicatespreposition, object–verb. Blue-red indicates postposition, verb–object. Blackindicates polymorphic states.

RESEARCH LETTER

8 0 | N A T U R E | V O L 4 7 3 | 5 M A Y 2 0 1 1

Macmillan Publishers Limited. All rights reserved©2011

Page 10: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Right in principle, but:(see reactions in a special issue of Linguistic Typology)

• Their interpretation of results is too radical‣ “most observed functional dependencies between traits

are lineage-specific rather than universal tendencies” (p.79)

• It is difficult to obtain the necessary data for many families‣ 4 families is not enough to find weaker typological patterns

• Computationally their method is very demanding

Page 11: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Any alternative?

• Conditional Mutual Information ‣ Information (or entropy) of a typological feature

measures ‘fractionality’‣ Mutual Information is measure of shared distribution‣ Conditional Mutual Information accounts for

conditioning factors

Page 12: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

-6 -5 -4 -3 -2 -1

-6-5

-4-3

-2-1

Cramer's V squared (log)

Mut

ual I

nfor

mat

ion

(log)

82-81

83-8184-81

85-81

86-81

87-81

88-81

89-81

90-81

91-8192-81

93-81

94-81

143-81

83-8284-82

85-8286-82

87-82

88-82

89-82

90-82

91-82

92-82

93-82

94-82143-82

84-83

85-83

86-83

87-83

88-83

89-83

90-83

91-8392-8393-83

94-83

143-83

85-84

86-84

87-8488-84

89-84

90-84

91-84

92-84

93-84

94-84

143-84

86-85

87-85

88-85

89-85

90-85

91-8592-85

93-85

94-85

143-85

87-86

88-86

89-86

90-86

91-8692-86

93-86

94-86

143-86

88-8789-87

90-87

91-87

92-87

93-87

94-87

143-87

89-88

90-88

91-88

92-88

93-88

94-88

143-88

90-89

91-89

92-8993-89

94-89

143-89

91-90

92-9093-90

94-90

143-90

92-91

93-9194-91

143-91

93-92

94-92143-92

94-93143-93

143-94

Page 13: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

-6 -5 -4 -3 -2 -1

-4.0

-3.5

-3.0

-2.5

-2.0

-1.5

Mutual Information between WALS features

Mutual Information (log)

Mut

ual I

nfor

mat

ion,

con

ditio

ned

by G

ener

a (lo

g)

82-81 83-81

84-81

85-81

86-8187-81

88-81

89-8190-81

91-81

92-81

93-81

94-81

143-81

83-82

84-82

85-8286-82

87-8288-82

89-82

90-82

91-82

92-82

93-82

94-82

143-82

84-83

85-83

86-83

87-83 88-83

89-8390-8391-83

92-83

93-83

94-83

143-83

85-84

86-84

87-84

88-84

89-84

90-84

91-84

92-84

93-84

94-84

143-8486-85

87-85

88-85

89-85

90-8591-85

92-85

93-85

94-85

143-85

87-86

88-8689-86

90-86

91-86

92-86

93-86

94-86

143-86

88-87

89-87

90-87

91-87

92-87

93-87

94-87

143-8789-88

90-88

91-88

92-88

93-88

94-88

143-88

90-89

91-89

92-89

93-89

94-89

143-89

91-90

92-90

93-90

94-90

143-90

92-91

93-91

94-91

143-9193-92

94-92143-92

94-93

143-93

143-94

Page 14: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Comparison to Dunn et al.

• First, using only the data from Dunn et al.‣ compare with Mutual Information: approximate match‣ compare with Conditional Mutual Information (CMI)

conditioned by families: good match

• Second, using all WALS data, CMI by family

Page 15: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

0 2 4 6 8 10

-5-4

-3-2

-1

Average Bayes factor from Dunn et al.

Mut

ual I

nfor

mat

ion

(Dat

a fro

m D

unn

et a

l., lo

g-sc

ale)

87-85

87-88

87-86

87-89

87-83

87-9087-82

85-88

85-86

85-89

85-83

85-90

85-82

88-86

88-89 88-83

88-90

88-82

86-89

86-83

86-90

86-82

89-85

89-90

89-82

85-90

85-82

90-82

Page 16: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

0 2 4 6 8 10

-5-4

-3-2

Average Bayes factor from Dunn et al.

Con

ditio

nal M

utua

l Inf

orm

atio

n (D

ata

from

Dun

n et

al.,

log-

scal

e)

87-85

87-88

87-86

87-89

87-83

87-90

87-82

85-88

85-86

85-89

85-83

85-90

85-82

88-86

88-89

88-83

88-90

88-82

86-89

86-83

86-90

86-82

89-85

89-90

89-82

85-90

85-82

90-82

Page 17: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

0 2 4 6 8 10

-3.5

-3.0

-2.5

-2.0

Average Bayes factor from Dunn et al.

Con

ditio

nal M

utua

l Inf

orm

atio

n (A

ll W

ALS

Fam

ilies

, log

-sca

le)

87-85

87-88

87-86

87-89

87-83

87-90

87-82

85-88

85-86

85-89

85-83

85-90

85-82

88-86

88-89

88-83

88-90

88-82

86-89

86-83

86-90

86-82

89-85

89-90

89-82

85-90

85-82

90-82

Page 18: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Conclusion

• Mutual Information conditioned by linguistic families (CMI) is highly similar to the Dunn et al. measure

• CMI is easier to apply for many families

• Dunn et al. data shows influence from limited selection of families

Page 19: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

-0.3 -0.2 -0.1 0.0 0.1 0.2

-0.2

-0.1

0.0

0.1

0.2

0.3

Dimension 1

Dim

ensi

on 2

8283

84

85

86

87

88

8990

91

92

93

94

MDS of Conditional MI(using Family as condition)

Object-Verb

Adpositions

Genitive-Noun

Relative Clause-Noun

Adverbial Subordinator

Subject-Verb

Adjective-NounDemonstrative-Noun

Numeral-Noun

Content Interrogative

Polar Question Particles

Degree word and Adjective

Page 20: Word Order Universals and Information Densitycysouw.de/home/presentations_files/cysouw_prokic... · Information ... • We will be using notions from information theory as a statistical

Next steps

• Conditional Mutual Information uses a classification as condition(e.g. genera, families, areas, ...)

• Many classifications can be combined as multiple conditioning factors

• But: hierarchically ordered classifications are identical to the most detailed classification(e.g. in WALS: genera ⊂ families ⊂ areas ≡ genera)

• New work by Dress & Albu: Conditional Mutual Information, conditioned by a tree