Investigating the potential of ancestral statereconstruction algorithms in historical linguistics
Gerhard Jäger & Johann-Mattis List
Tübingen University & CRLAO / Team AIRE, Paris
Capturing Phylogenetic Algorithms for Linguistics, Leiden
October 28, 2015
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 1 / 42
Introduction
What is Ancestral State Reconstruction?
While tree-building methods seek to find branching diagrams whichexplain how a language family has evolved, ASR methods use thebranching diagrams in order to explain what has evolved concretely.Ancestral state reconstruction is very common in evolutionary biologybut only spuriously practiced in computational historical linguistics(Bouchard-Côté et al. 2013).In classical historical linguistics, on the other hand, linguisticreconstruction of proto-forms and proto-meanings is very common andone of the main goals of the classical comparative method (Fox 1995).
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 2 / 42
Introduction
ASR of Lexical Replacement Patterns
If we look for words corresponding to one meaning in a wordlist andknow which of the words are cognate or not, we may ask which of theword forms was the most likely candidate to be used in theproto-language of all descendant languages.This question resembles the task of “semantic reconstruction”, but incontrast to classical semantic reconstruction, we are only operatingwithin one concept slot here, disregarding all words with a differentmeaning which may also be cognate with the words in our sample.As a result of this restriction, it is quite likely that we cannot recoverthe original form from our data.It is, however, very interesting to see to which degree we can proposea good candidate word form (cognate set) for the proto-language.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 3 / 42
Introduction
ASR of Lexical Replacement Patterns
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42
Introduction
ASR of Lexical Replacement Patterns
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42
Introduction
ASR of Lexical Replacement Patterns
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
"head"?
?
?
?
?
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42
Introduction
ASR of Lexical Replacement Patterns
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
*kop"head"
testa"head"
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42
Introduction
ASR of Lexical Replacement Patterns
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
*kop"head"
*haubud-"head"
testa"head"
caput"head"
*kaput-"head"
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42
Introduction
This talk
reconstruction of cognate class at the root
AA B BC C
?
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42
Introduction
This talk
reconstruction of cognate class at the root
AA B BC C
B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42
Materials and Methods Materials
Data
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 6 / 42
Materials and Methods Materials
Data
IELex
153 Indo-European doculects
207 concepts
entries for Proto-Indo-Europeanfor 135 concepts → used asgold standard
arbitrarily split into training setand test set:
training set: 67 concepts,1127 cognate classes (83occur in PIE)test set: 68 concepts, 957cognate classes (79 fromPIE)
ABVD
743 Austronesian doculects →100 were selected at random
210 concepts; for 154 of thementries for Proto-Austronesian
split into training set and testset:
training set: 81 concepts,1695 cognate classes (88occur in PAn)test set: 74 concepts,1584 cognate classes (79occur in PAn)
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 7 / 42
Materials and Methods Methods
Prerequisites: Trees
Treestrees were inferred with fulldata set (training + testdata) via Bayesian inference
IELex outgroup: AnatolianABVD outgroup:Malayo-Polynesian
random samples of 1000trees from posteriordistributionsmaximum clade credibilitytrees
600.0
Kashmiri
Upper_Sorbian
Lahnda
Old_High_German
Sariqoli
Stavangersk
Pennsylvania_Dutch
Urdu
Old_Norse
Polish
Bulgarian
Old_Swedish
Portuguese_St
Greek_Mod
Hitt i te
Oriya
Panjabi_St
Ashkun
Romansh
Prasun
Luvian
Irish_A
Tocharian_A
Classical_Armenian
GaulishOld_Irish
Old_Gutnish
Gujarati
Swedish_Vl
Standard_German_Munich
Serbian
Norwegian
Latvian
Wakhi
Frisian
Greek_Md
Bulgarian_P
Khaskura
Czech_E
Polish_P
Kati
Sardinian_N
Digor_Ossetic
French
Danish
Standard_Albanian
Brazilian
Ladin
Ossetic
Manx
Albanian_K
Magahi
Marathi
Sardinian_L
Old_Prussian
Rumanian_List
Slovak_P
Albanian_Top
Albanian_T
Waziri
German
Greek_D
Byelorussian
Oscan
Hindi
Vlach
Vedic_Sanskrit
Shughni
Schwyzerduetsch
Breton_List
Old_Welsh
Macedonian
Slovenian
Albanian_C
Provencal
Serbocroatian
Breton_Se
Persian
Lithuanian_O
Baluchi
Ancient_Greek
Slovak
Catalan
Gaelic_Scots
Serbocroatian_P
Czech
Icelandic_St
Albanian_G
Gothic
Lithuanian_St
Dolomite_Ladino
Latin
Ukrainian
Marwari
Gypsy_Gk
Avestan
Swedish
Welsh_N
Macedonian_P
Greek_K
Tocharian_B
Oevdalian
Armenian_List
Old_Breton
Flemish
Old_English
Swedish_Up
Bihari
Welsh_C
Sindhi
Italian
Bhojpuri
Old_Persian
Byelorussian_P
Afrikaans
Friulian
Faroese
Gutnish_Lau
Tadzik
Sardinian_C
Old_Cornish
Palaic
Czech_P
Ukrainian_P
Irish_B
Dutch_List
Singhalese
Russian
Cornish
Lower_Sorbian
Assamese
Russian_P
Greek_Ml
Nepali
English
Kurdish
Breton_St
Sogdian
Letzebuergesch
Spanish
Danish_Fjolde
Pashto
Umbrian
Zazaki
Iron_Ossetic
Old_Church_Slavonic
Lycian
Walloon
Armenian_Mod
Slovenian_P
Albanian
Tsakonian
Bengali
0.06
FijianBau
Isamorong
KwaraaeSolomonIslands
Cebuano
LampungApiKalianda
Lampung
KomeringIlirPalauGemantungVillage
Tagalog
Ivasay
EastSumbaneseUmbuRatuNggaidialect
Carolinian
LampungApiKrui
Anakalang
LampungApiBelalau
LampungNyoMenggalaTulangBawang
Melayu
KakidugenIlongot
Komering
KomeringUluPerjayaVillage
Kerinci
TetunTerikFehandialect
Surigaonon
Woleai
LampungApiDaya
Mamboru
Tabar
Marquesan
EastSumbaneseLewadialect
Maori
Tongan
Tolo
CiuliAtayalBandai
Rarotongan
BlablangaGhove
LampungApiSungkai
GhariTandai
TahitianModern
LampungNyoAbungKotabumi
Tuamotu
Babuyan
Rurutuan
MalayBahasaIndonesia
Saa
Imorod
PaiwanKulalao
Niue
KomeringKayuAgungAsli
Blablanga
FutunaEast
TaliseMalagheti
Ogan
Indonesian
MaringeKmagha
Toambaita
Itbayat
LampungApiTalangPadang
KilokakaYsabel
Yami
ManoboAtaupriver
DayakNgaju
Masiwang
Luangiua
LampungApiJabung
Lau
KomeringUluAdumanisVillage
Tikopia
NakanaiBilekiDialect
Neveei
Sengga
Iraralay
ManoboAtadownriver
Itbayaten
LampungApiPubian
Pukapuka
Talise
SquliqAtayal
TannaSouthwest
LampungNyoAbungSukadana
KomeringUluDamarpuraVillage
Hawaiian
Katingan
LampungApiSukau
WesternBukidnonManobo
Chuukese
TagalogAnthonydelaPaz
LampungApiWayKanan
Samoan
EastSumbaneseKamberaSoutherndialect
Kokota
Lakalai
LampungApiKotaAgung
Penrhyn
BabatanaKatazi
Sikaiana
GhariNggeri
Kambera
Luqa
LampungApiRanau
Rennellese
Kubokota
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 8 / 42
Materials and Methods Methods
Phylogenetic uncertainty
proper way to deal with it:work with posterior samplerather than with a single treepoor man’s method:
remove all short branches(shorter than somethreshold)do ASR with resultingmultifurcating tree
PrasunAshkunKatiSogdianOsseticDigor_OsseticIron_OsseticPashtoWaziri
BaluchiKurdishZazakiTadzikPersian
WakhiShughniSariqoli
Old_PersianAvestan
Vedic_SanskritKashmiriNepaliKhaskura
BengaliAssameseOriya
BihariGujaratiMarathi
SindhiMarwari
HindiUrdu
LahndaPanjabi_St
BhojpuriMagahi
Gypsy_GkSinghalese
Old_PrussianLatvianLithuanian_OLithuanian_St
Old_Church_SlavonicSerbocroatianSerbianSerbocroatian_P
Bulgarian_PBulgarianMacedonianMacedonian_P
SlovenianSlovenian_P
RussianRussian_PUkrainian_P
Byelorussian_PByelorussian
PolishUkrainian
Polish_PUpper_SorbianLower_Sorbian
CzechSlovakCzech_ESlovak_PCzech_P
GothicGerman
Standard_German_MunichPennsylvania_Dutch
SchwyzerduetschLetzebuergesch
FrisianAfrikaans
FlemishDutch_List
Old_High_GermanOld_English
EnglishOld_Gutnish
StavangerskNorwegian
DanishDanish_Fjolde
Gutnish_LauOevdalianSwedish
Swedish_UpSwedish_Vl
Old_SwedishFaroese
Old_NorseIcelandic_St
Old_BretonOld_Cornish
Old_WelshWelsh_CWelsh_N
CornishBreton_St
Breton_SeBreton_List
GaulishOld_Irish
Irish_AIrish_B
Gaelic_ScotsManx
OscanUmbrian
VlachRumanian_List
Dolomite_LadinoRomanshLadinFriulianItalianWalloonFrenchProvencalCatalan
BrazilianPortuguese_StSpanish
Sardinian_LSardinian_CSardinian_N
LatinTocharian_ATocharian_B
Albanian_TStandard_Albanian
AlbanianAlbanian_G
Albanian_TopAlbanian_KAlbanian_C
Ancient_GreekGreek_Mod
Greek_MdGreek_MlGreek_D
TsakonianGreek_K
Classical_ArmenianArmenian_ModArmenian_List
LycianLuvianPalaic
Hittite
100.0
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 9 / 42
Materials and Methods Methods
Coding
Multi-state
AA B BC C
B
Binarized
AA
non-A
non-A non-A non-A non-A
B B
B
non-Bnon-B non-B non-B
Cnon-C Cnon-C non-C non-C
non-C
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 10 / 42
Materials and Methods Methods
Polymorphisms (a.k.a. synonyms)
Kopf"head"
kop"head"
head"head"
tête"head"
testa"head"
cap"head"
Haupt"head"
hoofd"head"
problem for multistatecodingpossible representations:
epistemic: bothobservations have 50%(subjective) probabilitylifted model: states in thetechnical sense are sets ofcognate classes
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 11 / 42
Materials and Methods Methods
Parsimony reconstruction
A C C
B
A B B
A
BB
C
Parsimony = 2
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42
Materials and Methods Methods
Parsimony reconstruction
A C CA B B
A
B
C
Parsimony = 3A
A
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42
Materials and Methods Methods
Parsimony reconstruction
A C CA B B
AC
Parsimony = 3
A
C
C
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42
Materials and Methods Methods
Weighted parsimony reconstruction
A C C
B
A B B
A
BB
C
WeightedParsimony = 3 Weight matrix
A B C
A 0 1 2B 1 0 2C 2 2 0
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42
Materials and Methods Methods
Weighted parsimony reconstruction
A C CA B B
A
B
C
A
A
WeightedParsimony = 4 Weight matrix
A B C
A 0 1 2B 1 0 2C 2 2 0
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42
Materials and Methods Methods
Weighted parsimony reconstruction
A C CA B B
AC
WeightedParsimony = 5
A
C
C
Weight matrix
A B C
A 0 1 2B 1 0 2C 2 2 0
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42
Materials and Methods Methods
Dynamic Programming (Sankoff Algorithm)
wp(mother, s) =∑
d∈daughtersmin
s′∈states(w(s, s′) + wp(d, s′))
A C CA B B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42
Materials and Methods Methods
Dynamic Programming (Sankoff Algorithm)
wp(mother, s) =∑
d∈daughtersmin
s′∈states(w(s, s′) + wp(d, s′))
A C CA B B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42
Materials and Methods Methods
Dynamic Programming (Sankoff Algorithm)
wp(mother, s) =∑
d∈daughtersmin
s′∈states(w(s, s′) + wp(d, s′))
A C CA B B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42
Materials and Methods Methods
Dynamic Programming (Sankoff Algorithm)
wp(mother, s) =∑
d∈daughtersmin
s′∈states(w(s, s′) + wp(d, s′))
A C CA B B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42
Materials and Methods Methods
Weighted Parsimony reconstruction
the state with the lowest parsimony score winsin case of ties, frequency at the leafs is tie-breakerbinary characters:
w(0 → 2) = 1;w(1 → 0) = 2
multi-state characters:all weights = 1polymorphism only admitted at tips:
w(a → {a, b}) = 0
w(a → {b, c}) = 1
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 15 / 42
Materials and Methods Methods
The MLN Method for ASR
The MLN method (List et al. 2014a) uses parsimony for ancestralstate reconstruction.In contrast to classical parsimony, MLN tests different weightingschemes for gains and losses and selects the optimal scheme with helpof the vocabulary size criterion.The vocabulary size criterion states that the amount of synonyms perword should be similar in the ancestral and the descendant languages.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 16 / 42
Materials and Methods Methods
The MLN Method for ASR
Too many synonyms in
ancestral nodes!
The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42
Materials and Methods Methods
The MLN Method for ASR
Too fewsynonyms in
ancestral nodes!
The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42
Materials and Methods Methods
The MLN Method for ASR
Optimal amount of synonyms in
ancestral nodes!
The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42
Materials and Methods Methods
Reconstruction on a posterior sample
if a sample of trees is used: A state is reconstructed if it isreconstructed in more than θ trees in the sample. θ is estimated usingthe training set.values:
database method θ
IELex Sankoff/binary 0.690Sankoff/multistate 0.056MLN 0.464
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 18 / 42
Materials and Methods Methods
Likelihood-based reconstruction
logL(tips below|mother = s) =∑d∈daughters
∑s′∈states logP (s → s′|branchlength)+
log(L(tips below d|d = s′))
A C CA B B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 19 / 42
Materials and Methods Methods
Likelihood-based reconstruction
note: likelihoods (unlike parsimony scores) depend on branch lengths!likelihoods at the root give likelihood of a reconstruction, given allobserved data (for that character)total likelihood is obtained by multiplying root state likelihoods withequilibrium probabilities given a rate matrixrate matrix is optimized to maximize likelihood
rates across characters are independently optimizedfor multistate characters, all rates are constrained to be equal(otherwise BayesTraits crashes…)
using equilibrium probabilities, you can derive exptected stateprobabilities for root statesa state is likelihood-reconstructed if its expected probability > θ2
again, threshold θ2 must be estimated from training set
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 20 / 42
Results General Results
Evaluation
0.0
0.2
0.4
0.6
0.8
precision recall F.score
database
ABVD
IELex
0.0
0.2
0.4
0.6
0.8
precision recall F.score
algorithm
ML
MLN
Sankoff
0.0
0.2
0.4
0.6
0.8
precision recall F.score
character type
binary valued
multi−valued
0.0
0.2
0.4
0.6
0.8
precision recall F.score
tree type
bifurcating
multifurcating
0.0
0.2
0.4
0.6
0.8
precision recall F.score
tree sample
posterior sample
summary tree
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 21 / 42
Results General Results
EvaluationIELex
algorithm characters furcating treeSample precision recall F-scoreML binary bifurcating summary tree 0.817 0.734 0.773ML binary bifurcating posterior sample 0.795 0.734 0.763ML binary multifurcating summary tree 0.792 0.722 0.755ML binary multifurcating posterior sample 0.756 0.747 0.752Sankoff binary multifurcating summary tree 0.716 0.734 0.725Sankoff binary bifurcating summary tree 0.704 0.722 0.712Sankoff binary multifurcating posterior sample 0.720 0.684 0.701Sankoff binary bifurcating posterior sample 0.72 0.684 0.701ML multi bifurcating posterior sample 0.642 0.772 0.701MLN multi bifurcating posterior sample 0.743 0.658 0.698MLN binary multifurcating posterior sample 0.743 0.658 0.698MLN binary bifurcating posterior sample 0.743 0.658 0.698Sankoff multi bifurcating summary tree 0.671 0.722 0.695Sankoff multi multifurcating posterior sample 0.671 0.722 0.695Sankoff multi bifurcating posterior sample 0.671 0.722 0.695ML multi multifurcating posterior sample 0.629 0.772 0.693MLN multi multifurcating posterior sample 0.758 0.633 0.690Sankoff multi multifurcating summary tree 0.735 0.633 0.680ML multi multifurcating summary tree 0.735 0.633 0.680ML multi bifurcating summary tree 0.721 0.620 0.667MLN multi multifurcating summary tree 0.584 0.658 0.619MLN binary multifurcating summary tree 0.584 0.658 0.619MLN multi bifurcating summary tree 0.742 0.291 0.418MLN binary bifurcating summary tree 0.742 0.291 0.418
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 22 / 42
Results General Results
EvaluationABVD
algorithm characters furcating treeSample precision recall F-scoreML multi bifurcating posterior sample 0.738 0.747 0.742ML binary bifurcating posterior sample 0.682 0.759 0.719ML multi bifurcating summary tree 0.740 0.684 0.711ML binary bifurcating summary tree 0.757 0.681 0.711Sankoff multi bifurcating summary tree 0.691 0.709 0.700Sankoff binary multifurcating posterior sample 0.781 0.633 0.699ML binary multifurcating posterior sample 0.761 0.646 0.699ML multi multifurcating summary tree 0.726 0.671 0.697Sankoff binary bifurcating posterior sample 0.726 0.671 0.697ML binary multifurcating summary tree 0.732 0.658 0.693Sankoff multi multifurcating summary tree 0.679 0.696 0.688MLN multi bifurcating summary tree 0.655 0.722 0.687MLN binary bifurcating summary tree 0.655 0.722 0.687Sankoff binary bifurcating summary tree 0.629 0.557 0.591Sankoff multi multifurcating posterior sample 0.542 0.570 0.556Sankoff multi bifurcating posterior sample 0.542 0.570 0.556MLN multi multifurcating posterior sample 0.414 0.848 0.556MLN multi bifurcating posterior sample 0.414 0.848 0.556MLN binary multifurcating posterior sample 0.414 0.848 0.556MLN binary bifurcating posterior sample 0.414 0.848 0.556ML multi multifurcating posterior sample 0.421 0.709 0.528Sankoff binary multifurcating summary tree 0.469 0.570 0.514MLN multi multifurcating summary tree 0.667 0.405 0.504MLN binary multifurcating summary tree 0.667 0.405 0.504
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 23 / 42
Results Specific Results
Summary on Indo-European ASR
Error Type GS ASR NumberMissing forms A Ø 7Different forms A B 9Additional forms in ASR A A, B 5Missing root in ASR A, B A 4Summary 25
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 24 / 42
Results Specific Results
Evaluating the DifferencesWe evaluate the differences qualitatively by checking
the reflection of the proposed root in the branches, especially withsemantically shifted word forms which may not occur in the wordlistdata, using standard sources like Meier-Brügger (2002), Wodtko et al.(2008), Rix et al. (2002), and Pokorny (1959) for Indo-European ingeneral, and specific sources like Vaan (2008) for Latin, Derksen(2008) and Vasmer (1986/1987) for Slavic, and Kroonen (2013) forGermanic.the likelihood of semantic shift of the given root with help of theDatabase of Cross-Linguistic Colexifications (CLICS, List et al. 2013and 2014b, http://clics.lingpy.org),whether the cognate sets in the data are really reflexes of theproposed PIE root.
Based on this check, we distinguish four grades of root quality:erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 25 / 42
Results Specific Results
Indo-European ASR: Missing formsConcept Form Meaning in
ReflexesComment
SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.
SEE *weid- to see or toknow
Safe root for Indo-European.
SING *kan- to sing or therooster
Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster”which is a highly unlikely semantic change
SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance
SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkijnor English small go back to this root
THINK *teng- to think or tofeel
Root only reflected in Germanic languages with spurious reflexes in seman-tically shifted form in other branches. A better candidate for PIE would be*men- “the mind or to think”.
WASH *leh₂w- to wash or topour
Wrong cognate assignment in the source since Romance and Albanian re-flexes are not annotated.
WASH *neigʷ- to wash or watermonster
Very unlikely cognate assignment, due to the extreme shift from “to wash”to “water monster” (cf. English nix) in the Germanic languages.
WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but itis not clear why this should have already happened in PIE times.
erroneous problematic possible goodJäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42
Results Specific Results
Indo-European ASR: Missing formsConcept Form Meaning in
ReflexesComment
SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.
SEE *weid- to see or toknow
Safe root for Indo-European.
SING *kan- to sing or therooster
Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster”which is a highly unlikely semantic change
SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance
SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkijnor English small go back to this root
THINK *teng- to think or tofeel
Root only reflected in Germanic languages with spurious reflexes in seman-tically shifted form in other branches. A better candidate for PIE would be*men- “the mind or to think”.
WASH *leh₂w- to wash or topour
Wrong cognate assignment in the source since Romance and Albanian re-flexes are not annotated.
WASH *neigʷ- to wash or watermonster
Very unlikely PIE root, due to the extreme shift from “to wash” to “watermonster” (cf. English nix) in the Germanic languages.
WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but itis not clear why this should have already happened in PIE times.
erroneous problematic possible goodJäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42
Results Specific Results
Indo-European ASR: Missing Forms in ASR
Concept Form in GS CommentNOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re-
constructed as such. Whether it was the normal negation in PIE is lessclear.
SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek.It is much more likely that it meant something else in PIE and then shiftedinto this meaning.
VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected intwo languages of Romance.
YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning“year” is difficult to reconstruct, due to the high potential for shift from“summer”, “winter”, “time”, etc. as shown in CLICS.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42
Results Specific Results
Indo-European ASR: Missing Forms in ASR
Concept Form in GS CommentNOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re-
constructed as such. Whether it was the normal negation in PIE is lessclear.
SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek.It is much more likely that it meant something else in PIE and then shiftedinto this meaning.
VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected intwo languages of Romance.
YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning“year” is difficult to reconstruct, due to the high potential for shift from“summer”, “winter”, “time”, etc. as shown in CLICS.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42
Results Specific Results
Indo-European ASR: Different Forms
Concept GS ASR CommentRIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely
according to CLICS, this meaning is an innovation in Germanic. The ASR form isreflected across multiple branches and a much better candidate.
RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR isreflected in the meaning “to rub, to bore”.
SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognateassignment. Following Derksen (2008), assuming the GSR form is a much bettercandidate for the PIE word for “scratch”.
SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as themeaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning“to cover”, but the cognate should not contain Slavic words (Derksen 2008).
WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but themeaning may also have been “to go”.
WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high numberof reflexes in all branches.
WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud”in Hittite. The ASR form is a much better candidate, with a much more plausibleconnection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.
WORM *wrm̥i- *kʷrm̥is The ASR form is reflected in more different branches of PIE, while the GS form is onlyreflected in Germanic and Romance.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42
Results Specific Results
Indo-European ASR: Different Forms
Concept GS ASR CommentRIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely
according to CLICS, this meaning is an innovation in Germanic. The ASR form isreflected across multiple branches and a much better candidate.
RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR isreflected in the meaning “to rub, to bore”.
SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognateassignment. Following Derksen (2008), assuming the GSR form is a much bettercandidate for the PIE word for “scratch”.
SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as themeaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning“to cover”, but the cognate should not contain Slavic words (Derksen 2008).
WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but themeaning may also have been “to go”.
WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high numberof reflexes in all branches.
WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud”in Hittite. The ASR form is a much better candidate, with a much more plausibleconnection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.
WORM *wrm̥i- *kʷrm̥is The ASR form is reflected in more different branches of PIE, while the GS form is onlyreflected in Germanic and Romance.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42
Results Specific Results
Indo-European ASR: Additional Forms
Concept Form in ASR CommentMOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said
to have independently turned to mean “moon” in Romance and Slavic andother branches. The shift from “shine” to “moon” is however not very likely(no evidence in CLICS), so it is also possible that the word meant already“moon” in PIE as an epithet (Vaan 2008).
SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning“frost, cold”, which is a very likely shift according to CLICS.
SUCK *suḱ- The root is present in this meaning in many subbranches and a good can-didate for PIE in this meaning.
THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexesin the daughter languages vary greatly, due to analogical levelling.
WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranianand Slavic.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42
Results Specific Results
Indo-European ASR: Additional Forms
Concept Form in ASR CommentMOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said
to have independently turned to mean “moon” in Romance and Slavic andother branches. The shift from “shine” to “moon” is however not very likely(no evidence in CLICS), so it is also possible that the word meant already“moon” in PIE as an epithet (Vaan 2008).
SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning“frost, cold”, which is a very likely shift according to CLICS.
SUCK *suḱ- The root is present in this meaning in many subbranches and a good can-didate for PIE in this meaning.
THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexesin the daughter languages vary greatly, due to analogical levelling.
WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranianand Slavic.
erroneous problematic possible good
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42
Results Specific Results
Evaluation against our manually created goldstandard
precision: 0.986 (1 false positive)recall: 0.895 (8 false negatives)F-score: 0.9381
1The IELex PIE entries have an F-score of 0.854.Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 30 / 42
Results Specific Results
False positive
SogdianOsseticDigor OsseticIron OsseticWakhiShughniSariqoli
BaluchiZazakiTadzikPersianPashtoWaziri
Avestan
Vedic SanskritKashmiri
MarathiNepaliKhaskuraGypsy GkSinghalese
Old PrussianLatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
Old Irish
Irish AIrish BGaelic Scots
Vlach
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian C
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English
English
Old NorseIcelandic StFaroese
Old Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdGreek ModGreek K
Classical ArmenianArmenian ModArmenian List
●●●●●●●
●●●●●●
●
●●
●●●●●
●●●●
●●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●
●●●●●
●
●●●
●
●
●●●●
●●●
●
●●●
●●
●
●
●●●●
●●
●●●
●●
●
●●●
●
●●
●●
●●●●●
●
●●●●●●
●
●●●●●
●●●
snow:D
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 31 / 42
Results Specific Results
False negatives
Kati
SogdianOsseticDigor OsseticIron Ossetic
ZazakiTadzikPersianPashto
Old PersianAvestan
Vedic Sanskrit
HindiPanjabi StSindhiMarwariGujaratiMarathiAssameseOriyaBengaliNepaliKhaskuraSinghalese
Old PrussianLatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
GaulishOld Irish
Irish AIrish BGaelic Scots
VlachRumanian List
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
FlemishFrisianGermanStandard German MunichSchwyzerduetschLetzebuergesch
Old High GermanOld English
Old NorseIcelandic StFaroese
Old Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdGreek ModGreek K
Classical ArmenianArmenian ModArmenian List
LuvianHittite
●
●●●●
●●●●
●●
●
●●●●●●●●●●●●
●●●●
●●●●●●●
●●●●
●●●●
●●●●
●●●
●
●
●●●●●
●●
●●●
●●
●
●●●●
●●●
●
●●●
●●●
●
●
●●●●●●
●●
●●●
●
●●
●●
●●●●●
●
●●●●●●
●
●●●●●
●●●
●●
river:O
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 32 / 42
Results Specific Results
False negatives
Digor OsseticIron OsseticShughni
BaluchiZazakiTadzikPersianPashto
Vedic Sanskrit
Hindi
LahndaPanjabi StUrduSindhiGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskuraGypsy Gk
Old PrussianLatvianLithuanian St
BulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian
Slovenian PRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
Old IrishIrish AGaelic Scots
Rumanian List
Dolomite LadinoRomanshItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian CLatin
AfrikaansFlemishDutch ListFrisian
GermanStandard German MunichLetzebuergesch
Old High GermanOld English
Old NorseIcelandic StFaroese
Old Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Tocharian ATocharian B
Albanian TAlbanianAlbanian TopAlbanian K
Ancient Greek
Greek MlGreek DGreek MdGreek ModGreek K
Classical ArmenianArmenian ModArmenian List
●●●
●●●●●
●
●
●●●●●●●●●●●●●
●●●
●●●●●●●
●●●
●●●●
●●●●●
●●●
●
●
●●●●●
●●●
●
●●●
●●●
●
●●●
●●
●●●●
●●●
●●
●●●
●
●●
●●
●●●●●
●●
●●●●
●
●●●●●
●●●
smell:W
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 33 / 42
Results Specific Results
False negatives
KatiSogdianOsseticDigor OsseticIron OsseticWakhiShughni
BaluchiTadzikPersianPashtoWaziri
Avestan
Vedic Sanskrit Kashmiri
HindiSindhiMarwariGujaratiMarathiAssameseOriyaBengaliBihariGypsy GkSinghalese
LatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
Old Irish
Irish AIrish BGaelic Scots
VlachRumanian List
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English English
Old Norse Icelandic StFaroeseOld Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdGreek ModGreek K
Classical ArmenianArmenian ModArmenian List
●●●●●●●
●●●●●
●
●●
●●●●●●●●●●●
●●●
●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●
●●●●●
●
●●●
●●
●
●●●●
●●●
●
●●●
●●●
●
●
●●●●
●●
●●●
●●●
● ●●
●
●●
●●
●●●●●
●
●●●●●●
●
●●●●●
●●●
wet:I
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 34 / 42
Results Specific Results
False negatives
PrasunAshkunKati
SogdianOsseticDigor OsseticIron OsseticWakhi
BaluchiKurdishTadzikPersianPashtoWaziri
Avestan
Vedic Sanskrit Kashmiri
HindiLahndaUrduMarwariGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskura
LatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Old BretonOld CornishOld Welsh
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
Old Irish
Irish AIrish BGaelic ScotsManx
Rumanian List
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English
Old Norse Icelandic StFaroeseOld Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Tocharian ATocharian B
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdTsakonianGreek ModGreek K
Classical Armenian Armenian List
●●●
●●●●●
●●●●●●
●
● ●
●●●●●●●●●●●●
●●●
●●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●●●
●
●●●●●
●
●●●●
●
●
●●●●
●●●
●
●●●
●●●
●
●
●●●●
●●
●●●
●●
● ●●●
●●
●●
●●●●●
●●
●
●●●●●●
●
●●●●●●
● ●
skin:B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 35 / 42
Results Specific Results
False negatives
KatiSogdianOsseticDigor OsseticIron OsseticWakhiShughniSariqoli
BaluchiKurdishZazakiTadzikPersianPashtoWaziri
Avestan
Vedic Sanskrit Kashmiri
Hindi
LahndaPanjabi StUrduBhojpuriSindhiMarwariGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskuraSinghalese
Old Prussian LatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbocroatian PSlovenian
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
Old Irish
Irish AIrish BGaelic ScotsManx
VlachRumanian List
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English English
Old Gutnish
Old Norse Icelandic StFaroeseOld Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Tocharian ATocharian B
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdTsakonianGreek ModGreek KArmenian ModArmenian List
Hittite
●●●●●●●●
●●●●●●●
●
● ●
●
●●●●●●●●●●●●●●●
● ●●●
●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●
●●●●●
●
●●●●
●●
●
●●●●
●●●
●
●●●
●●●
●
●
●●●●
●●
●●●
●● ●
●
● ●●●
●●
●●
●●●●●
●●
●
●●●●●●
●
●●●●●●●●
●
sleep:E
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 36 / 42
Results Specific Results
False negatives
PrasunAshkunKati
SogdianOsseticDigor OsseticIron OsseticSariqoli
BaluchiKurdishZazakiTadzikPersianPashtoWaziri
Avestan
Vedic Sanskrit Kashmiri
HindiLahndaPanjabi StMarwariGujaratiMarathiOriyaBihariNepaliKhaskuraGypsy GkSinghalese
LatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Old BretonOld CornishOld Welsh
Cornish
Breton SeBreton ListBreton StWelsh CWelsh N
GaulishOld Irish
Irish AIrish BGaelic ScotsManx
VlachRumanian List
Dolomite Ladino
RomanshLadinFriulianItalian
WalloonFrenchProvencal
Catalan
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English English
Old Gutnish
Old Norse Icelandic StFaroeseOld Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Tocharian ATocharian B
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Ancient Greek
Greek MlGreek DGreek MdTsakonianGreek ModGreek KArmenian List
Hittite
●●●
●●●●●
●●●●●●●
●
● ●
●●●●●●●●●●●●
●●●
●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●●●
●
●●●●●
●●
●●●●
●●
●
●●●●
●●●
●
●●●
●●●
●
●
●●●●
●●
●●●
●● ●
●
● ●●●
●●
●●
●●●●●
●●
●
●●●●●●
●
●●●●●●●
●
white:E
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 37 / 42
Results Specific Results
False negatives
SogdianDigor OsseticIron OsseticWakhiSariqoli
BaluchiZazakiTadzikPersianPashtoWaziri
Vedic SanskritKashmiri
Hindi
LahndaPanjabi StUrduMagahiSindhiGujaratiMarathiAssameseOriyaBengaliNepaliSinghalese
Old PrussianLatvianLithuanian OLithuanian St
Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P
Slovenian PRussianRussian PUkrainian P
PolishUkrainianByelorussianByelorussian P
SlovakCzech ECzechSlovak PCzech P
Polish PUpper SorbianLower Sorbian
Old Church Slavonic
Cornish
Breton SeBreton ListBreton StWelsh N
Old IrishIrish BGaelic Scots
VlachRumanian List
Dolomite LadinoLadinFriulianItalian
WalloonFrenchProvencal
BrazilianPortuguese StSpanish
Sardinian LSardinian CSardinian N
Latin
Gothic
AfrikaansFlemishDutch ListFrisian
GermanStandard German Munich
SchwyzerduetschLetzebuergeschPennsylvania Dutch
Old High GermanOld English
English
Old NorseIcelandic StFaroese
Old Swedish
StavangerskNorwegian
DanishDanish Fjolde
Gutnish LauOevdalianSwedishSwedish UpSwedish Vl
Tocharian ATocharian B
Albanian T
AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C
Greek MlGreek DGreek MdGreek ModGreek K
Classical ArmenianArmenian ModArmenian List
●●●●●
●●●●●●
●●
●
●●●●●●●●●●●●
●●●●
●●●●●●●
●●●●
●●●●
●●●●●
●●●
●
●
●●●●
●●●
●●
●●●●
●●●
●●●
●●●
●
●
●●●●
●●
●●●
●●
●
●●●
●
●●
●●
●●●●●
●●
●
●●●●●●
●●●●●
●●●
worm:B
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 38 / 42
Results Specific Results
Summary on Indo-European
As the qualitative evaluation shows, the proto-forms proposed to bereconstructed back to PIE by our best ASR method are mostly equallygood if not even better candidates than those which we found in the goldstandard. Given the general and well-known uncertainties in semanticreconstruction in classical historical linguistics, it seems that ASR methodscould provide actual help in semantic reconstruction by providing objectiveevolutionary scenarios for word evolution along a given tree which follow aspecific evolutionary model.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 39 / 42
Discussion
Benefits of ASR (?)
If the language family is well-knownASR is of limited use in semantic reconstruction, since independentreconstructions by the comparative methods are available, butit is quite useful to check data quality and reference tree topology inlexicostatistical datasets.
If the language family is less well-knownASR is definitely useful as a preliminary analysis for semanticreconstruction, since it gives a more objective assessment of theconsequences of a given theory of lexical replacement and externallanguage change (a tree topology).
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 40 / 42
Discussion
Benefits of ASR (!)
ASR may help1 to identify loci of homoplasy and gives thus a first hint for parallel
semantic change patterns and borrowing.2 to quantify differential rates of lexical replacements for the concepts in
a given wordlist.3 to automatically identify sound change patterns and proto-form
reconstructions.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 41 / 42
Discussion
Caveats
Our current models are still very simplistic, in so far as theyoperate independently for each meaning slot,handle only binary (yes-no) cognate relations between words.
Future research will show whether it is possible to model lexical changeacross meanings and to allow for more fine-grained relations betweencognate classes.
Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42
References
A. Bouchard-Côté, D. Hall, T. L. Griffiths, and D. Klein. Automated reconstruction of ancientlanguages using probabilistic models of sound change. Proceedings of the National Academyof Sciences of the United States of America, 110(11):4224–4229, 2013.
R. Derksen. Etymological dictionary of the Slavic inherited lexicon. Brill, Leiden and Boston,2008.
G. Kroonen. Etymological dictionary of Proto-Germanic. Number 11 in Leiden Indo-EuropeanEtymological Dictionary Series. Brill, Leiden and Boston, 2013.
J.-M. List, A. Terhalle, and M. Urban. Using network approaches to enhance the analysis ofcross-linguistic polysemies. In Proceedings of the 10th International Conference onComputational Semantics – Short Papers, pages 347–353, Stroudsburg, 2013. Association forComputational Linguistics.
J.-M. List, T. Mayer, A. Terhalle, and M. Urban. Clics: Database of Cross-LinguisticColexifications. Online Resource, 2014a. URL http://clics.lingpy.org.
J.-M. List, S. Nelson-Sathi, H. Geisler, and W. Martin. Networks of lexical borrowing and lateralgene transfer in language and genome evolution. Bioessays, 36(2):141–150, 2014b.
M. Meier-Brügger. Indogermanische Sprachwissenschaft. de Gruyter, Berlin and New York, 8edition, 2002.
J. Pokorny. Indogermanisches etymologisches Wörterbuch, volume 1. Francke, Bern, 1959.M. Vaan. Etymological dictionary of Latin and the other Italic languages. Number 7 in Leiden
Indo-European Etymological Dictionary Series. Brill, Leiden and Boston, 2008.M. Vasmer. Ėtimologičeskij slovar’ russkogo jazyka. Progress, Moscow, 1986/1987.D. Wodtko, B. Irslinger, and C. Schneider. Nomina im Indogermanischen Lexikon. Winter,
Heidelberg, 2008.Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42
Top Related