Introduction and background information for the Catalogue of

17
1 About the Catalogue of the Endangered Languages of the World Many languages of the world are at risk of extinct soon. The crisis of endangered languages is one of the most serious issues facing humanity today, posing moral, practical, and scientific problems of enormous proportions. This catalogue informs users about the plight of endangered languages and encourages efforts to slow the loss. It provides information on the endangered languages of the world as a resource for the public, scholars, those whose languages are in peril and community groups facing language loss, and funding agencies to be able to deploy limited resources. Until now, with this Catalogue of Endangered Languages, there has been no single reliable source of information on the endangered languages of the world, describing how endangered each language is and to what extent it has been documented. For many of the languages in this catalogue, little or no accessible information exists. For others, the existing sources are often inaccurate, unreliable, or inaccessible. For those seeking to understand where documentation efforts and resources might most effectively be directed, and where language conservation or revitalization efforts are most needed, it is important to know not only how critically endangered a language is, but also how well it has already been described, how different or unique it might be, and how further description might contribute to understanding of human language in general. This catalogue presents these kinds of information on the endangered languages of the world. The Catalogue of Endangered Languages Project personnel. The Catalogue of Endangered Languages is under the direction of Lyle Campbell (University of Hawai‘i Mānoa) and Anthony Aristar and Helen Aristar-Dry (LINGUIST List/Eastern Michigan University). The team at Eastern Michigan University (EMU) is responsible for the programming, technical aspects of the Catalogue, bibliography management, and for the languages of Africa and Australia. The University of Hawai‘i Mānoa (UHM) team is responsible for the languages of Europe, the Caucasus, North Asia, East Asia, South Asia (the Indian subcontinent), Southeast Asia, North America, Central America, South America, and the Pacific, and for the endangerment scale and the documentation need scale. The following individuals contributed to Phase I of the data collection: EMU team: Dr. Anthony Aristar Dr. Helen Aristar-Dry Anna Belew (Team Leader) Lwin Moe Kristen Dunkinson Amy Brunett Brent Woo UHM team: Dr. Lyle Campbell John Van Way (Project Coordinator) Huiying Nala Lee Eve Okura Dr. Kaori Ueki The initial catalogue content was prepared by the members of these two teams, with some input from Regional Directors, experts on the languages of specific regions who provide expertise to correct and expand the catalogue, whose primary role begins in Phase II. The Regional Directors are: Willem F.H.Adelaar (South America) Greg Anderson (South Asia) I. Wayan Arka (Indonesia) Claire Bowern (Australia) Matthias Brenzinger (Africa) Lyle Campbell (the Americas) Alice C. Harris (Caucasus) Brian Joseph (Europe) Juha Janhunen (Northern and Central Eurasia) Bill Palmer (Pacific) Keren Rice (North America) David Solnit (East and Southeast Asia)

Transcript of Introduction and background information for the Catalogue of

  1  

About the Catalogue of the Endangered Languages of the World Many languages of the world are at risk of extinct soon. The crisis of endangered languages is one of the most serious issues facing humanity today, posing moral, practical, and scientific problems of enormous proportions. This catalogue informs users about the plight of endangered languages and encourages efforts to slow the loss. It provides information on the endangered languages of the world as a resource for the public, scholars, those whose languages are in peril and community groups facing language loss, and funding agencies to be able to deploy limited resources. Until now, with this Catalogue of Endangered Languages, there has been no single reliable source of information on the endangered languages of the world, describing how endangered each language is and to what extent it has been documented. For many of the languages in this catalogue, little or no accessible information exists. For others, the existing sources are often inaccurate, unreliable, or inaccessible. For those seeking to understand where documentation efforts and resources might most effectively be directed, and where language conservation or revitalization efforts are most needed, it is important to know not only how critically endangered a language is, but also how well it has already been described, how different or unique it might be, and how further description might contribute to understanding of human language in general. This catalogue presents these kinds of information on the endangered languages of the world. The Catalogue of Endangered Languages Project personnel. The Catalogue of Endangered Languages is under the direction of Lyle Campbell (University of Hawai‘i Mānoa) and Anthony Aristar and Helen Aristar-Dry (LINGUIST List/Eastern Michigan University). The team at Eastern Michigan University (EMU) is responsible for the programming, technical aspects of the Catalogue, bibliography management, and for the languages of Africa and Australia. The University of Hawai‘i Mānoa (UHM) team is responsible for the languages of Europe, the Caucasus, North Asia, East Asia, South Asia (the Indian subcontinent), Southeast Asia, North America, Central America, South America, and the Pacific, and for the endangerment scale and the documentation need scale. The following individuals contributed to Phase I of the data collection:

EMU team: Dr. Anthony Aristar Dr. Helen Aristar-Dry Anna Belew (Team Leader) Lwin Moe Kristen Dunkinson Amy Brunett Brent Woo

UHM team: Dr. Lyle Campbell John Van Way (Project Coordinator) Huiying Nala Lee Eve Okura Dr. Kaori Ueki

The initial catalogue content was prepared by the members of these two teams, with some input from Regional Directors, experts on the languages of specific regions who provide expertise to correct and expand the catalogue, whose primary role begins in Phase II. The Regional Directors are:

Willem F.H.Adelaar (South America) Greg Anderson (South Asia) I. Wayan Arka (Indonesia) Claire Bowern (Australia) Matthias Brenzinger (Africa) Lyle Campbell (the Americas)

Alice C. Harris (Caucasus) Brian Joseph (Europe) Juha Janhunen (Northern and Central Eurasia) Bill Palmer (Pacific) Keren Rice (North America) David Solnit (East and Southeast Asia)

  2  

Acknowledgements. The research for the Endangered Language Catalogue project is funded by a grant from the National Science Foundation: Collaborative Research: Endangered Languages Catalog (ELCat), BCS-1058096 to the University of Hawai'i at Mānoa (Principal Investigator, Lyle Campbell) and BCS-1057725 to Eastern Michigan University (Principal Investigators Helen Aristar-Dry and Anthony Aristar). The goals and basic organization of the Catalogue were established in an international workshop with some 50 specialists from around the world supported by National Science Foundation grant Collaborative Research: Endangered Languages Information and Infrastructure Project (NSF 0924140 ). This is just the beginning. It is extremely important to understand that the Catalogue is a work in progress. At launch of this website, the Catalogue is still in Phase I, which is based only on the information available in existing publications and web resources about the individual endangered languages. Bringing in more recent and local information is critical to this project, which is the focus of Phase II. The second phase will continue over the next two years. It involves an international team of regional specialists (see above) reaching out to knowledgeable individuals and organizations to fill in the missing information for languages in their areas, to check the accuracy of information, and to make needed corrections. For this phase and long into the future, the goal is to modify, update, and improve the catalogue contents constantly, as new information becomes available or as the situation for particular languages changes. If users of this website have particular knowledge or information about specific languages, we encourage submission of comments and suggestions for improvement of language entries. We are grateful for your help in improving the collective knowledge of the endangered languages. The Language Endangerment Index and the Need for Documentation Index presented for each language are not meant to be the final word about degree of endangerment or extent of documentation. The scores for individual languages will change as more information becomes available. They are provided for practical purposes, to give a quick but rough visual indication of a language’s endangerment status and documentation needs. The level of certainty accompanying each language shows the degree of confidence in the score: a label of “uncertain” may indicate that the level is not yet known or the score has been computed and further evaluation is needed. How the Catalogue handles tough questions. Some may wonder how differences of opinion have been handled. For example, some language varieties are believed to be independent languages by some scholars but are considered only dialects of a single language by others. In cases where it is not clear whether separate languages are involved or just dialects of one language, the entity in question is given its own entry as a potentially distinct language, but with the different opinions noted. In cases where the evidence is clear, entities are joined in a single entry, but with differences of opinion registered. Similarly, in cases the evidence is clear that separate independent languages are involved, though some believe they are dialects of a single language, these are given separate entries in the catalogue, with description of the different interpretations. The thorny issue of distinguishing dialects from closely related languages is avoided simply by giving doubtful entities their own entries with comments representing the range of opinion. As more comes to be known, it will possible to resolve the status of many of these entities; for others, the status may just remain unclear. This benefit-of-the-doubt approach to inclusion in the Catalogue, however, means that it is not possible just to count the total number of entries in the catalogue to get an absolute number of how many endangered languages there are in the world. Almost certainly some entities given their own entry will turn out to be only dialects that need to be joined in a single entry as representatives of a single language, reducing the total number of entries in the Catalogue. This approach results in the total number of entries being greater than the absolute number of true languages that are endangered, though hopefully not by a very large margin. Opinions differ also over the word “extinct.” In cases there have been no known speakers for hundreds or thousands of years, extinction is clear. However, there are cases where one source says “extinct,” “probably extinct,” “possibly extinct,” or “no known speakers”, and another credible source reports some speakers. In unclear instances, we include the language in the Catalogue, but report the conflicting designations. This means that almost certainly some languages in the Catalogue are in fact extinct -- not just endangered -- though definitive information is not yet available. As work on the Catalogue progresses, more accurate information on these cases will be obtained and their situations clarified. However, this means that it is not possible to take the total number of entries in the Catalogue as the absolute number of endangered languages in the world today, since some of these languages will prove not to be just endangered, but in fact extinct. There are 133 entries in the Catalogue that fall into this category. [link to “Silent languages” here] The word “extinct” raises other questions. Some scholars consider a language extinct when there are no longer any completely fluent native speakers who learned the language as children from the previous generation. Often, however, even

  3  

after there are no fully fluent native speakers, there remain speakers with some aptitude in the language, others with passive knowledge, and others who have learned or are learning their heritage tongue as a second language. Many oppose calling these languages “extinct.” Some do not consider languages with any of these sorts of speakers (even if not fluent) as extinct, and they recommend avoiding premature declaration of extinction: for those attempting to learn or revitalize their languages, it can be demoralizing to read that their language is deemed dead. In order not to discourage learning and revitalization efforts in these situations, they recommend reporting these languages as having “no known speakers,” or something equivalent. This practice of avoiding the word “extinct” in such situations is followed in this catalogue, though when the number of native speakers is given as Ø, that is an indication that the language in question falls into this category, a language with no known speakers. Endangered languages: Why so important?

Language extinction is not new – languages have been dying since the ancient times. However, languages are becoming extinct today at an alarming rate. Of the nearly 7,000 languages in the world today, some 3,000 (over 40%) are endangered; many others will make their way into this catalogue in the near future. Experts predict that in the worst-case scenario 90% of all languages will be extinct within 100 years; in best-case scenarios, only 50% will survive, and just 10% are considered safe during the next century (see Krauss 1992). Languages not being learned by children are not just endangered but doomed. Of the Native American languages of the US, 90% are not being passed on to a new generation, while also 90% of Australian aboriginal languages and over 50% of minority languages of Russia are in a similar situation. There were 312 American Indian languages in use when Europeans first arrived in North America; of these, 123 (40%) are extinct and others were lost without record. In the US, of the 280 languages known from the time of first European contact, only 151 still have speakers (54%), but all are endangered. Only 20 of these (13%) are being learned by children, but by ever fewer children each year. Most of these languages will be extinct in your lifetime, if language revitalization programs are not successful. California illustrates the crisis: at the time of the Gold Rush (c.1850), California had about 100 Native American languages; only 50 of these survive with speakers, but none is being learned by children in the normal way – the youngest remaining speakers are well into senior-citizenhood. The disappearance of an individual language constitutes a monumental loss of scientific information and cultural knowledge, comparable in gravity to the loss of a species, for example the Bengal tiger or the white whale. However, the extinction of whole families of languages is a tragedy comparable in magnitude to the loss of whole branches of the animal kingdom (classes, orders, families), for example to the loss of all felines or all cetaceans. Just as it would be difficult to understand the animal kingdom with major branches missing, it is impossible to understand the history and classification of human languages with the loss of entire language families. Yet this is what confronts us: already all the languages belonging to 108 of the 420 independent language families (including isolates) of the world are extinct – a staggering 26% of the linguistic diversity of the world is gone forever. Why should you care? We should all be concerned over the crisis of language loss for compelling reasons. (1) Human concerns. Languages are treasure houses of information on literature, history, philosophy, and art. Their stories, ideas, and words help us make sense of our lives and the world round us. For example, the life-enriching value of literature is well-understood and is true also of the oral literatures of the indigenous peoples of the world – they, too, have grappled with the complexities of their world and the problems of life, and the insights and discoveries represented in their literatures are of value to us all. When a language becomes extinct without documentation, taking all its oral literature, oral tradition, and oral history with it into oblivion, we are all diminished. There are also great reservoirs of historical information to be recovered from the study of languages. The classification of related languages teaches us about the history of human groups and how they are related to one another, and we gain understanding of contacts and migrations, the original homelands where languages were spoken, and past cultures from the comparison of related languages and the study of language change – all irretrievably lost when a language becomes extinct without adequate documentation. (2) Lost knowledge. Specific knowledge is often held by the smaller speech communities of the world – knowledge of medicinal plants and cures, identification of plants and animals yet unknown scientifically, new crops, etc. When the language is not learned by the next generation, the knowledge of the natural and cultural world encoded in the language typically fails to be transmitted. Loss of such knowledge could have devastating consequences for humanity. For example,

  4  

the Seri (of Mexico, only 700 speakers) use xnois ‘eelgrass seed’ (Zostera marina L.) as a food. This is “the only known grain from the sea used as a human food source” and it has considerable potential as a general food source … Its cultivation would not require fresh water, pesticides, or artificial fertilizer” (Felger and Moser 1973). It is easy to imagine a future in which natural or human-caused catastrophes compromise land-based crops, leaving human survival in jeopardy if we lose knowledge such as this. Medicines provide similar examples. Seventy-five percent of plant-derived pharmaceuticals were discovered by examining traditional medicines, and the languages of curers often played a key role. If these languages had become extinct and knowledge of the medicinal plants and associated cures had been lost in the process, all of humanity would have been impoverished and our survival as a species left more precarious. Paul Cox worked with Epenesa Mauigoa, a taulasea, traditional healer, on Upolu, Samoa, and they described 121 herbal remedies. Their work led to knowledge of the mamala plant (Homalanthus nutans) and the anti-viral drug prostratin, used to treat yellow fever. In trials at the National Cancer Institute, it also proved effective against HIV Type 1 (Cox 1993, 2001). Loss of this endangered traditional Samoan knowledge would have been a loss for all of humanity. (3) Scientific understanding of human language. Linguists have the goal of understanding what is possible and impossible in human languages, and through the study of human language capacity, of advancing knowledge of how the human mind works. For these goals, language extinction is a disaster. The discovery of previously unknown features and traits in undescribed languages contributes to this goal. For example, the discovery of languages with OVS [Object-Verb-Subject] and OSV [Object-Subject-Verb] basic word orders forced abandonment of previously postulated universals of language. Since languages with these basic word orders were not previously known, it was claimed that “the dominant order is almost always one in which the subject precedes the object” (Greenberg 1966:177), like English with SVO or Japanese with SOV. However, languages such as Hixkaryana (Brazil, 350 speakers) were discovered with OVS basic word order, as in:

toto yonoye kamura man ate jaguar ‘The jaguar ate the man.’

Discovery of languages with these previously unattested basic word orders forced this claim to be abandoned. It is all too plausible, however, given the recent loss of many languages in Brazil where most of the OVS and OSV languages were found, that the few languages with these word orders could have become extinct before they were described, leaving us forever in error about what is possible in human language and how that reflects human cognition. The discovery of a new speech sound is to linguists like the discovery of a new species to biologists. Recent discoveries of a new speech sound in threatened languages has led to testing scientific claims about sound systems and to refining our knowledge. Linguists document endangered languages to discover information of this sort, to determine the full range of what is possible in human languages. (4) Human rights. Language loss is often not voluntary; it frequently involves violations of human rights, with oppression or repression of speakers of minority languages. It is a matter of injustice when people are forced to give up their languages by repressive regimes or prejudiced dominant societies. Related to this is the personal loss associated with the death of one’s heritage language. Language loss is often experienced as a crisis of social identity. Our psychological, social, and physical well-being is connected with our native language; it shapes our values, self-image, identity, relationships, and ultimately success in life. For many communities, work towards language revitalization is not about language alone, but is part of a “larger effort to restore personal and societal wellness” (Pfeiffer and Holm 1994, the of Navajo Nation’s Education Division). Many indigenous voices affirm the importance of language in cultural identity:

Linguistic diversity ... constitutes one of the great treasures of humanity, an enormous storehouse of expressive power and profound understanding of the universe. The loss of hundreds of languages that have already passed into history is an intellectual catastrophe in every way comparable in magnitude to the ecological catastrophe we face today as the earth’s tropical forests are swept by fire. Each language still spoken is fundamental to the personal, social and – a key term in the discourse of indigenous peoples – spiritual identity of its speakers. (Zepeda [Tohono O’odham nation] and Hill 1991.)

  5  

But why save our languages ... we should save our languages because it is the spiritual relevance that is deeply embedded in our own languages that is important. (Richard Littlebear [Northern Cheyenne, President of Chief Dull Knife College, Lame Deer]. 1999:1.)

I canʼt stress enough the importance of retaining our tribal languages, when it comes to the core relevance or existence of our people … You could argue that when a tribe loses its language, it loses a piece of its inner-most being, a part of its soul or spirit … When it comes to native languages, the situation is simple: Use it or lose it. (Sonny Skyhawk [Sicangu Lakota, Hollywood actor] 2012.)

Language loss does not promote peace. It is often claimed that there would be more harmony if there were just one or only a few languages in the world. Some see language loss as promoting greater understanding and fostering world peace. This is wrong. Having only one language is no guarantee of “understanding.” We need look no further than the conflicts in monolingual Northern Ireland, the former Yugoslavia (where Serbians and Croatians have a common language), or the 1994 Rwanda genocide (involving Hutu and Tutsi, both speakers of Kinyarwanda), not to mention th US Civil War. National unity is not fostered by monolingualism; rather, recognition of minority languages’ rights may be a better way of bringing about peace, understanding, and ultimately national unity, as in relatively peaceful multilingual Belgium, Finland, or Switzerland. References Cox, Paul Alan. 1993. Saving the ethnopharmacological heritage of Samoa. Journal of Ethnopharmacology 38.181-8. Cox, Paul Alan. 2001. Will Tribal Knowledge Survive the Millennium? Science 287.5450.44-5. Felger, Richard and Mary Beck Moser. 1973. Eelgrass (Zostera marina L.) in the Gulf of California. Science.181.4097:355-6. Greenberg, Joseph H. 1966. Some universals of grammar with particular reference to the order of meaningful elements. Universals of Language, ed. by Joseph H. Greenberg, 73-113. Cambridge, MA: MIT Press. Littlebear, Richard. 1999. Some Rare and Radical Ideas for Keeping Indigenous Languages Alive. Revitalizing Indigenous Languages, edited by Jon Reyhner, Gina Cantoni, Robert N. St. Clair, and Evangeline Parsons Yazzie, 1-5. Flagstaff, AZ: Northern Arizona University. Skyhawk, Sonny. 2012. Why should we keep tribal languages alive? Indian Country, April 6, 2012 (http://indiancountrytodaymedianetwork.com/2012/04/06/why-should-we-keep-tribal- languages-alive-99182). Zepeda, Ofelia and Jane H. Hill. 1991. The condition of Native American languages in the United States. Endangered Languages, ed. by R. H. Robins and E. M. Uhlenbeck, 135-55. Oxford: Berg.

  6  

Scale of Endangerment

Level of Endangerment

5 Critically Endangered

4 Severely Endangered

3 Endangered 2 Threatened 1 Vulnerable 0 Safe1

Intergenerational Transmission

Few speakers, all elderly

Many of the grandparent generation speaks the language.

Some of child-bearing age know the language, but do not speak it to children.

Most adults of child-bearing age speak the language.

Most adults and some children are speakers.

All community members /members of the ethnic group speak the language.

Absolute Number of Speakers

1-9 speakers 10-99 speakers 100-999 speakers 1000-9999 speakers

10,000-99,999 speakers

>100,000 speakers

Speaker Number Trends

A small percentage of community members or members of the ethnic group speaks the language;, the rate of language shift is very high.

Fewer than half of community members or members of the ethnic group speak the language; the rate of language shift is accelerated.

About half of community members or members of the ethnic group speak the language; the rate of language shift, is frequent but not rapidly accelerating.

A majority of community members or members of the ethnic group speak the language; the numbers of speakers is gradually diminishing.

Most community members or members of the ethnic group are speakers; speaker numbers are diminishing, but at a slow rate.

Almost all community members or members of the ethnic group speak the language; speaker numbers are stable or increasing.

Domains of use of the language

Used only in very few domains, (for example, restricted to ceremonies, to few specific domestic activities; a majority of speakers supports language shift; no institutional support.

The language is being replaced even in the home; some speakers may values their language while the majority support language shift; very limited institutional support, if any.

Used mainly just in the home; some speakers may value their language but many are indifferent or support language shift; no literacy or education programs exist for the language; Government encourages shift to the majority language; there is little few outside institutional support.

Used in non-official domains; shares usage in social domains with other languages; most value their language but some are indifferent; education and literacy programs are rarely embraced by the community; government has no explicit policy regarding minority languages, though some outside institutions support the languages.

Used in all domains except official ones (i.e., government and workplace); nearly all speakers value their language and are positive about using it (prestigious); education and literacy in the language is available, but only valued by some; government and other institutional support for use in non-official domains.

Used in government, mass media, education and the workplace; most speakers value their language and are enthusiastic about promoting it; education and literacy in the language are valued by most community members; government and other institutions support the language for use in all domains.

                                                                                                               1  In order for a language to be considered ‘Safe,’ it must receive a 0 rating in all four categories. If a language’s composite score is 0% but the score is anything less than ‘Certain,’ it will be considered ‘At risk.’  

  7  

Computing Level of Endangerment: Intergenerational Transmission will be worth twice each of the other factors. Because many languages will not have reliable data for some of these factors, the total score will be based on the percentage of points out of the total points possible based on the number of factors considered. (100-81% = Critically Endangered; 80-61% = Severely Endangered; 60-41% = Endangered; 40-21% = Threatened; 20-1% = Vulnerable; 0% = Safe) Level of Certainty will be computed based simply on the percentage of factors that are known and entered. (25 points possible = Certain; 20 points possible = Mostly Certain; 15 points possible = Fairly Certain; 10 points possible = Mostly Uncertain; 5 points possible = Uncertain) Examples:

Need for Documentation Scale The need for documentation is based on the adequacy of available documentation of three types: grammar, dictionary, and corpus. Each of these factors has a total number of points; the number of points received is a percentage that is then weighted. Grammar weighs 4; Dictionary weighs 2; Corpus weighs 1. Grammar (Factor 1 out of 3) Size:

Description large, comprehensive

basic reference grammar

grammatical sketch

treats some aspects

nothing

Score 4 3 2 1 0 Criteria Yes No (remains unchanged) Scientific x 1.5 x 1 Accessible x 1.5 x 1 Highest Score Possible: 9 Lowest Score: 0 Example: Basic reference grammar, pre-scientific, accessible 3 x 1 x 1.5 = 4.5

Intergen. Trans. (x2)

Abs. #

Speaker Trends

Domains Total Status

Language A 6 4 3 3 16

Severely Endangered

Pts. possible 10 5 5 5 25 Certain Language B 8 5 0 0 13 Critically

Endangered Pts. possible 10 5 0 0 15 Fairly Certain Language C 0 3 0 0 3 Endangered Pts. possible 0 5 0 0 5 Uncertain

  8  

Note: A grammar is considered either scientific or pre-scientific. In terms of its score, this is a function of its size. A scientific grammar is 1.5 time the value of a pre-scientific grammar of the same size. The same for accessibility: it is a binary matter (accessible or not) rather than a range. Dictionary (Factor 2 out of 3) Size:

# of words > 5,000

2,000 - 5,000 < 2,000 Nothing

Score 3 2 1 0 Bonus points: Criteria Present in dict. Absent from dict. Example Sentences + 1 0 Usage + 1 0 Cultural explanations + 1 0 Accessibility – a factor of the total score for the dictionary Accessible x 1.5 Inaccessible x 1 Highest Score Possible: 9 Lowest Score: 0 Example: 2,750 words, no example sentences, usage present, no cultural expl., accessible ( 2 +0 +1 +0 ) x1.5 = 4.5 Corpus (Factor 3 out of 3) Size of annotated audio/video texts: Length

> 120 min.

119-60 min. 59-15 min.

< 15 min.

Nothing

Score 4 3 2 1 0 Written texts (with no corresponding audio/video): +0.5 Unannotated audio/video: +0.5 Highest possible score: 5 Lowest score: 0 Example: 30 min annotated transcription, and some written texts, and some unann. audio 2 +0.5 + 0.5 = 3

  9  

Example language score: Total Score Based on All Three Factors (weighted mean) Section Grammar Dictionary Annotated Corpus (text) Score 4.5/9

50% 4.5/9 50%

3/5 60%

Grammar weighted: 2x dictionary Dictionary weighted: 2x annotated corpus 4(50) + 2(50) + 1(60) = 51% (documented) High Need for Documentation 4 + 2 + 1 Need for Documentation:

Urgent 0-19% Very High 20-39% High 40-59%

Moderate 60-79% Low 80-99% Very Low 100%

Behind the Need for Documentation ratings The Need for Documentation Index is designed to offer, at a glance, how well documented a language is, and thus what the need for documentation is for that language. This is based on an evaluation of the published documentation in three areas: grammar, dictionaries/lexicon, and texts/corpora. All material relating to one of these areas is evaluated together to provide an overall picture. The initial evaluation is carried out by ELCat researchers, with further review sought by users as new documentation is discovered, written or published and becomes available. Grammars – Grammatical documentation may consist of book-length published grammars, shorter grammatical sketches or articles on particular aspects of a language’s grammar. A large, comprehensive grammar (score of 4) covers all major aspects of the language (phonology, morphology, syntax, etc.) and leaves little to nothing to be desired by a person wishing to know more about the language. An example of this would be Dixon’s (1997) grammar of Yidiny. A basic reference grammar (score of 3) covers most, but not all major aspects of the language (e.g., little phonological information, but lots on syntax). A grammatical sketch (score of 2) is much shorter and provides only preliminary information about some aspects of the language. Documentation that treats some aspects (score of 1) provides information about very limited topics in the language’s grammar, even if it explores those topics in a thorough way. Finally, if the language has no available documentation dealing specifically with the grammar it receives a score of 0. For example, at the time of writing, no documentation is available for the grammars of languages such as Kujarge and Guriaso. If the documentation is informed by modern linguistic training and is written in a way that is useful to today’s linguists, it is rated as ‘scientific’ and the score is adjusted. Hence, the score for grammars such as Dixon’s grammar of Yidiny would be adjusted. If the documentation is not written in such a way that it takes advantage of common generalizations observed in linguistics, the score remains the same. Documentation that is easy to find through university libraries or on the internet, written in a language of wider communication, and is not written in a specific theoretical framework is rated as accessible and the score is adjusted. This applies to grammars such as the Elkins’ (1970) grammar of Western Bukidnon Manobo, which can be found in its entirety online. If it fails to meet all of these criteria, then the score remains the same. Dictionaries/Lexicon – Lexical documentation, including all available wordlists and/or dictionaries, is evaluated first by the number of entries. Entries > 5,000 receives a score of 3; 2,000 – 4,999 entries receives a score of 2; < 2,000 entries receives a score of 1; and no available wordlists receives a score of 0. The quality of those entries is then evaluated by three criteria: if the entries include example sentences, it receives one extra point; if the entries include information about how the words are used in phrases, sentences or discourse, it receives one extra point; if the entries include information that places words in their cultural contexts, it receives one extra point. The above considerations are important because they can help to make a dictionary or wordlist more useful for its users. Finally, if the dictionary/wordlist is not available through university libraries or on the internet, or if it uses special symbols or terms that are not explained, or does not include definitions in a

  10  

language of wider communication, then it is considered inaccessible and the score remains the same. If not, it is considered accessible and the score is adjusted. Hence, a work like Blust’s (2003) dictionary of Thao would receive a full score of three for having more than 5000 entries, as well as extra points for including example sentences, information on how an entry is used in a phrase, sentence or discourse, as well as cultural information. Finally its score is adjusted for being accessible: the information appears on Google books. Texts/Corpora – Textual documentation consists of recordings of connected speech in a variety of contexts, such as conversations, personal narratives, rituals, instructions, myths/folklore, etc. Our primary consideration are texts that are accessible online or through archives and that are most useful because they include recordings (audio or video) and are annotated with word-by-word or morpheme-by-morpheme glossing and a free translation. Texts meeting these criteria which are > 120 minutes receive a score of 4; 119-60 minutes receive a score of 3; 59-15 minutes receive a score of 2; < 15 minutes receives a score of 1; no texts of this kind merits a score of 0. If the language has written texts (annotated or not) with no corresponding audio or video, it receives 0.5 points, and if the language has audio/video recordings which include no annotation, it also receives 0.5 points. For example, since Chamorro has more than 120 minutes worth of annotated audio corpus, it receives a score of 4. It also receives 0.5 for unannotated audio material and 0.5 for having written texts available, hence scoring a total of 5 for corpus. On the other hand, a language such as Chrau, which has no annotated corpus, no unannotated audio and no written texts, would score a total of zero on the corpus scale. Overall score – The total need for documentation is computed by weighing the scores in each of the categories. Grammars (x 4) are worth twice as much as dictionaries; dictionaries (x 2) are worth twice as much as texts; and texts (x 1) are weighted one. The total grammar score is divided by the points possible for a grammar, yielding a percentage which is then weighted by four. This score is then added to the dictionary points percentage (calculated the same way as the grammar score), which is weighted by two, and added to the percentage of text points. This total is converted to a percentage of total documentation that corresponds to the following levels of need:

Urgent 0-19% Very High 20-39% High 40-59%

Moderate 60-79% Low 80-99% Very Low 100%

Reasons these scores are only rough guides: It is impossible to know whether the grammatical documentation for a language covers all, or even close to all, of the topics in the language. First, it would take someone very familiar with the language to decide that all topics were adequately covered; second, there may be interesting topics in the language that have not yet been considered. Therefore, our evaluation of a grammar as ‘comprehensive’ is based on an educated guess. Basing the quality of lexical documentation (i.e., dictionaries) on the number of entries is a necessary first step, though this can be misleading because not all entries are equal. Some dictionaries may inflate the number of entries by including inflected forms which are predictable. A language may have a perfectly adequate dictionary based just on roots – a dictionary like this would have a much lower number of entries. Textual documentation is evaluated only on what is available to the researchers. In some cases, this may mean that there is significant textual documentation that we have not evaluated and that the score might be higher. However, until the texts are made available to the wider public, then we cannot consider the amount of textual documentation to be satisfactory. When considering the quality of textual documentation, it is important to consider whether a wide variety of genres exists in the available documentation. Because of practical considerations, however, we have reluctantly decided not to consider this factor. First, it is very hard to determine exactly which genres are covered in a corpus and, second, there are challenges in determining whether some texts should be considered a single or multiple genres. (E.g., Are marriage ceremonies and funeral ceremonies considered one genre – ritual – or two?) The overall rating of the need for documentation is inherently arbitrary because it is determined by numerical values. The difference between a rating of Low and Moderate need is one percentage point, which is of course unrealistic. Unfortunately, there is no easy solution to this.

  11  

References: Blust, R. 2003. Thao dictionary. Language and Lingusitics Monograph Series, No. A5. Taipei: Institute of Linguistics (Preparatory Office), Academia Sinica. Dixon, R.M.W. 1997. A Grammar of Yidiny. Cambridge: Cambridge University Press. Elkins, R.E. 1970. Major grammatical patterns of western Bukidnon Manobo. SIL Publications in Linguistics and Related Fields.

  12  

Silent Languages To understand the plight of endangered languages today, it is valuable to be able to see just how many languages have become extinct, and to compare the list with the number of living languages and with the number of currently endangered languages. However, “extinction” is not straightforward. Two lists are presented here. One is of languages which are well and truly extinct. The second list is of language that are sometimes declared to have no remaining native speakers but whose status may not be certain. The languages of this second list underscore the need for careful and urgent attention to these cases. Telling questions are, when is a language “extinct”, and indeed what does it mean for a language to be “extinct”? Where there have been no known speakers for hundreds or even thousands of years, extinction is clear and uncontroversial. However, there are uncertain languages about which one source says the language in question is “extinct,” “probably extinct,” “possibly extinct,” or has “no known speakers”, where another equally credible source reports it as still having some speakers or possibly some speakers. The list includes these languages and also languages whose last fluent speaker is reported to have died in recent times, even when sources do not disagree. In some cases of languages recently declared extinct, later on other speakers were found. Most of these languages reported to have recently lost their last speakers probably are truly no longer spoken; nevertheless, it is possible that for some cases some unknown speakers may yet turn up. For that reason we give these languages considerable benefit of the doubt. These languages are all included in the Catalogue of Endangered Languages, together with whatever is reported in sources about their status. There are 141 entries in the Catalogue of Endangered Languages that fall into this possibly speakerless but unclear category, where sources disagree or where the languages have only recently been said to no longer have speakers – 141 is no small set. But there is much more to the extinction story. When a language qualifies as extinct is not a precise matter. For some scholars, a language is considered extinct when there are no longer any completely fluent native speakers who learned the language as children. For others, a language that may lack fluent native speakers but still have semi-speakers or is being learning as a second language is not considered extinct. Moreover, many prefer to avoid calling any language extinct where people whose heritage languages are involved may be interested in attempting to learn or revitalize it, to avoid discouraging such efforts. To encourage efforts toward recovery of a language that lacks fully fluent native speakers, or for that matter, lacks any speakers of any sort, some prefer to speak of such languages as “silent”, “sleeping”, “dormant”, or just “unspoken”. The second list, the one of languages of which there is suspicion or even good reason to believe but no certainty that there are no longer native speakers, serves to call attention to those languages that are perhaps no longer spoken but where it may be possible, nevertheless, that speakers might remain. Such extremely precarious languages merit high priority. Many will be and should be the objects of caring concern by those whose heritage languages these cases represent. These lists demonstrates starkly the problem of language endangerment by showing just how many of the world’s languages have already become extinct or are “silent” (“sleeping”), in contrast to the great many languages that are currently endangered, listed in this catalogue. Up to now, 635 known languages appear on these two lists, just under 10% of the language known ever to have existed. Already all the languages of more than 100 language families (including language isolates) are extinct from among the 420 independent language families (including isolates) in the world – 25% of the linguistic diversity of the world has already disappeared. Worse, this number will change radically and rapidly: the Catalogue of Endangered Languages has just over 3,000 entries from among the approximately 7,000 living languages in the world – by this count, 43% of living languages are endangered! The number of extinct languages will soon swell dramatically. Clearly, as these numbers show, languages on a course towards extinction are vastly more numerous currently than in the past.

  13  

1. Language of Uncertain but Precarious Status (sometimes reported as having no speakers) (141 languages): Agwamin wmi Jumaytepeque Xinka xin Tora trz Akkala Saami sia Kansa ksk Toromona tno Alngith aid Kushyana (Kaxuiana) kbb Tukumanfed tkf Amanaye ama Kerek krk Tuxa tud Amonap mzo Klallam clm Umbuygamu umg Arabana-Wangkangurru ard Korana kqz Teteté teb Arapaso arp Kukatj ggd Teushen 0qk Arara (Arara do Beiradao) axg Kuku-Mangk xmg Tolowa tol Ariba aea Kuku-Mu'inh xmp Umpithamu umd Atampaya amz Kuku-Ugbanh ugb Pauserna psm Atsugewi atw Kuku-Thaypan typ Phalok lwl-pha Ayapathu ayd Laimon coj Pitta-Pitta pit Bare bae Lake Miwok lmw Plains Miwok pmw Baygo byg Lamalama lby Quapaw qua Bung bgd Lapachu qa6 Quileute qui Canichana caz Leco lec Santiam kyl Catawba chc Lipan apl Saraveca sar Cayuvava cyb Lower Chinook chh Sawknah swn Chiapanec cip Lower Chehalis cea Serrano ser Chilanga (Salvadoran Lenca) len? Macaguaje mcl Southeastern Pomo pom

Chimariko cid Maidu nmu Southern Sierra Miwok skd

Chitimacha ctm Makolkol zmh Tapeba tbb Chiwere iow Malyangapa 0h1 Tequiraca ash Coast Miwok csi Mandahuaca mht Umutina umo

Copper Island Aleut mud Martha’s Vineyard Sign Language mre Unami unm

Cupeño cup Mbariman-Gudhinma zmv Uradhi urf Deti shg-det Miami-Illinois mia Uru ure Dirari dit Miriti mvv Vilela vil Djangun djf Miwa vmi Wanggamala wnm Duungidjawu wkw-duu Muluridyi vmu Wangganguru wgg Eastern Pomo peb Mayi-Kutuna xmy Wappo wao Eel River Athabaskan qt8 Mbabaram vmb Wik-Epa wie Eyak eya Ngamini nmv Wik-Keyangan wif Gamberre gma Ngarinyin wil Wirafed wir Ganggalida gcd Ngawun nxn Wirangu wiw Garlali nbx Ngumbarl 08s Wiyot wiy Guazacapan Xinka xin Nimbari nmr Wotapuri-Katargalai wsv Gununa-Kune pue Nisenan nsz Xakriaba xkr Gurr-goni gge Njerep njr Xiriâna xir Hanis csz Nungali nug Yahuna ynu Honduran Lenca len? Nyaki Nyaki nys Yameo yme Gros Ventre ats Nyang'i nyp Yangman jng Hpon hpo N|u ngh Yaquina aes Ilgar ilg Ona ona Yavitero yvt Itene ite Opata-Eudeve opt Yir-Yoront yiy

Jawi djw Otoe iow-oto Zire sih

Jiwarli djl Paraujano pbg Ziriya zir

  14  

2. Extinct Languages (494 languages) //Xegwi xeg Aruá aru Chipiajes cbe

/Xam xam Assan xss Chiquimulilla Xinka xin-chi

Abipon axb Atakapa aqp Cholón cht Abishira ash Atsahuaca atc Chorasmian xco Abnaki, Eastern aaq Aushiri avs Chorotega cjr Acroá acs Auyokawa auo Chumash chs Adai xad Avar, Old oav Chuvantsy xcv Aequian xae Avestan ave Coahuilteco xcw Aghu Tharnggalu ggr Awabakal awk Cochimi coj Aghwan xag Ayta, Tayabas ayy Comecrudo xcm Agta, Dicamay duy Bactrian xbc Coptic cop Aguano aga Baga Kaloum bqf Coquille coq Ahom aho Baga Sobané bsv Cornish cnx Ajawa ajw Banggarla bjb Cotoname xcn Aka-Bea abj Baniva bvv Coxima kox Aka-Bo akm Barbacoas bpb Coyaima coy Aka-Cari aci Barbareño boi Creole Dutch, Skepi skw Aka-Jeru akj Baré bae Cruzeño crz Aka-Kede akx Basa-Gumna bsl Cumanagoto cuo Aka-Kol aky Basay byq Cumbric xcb Aka-Kora ack Bayali bjy Cumeral cum Akar-Bale acl Baygo byg Curonian xcu Akkadian akk Beothuk bue Dacian xdc Alanic xln Berti byt Dagoman dgn Algonquian, Carolina crr Biloxi bll Dalmatian dlm Alsea aes Bina bmn Deir Alla xdr Ammonite qgg Biri bzr Delaware, Pidgin dep Andaqui ana Birked brk Dhurga dhu Andoa anb Bolgarian xbo Dieri dif Anglo-Norman xno Cacaopera ccr Dororo drr Anserma ans Cagua cbh Duli duz Apalachee xap Camunic xcc Dura drq Aquitanian xaq Caramanta crf Eblan xeb Arabic, Andalusian xaa Carian xcr Edomite xdm Aramaic, Jewish Babylonian

tmr Carib, Island crb Egyptian egy

Aramaic, Jewish Palestinian

jpa Catawba chc Elamite elx

Aramaic, Official arc Cauca cca Elymian xly Aramaic, Samaritan sam Cayubaba cyb Emok emo Aranama-Tamique xrt Cayuse xcy Epi-Olmec xep Arára, Mato Grosso axg Celtiberian xce Esselen esq Aribwatsa laz Chagatai chg Esuma esm Arikem ait Chané caj Etchemin etc Arin xrn Chibcha chb Eteocretan ecr Arma aoh Chicomuceltec cob Eteocypriot ecy Armazic xrm Chimakum cmk Etruscan ett

  15  

Faliscan xfa Kalapuya, Southern sxk Langobardic lng Frankish frk Kalarko kba Laurentian lre Gabrielino-Fernandeño

xgf Kalkutung ktg Lemnian xle

Gafat gft Kamakan vkm Leningitij lnj Galatian xga Kamas xas Lepontic xlp Galice gce Kamba xba Liburnian xli Galindan xgl Kambiwá xbw Ligurian xlg Gamo-Ningi bte Kaniet ktk Linear A lab Gangulu gnl Kanoé kxo Lingua Franca pml Garza xgr Kapinawá xpn Loup A xlo Gaulish, Cisalpine xcg Kara Loup B xlb Gaulish, Transalpine xtg Kara zra Lumbee lmz Geez gez Karakhanid xqa Lusitanian xls Gey guv Karami xar Luwian, Cuneiform xlu

Ghomara gho Karankawa zkk Luwian, Hieroglyphic

hlu

Gothic got Karipúna kgm Lycian xlc Greek, Cappadocian cpg Karirí-Xocó kzw Lydian xld

Guana gqn Kariyarra vka Macedonian, Ancient

xmk

Guanche gnc Karkin krb Maek hmk Gugu Warra wrw Kaskean zsk Mahican mjy Gule gly Katabaga ktq Maidu, Valley vmv Guliguli gli Kaurna zku Malgana vml Gureng Gureng gnr Kawi kaw Mamulique emm Guyani gvy Kazukuru kzk Manangkari znk Hadrami xhd Kepkiriwát kpn Mandaic, Classical myz Harami xha Ketangalan kae Mangue mom Hattic xht Khazar zkz Manipuri, Old omp Hermit llf Khorezmian zkh Manx glv Hernican xhr Kitan zkt Maritsauá msp Hibito hib Kitsai kii Marrucinian umc Hittite hit Knaanic czk Marsian ims Homa hom Koguryo zkg Matagalpa mtn Horo hor Koibal zkb Mator mtm

Hunnic xhc Kott zko Mator-Taygi-Karagas

ymt

Hurrian xhu Kpati koc Mattole mvb Iberian xib Krevinian zkv Mawa wma Ifo iff Kubi kof Maykulan mnt Illyrian xil Kulon-Pazeh uun Mbara mvl Ineseño inz Kuman qwm Median xme Iowa-Oto iow Kungarakany ggk Meroitic xmr Jorá jor Kunza kuz Mesmes mys Jurchen juc Kusunda kgg Messapic cms Kaimbé xai Kw'adza wka Michigamea cmm Kakauhua kbf Kwadi kwz Miluk iml

Kalapuya, Northern nrt Kwalhioqua-Tlatskanai

qwt Milyan imy

  16  

Minaean inm Oirat, Written xwo Puquina puq Minoan omn Oko-Juwoi okj Puri prr Mittu mwu Omejes ome Purisimeño puy Miwok, Bay mkq Omok omk Puyo xpy Miwok, Coast csi Omurano omu Puyo-Paekche xpp Mlahsö lhs Oscan osc Pyu pyx Moabite obm Oti oti Qatabanian xqt Mobilian mod Otuke otu Raetic xrr Mochica omc Ouma oum Rema bow Mohegan-Montauk-Narragansett

mof Paekche pkc Remo rem

Moksela vms Paelignian pgn Runa rna Molale mbe Pahlavi pal Sabaean xsa Mozarabic mxi Paisaci Prakrit qpp Sabine sbv Mulaha mfw Palaic plq Sakan xsk Muskum mje Pali pli Salinan sln Mysian yms Palumata pmc Sam'alian qey Nadruvian ndf Pame, Southern pmz Samaritan smp Nagumi ngv Pamlico pmk Sami, Kemi sjk Nanticoke nnt Pankararé pax Saraveca sar Narrinyeri nay Pankararú paz Scythian xsc Natagaimas nts Panobo pno Selian sxl Natchez ncz Papora ppu Sened sds

Nawathinehena nwa Paranawát paf Senhaja De Srair

sjs

Negerhollands dcr Parthian xpr Sensi sni Neo-Aramaic, Barzani Jewish

bjf Pataxó Hã-Ha-Hãe pth Seroa kqu

Newar, Middle nwx Pecheneg xpc Seru szd Ngandi nid Pentlatch ptw Shinabo snh Nganyaywana nyx Phoenician phn Shuadit sdt Ngbee jgb Phrygian xpg Sicanian sxc Niuatoputapu nkp Picene, North nrp Sicel scx Nocamán nom Picene, South spx Sidetic xsd Nooksack nok Pictish xpi Singa sgm Noric nrc Pidgin, Timor tvy Siraya fos Norn nrn Pijao pij Siuslaw sis North Arabian, Ancient xna Pirlatapa bxi Skalvian svx Nottoway-Meherrin nwy Pomo, Northern pej Sogdian sog Nubian, Old onw Ponares pod Solano xso Nukuini nuc Potiguára pog Sorothaptic sxo Numidian nxm Powhatan pim Subtiaba sut Nyang'i nyp Prākrit, Ardhamāgadhī pka Sudovian xsv Obispeño obi Prākrit, Māhārāṣṭri pmh Sumerian sux Ofayé opy Prākrit, Sauraseni psu Susquehannock sqn

Ofo ofo Prussian prg Syriac, Classical

syc

Ohlone, Northern cst Pumpokol xpm Taino tnq Ohlone, Southern css Punic xpu Takelma tkm

  17  

Tama ten Woccon xwc Waamwang wmn Tamanaku tmz Worimi kda Wailaki wlk Tanema tnx Wulguru qgu Wakoná waf

Tangut txg Wuliwuli wlu Yupiltepeque Xinka xin-yul

Tapeba tbb Wurrugu wur Zarphatic zrp Tartessian txr Wyandot wya Zemgalian xzm Tasmanian xtz Xukurú xoo Zhang-Zhung xzh Tay Boi tas Yabaâna ybn

   Tepecano tep Yalarnnga ylr    Torona tqr Yana ynn    Totoro ttk Yassic ysc    Tripuri, Early xtr Yeni yei    Truká tka Yoba yob    Tsetsaut txc Yug yug    Tubar tbu Yugambal yub    Tumshuqese xtq Yupik, Sirenik ysr    Tunica tun Ternateño tmg    Tupí tpw Teshenawa twc    Tupinambá tpn Thracian txh    Tupinikin tpk Thurawal tbh    Turiwára twt Tillamook til    Turung try Timucua tjm    Tutelo tta Tingui-Boto tgv    Tuxá tud Tjurruru tju    Tuxináwa tux Togoyo tgy    Twana twa Tokharian A xto    Uamué uam Tokharian B txb    Ubykh uby Tomedes toe    Ugaritic uga Tonjon tjn    Umbrian xum Tonkawa tqw    Piro pie Umotína umo    Piscataway psy Umpqua,

Upper

xup

   Pisidian xps Urartian xur    Pochutec xpo Uruava urv    Polabian pox Urumi uru    Pomo,

Eastern

peb Vandalic xvn

   Wampanoag wam Vano vnk    Wandarang wnd Venetic xve    Wariyangga wri Ventureño veo    Wasu wsu Vestinian xvs    Weyto woy Volscian xvo