Language Documentation in the 21st Century
-
Upload
peter-austin -
Category
Technology
-
view
1.236 -
download
2
description
Transcript of Language Documentation in the 21st Century
1
Language Documentation in the 21st Century
Prof Peter K. Austin
Endangered Languages Academic Programme
Department of Linguistics, SOAS
Department of Linguistics, University of Hong Kong
13th September 2013
2
© 2013 Peter K. Austin
Creative commons licence:
Attribution-NonCommercial-NoDerivs CC BY-NC-ND
3
Outline
• Language documentation in 1995 and today
• Establishing principles for the field
• Developments since 2005
• Some current challenges
• Conclusions
4
Language documentation
• “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998)
• has developed over the 20 years in large part in response to the urgent need to make an enduring record of the world’s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information and communication technologies
• essentially concerned with roles of language speakers and their rights and needs
5
Publications: books and journals
• Gippert et al 2006 Essentials of Language Documentation. Mouton
• Tsunoda 2006 Language endangerment and language revitalization: an introduction
• Language Documentation and Description – 11 issues (2,000+ copies sold), 1 in prep
• Language Documentation and Conservation – 6 issues (on-line only)
• Cambridge Handbook of Endangered Languages 2011
• Routledge Essential Readings 2011
• Oxford Bibliography Online 2012
6
DoBeS projects
7
ELAR deposits
8
Main features (Himmelmann 2006:15)
• Primary data – collection and analysis of an array of primary language data to be made available for a wide range of users;
• Accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected;
• Long-term storage and preservation of primary data – includes a focus on archiving in order to ensure that documentary materials are made available to potential users now and into the distant future;
9
Main features (cont.)
• Interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to mainstream (“core”) linguistics alone
• Cooperation with and direct involvement of the speech community – active and collaborative work with community members both as producers of language materials and as co-researchers
• Outcome is annotated and translated corpus of archived representative materials on a language
10
Stuart McGill Cicipu corpus
11
Cicipu Toolbox
12
Critique: Dobrin, Austin & Nathan 2007
• “subtle and pervasive kinds of commoditisation (reduction of languages to common exchange values) abound, particularly in competitive and programmatic contexts such as grant-seeking and standard-setting where languages are necessarily compared and ranked”
• archivism: quantifiable properties such as recording hours, data volume, and file parameters, and technical desiderata like ‘archival quality’ and ‘portability’ have become reference points in assessing the aims and outcomes of language documentation – these are not measures of qualitydocumentary dog
archiving tail
X
13
Skills issues
• video madness: video recordings are made without reference to hypotheses, goals, or methodology, simply because the technology is available, portable and relatively inexpensive
• audio skills are lacking: documentary linguists show little or no knowledge about recording arts and microphone types, properties and placement (microphone choice and handling is the single greatest determiner of recording quality)
• corpus taming : documentary linguists show little ability at corpus and metadata management, ranging from file naming to bundle organisation
14
Myopia (Austin 2012)
• ILG blindness: many documenters believe that interlinear glossing is the “gold standard” of annotation but it is very time-consuming and illegible to non-linguists – overview annotation may be a preferred as a primary goal: “roadmap” or index of a recording – approximately time-aligned information about what is in the recording, who is participating, and other interesting phenomena
• Toolbox and ELAN as “Nietsche’s typewriter” (link)
15
• with no guiding framework for assessing quality, progress, and value in their work, documentary linguists fall back on established patterns, referring to quantifiable indices of language vitality or technical standards for the density of acoustic information even when these are not rationalised by the particular language or research situation
• diversity (goals, contexts, people) – move away from “Noah’s Ark” projects to more specialised documentation, eg. ELDP 2012 grant list
• we need more and better attention to goals, methods, skills, outcomes and values of language documentation
16
A 21st century model
Woodbury 2011 enlarges concept of language documentation:
“creation, annotation, preservation and dissemination of transparent records of a language.”
and identifies several gaps in a Himmelmann-type approach:
“While simple in concept, it is complex and multifaceted in practice because:
• its object, language, encompasses conscious and unconscious knowledge, ideation and cognitive ability, as well as overt social behaviour;
• records of these things must draw on concepts and techniques from linguistics, ethnography, psychology, computer science, recording arts and more;
17
A 21st century model
• the creation, annotation, preservation and dissemination of such records pose new challenges in all these fields, as well as information and archival sciences and;
• “above all, humans experience their own and other people’s languages viscerally and have differing stakes, purposes, goals and aspirations for language records and language documentation”
Woodbury emphasises:
• Diversity of goals, purposes and outcomes
• Need for a theory of the documentary corpus
• Need for accounts of individual project designs
18
Need for meta-documentation (Austin 2013)
• meta-documentation concerns the theory and practices of meta-data, data about the data being collected and analysed
• metadata:• is needed for identification, management,
retrieval of the data• provides the context and understanding of
that data• carries those understandings into the future,
and to others (and hence is important for archiving and preservation)
• reflects knowledge and practices of data providers
19
Metadata
• defines and constrains audiences and usages for the data
• all value-adding to recordings of events involves the creation of metadata – all annotations (transcriptions, translations, glosses, pos tagging, etc.) are metadata (Nathan and Austin 2004)
20
Metadata gaps
• recommendations for creating metadata for language documentation have been primarily influenced by library concepts (eg. Dublin Core), and key metadata notions have been interoperability, standardisation, discovery, and access (OLAC, EMELD, Farrar & Langendoen 2003).
• the goals of language documentation mean this is not powerful enough and we need a theory of metadata, largely lacking until now
• Nathan (2010): “meta-documentation is the documentation of your data itself, and the conditions (linguistic, social, physical, technical, historical, biographical) under which it was produced. Such meta-documentation should be as rich and appropriate as the documentary materials themselves”
21
Missing meta-documentation categories
• identity of stakeholders involved and their roles in the project
• attitudes and ideologies of language consultants, both towards their languages and towards the documenter and documentation project
• relationships with consultants and community
• goals and methodology of researcher, including research methods and tools (see Lüpke 2010), corpus theorisation (Woodbury 2011), theoretical assumptions embedded in annotation (abbreviations, glosses), potential for revitalisation
22
• biography of the project, including background knowledge and experience of the researcher and main consultants (eg. how much fieldwork the researcher had done at the beginning of the project and under what conditions, what training the researcher and consultants had received)
• for funded projects, includes original grant application and any amendments, reports to the funder, email communications with the funder and/or any discussions with an archive (eg. reviews of sample data)
23
Archiving in the 21st century
• Two major approaches have emerged
• ‘big data’ archiving
• archiving inspired by social media models
24
Big data archiving
• e.g. MPI-Nijmegen• CLARIN, DARIAH, VLO• “integrated digital research environments
that allow researchers to combine resources and tools from various sources in a seamless way” (Trilsbeek & Koenig 2013)
• component metadata initiative (CIMDI)• mandatory to link each field to a concept
definition in a central data category registry called ISOcat
• goal of data mining and cross-corpus extraction, use of large scale computational linguistics tools
25
Archive 2.0: social media models
• traditionally archiving focussed heavily on preservation
• however documentation often deals with highly sensitive topics (sacred stories, gossip)
• needs powerful but flexible access management
• transparency – ease of understanding• use positively – social networking model
• access through relationships• relationships and sharing produce new
opportunities• ELAR URCS system
26
ELAR URCS system
• e.g. Trevor Johnson Auslan deposit
• Logged in user displays
27
OAIS model
OAIS archives define three types of ‘packages’ingestion, archive, dissemination:
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
IngestionProducers Designated communities
28
ELAR archive 2.0 model
29
Rethinking the archive model
• progressive archiving – a challenge to whole approach of documentary linguistics
• establish user account at beginning of project – users add and manage/update resources over time
• user accounts show access and usage/downloads analytics – cf. Academia.edu
30
“classical” archiving
collect resources/data archive them
Collect, process, publish Archive
And hope that death does not intervene
progressive archiving
31
Rethinking archive participation
• userse.g. add bookmarks, negotiate access
• depositorse.g. updating and editing content• negotiate access• monitoring usage
• collaborations• exchange & share information• establish groups• community curation
32
User xx has just applied for access to restricted material in the deposit johnston2012auslan. The following message was attached to the application:
"Hello [depositor], xx here. I'm interested in having a look at some of your video deposit, including annotation files. I am working on a project documenting Central Australian Indigenous sign with yy (see http://iltyemiltyem.tumblr.com/). If ok, I'd like to see how you do the annotation - we have worked out a template and annotation protocol, but this needs a lot of refinement. Regards, MC"
Application: from depositor’s friend, re methods
33
This email is to inform you that user xx's application for access to restricted material in the deposit kunbarlang-389 has just been approved. The depositor included the following note to the user:
"Hi xxI've approved your access to this collection, but you should know that there is an update in the material I've just deposited, with much more information on both music and texts. I'd be happy to give you access to that when it is processed.
Next time I come to London (October or November this year) I'd be happy to meet up if you would like to discuss."
Response: further info and offer to meet
34
User xx has just applied for access to restricted material in the deposit cappadocian-375. The following message was attached to the application:
"Dear [depositor], I work as a research assistant in Nevsehir University in Cappadocia, Turkey. As you know, Cappadocian language has some relics in this region despite speakers of Cappadocian do not live anymore. In my university, there are few research on this subject with collaboration of Greek friends and local societies … I would like to access to your material … By the way, i would like to interview with you about Cappadocian language for our international journal of art and language. I hope you will have time for our journal . Thank you in advance."
Application: establish credentials and make request
35
This email is to inform you that user xx's application for access to restricted material in the deposit johnston2012auslan has just been approved. The depositor included the following note to the user:
"I am giving you user access which means you should be able to see the ELAN eaf annotation files for the topics "The boy who cried wolf" and for "The hare and the tortoise. You should also be able to see most other movies except those tagged "1a" "4a" and "5". If you cannot see the ELAN eaf annotations I hope the problem will be fixed soon. I told the ELAR team about this."
Response: approval with details and guide
36
Applied documentation
• Should documentation contribute to sustaining language and cultural diversity and the communities who want to maintain and develop them?
• What would documentary linguistics look like if it took revitalisation (and pedagogy) as its primary goal – e.g. types of data, learner-directed language, sequencing? See Nathan & Fang 2013
• Are there mismatches between linguists’ ideologies of endangered languages and documentation and community ideologies? See Austin & Sallabank 2014
37
Examples
• emergence of examples of applied language documentation and language and cultural revitalisation, eg. papers in LDD 11, Wuqu’ Kawoq (from Guatemala), Maori (from New Zealand)
• this year I have been involved in a project with the Dieri Aboriginal Corporation in Australia aimed at cultural and linguistic repatriation and revival which has taught me a lot about links between primary documentation and its applications
38
… it seems that in general many documenters are struggling with formal
aspects of their documentary work because of a late recognition by leaders in
documentary linguistics that a good language documentation might be very
much more than a set of dozens, hundreds, or thousands of files in
archiveable formats.” (Nathan 2012)
39
Conclusions
• we need to move beyond 20th century models of language documentation and archiving and become more reflexive and analytical about our goals, practices, methods and values
• we need to bring more of the social aspect of human life into language documentation and linguistic research (where it has been largely missing for the past 20 years of renewed interest in endangered languages) replacing objectification and commodification with concern for what is special and unique about the contexts, and the people, cultures and languages we are attempting to document and support
40
唔該
Thank you!