Fieldwork on endangered languages - French National Centre ...
1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing...
-
Upload
abner-whitehead -
Category
Documents
-
view
215 -
download
0
Transcript of 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing...
![Page 1: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/1.jpg)
1
LingDyFebruary 13, 2012
TUFS, Tokyo
David NathanEndangered Languages Archive
Hans Rausing Endangered Languages ProjectSOAS, University of London
Data management
![Page 2: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/2.jpg)
2
Two most valuable strategies
design and use a filename system work out your basic units of documentation
and the relationships between them
- if you get these right, it will do the “heavy lifting” of your data management strategy- data and metadata are intertwined, points in a spectrum rather than different things
![Page 3: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/3.jpg)
3
Three most important qualities
consistency machine readability
computer programs can act on your data in terms of its proper structures and categories
bad example
documentation of conventions, structures, methods
![Page 4: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/4.jpg)
4
Data management
understand and model the data (units, relationships)
use appropriate data structure methods – in both file contents and organisation
use appropriate and conventional data encoding methods (e.g. Unicode)
be explicit and consistent plan for flow of data, working with others,
across different systems document steps, decisions, conventions,
structures think ahead to archiving
![Page 5: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/5.jpg)
5
Managing data in your computer
design a well-organised system of folders so that you can always find your stuff according to what it is, not: where the software decided to put it what the software decided to call it when/where you last used it what someone else called it
![Page 6: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/6.jpg)
6
File structures and names
design folder structure as a logical hierarchy that suits your goals, content and work style have materials gathered within one overall
directory (e.g. for backup) make directories for relevant categories,
e.g. sessions, media types, dates design it so that you will always be able to
find things you may need to restructure at different
points in your project, e.g. move from date-based to session-based structures
![Page 7: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/7.jpg)
7
Designing a file/folder structure
it should relate to reality locations should make sense, so you (and
others) will know where to look for things (where do you keep your passport; favourite cup?)
the best location is "the place that one would naturally look to find it"
![Page 8: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/8.jpg)
8
3 models for file system structures
tree of descriptive folder- and file-names one folder with descriptive filenames one folder with systematic filenames
… what else is needed?
![Page 9: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/9.jpg)
9
On identifiers
real world objects are inherently identified because of their physical uniqueness - an unlabelled cassette is only poorly identified
digital objects have no such physical independence - they depend on the identifiers that we give them
three types of identifiers: semantic keys relative
![Page 10: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/10.jpg)
10
On identifiers
semantic, e.g. Nelson Mandela The Sound of Music SA_JA_Bongo_Palace_Land Dispute
Trial_015_29-04-2010.wav *
* SA_JA_Bongo_Palace_Land Dispute Trial_015_29-04-2010.wav
![Page 11: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/11.jpg)
11
On identifiers
keys (disambiguators), e.g. 1137204 (a student number) 0803 211 6148 (a telephone number),
p12893fh23.pdf (some system's reference number)
![Page 12: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/12.jpg)
12
On identifiers
relative, e.g. 67 High Street the secretary index.html metadata.xls
![Page 13: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/13.jpg)
13
On identifiers
your collection will have a mix of these but it is important to be aware of the differences and limitations, for example: semantic identifiers: invite name clashes keys: a program or process might depend
on the identifier to work properly relative identifiers: if you move them you
typically change or destroy their meaning
![Page 14: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/14.jpg)
14
Objects and identities
a digital object’s identity includes its location a file’s full identity = path + filename the path is a representation of the volume
and the directory (folder) hierarchy if the full identity is unambiguous then
everything can be fine, compare: c:\\dogs\spaniels\rover.jpg c:\\cars\british\rover.jpg
or lectures\syntax\20091103\lect-notes.doc
![Page 15: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/15.jpg)
15
Objects and identities
but semantic identifiers are potentially dangerous, because just adding more chunks to disambiguate them will not work: my\rover.jpg my\white_rover.jpg
so objects that are not semantically unique need identifiers which are either keys, or relative
![Page 16: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/16.jpg)
16
Segue to file names
(having said all that) filenames are filenames, and do not
necessarily identify other entities common mistaken assumptions:
that a filename “dp_verbs_39.wav” means there is an entity “dp_verbs_39”
that files are logically linked just by sharing some part of their filenames
![Page 17: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/17.jpg)
17
File naming
we tend to be unsystematic in naming files. This might be OK, if you have a large amount of files and a method that already does everything you need to do (and will need to do in the future)
but filenames that are unsystematic or are non-standard will cause problems, eventually
![Page 18: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/18.jpg)
18
Manage file names from the start
a new file: don’t accept the default filename or
location suggested by an application when you first save the file
put it where it belongs, immediately. If necessary, create the place (directory/path) where it belongs
name it according to your naming system!
![Page 19: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/19.jpg)
19
Filename rules
all filenames should have correct extensions each filename should have only one ".",
before the extension do not use characters other than letters,
numbers, hyphen - and underscore _ avoid non-ASCII characters keep filenames short, just long enough to
contain the necessary identifier - don't fill them up with lots of information about the content (that is metadata!)
![Page 20: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/20.jpg)
20
How about these file names?
1. ready.audio.wav2. ReAlLyOdDtOReAd.txt3. éclair.jpg4. éclair_fr.jpg5. e'clair.jpg6. french-cake.jpeg7. french-cake.jaypeg8. lexicon-master9. ɘɫIɲʰ.eaf10. ice cream.doc11. OBAMA.TXT12. Obama.txt
![Page 21: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/21.jpg)
21
Make filenames sortable
make filenames usefully sortable:
20100119lecture.doc 20100203lecture.doc
gr_transcription_1.txtgr_transcription_2.txtgr_transcription_9.txt gr_transcription_53.txt
gr_transcription_001.txtgr_transcription_002.txtgr_transcription_009.txtgr_transcription_053.txt
![Page 22: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/22.jpg)
22
Associating files
you can make resources sortable together by giving them the same filename root (the part before the extension), or part of the root:
gr_reefs.wavgr_reefs.eafgr_reefs.txt
paaka_photo001.jpgpaaka_photo002.jpgpaaka_txt_conv203.wavpaaka_txt_conv203.eafpaaka_txt_lex.doc
![Page 23: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/23.jpg)
23
Avoid metadata in filenames
avoid stuffing metadata into filenames. A filename is an identifier, not a data container
better to use a simple (semantic) filename or a key (i.e. meaningless) filename, and then create a metadata table to contain all the relevant information
a table can properly express all the information, contain links etc, and is extensible for further metadata
![Page 24: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/24.jpg)
24
Avoid metadata in filenames
e.g. Paaka_Reefs_Dan_BH_3Oct97.wav better:
paaka_063.wavplus
paaka_063.txt
language topic speaker location date
Paakantyi Reefs at Mutawintyi
Dan Herbert Broken Hill 1997-10-03
paaka_063.txt
![Page 25: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/25.jpg)
25
A filenaming system
carefully design a filename system for your data and document the system so that somebody else can understand it
one documenter’s new system:
aaa_bb_cc_yyyy-mm-dd_nnn.wav
![Page 26: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/26.jpg)
26
A filenaming system
aaa_bb_cc_yyyy-mm-dd_nnn.wavaaa = village codebb = (main) speaker codecc = genre/event codeyyyy-mm-dd = date (why this order?)nnn = optional number (e.g. 001).wav = correct extension for file content type
![Page 27: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/27.jpg)
27
Documenting the filename system
describe the system- how would you describe it?- where would you put the description?
document the codes – this is probably part of your metadata
![Page 28: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/28.jpg)
28
On changing file names
decide if it’s possible, benefits and side effects (e.g. loss of links in ELAN files)
design a system first don’t change names in situ – copy data set
and gradually migrate it to your new system document file name changes if possible, automate or copy and paste
filenames if possible, use machine processes, e.g.
system filename listings, XLS formulas
![Page 29: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/29.jpg)
29
Metadata
metadata is data about data• for identification, management, retrieval of
data• provides the context and understanding of
that data carries those understandings into the future,
and to others
![Page 30: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/30.jpg)
30
Metadata
it reflects the knowledge and practices of data providers
… and therefore defines and constrains audiences and usages for the data
all value-adding to recordings of events (annotations transcriptions, translations, glosses, comments, interpretations, part of speech tagging etc) are actually metadata
![Page 31: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/31.jpg)
31
OLAC: Open Language Archives Community categories include:
IMDI: ISLE Metadata Initiative (IMDI) more categories, software dependent
Common metadata standards
TitleIdentifierCreatorContributorLanguageSubject.language
DateDescriptionFormatTypeRightsCoverageRelation
![Page 32: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/32.jpg)
32
Types of metadata
people metadata – creator’s / delegate’s details descriptive metadata – content of data administrative metadata – eg. date of last edit,
relation to other data preservation metadata – character encoding, file
format access and usage protocols
![Page 33: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/33.jpg)
33
Meta-documentation
Nathan (2010): ‘[a]nother way to think of metadata is as meta-documentation, the documentation of your data itself, and the conditions (linguistic, social, physical, technical, historical, biographical) under which it was produced. Such meta-documentation should be as rich and appropriate as the documentary materials themselves.’
meta-documentation = documentation of language documentation models, processes and outcomes
![Page 34: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/34.jpg)
34
Additional meta-documentation
identity of stakeholders involved, and their roles attitudes of language consultants, towards their
languages and towards the documenter and documentation project
relationships with consultants and community (Good 2010 mentions what he called ‘the 4 Cs’: ‘contact, consent, compensation, culture’);
goals and methodology of researcher, including research methods and tools, corpus theorisation (Woodbury 2011), theoretical assumptions behind annotation (abbreviations, glosses), potential for revitalisation
![Page 35: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/35.jpg)
35
project and researcher biography: knowledge and experience of the researcher and consultants (eg. how much fieldwork the researcher had done at beginning of project, what training the researcher and consultants had received)
for funded projects: grant application, reports, email communications
agreements entered into – formal or informal (eg. Memorandum of Understanding, future compensation arrangements), and any promises made to stakeholders
relationships between this and other projects
Additional meta-documentation
![Page 36: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/36.jpg)
36
ELAR model
ELAR
Project Project
Collection Collection
Bundle Bundle
Item
Item
Item
Item
Item
Collection
Metadata can apply to: Project Collection (deposit) Bundle Item (file)
![Page 37: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/37.jpg)
37
How do we store metadata?
in our heads – problem: degrades rapidly and not preservable or portable
on paper – problem: not easily searchable or extensible
within files (headers) – problem: not easily searchable or extensible
in file/folder names (eg. SasJBpka09-12_int03.wav – problem: difficult to maintain, breaks easily, not all semantics can be expressed
better: in a metadata system
![Page 38: 1 LingDy February 13, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f1e5503460f94c3589c/html5/thumbnails/38.jpg)
38
Metadata systems
plain text structured text (eg. Word tables, XML, Toolbox) spreadsheet (eg. Excel) database (eg. Filemaker Pro, Access, MySQL) metadata manager (IMDI, SayMore) (note: some of these may need to be exported
to preservable formats, e.g. CSV, XML)
or some combination of these