TEXT ENCODING INITIATIVE (TEI)
description
Transcript of TEXT ENCODING INITIATIVE (TEI)
![Page 1: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/1.jpg)
TEXT ENCODING INITIATIVE (TEI)
Inf 384C
Block II, Module C
![Page 2: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/2.jpg)
TEI History
• The developing organizations first met in 1987– Association for Computers and the Humanities (ACH)
– Association for Computational Linguistics (ACL)
– Association for Literary and Linguistic Computing (ALLC)
• 1990—first Version TEI P1
• 1992—TEI P2
• 1993—TEI P3
![Page 3: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/3.jpg)
TEI History Continued
• Principles for the development of TEI– Standard format for data interchange in humanities research
– Guidelines for encoding texts in the same format
– Define a recommended syntax
– Define a meta language for description of text-encoding schemes
• Future Developments– Linguistic description and grammatical annotation
– Historical analysis and interpretation
– Base tag sets for further document types
– Manuscript analysis and physical description of text
![Page 4: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/4.jpg)
General Introduction to SGML and XML
![Page 5: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/5.jpg)
The Evolution of SGML and XML
• 1960’ Generalized Markup Language by IBM 1960’s
• 1970’s & 1980’s ANSI initiates project to develop a Standard text-description language based on GML
• 1983 SGML became an industry standard
• 1986 ISO ratified a standards for SGML
• 1990’s Tim Berners-Lee developed HTML a simple formatting markup language for the World Wide Web
• Mid 1990’s XML was developed by the W3C to combine the flexibility of SGML and the simplicity of HTML
![Page 6: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/6.jpg)
Benefits of SGML and XML
• SGML is a toolkit for developing specialized markup languages– Specifies the structure of information
– Enables interoperability between multiple platforms
– Acts like a database
– ail encompassing
• The DTD acts as a blueprint for document structure
• XML provides a manageable framework in which you can define your own elements
![Page 7: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/7.jpg)
XML Syntax
• Information content must have start and end tags– Case is significant– Elements may not overlap– Elements can nest one inside another
![Page 8: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/8.jpg)
The XML Environment
• XML Editor
• XML Parser/Validator
• Display program
• DTD or schema to define elements
• Style sheet for display of elements
![Page 9: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/9.jpg)
The XML Document
• Document prologue– XML declaration
– Document type declaration• Points to root element
• Points to external standards (DTDs, namespaces)
• Document itself– Bracketed by root element
– Contains elements, attributes, entities
![Page 10: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/10.jpg)
The Document Type Definition
![Page 11: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/11.jpg)
The DTDDocument Type Definition
• DTD defines a document’s structurei.e. it is a set of rules and declarations that specify what tags can be used and what these tags can contain
• DTD validates documents- determines which documents conform to language
- reduces possibility of errors
• DTD provides blueprint for documents- specifies how to handle elements
- specifies which elements are allowed
![Page 12: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/12.jpg)
The DTDDocument Type Definition
• The DTD has four main functions: 1. declares a set of allowed elements
“vocabulary”2. defines content model for each element
“grammar”3. declares set of allowed attributes for each element4. provide various mechanisms to make
management of model easier(Ray, Chapter 5, p 148)
![Page 13: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/13.jpg)
Basic Structure of DTD-Element Declaration-
<!Element name (content-model)>
Holds two functions:
1. Adds a new element
2. States what can go inside the element
• For every element that appears in the document, one must be identified in the DTD
• Order of declarations is important
![Page 14: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/14.jpg)
<!Element name (content-model)>
“vocabulary”
• Denotes NAME of element that appears in mark-up tag
(case-sensitive-LOWER)e.g. title, graphic, article, thingie
“grammar”
• Formula that delineates what kind of content, how many and in what order
1. Empty elements: EMPTY2. No content restrictions (little
value): ALL 3. Only character data, no
elements: #PCDATA4. Only elements: formula5. Mixed Content: content
model
![Page 15: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/15.jpg)
Basic Structure of a DTD-Attribute Declaration-
<!attlist name (attname1 atttype1 attdescl1)
(attname2 atttype2 attdescl2)>
For each element that appears in document, attributes of the
element must be declared
All attributes are declared in one place, attribute list
![Page 16: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/16.jpg)
<!attlist name (attname1 atttype1 attdescl1)>
“vocabulary”
• Name of element to which the attributes belong
• Same as name as element declared earlier
e.g. title, article, thingie
“Attribute declarations”
attname1 Gives attribute name
atttype1 Specifies datatype of
attribute, list of valuesCDATA, NMTOKEN, ID
attdesc1 Describes behavior
1. default value “high”
2. author specified value#REQUIRED, #FIXED,
#IMPLIED
![Page 17: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/17.jpg)
The DTDDocument Type Definition
“It is important to remember that every document type definition is an interpretation of a text. There is no single DTD which encompasses any kind of absolute truth about a text, although it may be convenient to privilege some DTDs above others for particular types of analysis.”
TEI Guidelines for Electronic Text Encoding and Interchange http://etext.virginia.edu/TEI.html
![Page 18: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/18.jpg)
The TEI DTD
• Uses basic structural elements of general DTD• Designed to simplify the task of choosing an appropriate
set of tags for the text in hand.• Selects appropriate combination of smaller tag sets, each
containing some set of tags likely to be used together1. core tag sets – standard components that are always
included, no encoder action2. basic tag sets – basic building blocks for text types,
encoder must select at least one3. additional tag sets – extra tags compatible with all other
tag sets, encoder may add with basic tags in any combination
http://www.tei-c.org/P4X/DTD/
![Page 19: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/19.jpg)
The TEI Header
![Page 20: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/20.jpg)
Basic Elements of TEI
• Paragraphs <p>
• Punctuation <stop.abbr>, <stop.sent>
• Quotations <q> or <quote>
• Lists <list>, <item> etc.
• Bibliographic Citations <bibl>
• THE HEADER! <teiHeader>
![Page 21: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/21.jpg)
The TEI Header
• Required of every TEI text, composed of four parts
• May be large and complex or very simple• The header may differ for documents not
based on written text, such as computer files or spoken text
• The header is not a library cataloging record, although the intent is similar
![Page 22: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/22.jpg)
Four Parts
• File Description <fileDesc>
• Encoding Description <encodingDesc>
• Text Profile <profileDesc>
• Revision Description <revisionDesc>
![Page 23: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/23.jpg)
File Description <fileDesc>
• <titleStmt>
• <editionStmt>
• <extent>
• <publicationStmt>
• <seriesStmt>
• <notesStmt>
• <sourceDesc>
![Page 24: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/24.jpg)
Encoding Description <encodingDesc>
• <projectDesc> • <samplingDecl> • <editorialDecl> • <tagsDecl> • <refsDecl> • <classDecl> • <fsdDecl> • <metDecl> • <variantEncoding>
![Page 25: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/25.jpg)
Profile Description <profileDesc>
• <creation>
• <langUsage> • <textClass>
![Page 26: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/26.jpg)
Revision Description <revisionDesc>
• <revisionDesc>
• <change>
![Page 27: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/27.jpg)
Examples and Application
![Page 28: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/28.jpg)
Examples and Application
• Dumble Geological Survey– A Geological survey of Texas from the late 19th Century comprised of
twelve volumes
• Digitally imaged monographs processed with OCR software to produce text
• Text marked up in XML using the TEI Lite specifications
• http://www.lib.utexas.edu/books/dumble/
![Page 29: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/29.jpg)
Dumble DTD
• Element and Attribute definitions
• Entity references
![Page 30: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/30.jpg)
![Page 31: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/31.jpg)
![Page 32: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/32.jpg)
Dumble Header
• Four basic sections– File description
– Encoding description
– Profile description
– Revision description
• Contains bibliographic information
• Contains information on the creation of the digital file
![Page 33: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/33.jpg)
![Page 34: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/34.jpg)
![Page 35: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/35.jpg)
![Page 36: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/36.jpg)
Why XML?
• Ability to record information about a document within the document.
• Ability to separate structure from format
• Ability to “wrap” or embed information in layers of xml
![Page 37: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/37.jpg)
XML Beyond TEI
• Open Archives Initiative (OAI)
• Semantic Web
• Open Archival Information System
• Digital Preservation
• Information Discovery
![Page 38: TEXT ENCODING INITIATIVE (TEI)](https://reader035.fdocuments.in/reader035/viewer/2022062305/568159f8550346895dc743c0/html5/thumbnails/38.jpg)
References
• A Sample TEI Markup
• Appendix A.2 Elements in TEI Lite
• OAI
• OAIS
• Learning XML
• www.tei-c.org/Lite/U5-eg.html
• www.tei-c.org/Lite/U5-taglist.html
• www.openarchives.org/
• http://www.rlg.org/longterm/oais.html
• Erik T. Ray