Archives hub ead 2010_extended
-
Upload
lisa-jeskins -
Category
Documents
-
view
401 -
download
1
description
Transcript of Archives hub ead 2010_extended
Introduction to EAD (extended version)
Lisa Jeskins and Bethan RuddockArchives HubMimas
By the end of today’s session we will have given you an introduction to:
• what interoperability means• what XML is, what it does and why it is important• EAD structure and syntax• EAD and hierarchies• UK Archives Discovery Network (UKAD)
Objectives
Interoperability
the ability of two or more systems or components to exchange information and to use the information that has been exchanged
(IEEE Standard Computer Dictionary )
What is Interoperability?
the ability to exchange/share data
integration of information resources presented in different formats
within a domain or across domains
advantages of cross-searching
XML facilitates interoperability
About Interoperability
Data exchange standards such as:
◦Z39.50
◦SRU
Types of interoperability
user can easily search across and retrieve resources from a wealth of systems
moving beyond individual websites for individual resources (silo approach)
End result…
http://www.ukoln.ac.uk/interop-focus/
◦to explore, publicise and mobilise the benefits and practice of effective interoperability across diverse information sectors
Interoperability Focus
An Introduction to XML
Extensible Markup Language
XML is a grammatical system for creating languages: ◦ a meta-language
Use XML to design your own markup language, consisting of meaningful tags that describe the data they contain
Create a language for describing…anything
What is XML?
XML does not do anything itself. It is pure information wrapped in XML tags
You must use other means to send, receive or display the data
Something to remember about XML
XML XML technologies
is used by to createDetailed description to view in a browser
Summary entry to view in a browser
PDF for print
XML is not about content, though there might be certain restrictions on content
XML is essentially about structure
Creating a consistent structure via XML tagging enables content to be easily identified (by machines) and used flexibly
XML provides structure
XML: elements
<title> Alice in Wonderland </title>
*XML allows you to define your tags*
<book>Alice in Wonderland</book>
<filmtitle>Alice in Wonderland</filmtitle>
<tag> content </tag>
Attributes are simple name/value pairs associated with an element
<tag attribute_name=“attribute_value”>content</tag>
<language>English</language>
<language langcode=“eng”>English</language>
<date normal=“2004”>20 Sept 2004</date>
XML attributes
XML Syntax
<tag attribute_name=”attribute_value”>content</tag>
<tree>hornbeam</tree>
<tree type=”deciduous”>hornbeam</tree>
<date normal=”2004”>20 May 2004</date>
<date>20 May 2004</date>
This is an XML element
<trees><tree type=“deciduous”>
<species>oak</species><fruit>acorn</fruit>
</tree><tree type=“coniferous”>
<species>pine</species><fruit>pine cone</fruit>
</tree></trees>
Nested elements
<catalog><cd>
<title>OK Computer</title><artist type=“band”>Radiohead</artist><genre>pop</genre><year>1997</year>
</cd>
<cd><title>Stanley Road</title><artist type=“solo”>Paul Weller</artist><genre>pop</genre><year>1995</year>
</cd></catalog>
XML example
<title>Stanley Road</title><artist>Paul Weller</artist><type>solo</type><genre>pop</genre><year>1995</year>
Alice in WonderlandLewis Carroll1 volumehardback
Content
Title Alice in Wonderland
Author Lewis Carroll
Extent 1 volume
Format hardback
Content in a database
<books><title>Alice in Wonderland</title><author>Lewis Carroll</author><extent>1 volume</extent><format>hardback</location></books>
XML: Structure
a root element is required<catalog>
…..all your tags and content…</catalog>
closing tags are required
case matters
XML must be well-formed
elements must be properly nested
<physdesc><extent>10 boxes</extent></physdesc>
<physdesc><extent>10 boxes</physdesc></extent>
XML must be well-formed (2)
attribute values must be enclosed in quotation marks, e.g. langcode=“fre”
element names must obey some basic rules◦ e.g. cannot start with numbers or punctuation characters,
cannot contain spaces ◦ e.g. <cd name> or <?name> would be incorrect
XML must be well-formed (3)
Marking up a recipe
Look at the following recipe for Chocolate Brownies – How would use XML to mark this up?
(I’m reliably informed the recipe works!)
375g butter 375g dark chocolate 1 tablespoon vanilla extract 6 eggs 500g sugar 225g plain flour
Preheat the oven to 180°C, 350°F or gas mark 4. Grease a swiss roll tin or oblong baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm.
Whisk the eggs and sugar into the mixture. Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for 20 to 30 minutes until the brownie is cooked around the edges, but still soft in the middle.
Cool and cut into squares. Makes 48 brownies
Chocolate Brownies
<recipe><title>Chocolate Brownies</title>
<ingredients><item>375g butter</item><item>375g dark chocolate</item><item>1 tablespoon vanilla extract</item><item>6 eggs</item><item>500g sugar</item><item>225g plain flour</item></ingredients>
<method><p>Preheat the oven to <temp>180°C, 350°F or gas mark 4</temp>.Grease a swiss roll tin or oblong
baking dish. Melt the chocolate and butter in a bowl over a saucepan of hot water. Add the vanilla and set the mixture aside until it is lukewarm. Whisk the eggs and sugar into the mixture.</p>
<p>Sift in the flour and baking powder and fold gently until the mixture is just combined. Pour into the greased tin and bake for <bakingtime>20 to 30 minutes</bakingtime> until the brownie is cooked around the edges, but still soft in the middle.</p>
<p>Cool and cut into squares.</p></method><serving>Makes 48 brownies</serving></recipe>
Possible XML markup for recipe
<ingredient>375 g butter</ingredient>
Or
<ingredient><item>375 g butter</item>
</ingredient>
Or
<ingredient><type>butter</type><quantity>375 g</quantity>
</ingredient>
Exchanging recipes..?
http://www.archiveshub.ac.uk/temp/recipe.xml
Displaying the recipe online
Valid XML: rules specify elements and attributes used and how used
Valid XML provides consistency and facilitates the exchange of data
Valid XML is important for displaying, processing and exchanging XML in a wider environment
Valid XML
A Document Type Definition or Schema defines the building blocks of an XML document
It specifies elements and attributes and defines how they can be used
People can agree to use a common DTD/Schema for interchanging data
Document Type Definitions
<?xml version="1.0" encoding="UTF-16"?><!ELEMENT recipe (title, intro?, ingredients+, method, serving*)><!ELEMENT title (#PCDATA)><!ELEMENT intro (#PCDATA)><!ELEMENT ingredients (item+)><!ELEMENT item (#PCDATA)><!ELEMENT method (p+)><!ELEMENT p (#PCDATA | temp | bakingtime)*><!ELEMENT temp (#PCDATA)><!ELEMENT bakingtime (#PCDATA)><!ELEMENT serving (#PCDATA)>
Recipe DTD
Schemas perform the same task as DTDs
Schemas use XML syntax
Schemas support complex data types
Easier to describe allowable content
One XML document can point to more than one schema
Schemas
<?xml version="1.0"?><notexmlns="http://www.w3schools.com"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3schools.com note.xsd">
<note> <to>Rachel</to> <from>John</from>
<heading>Reminder</heading> <body>Don't forget the concert!</body>
</note>
A simple XML document
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="note"> <xs:complexType>
<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/>
</xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Example of a simple Schema
What about display?
XML file DTD or Schema Valid XML
Blue Elephant Papers
……………………
…………
Blue Elephant Papers
Browse List
Use XML technologies – for displaying, retrieving, transforming, manipulating
XSLT – Extensible Stylesheet Language for Transformations
Many technologies available to manipulate XML documents
Displaying XML
transformation involves the reading in of an XML file and an XSLT file to a processor, which can then generate some output – typically HTML
Transformation of XML
XSLT
XML
processorHTML output
HTML is ONLY for display, typically in a Web browser
HTML tags do not describe the content
HTML cannot easily be extracted by machines for different purposes
XML tags can be specified by anyone; HTML tags are prescribed
HTML and XML (1)
HTML and XML (2)
HTML: <h1> Papers of Peter Rowe </h1>XML: <title> Papers of Peter Rowe </title>
HTML: <b> 21 May 2004 </b>XML: <date> 21 May 2004 </date>
International standard, supported by the W3C
It is open, licence free and platform neutral
It is human and machine readable
XML documents are text documents
Why use XML?
XML does not determine the presentation of the data◦ use stylesheets to present XML data◦ with proprietary systems content is inextricably bound up
with format
Hierarchical structure – good for archive descriptions!
More reasons to use XML...
XML is the main basis for defining data exchange languages
Meaningful tags facilitate extraction – data can be manipulated as required
...and for data exchange
All publicly funded bodies should use XML for data exchange (e-GIF)
XML has been widely adopted commercially as well as in the public sector
The Government mandates XML
XML is:◦ simple◦ flexible◦ great for data exchange
XML must be: ◦ well-formed ◦ valid
DTDs and Schemas:◦ to create valid XML◦ provide tags, attributes and rules
XML requires other XML technologies◦ e.g. stylesheets can transform XML for display
Summary
EAD: An introduction
EAD = Encoded Archival Description
EAD is XML for finding aids
A data structure standard – not a content standard
A structure that allows finding aids to be indexed, searched, retrieved and navigated
Compatible with ISAD(G)
What is EAD?
EAD is:
Flexible enough to deal with all types of finding aids: single or multi-level, long or short, lists or calendars etc.
Used to create new finding aids as well as converting old ones to standardised form
Used to share data between systems
What is EAD?
EAD is maintained and developed by an international working group
Develops and publishes documentation and tools: tag library, guidelines, EAD Cookbook, websites
EAD Working Group - EADWG
EAD structure
<ead>
<eadheader></eadheader>
<archdesc><did></did>
</archdesc>
</ead>
Basic EAD file structure
<ead> EAD root element<eadheader> EAD file information wrapper
</eadheader>
<archdesc> Finding aid wrapper
<did></did> Core collection information wrapper
</archdesc></ead>
Basic EAD file structure
EAD beetle
<archdesc>
<eadheader>
<did>
sub-fonds descriptions
<eadheader><eadid><filedesc>
<titlestmt><titleproper>
<profiledesc> <revisiondesc>
<eadheader>
EAD file informationIdentifier
TitleCreationRevision
Within <archdesc> there are elements for:
Description Presentation Hierarchy
Finding aid elements
<archdesc><did><scopecontent> <bioghist> <arrangement> <controlaccess>
Descriptive elements
Archival descriptionDescriptive informationScope and ContentBiographical/Admin. HistoryArrangementAccess points
<did><unitid><unititle><unitdate><origination><repository><physdesc>
<extent><genreform><physfacet>
<physloc><container><abstract>
</did>
Descriptive informationReferenceTitleCovering datesCreator(s)RepositoryPhysical description
ExtentFormPhysical Facet
LocationContainer typeBrief description
<did> elements
<archdesc level="fonds"> <did> <unitid>GB 0001 Foster</unitid> <unittitle>Papers of Dr Foster</unittitle> <unitdate normal = "1820-1833">1820-1833</unitdate> <repository>University of Gloucestershire</repository> <physdesc> <extent>1 box</extent> <physfacet>Four folders of letters, 230 folios</physfacet> </physdesc> <langmaterial><language langcode=“eng”>English<language> </langmaterial> <origination>Dr Foster</origination> </did>
Hub <did> EAD2002
<acqinfo><custodhist><appraisal><processinfo><accruals><altformavail><accessresrict><userestrict>
<prefercite>
Acquisition informationCustodial historyAppraisal and selectionProcess InformationAccruals information CopiesAccess restrictionsUser restrictionsCitation information
Administrative information elements
<bibliography><fileplan><otherfindaid><relatedmaterial><separatedmaterial><index>
Publication noteClassification schemeOther finding aidsRelated materialSeparated material Keywords
Additional information elements
<controlaccess><name><corpname><persname><famname><geogname><occupation><function><genreform><subject>
Controlled access headingsNames (general)Corporate body namePersonal nameFamily namePlace name OccupationsFunctions (administrative)Genre and FormSubject
<controlaccess> elements
<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;
<ref>; <ptr>; <dao>
HeadingsLayoutItalics and quotesLists
References, pointersand links to digital objects
Presentation elements
<head><p>; <lb> <emph>; <blockquote><list><item>;<chronlist><chronitem>;
<ref>; <ptr>; <dao>
HeadingsLayoutItalics and quotesLists
References, pointersand links to digital objects
Presentation elements
NB: EAD is NOT about the presentation of your finding aids, but about their
syntax. Separate software will take care of the display of the information.
ISAD(G) (v.2)
3.1.1 Reference code(s)
3.1.2 Title3.1.3 Dates of creation3.1.4 Level of description3.1.5 Extent of the unit3.2.1 Name of creator3.2.2 Administrative/Biographical
history3.2.3 Custodial history3.2.4 Immediate source of acquisition3.3.1 Scope and content3.3.2 Appraisal, destruction and
scheduling
EAD 2002
<unitid> countrycode and repositorycode attributes
<unittitle><unitdate><archdesc> and <c> level attribute
<physdesc>, <extent><origination><bioghist>
<custodhist><acqinfo>
<scopecontent><appraisal>
ISAD(G) to EAD
3.3.3 Accruals 3.3.4 System of arrangement3.4.1 Access conditions3.4.2 Copyright/Reproduction3.4.3 Language of material3.4.4 Physical characteristics3.4.5 Finding aids3.5.1 Location of originals3.5.2 Existence of copies3.5.3 Related units of description
3.5.4 Publication note3.6.1 Note
<accruals><arrangement><accessrestrict><userestrict><langmaterial><phystech><otherfindaid><originalsloc><altformavail><relatedmaterial> and <separatedmaterial>
<bibliography><odd>
ISAD(G) to EAD
EAD version 1 DTD
EAD 2002 DTD
EAD 2002 Schema
Available from http://www.loc.gov/ead/
Human-readable version: EAD Tag Library (Society of American Archivists)
EAD DTD
Library of Congress Official EAD site: http://www.loc.gov/ead/
Tag Library: http://www.loc.gov/ead/tglib/index.html
EAD Roundtable Help Pages: http://www.archivists.org/saagroups/ead/
EAD Documentation
EAD and hierarchy
ISAD(G) states that to be a conformant archival description a finding aid must:
Be hierarchical◦ Description from the general to the specific◦ Information relevant to the level of description◦ Linking of descriptions (logical sequence)◦ Non-repetition of information
Contain a minimum set of data elements
EAD and ISAD(G)
Recommended elements for lower level descriptions:◦ reference code ◦ title ◦ date(s) ◦ extent of the unit of description ◦ level of description
Lower level elements
ISAD(G) levels: Fonds Sub-fonds Series Sub-series File Item
EAD levels:<archdesc><dsc><c01><c02><c03><c04> <c05>
EAD and Hierarchy
<ead>…<archdesc>
[collection level description here]◦ <dsc>
<c01>[series] description 1<c02>[file] description 1</c02><c02>[file] description 2
<c03>[item] 1</c03><c03>[item] 2</c03>
</c02></c01><c01>[series] description 2....
◦ </dsc></archdesc>
</ead>
Representing hierarchies
c02 c02
c03 c03
c01
<c01 level = "subfonds"><did>
<unitid>GB 0324 MS 54</unitid><unittitle>Correspondence files</unittitle><unitdate>1920-1945</unitdate><physdesc><extent>4 files</extent></physdesc>
</did><scopecontent>…</scopecontent>
<c02 level = "series"><did>…</did><scopecontent>…</scopecontent>
</c02>
</c01>
Nesting items
EAD supports two ways of representing levels
<c> is used in A2A, <c0*> on the Hub
Slightly easier to use <c0*>, as the numbers give you more of an idea of the level you are working at
<c> or <c0*>?
<dsc type="combined">
<c level="series"> <did> <unitid>Series 1</unitid><unittitle>Correspondence</unittitle> </did><scopecontent>[...]</scopecontent>
<c level="subseries"> <did> <unitid>Subseries 1.1</unitid> <unittitle>Outgoing Correspondence</unittitle> </did>
<c level="file"> <did> <unittitle>AbbingerAldrich</unittitle> </did> </c> </c> </c> </dsc>
Hierarchy <c> tag
XML is a meta-language for creating mark-up languages
XML files require other technologies for display, processing, etc.
For archive finding aids EAD is the DTD/Schema to use
Summing-up
It is XML, which is an international standard
It is a simple and effective way of structuring content and providing meaning
Machines can manipulate the content in all sorts of ways
It is a great format to store finding-aids
EAD is a good thing because…
Cross-searching initiatives
Effective cross-searching requires:
◦Interoperability
which requires
◦Common standards
Cross-searching
UK Archives
UKAD: http://www.ukad.org/
To promote the opening up of data and to offer capacity for such a cross-searching capability across the UK archive networks and online repository catalogues
To lead and support resource discovery through the promotion of relevant national and international standards
To support the development and use of name authorities
UK Archives Discovery Network
To advocate for the reduction of cataloguing backlogs and the retro-conversion of hard-copy catalogues
To promote access to digitized and digital archives via cross-searching resource discovery systems.
To work with other domains and potential funders to promote archive discovery
UKAD
Fairly loose structure
Meetings about twice a year
Forum for discussion, sharing, connecting and collaborating
Creating a framework for activities (matrix)◦ International/national/regional◦ Meeting UKAD objectives, e.g. open up data; standards-based resource
discovery; retro-conversion
UKAD activities
Not many UK archives currently using EAD as a storage format
EAD will increasingly be used as an export format from proprietary database systems like CALM, for use in XML-based gateways such as Aim25 and the Archives Hub
New software becoming available all the time, which makes it easier to create, search and display XML – much of this is open source and often free
EAD in the real world
Differences in how EAD is used
Encourages interoperability but still requires work to ensure seamless cross-searching
EAD is flexible and includes a large number of tags which has advantages and disadvantages
EAD in the Hub and Aim25
XML is an international standard for sharing information
EAD is the XML language for archival finding aids
EAD is not a content standard
Use ISAD(G) for content guidelines and thesauri or authority files for index terms
Summing-up
You have used the Archives Hub’s EAD editor to create EAD records
XML Editors, such as XMetal or XMLspy can provide help with validating and with selecting tags and attributes
EAD will become increasingly important
Summing-up
Any Questions?