Bioinformatics Forum: March 14-15, 2005 National Institute for Environmental Studies Bioinformatics...
-
Upload
lindsey-kennedy -
Category
Documents
-
view
218 -
download
2
Transcript of Bioinformatics Forum: March 14-15, 2005 National Institute for Environmental Studies Bioinformatics...
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
An Introduction to Digital Object Identifiers as
background to Names for Life
Catherine LyonsNames for LifeEdinburgh, UK
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs and Names for Life
Systematic taxonomy is a complex network of documents, data, and, concepts.
The DOI system is built from components that model complexity in other domains.
This is an unusual introduction to DOIs, in that it emphasizes those aspects of the DOI system that will be a particular strength in the management of taxonomy and nomenclature.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Identification and location of content
NIES home pagePage location:http://www.nies.go.jp/
Page content includes:
Content changes but location stays the same.
‘new’ two weeks ago
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Identification and location of content:linkrot monitoring
Broken Links: Just How Rapidly Do Science Education Hyperlinks Go Extinct? John Markwell and David W. Brooks, University of Nebraska, USAhttp://www-class.unl.edu/biochem/url/broken_links.html
“From the data so far, we estimate a half-life for these science education hyperlinks of approximately 55 months.”
(cited March 2005)
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Identifiers and Locators: URI, URN, URL
Uniform Resource
Identifier
Name
Locator
Codified by the Internet Engineering Task Force (IETF)
Focuses on syntax: ‘scheme’ : ‘scheme-specific string’
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Identifiers and Locators: URI, URN, URL
URI an identifier and/or a locator
URL a URI that encodes a location (using a transfer protocol scheme), example: http://www.nies.go.jp/
URN a URI whose scheme is ‘urn’; URNs provide a syntax ‘umbrella’ for other schemes,
they name resources without necessarily locating them (location requires the addition of software)
example: urn:isbn: 0-387-98771-1
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
“The Handle system is a comprehensive system for assigning, managing, and resolving persistent identifiers, known as ‘handles,’ for digital objects and other resources on the Internet. Handles can be used as Uniform Resource Names (URNs).”Invented mid-1990s by Robert Kahn (co-inventor of TCP/IP, ‘the internet’)and Robert Wilensky
Corporation for National Research Initiatives, a USA not-for-profitorganizationhttp://www.handle.net/introduction.html
Identifiers and Locators: Handles and DOIs
system
resolution
digital objects
persistent identifiers
Key terms
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Identifiers and Locators: Handles and DOIsIt is implicit in the Handle design that a digital object has associated metadata (data about data; here: data about the digital object). The core piece of metadata is the Handle itself.
Some Handle applications:Library of Congress, USADSpace (a repository system used by university
libraries)Digital Object Identifier (most Handles are DOIs)
A Handle (or DOI) is stored together with the location of what it resolves to.
Thanks to Norman Paskin of the IDF for the next slide.
DOIdirectory
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
Content
Content
Assigner
DOIdirectory
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs: background
DOI began as a publishing application, in the 1990s, when publishers anticipated a revolution in commercial electronic publishing.
The prospect of multiple formats and multiple rights demanded support for persistent identification of multiple types of digital objects.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs: what do Handles and DOIs look like?10.1234/myownnumbers-123
prefix suffix
The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly.
The suffix is an opaque string supplied by the content provider.
Handle software stores a mapping of the Handle to one or more locations (or services) In almost all cases, right now, the Handle is mapped to a location (URL).
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs and the IDF
DOI development is managed by the International DOI Foundation (IDF) The DOIs themselves are managed by Registration Agencies (RAs).
CrossRef (consortium of mainly publishers): the ‘Microsoft’ of RAs (approaching 15 million DOIs); registers DOIs for scholarly and professional publishing.
TIB (German National Library of Science and Technology): the latest RA ‘startup’; registers primary data in the earth sciences (pilot project used meteorological datasets).
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs, the IDF, and RAs
Other Registration Agencies cover:
mass market print, audio, video, and images,business publishing,Australian copyright management,UK government publications, European Union publications, Korean-language content, training and education material, the book industry.
This list is not exhaustive!
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
RAs and metadata
Each RA works with the IDF to define structured metadata for the digital objects it registers.
An RA may make arbitrarily fine distinctions between the types of objects it covers, according to the needs of its community. Each object type may be associated with a distinct metadata structure.
The IDF provides an integrated metadata structure to support any type of digital object from any RA, by means of an ontology.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs, metadata, and ontology
The IDF uses an abstract and extensible ontology, (Indecs), to model any entity (e.g., a digital object) in relation to any other.
Indecs contextualizes digital objects in relation their origination through the agency of other entities at a particular time and place.
Influences on the Indecs ontology include the Functional Requirements for Bibliographic Records produced by the International Federation of Library Associations (IFLA)
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs, application profiles, and services
Consider a digital object in the DOI sense:
It has a recognized type; i.e. it has properties in common with others of its type. It has metadata associated with it that is particular to its type. Consider these digital objects.
a scientific journal article (Article) a nomenclatural assertion (Name)
A name may have a synonym. A digital object of type ‘Name’ might have metadata element ‘hasSynonym’.
A digital object of type ‘Article’ would not have a ‘hasSynonym’ element.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs, application profiles, and services
The association of objects with types, and types with type-specific metadata, enable a DOI ‘Application Profile (AP)’.
An application profile gathers together digital objects that have common metadata properties. For a DOI in a given AP, a service can be implemented that exploits the metadata defined by its AP, and returns, for example, some text, a link, a menu.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs, application profiles, and services
Imagine …
Suppose there were a Biological Name AP;suppose there were a ‘Check for synonyms’ service:
‘Check for synonyms’ could be associated with digital objects in the Name AP (i.e., nomenclatural assertions).
‘Check for synonyms’ could not be associated with digital objects in the Article AP (and note that articles are the means by which names are published!).
We are getting close to Names for Life at last!Thanks to Norman Paskin of the IDF for the next slide.
Entity
784369
965876
456
908
453
Service Instance Service Definition
Service Definition
Entity
Entity
Entity
Entity
Entity
EntityService Instance
Application Profile
Application Profile
453784
Service Instance
Application Profile
Service Instance
Service Definition
The properties of groups of DOIs are defined as APs
APs have one or moreServices
Services havedefinitions
doi>DOI Data Model: AP Framework
New APs and services may be created or made available
Entities are identified by DOIs
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs services and multiple resolution
What do you expect to find at the end of a DOI? It is probably not the object identified, but
instead metadata and links, and perhaps a rights challenge (password or payment form).
There is likely to be more than one object.
A typical DOI identifies a journal article (a work, not a digital object). A work is often manifested in multiple objects: an abstract, a full-text web page (or pages), a pdf file, a printed journal issue. It may be hosted at multiple sites around the world.
what the DOI resolves to what the DOI identifies
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
DOIs services and multiple resolution
Putting in all together
A DOI is much more than a persistent link.
What can happen if you replace an ordinary link with a DOI reference?
Mostly, right now, you get a webpage of further options.
For some DOIs, you get an inline menu of options.
For some DOIs (in article bibliographies, through CrossRef) you get a forward link to an article citing the one you are reading.
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Names for Life: what will it really look like?
20012001
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Names for Life: what will it look like?
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
Names for Life: what will it look like?
NamesforLifeStrain currently bearsMarinomonas communisGet original NameGet current NameView Nfl TaxonView Nfl ExemplarView Nfl NomosView crosstaxa
Bioinformatics Forum:March 14-15, 2005
National Institute forEnvironmental Studies
Bioinformatics Forum:March 14-15, 2005
Namesfor life
name
taxon
combinedname
exemplar
nomos
journalarticle
geneannotation
anyonline
information
strainrecord
links from the web
journalarticle
strainrecord
geneannotation
journalarticle
journalarticle
links to the web
Using InfoObjects in the web
DOI
DOIDOI
DOI
DOI