ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January...

25
ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M. Garrity Microbiology and Molecular Genetics Michigan State University

Transcript of ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January...

Page 1: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

An Overview of Persistent Identifiers

George M. Garrity

Microbiology and Molecular Genetics

Michigan State University

Page 2: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

The phone call from Peru…

To provide the TEG with an overview of persistent identifiers and digital objects

Explore both the technical and social/policy issues

Provide some perspective on how persistent identifiers have been applied in two settings

Mature application - CrossRef

Evolving application - NamesforLife

Offer some thoughts on how PIDs might be applied to Certificates of Origin and Traditional Knowledge

My assignment

Disclaimers An end-user of persistent identifiers

Dual interests and IP in this space

Page 3: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

So, what’s the problem?

“…link heterogeneous electronic libraries.

The difficulties inherent in this third objective ultimately led to this paper. ”

“But for the bioinformatician concerned with integrating and computing upon distributed information… In second place is perhaps naming (identifying), with all the gloriously idiosyncratic embedded semantics of local identifiers in disparate forms.”

Kahn and Wilensky1993

Clark 2003

Page 4: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

So, what’s the problem?

“Even well-formed and properly applied names can serve as a source of confusion and considerable frustration. This is hardly a new problem.”

Garrity and Lyons2003

“Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.”

Report of the NISOIdentifiers Round-Table 2006

“And now, a much more succinct way to say this: our systems are autistic. They don’t make inferences. When we learn something in one system or one area, it doesn’t carry over to other areas.”

McComb 2006

Page 5: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Let’s start with some working definitions

An instance of an abstract data type that has two components: metadata and key metadata

Key metadata includes a handle

A handle is a globally unique identifier that is bound to the digital object

Digital objects

differ from database records and files,

are stored in network accessible repositories,

and are accessed using a repository access protocol.

Other key properties

Digital objects

From: Kahn and Wilenski 2006 Int J. Digit. Lib 6: 115-223

Page 6: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Essential elements inHuman - machine communications

Machine - machine communications

Identifiers

Ideally… Exist as an unambiguous string

Context and application dependentActionable

Resolvable

Other points to considerSemantically opaque

Global or local

Unique or non-unique

Unanticipated uses

Page 7: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Definitions (continued)

A name or an identifier for a resource that uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.

Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.

PersistentIdentifiers

From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001

Page 8: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Definitions (continued)

Name resolution The process of mapping a persistent identifier to a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).

PID URLPID1

PID2

PID3

URL1

URL2

URL3

Resource

Identifies LocatesName resolution

Page 9: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Inherent in the design of such systems….

Name registration&

Name resolution

Name registration&

Name resolution

AuthorityAuthority

PID URLPID1

PID2

PID3

URL1

URL2

URL3

ResourceMetadata

PID URL

IdentifiesIdentifies LocatesLocates

UserUser

Key metadataKey metadata

Global registryGlobal registry

Page 10: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

DOIdirectory

URLURL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

URL

Content

Content

Assigner

DOIdirectory

DOIdirectory

DOIDOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

doi>doi>doi>

Source: Norman Paskin, International DOI Foundation

Page 11: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Comparing identifiers

A single unambiguous string

A numbering scheme

A label that identifies an entityISBN 0-387-98771-1

ATCC 27126*

L-681,572-001

A method of providing consistent syntax to denote class membership of an entity.

A formal standard or industry convention

An arbitrary internal system

Key point is establishing a 1:1 correspondence between labels and members

Enumeration

The number or label are simply strings

Page 12: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Comparing identifiers (cont.)

A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure.

Actionable identifiers

URI (URN and URL)

ISBN numbers as UPC/EAN identifiers

Does not mandate a method of creating labels

Does not create a managed environment

An infrastructure specification

Page 13: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Includes Unique identifiers

A formalized infrastructure

Management policies for registration, structured interoperable metadata, policy, and governance mechanisms.

ExamplesUPC/EAN barcodes and RFID tags

Digital object identifiers (digital identifiers of objects)

A fully implementedidentifier system

Comparing identifiers (cont.)

Page 14: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Desired properties of a candidate PID

Semantically opaque - avoid the pitfalls of embedded meaning

Governance - is there a technical and social framework overseeing the development, implementation and “marketing’ of the PID?

Persistence - is there a mechanism in place to guarantee persistence of issued PIDs, when so desired?

Registration - is there a mechanism for global registration of the PIDs or can anyone issue PIDs?

Metadata - is there a minimal requirement for metadata associated with each identified object?

Accepted standard - is there evidence that the PID is an accepted standard?

Globally unique - are the PIDs globally unique?

Widespread usage - how many PIDs have been issued and what is the rate of issuance of new PIDs?

Page 15: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Desired properties of a candidate PID (cont)

Object/location - what does the PID identify?

Actionable - are network services attached/imbedded?

Unique - does the resolution service check for uniqueness at the local level?

Interoperability - can the identifiers be readily incorporated into other applications without modification or permission?

Granularity - can the identifiers be assigned to subcomponents (nesting of entities within entities).

Business model - is there a compelling business need for the PIDs to insure that the infrastructure can be maintained in a self-supporting manner?

Page 16: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Comparison of identifier properties

OpaqueGovernancePersistentRegistrationMetadataAccepted standardGlobalWidespread useObjectActionableUniqueInteroperableAccession numbers - - V - V - - + + - - -LSID - - ? - V ? V ? - + + ?Gene names V - - - - + - + + - - -PURL - - - - + ? - - + + + +Taxid + - - - + - - ? + V + ?DNS - + - + - + + + - + + +Taxonomic names - + + v - + + + + - - -Handle + - + + + - + ? + + + +DOI + + + + + + + + + + + +

Page 17: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

What does a Digital Object Identifier look like?

The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly.

The suffix is an opaque string supplied by the content provider.Handle software stores a mapping of the Handle to one or more

locations (or services) In virtually all cases today, the Handle is mapped to a location (URL).

http://dx.doi.org/10.1007/bergeysoutlineresolves to

http://141.150.157.80/bergeysoutline/main.htm

Which used to be:

http://www.springer-ny.com/bergeysoutline

10.1234/myownnumbers-123.00001

prefix suffix subsuffix

Page 18: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Syntax of some other PIDs in “common” use

<Handle>::=<Handle Prefix> "/"<Handle Suffix>

http://hdl.handle.net/10.1099/ijs.0.64483-0

PersistentURLs

LSIDLSID Life ScienceIdentifiers

<purl>::=<protocol>/<resolver>/<name>http://purl.oclc.org/OCLC/OCLC/PURL/FAQ

urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev>

http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2

Page 19: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Two implementations using DOIs

Independent membership association,founded and directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls.

The largest and most successful implementation of DOI services.

NamesforLife is a proprietary semantic resolution service developed at MSU. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related goods and services.

Page 20: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

“…because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.”

Rumsfeld’s axiom and knowledge bleed

Page 21: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

The knowledge gradient

Unkno

wnun

know

ns

Know

n kn

owns

Basic and applied research advances

knowledge

Knowledge bleed results is a loss of

knowledge that has already been gained

Semantic resolution provides a mechanism to combat knowledge

bleed

Unkno

wnkn

owns

Know

n un

know

ns

Page 22: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Page 23: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Page 24: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Ramifications of misunderstanding a name

Wrong assumptions, assertions, or hypotheses Misdiagnosis of infectious diseasesMisapplication of public policies

Highly significant

Significant Lost opportunities

Failure to reach potential customers potentially interested in marketed content, goods, and services at point of need.

The long-tail phenomenon*

Names trigger specificresponses

But, the concepts to which names apply are not static

May not always map 1:1

May require expertise for accurate interpretation

Page 25: ABS Governance Dialogues The Role of Documentation in ABS and TK Governance Lima, Peru 21 January 2007 An Overview of Persistent Identifiers George M.

ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance

Lima, Peru 21 January 2007

Some thoughts on selecting a PID for CO and TK

The intended use of the identifier

Syntactic rules governing the form of the identifier

What the identifier resolves to

The technical infrastructure that is available to support the identifier and the parties operating it

Policies governing creation, maintence, support, and persistence of the identifier

Information about any metadata related to the identifier that is or must be made available

A history about the identifier, including any changes in any of the above points over time.

Source: Report of the NISO Identifiers Roundtable 2006

Questions?