Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

19
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara

description

Digital Object (DO) Data PID Metadata Synchronise PID, Data and Metadata during creation, maintenance and deletion of a digital object!

Transcript of Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Page 1: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Persistent Identifiers (PIDs)&

Digital Objects (DOs)

Christine Staiger & Robert VerkerkSURFsara

Page 2: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Persistent Identifiers (PIDs)

• Pointers to data resources

• Digital Resources:Data, metadata, documents

• Real world objects: Species, patient, cell line • Globally unique

• Exist infinitely long

• Used to identify and retrieve resources

• Examples: ISBNs, BSNs, DOIs, EPIC PIDS, URIs

Page 3: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Digital Object (DO)

Data

PID

Metadata

• Synchronise PID, Data and Metadata during creation, maintenance and deletion of a digital object!

Page 4: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

PIDs are static

World of data infrastructure (hardware)

Data2Data

1

Data4

Data3

PID 1 PID 2 PID 3 PID 4

Page 5: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Workflow1: Change storage environment

PID1 PID2

Storage site A Storage site B

Page 6: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Use Case 1: Digital repositories• PIDs point to landing page of the digital repository showing metadata• “Real” data can be downloaded from this page with another link• E.g. B2SHARE, 3.TU Datacentrum & DANS repositories• PID http://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435e

resolves to https://b2share.eudat.eu/record/139

Page 7: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Use Case 2: Enabling data flows• PIDs point to data directly • If needed create another field specifying the data type to choose application

• Use data in workflow via PID, NOT via actual location!

Page 8: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Resolving PIDs

Global RegistryE.g. Handle

system

Client gets requestto resolve hdl:123/456

1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456)

hdl:123/456

2. Global Responds with Service Information for 123

#1

#1#2

#3

Secondary Site A, e.g. SURFsara

Secondary Site B

Local Service

#1 #2

Primary Site

4. Server responds with handle data

Service InformationLocal Handle Service

IP xc xc xc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

xcxcxc

..

..

..

xcxcxc

..

..

..

xcxcxc

..

..

..

...

xcccxvxccxxccx

xcccxvxccxxccx

xcccxvxccxxccx

Page 9: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Example: Relationships between DOs

PID: prefix1/suffix1

Metadata:key1: …key2: prefix2/suffix2key3: prefix3/suffix3

PID: prefix2/suffix2

Metadata:key1: …key2: prefix1/suffix1

PID: prefix3/suffix3

Metadata:key1: …key2: prefix1/suffix1

• Part of/has part relationships

• Model cohort-patient relationship

• Model patient-samples relationship

Page 10: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Guidelines: Characteristics of PIDs

• What should be identifiable by a PID?• Define what is data and what is metadata

• Granularity of PIDs:• How much information should a PID contain?• Location• Checksums• Other system specific information• Do not put contents information of the data here!

• Don’t mix PIDs with other IDs, e.g. database IDs

• Opacity:No assumptions about data context in PID

Page 11: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Guidelines: Referable data

• How persistent is the data? What and how much in a DO may change?• When should a new DO be instantiated?• Versioning via PIDs?

• Define PID management processes:1. Connecting Data, Metadata and PID2. Handling changes in data and metadata3. Handling changes in storage environment4. Deleting data, metadata, or PIDs

• Which problem should be addressed with PIDs?

Page 12: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

The handle system

• Offers a resolution service for PIDs

• Gives a lot of freedom for implementation, e.g. PID information types

• Software architecture designed for high availability and scalability

• Basis for several PID providers

• Costs: 50$ for registering a prefix with handle + 50$/year maintenance

• EPIC PIDs and DOIs built their service upon the handle system.Thus, a PID is a handle

Page 13: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

PID systems

DOIs

• Data registry service• Library specific metadata standard incorporated in PID entry

(Author info, Dublin core, …) ensuring interoperability between registered data objects

• Costs: 0.06$-1$ per PID, depending on service (CrossRef) + annual fee

EPIC PIDs

• Data registry service• Create own metadata for PIDs for data interoperability• Only costs for the handle service• With one prefix one can create as many PIDs as wanted

Page 14: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Example: Python epicclient

Page 15: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

B2SAFE: iRODS and PIDs @ KNMI

NFS mount

iRODS

dCache

iRODS

PID

HPSS

DMF

OS: /data/orfeus/data/continuous/...iRODS: /ORFEUS/eudat/data/continuous/…iRODS: /vzSARA1/eudat/knmi/…

KNMI

NFS share

Seismicsystem

Page 16: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Dataflow KNMI SURFsara

The B2SAFE is implemented as a 2 step process:

1. Register a file in irods

ireg a file in iRODS @ KNMI create a handle/PID @ KNMI

2. Replicate a file in irods to an other node

Replicate the registered file to SURFsara Create a handle/PID @ SURFsara Update the handle/PID @ KNMI

Page 18: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

Installation

• EPIC client, e.g. python or perl client

• Handle server and an EPIC API server

• iRODS and B2SAFE for ingesting data (optional)

SURFsara provides

• Handle server

• EPIC API

Page 19: Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.

How to obtain a handle prefix

• The production prefix has to be purchased from CNRI.

• Costs 50$/year plus once 50$ for request

• More information on how to obtain a handle prefix:http://handle.net/service_agreement.html

• More information on how to make use of SURFsara’s PID service:http://eudat.eu/User+Documentation+-+PIDs+in+EUDAT.html