Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
-
Upload
sarah-obrien -
Category
Documents
-
view
221 -
download
4
description
Transcript of Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
Persistent Identifiers (PIDs)&
Digital Objects (DOs)
Christine Staiger & Robert VerkerkSURFsara
Persistent Identifiers (PIDs)
• Pointers to data resources
• Digital Resources:Data, metadata, documents
• Real world objects: Species, patient, cell line • Globally unique
• Exist infinitely long
• Used to identify and retrieve resources
• Examples: ISBNs, BSNs, DOIs, EPIC PIDS, URIs
Digital Object (DO)
Data
PID
Metadata
• Synchronise PID, Data and Metadata during creation, maintenance and deletion of a digital object!
PIDs are static
World of data infrastructure (hardware)
Data2Data
1
Data4
Data3
PID 1 PID 2 PID 3 PID 4
Workflow1: Change storage environment
PID1 PID2
Storage site A Storage site B
Use Case 1: Digital repositories• PIDs point to landing page of the digital repository showing metadata• “Real” data can be downloaded from this page with another link• E.g. B2SHARE, 3.TU Datacentrum & DANS repositories• PID http://hdl.handle.net/11304/3265434c-4b34-11e4-81ac-dcbd1b51435e
resolves to https://b2share.eudat.eu/record/139
Use Case 2: Enabling data flows• PIDs point to data directly • If needed create another field specifying the data type to choose application
• Use data in workflow via PID, NOT via actual location!
Resolving PIDs
Global RegistryE.g. Handle
system
Client gets requestto resolve hdl:123/456
1. Client sends request to Global to resolve 0.NA/123 (prefix handle for 123/456)
hdl:123/456
2. Global Responds with Service Information for 123
#1
#1#2
#3
Secondary Site A, e.g. SURFsara
Secondary Site B
Local Service
#1 #2
Primary Site
4. Server responds with handle data
Service InformationLocal Handle Service
IP xc xc xc
xcxcxc
xcxcxc
xcxcxc
xcxcxc
xcxcxc
xcxcxc
xcxcxc
..
..
..
xcxcxc
..
..
..
xcxcxc
..
..
..
...
xcccxvxccxxccx
xcccxvxccxxccx
xcccxvxccxxccx
Example: Relationships between DOs
PID: prefix1/suffix1
Metadata:key1: …key2: prefix2/suffix2key3: prefix3/suffix3
PID: prefix2/suffix2
Metadata:key1: …key2: prefix1/suffix1
PID: prefix3/suffix3
Metadata:key1: …key2: prefix1/suffix1
• Part of/has part relationships
• Model cohort-patient relationship
• Model patient-samples relationship
Guidelines: Characteristics of PIDs
• What should be identifiable by a PID?• Define what is data and what is metadata
• Granularity of PIDs:• How much information should a PID contain?• Location• Checksums• Other system specific information• Do not put contents information of the data here!
• Don’t mix PIDs with other IDs, e.g. database IDs
• Opacity:No assumptions about data context in PID
Guidelines: Referable data
• How persistent is the data? What and how much in a DO may change?• When should a new DO be instantiated?• Versioning via PIDs?
• Define PID management processes:1. Connecting Data, Metadata and PID2. Handling changes in data and metadata3. Handling changes in storage environment4. Deleting data, metadata, or PIDs
• Which problem should be addressed with PIDs?
The handle system
• Offers a resolution service for PIDs
• Gives a lot of freedom for implementation, e.g. PID information types
• Software architecture designed for high availability and scalability
• Basis for several PID providers
• Costs: 50$ for registering a prefix with handle + 50$/year maintenance
• EPIC PIDs and DOIs built their service upon the handle system.Thus, a PID is a handle
PID systems
DOIs
• Data registry service• Library specific metadata standard incorporated in PID entry
(Author info, Dublin core, …) ensuring interoperability between registered data objects
• Costs: 0.06$-1$ per PID, depending on service (CrossRef) + annual fee
EPIC PIDs
• Data registry service• Create own metadata for PIDs for data interoperability• Only costs for the handle service• With one prefix one can create as many PIDs as wanted
Example: Python epicclient
…
B2SAFE: iRODS and PIDs @ KNMI
NFS mount
iRODS
dCache
iRODS
PID
HPSS
DMF
OS: /data/orfeus/data/continuous/...iRODS: /ORFEUS/eudat/data/continuous/…iRODS: /vzSARA1/eudat/knmi/…
KNMI
NFS share
Seismicsystem
Dataflow KNMI SURFsara
The B2SAFE is implemented as a 2 step process:
1. Register a file in irods
ireg a file in iRODS @ KNMI create a handle/PID @ KNMI
2. Replicate a file in irods to an other node
Replicate the registered file to SURFsara Create a handle/PID @ SURFsara Update the handle/PID @ KNMI
Example handle
Domain / prefix / unique identifier
Handle/PID @ KNMI:
http://hdl.handle.net/11230/7bc49fd6-2836-11e4-955a-d89d6771dd88?noredirect
Handle/PID @ SURFsara:
http://hdl.handle.net/11112/387ed2e4-5371-11e4-92a8-a0369f0b5f26?noredirect
Installation
• EPIC client, e.g. python or perl client
• Handle server and an EPIC API server
• iRODS and B2SAFE for ingesting data (optional)
SURFsara provides
• Handle server
• EPIC API
How to obtain a handle prefix
• The production prefix has to be purchased from CNRI.
• Costs 50$/year plus once 50$ for request
• More information on how to obtain a handle prefix:http://handle.net/service_agreement.html
• More information on how to make use of SURFsara’s PID service:http://eudat.eu/User+Documentation+-+PIDs+in+EUDAT.html