Persistent identifiers for museum specimens, NeIC workshop, August 2015

download Persistent identifiers for museum specimens, NeIC workshop, August 2015

If you can't read please download the document

Transcript of Persistent identifiers for museum specimens, NeIC workshop, August 2015

  1. 1. PersistentIden+ers,NeICworkshopAugust2015inOslo DagEndresen,GBIFNorway,UiONaturalHistoryMuseum
  2. 2. Thepurposeofiden.ers istonamethings, makingitpossibletorefertothem. 2
  3. 3. Nameambiguity: Manythings(inGBIF)arenamed123 3 Catalognumber:123 GBIFID:543392241 urn:catalog:CAS:BOT:123 Bigelowiajuncea Catalognumber:123 GBIFID:1030591721 UAMb:Herb:123 Sphagnumgirgensohnii Catalognumber:123 GBIFID:893477175 Parideserithalion Catalognumber:123 GBIFID:1050327334 Cinchonaledgeriana Catalognumber:123 GBIFID:231564351 Umbrinacanariensis Catalognumber:123 GBIFID:931031820 Bromuskalmii Catalognumber:123 GBIFID:283363 urn:occurrence:Arctos:MVZ:Egg:123:164 Mercurialisovata Catalognumber:123 GBIFID:896547722 urn:occurrence:Arctos:MVZ:Egg:123:164 Contopussordidulusveliei
  4. 4. Whenistheiden.ergoodenough? Uniqueandpersistent-withinagivencontext. ThecommonexperienceisthatanidenEeriscreatedwithin asystemorwithinacontext,andthatatalaterdateitneeds tobeusedinanotherorlargercontext(KarenCoyle2006). Expandingcontext: 1. Withinonemuseumcollec+on(catalognumber). 2. Withinanetworkbetweenmuseumcollec+ons(collec+oncode+ cataloguenumber). 3. Withinbiodiversityinforma.onnetwork(ins+tu+oncode+ collec+on/datasetcode+cataloguenumber). 4. AttheInternet(e.g.hepURI,DOI,LSID,etc) 5. largercontextsarepossibletoimagineinthefuture!! 4
  5. 5. Expandingcontext 5 Internet Museum Iden+er
  6. 6. Iden.ersformuseumcollec.ons Thelongevityofmuseumsleadto: Theneedtouseiden3ersfromourpastinthecurrenthighly- networkeddigitalsystems(KarenCoyle2006[talkingaboutlibraries]). Specifyanamespacefortheiden+ers? URIuniformresourceiden+er(uniqueinthecontextoftheweb). URNuniformresourcename(namenot+edtoloca+on). URLuniformresourcelocator(networkloca+onasiden+er). PURLpersistentURL(commitmenttoservicelongevity). Somethingelse? DOIdigitalobjectiden+er ARKarchivalresourcekey UUIDuniversaluniqueiden+er 6
  7. 7. PersistentIden+er(PID) GloballyUniqueIden+er(GUID) UniversalResourceIden+er(URI) PersistentUniformResourceLocator(PURL) LifeScienceIden+er(LSID) DigitalObjectIden+er(DOI) Handlesystem(Handle) ArchivalResourceKey(ARK,EZID) UniversallyUniqueIden+er(UUID) 7
  8. 8. Photo:SmithsonianNa+onalMuseumofNaturalHistory,USNM-445024-Eutoxeres-aquila PURL Reuseexis3ngiden3ers 8
  9. 9. Globallyunique Scalability,numberofIDs Communityacceptance Long-termlife-cycle Resolvable,resolu+onservice(s) Costperiden+er People-friendlyormachine-friendly Solu+onforthegenera+onofnewIDs Centralgenera+on,PIDissuer Distributedgenera.onatsource 9
  10. 10. AUUIDisa16-octet(128-bit)36-charsnumber. Example:41d9cbb4-4590-4265-8079-ca44d46d27c3 Theprobabilityofoneduplicatewouldbeabout 50%ifeverypersononearthcreate600million UUIDs. Allowsforeasygenera.onatsourceina distributednetwork. 10
  11. 11. hepPURLUUID hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 11
  12. 12. Iden+er Resolver Loca+on Specimen Theresolverisasystemtoresolveloca+onsfromiden+ers, enablingretrievalevenwhentheloca+onchanges. hep://purl.org/nhmuio/id/[UUID] hep://gbif.no/resolver/[UUID] No-informaEonobject(hMpredirect) hMp303 redirect
  13. 13. hep://purl.org/nhmuio/id/UUIDhep://gbif.no/resolver/UUID hep://purl.org/gbifnorway/id/UUIDhep://gbif.no/resolver/UUID 13
  14. 14. Includingmachine readableformats 14
  15. 15. Catalognumber:O-L-000014hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 15
  16. 16. UUIDQRcodesformuseum objectsatNHM-UiOprovides: Machine-readableiden.ers (usingasimplesmartphone-ora barcodereader) Allowsfornewandecient workowsforcollec+on management. Deploymentforstableiden.ers appropriatefordata-basing. 16
  17. 17. hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 (machinefriendly) Catalognumber:O-L-000014 (humanfriendly) Ecientworkowrou+nes
  18. 18. hep://gbif.no/transcribe/ 18
  19. 19. 19 Somekeychallengesforthegroupwork ManyoftheoriginalsourcedatasetsindexedbyGBIFareregularlyupdatedandre-indexedbytheGBIFportal.Without stableandpersistentiden+ersinforma+ononthesameherbariumspecimen(orspeciesobserva+on)aresome+mes includedmorethanone.me,leadingtoduplicatedinforma.on-duplicatedinthesenseofmorethanone(unlinked) datarecordforthesameRealWorlden+ty. Withoutstableandpersistentiden+ersforherbariumspecimens(andspeciesobserva+ons)itisdiculttolinkthe samedatarecordindexedatdierentre-indexingcyclesoftheGBIFportal.Whenadatarecordpreviouslyindexedisnot re-iden+edinanewversionofagivendataset,thentherecordisdeletedfromtheportal,andthelinktoprevious versionsofthisdatarecordislost. Acompositekeyiden.er(suchastheDarwinCoretriplet)basedonacombina.onthemetadataaIributesfor ins+tutecode(dwc:ins+tuteCode),collec+oncode(dwc:collec+onCode),andthelocalspecimeniden+er (dwc:catalogNumber)isgenerallyusedasthespecimeniden+erinGBIF.However,allthreemetadataaeributescan (anddo)some+meschange. Whatcouldbeabestprac+ceguidelineforiden.erresolu.on.Isitusefultodeneandagreeona(setof)common andwell-denedresponseformat?Isitusefultoproviderecommenda+onsforasetofmetadataproleswithaclear setofdenedmetadataaeributes?Orwouldmoregeneralprinciplesandmoreopenrecommenda+onsbemorelikely tostandthetestof+meandremainrelevantwiththeemergenceofnewinforma+oninfrastructuretechnologies? Challenges,prosandconsofreusingobjectiden.ersandmetadataaIributetermsdeclaredbyotherswithoutfull controlofhowtheseobjectsandtermsaremaintained.Objectsandconceptsdeclaredforapar+cularpurposewilloren notmatchexactlytheneedssuitableforanotherpurpose.Howtoop+mallyreuseeachothersOWLontologies, metadatavocabulariesanddataobjectmodels? Iden.ersiden.fyingtheRealWorldphysicalobjects,theen++esthatthecollec+oncuratorsandusersofthe informa+oncareabout.Orshouldtheiden+erbeassignedtodatabaserecords?RealWorlden++eswillnothavea signaturebyte-sequenceandwillrelyofinterpreta+onofwhenanobjectisconsideredtobethesamething.
  20. 20. [email protected] DagEndresen [email protected] Chris+anSvindseth [email protected] Gary Larson, 1987 20 Workshop in Oslo 26th Aug