Triplifier talk

18
John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba Data Curation and Biodiversity Research -- Lessons from BiSciCol and a look at the “Triplifier Simplifier”

description

Connecting content with a tool to convert database and spreadsheet data to be useable on the semantic web.

Transcript of Triplifier talk

Page 1: Triplifier talk

John Deck, University of California, BerkeleyBrian Stucky, University of Colorado, BoulderLukasz Ziemba, University of Florida, GainesevilleNico Cellinese, University of Florida, GainesvilleRob Guralnick, University of Colorado, Boulder

BiSciCol TeamReed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John

Deck, RobGuralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate

Rachwal, BrianStucky, Rob Whitton, Lukasz Ziemba

Data Curation and

Biodiversity Research --

Lessons from BiSciCol and

a look at the “Triplifier

Simplifier”

Page 2: Triplifier talk

• BiSciCol is National Science Foundation funded 2010 – 2014

• Infrastructure to tag & track specimens & derivates in cyberspace

• Relies on globally unique identifiers (GUIDs) to track objects

• Implements a Linked Data approach

• Provides support for the Global Names Architecture

Page 3: Triplifier talk

Taxonomic Type Filter

Class Filter

X

X

Specimens

Tissues

Sequences

A Biological Relationship Graph …

Page 4: Triplifier talk

Why Linked Data? Why BiSciCol?

(Prefers to collect stuff)

Generates Lots of Data…

Here is Gustav’s Problem

Page 5: Triplifier talk

Biodiversity Data Challenges

Data is Distributed

Rapidly Changing

Technologies

Covers Multiple

Domains

Page 6: Triplifier talk

Group data into classes.

Publish.[ ] Ocean Sampling Day

[X] Moorea Biocode

[X] SI MSNGR System

[+] Add My Data

Link identifiers.

Is a dwc:Event

Solving Biodiversity Data Challenges with

BiSciCol and Linked Data

Assign identifiers. Is a dwc:Event

Page 7: Triplifier talk

The Triplifier(Advanced Interface)

Powered by:

Naming and Identifying Objects

Linking Objects

Publishing

Loading Data

Page 8: Triplifier talk

Advanced Interface: Loading Data

MySQL

Darwin Core

Archive

Mysql

DarwinCoreArchive

KEMU

Spreadsheets

Page 9: Triplifier talk

Advanced Interface: Entities

Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health R Biomed Inform. 2006 Jun;39(3):362-78.

78

From Gary Larsen and adapted by Barry Smith in Referent Tracking presentation at the Semantics of Biodiversity Workshop, 2012.

Result is identifiers assigned to Entities:78 a door .

427 a cat .

<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> a <dwc:Occurrence> .

<http://biocode.berkeley.edu/collectorevents/MIB_25> a <dwc:Event> .

Tissue

Page 10: Triplifier talk

Advanced Interface: Entity Relations

Relations as Triples:<http://biocode.berkeley.edu/collectorevents/MIB_25> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> .

<http://biocode.berkeley.edu/collectorevents/MIB_37> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> .

<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M037F10> .

<http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M028G5> .

Page 11: Triplifier talk

Qu

ery

Response

Triplify!: View graph based data

Page 12: Triplifier talk

The Triplifier (Simple Interface)

Publish

Page 13: Triplifier talk

What challenges are we facing now?

(for BiSciCol, Linked Data, and data integration

In general)

Page 14: Triplifier talk

Identifier IssuesPersistence

Assignment at the source is difficult

The digestible RFID tag

Solutions: • DOIs (http://doi.org/)• EZIDs (http://ezid.net/)

Solutions: • Calculated namespaces (e.g. geo:lat,lng) via PDAs• UUIDs (randomly unique)

Solution: • Promote use of URIs for identifiers in all Standards.

Semantic web requires URIs but many standards (including Darwin Core) do not require URIs for identifiersscheme : string

URI

Page 15: Triplifier talk

Classification Issues

Solutions: • Continue working on clarity in term

definitions• Work from upper level ontologies (e.g.

Basic Formal Ontology) to derive definitions.

Confusion between representational units

“Sample, Specimen, Individual, Aggregation”

Inadequate representational units

“Occurrence”

Page 16: Triplifier talk

Relation Issues

Solution: • apply directional links only where

appropriate.

Non-sensical conclusions are possible!

Page 17: Triplifier talk

Adoption IssuesCritical mass required for effective utilization

Reality is complicated

Solutions: • Work collaboratively (e.g.

BioPortal, hackathons, interdisciplinary workshops)

Solutions: • Work with aggregators (GBIF, VertNet, NCBI).• View Triples as a publishable unit

Page 18: Triplifier talk

• BiSciCol tackles biodiversity data challenges:

• Tracking and integration of objects across disciplines

• Linking derivatives back to their source

• BiSciCol is about community, collaborative practice

• Commitment to standards, ontologies

• Agreement on permanent, resolvable identifiers

• Triplification of data sources to enhance linked data

The BiSciCol Mission

http://biscicol.blogspot.com/ http://biscicol.org