Prizms for Data Publication and Management Katie Chastain May 9, 2014.
-
Upload
madeline-henderson -
Category
Documents
-
view
215 -
download
0
description
Transcript of Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Prizms for Data Publication and ManagementKatie Chastain
May 9, 2014
The Goal
• Allow a scientist…– to convert data from comma-separated
value format (CSV) to a linked data format– to record provenance information about the
original dataset and conversion transformation
– to receive RDF immediately, for use in visualizations or other semantic applications
– to have a repository with appropriate access for collaborators
2
The Value of Linked Data
• Data is converted and annotated using community-standard vocabularies (ontologies).
• Allows queries in SPARQL across any or all data sets
• Provides a way to make metadata machine-readable
3
Revealing Implicit Information
4
Revealing Implicit Information
5
unitdouble
Measurement
characteristic
ofCharacteristic hasValue hasUnit
Tabular Datato Linked Data
• CSV2RDF4LOD is a powerful tool for converting tabular data into linked open data, able to represent many complex relationships between data fields– Command-line based– Requires prior knowledge of RDF
• Semantic Annotator provides a graphical user interface for this converter, building up conversion parameters in the background.– Browser-based– Still in development, currently not as powerful
6
Tabular Datato Linked Data
7
Data
Enhancement Parameters
Linked Data
• CSV2RDF4LOD and Semantic Annotator enable creation of enhancement parameters for conversion• These parameters can also be applied to other datasets with similar structure• Provide provenance about how data conversion was performed
Converted Data
8
Prizms Overview
• By Tim Lebo from the TWC• A pipeline of tools for taking data from CSV to linked
open data ready for visualizations.– CSV2RDF4LOD and Semantic Annotator for conversion– DataFAQs for quality management– LODSPeaKr by Alvero Graves for data publication
9
• Organizes data with a Source-Dataset-Version schema
• Captures provenance information at each step of data processing
Prizms on github:https://github.com/timrdf/prizms/
Prizms Overview
10
Data
E Params
Linked Data Triple
StoreConversion +
Annotation
LODSPeaKrSADI web servicesEtc…
Data Repository
SPARQL Endpoint
Presentation + Publication
Hosting Data on LODSpeaKr
• Quick statistics for each dataset
• SPARQL endpoint• Ability to plug in to
other semantic web services, such as SADI
11