Post on 07-May-2015
description
Dr. David Wooddavid@3roundstones.com
@prototypo12 March 2013
Linked Data: Opportunities for Entrepreneurs
David WoodB.S. Mechanical Engineering
B.S. Electrical Engineering (equivalency)M.S. Astronautical EngineeringAeronautical & Astronautical Engineer
Ph.D. Software Engineering
David Wood
ongoing
ongoing
company founded products disposition
2002
2005
@𝛑Plugged In Software
David Wood
RDF Database
RDF Database Management
RDF Usage ongoing
Linked Data Management
ongoing
company founded products disposition
2002
2005
@𝛑Plugged In Software
“more anterior sectors of the prefrontal cortex are distinctively recruited when altruistic choices prevail over selfish material interests”
- Jorge Moll et al
“For it is in giving that we receive.”
- Saint Francis of Assisi
Consistently late to rapidly changing markets (music, electronics, cafés, e-books)
Pop Quiz
Pop Quiz
Innovators Dilemma
Innovators Dilemma
May 2001
08 Oct 2007 07 Nov 2007 10 Nov 2007 28 Feb 2008 31 Mar 2008
18 Sep 2008 05 Mar 2009 27 Mar 2009 14 Jul 2009 22 Sep 2010
Sep 2011
We’ve Seen This Before
YouTube HDTV
watch videos watch Better videos
Publish videos
Share videos
Rate videos
Discuss videos
Linked Data RDBMS
Use data Use data
Publish data
Share data
Rate data
Discuss data
CONTENTMANAGEMENT
SYSTEM
LINKED DATAMANAGEMENT
SYSTEM
Callimachus
UNSTRUCTURED
TEXT
TEXT
STRUCTURED
DATA
DATA
32
Publishing
Credit: Bradley P. Allen, Elsevier Labs
Credit: Bradley P. Allen, Elsevier Labs
XHTML 5
DocBook 5
ePub 3
✔
✔
✔
LaTex✔
Open Government
US EPA• Cloud-based Linked Data provision of 3 core programs:
• 2.9M Facilities• 100K substances• 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch Feb 2013
From WikipediaFrom EPA
Open Street Map
Life Sciences
HTTP-accessible endpoints capable of returning XML or textual content
Convert XML or textual results to RDF
Render RDF to HTML via templateUser resolves asingle URI to anActive PURL
Multiple targets queriedindependently
1
David Wood1 and Tom Plasterer21david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com
Active PURLs for Clinical Study Aggregation
The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.
The solution: Gather, convert, aggregate and format for display
Challenges
Next steps
How semantic technologies help
3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the CallimachusProject, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enablePURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company'snetwork. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source isdynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readableversions of the data are also available.
Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributedenterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowingresearchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.
Distributed queries have many knownlimitations, such as the introduction ofmultiple single points of failure in anygiven PURL resolution. HTTP timeouts,auth/auth errors or other network failurescan slow or stop a pipeline from returningcorrectly. Similarly, distributed queries can resultin variant query-time performance due tocomplex network and endpoint perform-ance variances. Proactive caching and cache manage-meant strategies can improve runtimeperformance and protect end users fromthe limitations inherent in a distributedquery architecture. Caching ofintermediate results from endpoints hasnot yet been implemented.
We intend to continue to addressReferences
1. Callimachus Project,
User experience
Users resolve a URL thatprovides a unique identifier fora clinical study, drug, chemicalor other concept managed bythis system. The user maybe presented with the URL onHTML pages, search it via full-text techniques or discover itvia semantic search.
1
2 Users are presented with adynamically generated Webpage representing aggregatedclinical study information. Usersare isolated from the complexand distributed informationenvironment.
• Linked Data warehouses 10B USD annually.
• Linked Data supply chains205M USD annually (Web analytics)6B USD annually (enterprise)
• Linked Data analytics16B USD annually
Your Opportunity?
CreditsBatman Treaty Signing
(public domain)http://upload.wikimedia.org/wikipedia/commons/d/dc/Batman_signs_treaty_artist_impression.jpg)
Centro Universitario de Ciencias Exactas e Ingenierías, Universidad de Guadalajara
(public domain)
http://proton.ucting.udg.mx/galeria/3D/WEB.jpg
Spreadsheet PhotoCasey Serin
(CC-BY licensed)http://www.flickr.com/photos/sercasey/351617208/sizes/l/in/photostream/
LOD Cloud DiagramsRichard Cyganiak, Anja Jentzsch, (CC-BY-SA)
http://lod-cloud.net/
Earth weather analysis imageNASA Goddard SFC
CC-BYhttp://www.flickr.com/photos/gsfc/4662884851/
Publisher emerging content architecture
Copyright (c) 2011 Elsevier, used with permission.
Corporate logos, Darkon Movie Poster, BBC screenshots, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes
Corporate logos, Darkon Movie Poster, BBC screenshots, CAMC credit card image and book covers © their respective owners and used under Fair Use for educational purposes
CreditsMundaneum images Copyright © Collection Mundaneum - Mons, Courtesy of the Mundaneum Archives Centre.
Chasm PhotoTravis S.
(CC-BY-NC licensed)http://www.flickr.com/photos/baggis/3860802929/
Supply Chain ImageKevin Krejci
(CC-BY licensed)http://www.flickr.com/photos/kevinkrejci/6141829763/
Sharing Squirrels Imageleezie5
CC-BY-NC-ND licensed)http://www.flickr.com/photos/leeziet/5912219625/
Envirofacts screenshot A US Government Work of the US EPA. Used with permission.
Linked Data book cover Copyright (c) 2012-13 Manning Publications Inc. Used with permission.
All other photos and drawings © 2010-13 3 Round Stones Inc or David Wood, released under a CC-BY-SA licenseAll other photos and drawings © 2010-13 3 Round Stones Inc or David Wood, released under a CC-BY-SA license
This work is Copyright © 2011 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
Dr. David Wooddavid@3roundstones.com
@prototypo12 March 2013
Linked Data: Opportunities for Entrepreneurs
http://purl.org/net/prototypo/lod-entrepreneur