The Power of Linked Data for Government & Healthcare Information Integration

Post on 07-May-2015

692 views 2 download

description

Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.

Transcript of The Power of Linked Data for Government & Healthcare Information Integration

The Power of Linked Data for Government and Healthcare

Information Integration

By Bernadette HylandCEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG

This presentation on http://slideshare.net/3roundstones

OMG Technical Meeting Special Event, Reston VA20-Mar-2013

1Wednesday, March 20, 13

Agenda

• Government data publication on the Web• Update on EPA Linked Data Service• Healthcare Delivery Industry’s Appetite• Update on W3C Government Linked Data Working Group

2Wednesday, March 20, 13

3 Round Stones produces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls.

3Wednesday, March 20, 13

US EPA Linked Data

• Cloud-based Linked Data provision of 3 core programs:

• 2.9M Facilities• 100K substances• 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch April 2013

5Wednesday, March 20, 13

US GPO• Cloud-based Linked Data provision of persistent URLs for US Government documents:

• 100k+ documents• Used by 1,240 Federal Depository Libraries and public

• In 3rd year of operation• Deemed an “Essential service” supporting US Congress

6Wednesday, March 20, 13

7Wednesday, March 20, 13

Big DataSimple dataComplex dataLegacy data

8Wednesday, March 20, 13

9Wednesday, March 20, 13

Open Government Data

10Wednesday, March 20, 13

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

Growing chorus ...

11Wednesday, March 20, 13

12Wednesday, March 20, 13

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

13Wednesday, March 20, 13

14Wednesday, March 20, 13

15Wednesday, March 20, 13

Open data + open standards + open platforms

Highly scalable computing on the Cloud

Open Web Standards

5 Star Data (Linked Data), whenever possible

Leverage Open Source tools where practical

16Wednesday, March 20, 13

Use a non-proprietary format• Open Web data exchange formats

• RDF instead of CSV

• Benefits

• Accessibility, Interoperability & Re-use• Reduces the risks of

• “Super model” data warehouse approach

• Budget & schedule over runs

• Confidential info leakage

17Wednesday, March 20, 13

18Wednesday, March 20, 13

Universal Identifiers• It’s the foundation of the

Web

• Others can reference things

• Two references with the same URI are the same thing

• Quick, easy and scaleable

• People keep coming back for more!!

19Wednesday, March 20, 13

20Wednesday, March 20, 13

HELPING DEFINE THE PROCESS

PublishConvertDescribeNameModelIdentify

21Wednesday, March 20, 13

HELPING DEFINE THE PROCESS

PublishConvertDescribeNameModelIdentify

Maintain

21Wednesday, March 20, 13

22Wednesday, March 20, 13

• Start with the basics

• Well curated datasets with relevant data

• Integrate related datasets (e.g., EPA chemical substances, toxic releases & facilities)

• Reach out to developers early

• Emphasize the internal agency benefit

• Address data quality ...

• Multiple approaches including crowed sourcing

A Path to Success

23Wednesday, March 20, 13

Social responsibility of government publishers

• Must specify a license for use

• Publish frequency of data updates

• Ensure data is accurate as possible

• Recognize responsibility to maintain data

• Document & follow a persistence strategy

• Respond to reports of problematic data

24Wednesday, March 20, 13

Callimachushttp://callimachusproject.orghttp://3roundstones.com

25Wednesday, March 20, 13

CONTENTMANAGEMENT

SYSTEM

LINKED DATAMANAGEMENT

SYSTEM

Callimachus

UNSTRUCTURED

TEXT

TEXT

STRUCTURED

DATA

DATA

26Wednesday, March 20, 13

27Wednesday, March 20, 13

Guidance for developers

28Wednesday, March 20, 13

29Wednesday, March 20, 13

From WikipediaFrom EPA

Open Street Map

30Wednesday, March 20, 13

31Wednesday, March 20, 13

We’ve Seen This Before

32Wednesday, March 20, 13

33Wednesday, March 20, 13

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

34Wednesday, March 20, 13

How much mercury did Elisa’s local cement plant release

in 2004?

35Wednesday, March 20, 13

Linked Data Approach

36Wednesday, March 20, 13

37Wednesday, March 20, 13

Finding Hanson Permanente

38Wednesday, March 20, 13

Finding Mercury Released in 20041

2

39Wednesday, March 20, 13

TRI Report

40Wednesday, March 20, 13

Data Reuse

41Wednesday, March 20, 13

Potential Audience

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

42Wednesday, March 20, 13

HTTP-accessible endpoints capable of returning XML or textual content

Convert XML or textual results to RDF

Render RDF to HTML via templateUser resolves asingle URI to anActive PURL

Multiple targets queriedindependently

1

David Wood1 and Tom Plasterer21david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com

Active PURLs for Clinical Study Aggregation

The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.

The solution: Gather, convert, aggregate and format for display

Challenges

Next steps

How semantic technologies help

3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the CallimachusProject, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enablePURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company'snetwork. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source isdynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readableversions of the data are also available.

Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributedenterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowingresearchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.

Distributed queries have many knownlimitations, such as the introduction ofmultiple single points of failure in anygiven PURL resolution. HTTP timeouts,auth/auth errors or other network failurescan slow or stop a pipeline from returningcorrectly. Similarly, distributed queries can resultin variant query-time performance due tocomplex network and endpoint perform-ance variances. Proactive caching and cache manage-meant strategies can improve runtimeperformance and protect end users fromthe limitations inherent in a distributedquery architecture. Caching ofintermediate results from endpoints hasnot yet been implemented.

We intend to continue to addressReferences

1. Callimachus Project,

User experience

Users resolve a URL thatprovides a unique identifier fora clinical study, drug, chemicalor other concept managed bythis system. The user maybe presented with the URL onHTML pages, search it via full-text techniques or discover itvia semantic search.

1

2 Users are presented with adynamically generated Webpage representing aggregatedclinical study information. Usersare isolated from the complexand distributed informationenvironment.

43Wednesday, March 20, 13

44Wednesday, March 20, 13

45Wednesday, March 20, 13

46Wednesday, March 20, 13

http://slideshare.com/3roundstones

Twitter : @BernHyland Email. bhyland@3roundstones.com

Thank you for participating!!

47Wednesday, March 20, 13

Credits

David NewmanGartner: “Innovation Insight: Linked Data Drives Innovation Through Information-Sharing Network Effects” Published: 15 December 2011

David Wood, ed. Linking Government Data, Springer (2011) http://3roundstones.com/linking-government-data/

US Executive Branch

Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html

W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa licenseAll other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license

48Wednesday, March 20, 13

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

49Wednesday, March 20, 13