The Power of Linked Data for Government & Healthcare Information Integration

50
The Power of Linked Data for Government and Healthcare Information Integration By Bernadette Hyland CEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG This presentation on http://slideshare.net/3roundstones OMG Technical Meeting Special Event, Reston VA 20-Mar-2013 1 Wednesday, March 20, 13

description

Government open data strategies aimed at wider access and re-use by entrepreneurs, publishers and the wider US healthcare delivery industry. Presentation to the OMG Standards Community technical workshop on semantics, held in Reston VA on 20-March 2013. Presentation by Bernadette Hyland, CEO 3 Round Stones, Inc and co-chair W3C Government Linked Data Working Group.

Transcript of The Power of Linked Data for Government & Healthcare Information Integration

Page 1: The Power of Linked Data for Government & Healthcare Information Integration

The Power of Linked Data for Government and Healthcare

Information Integration

By Bernadette HylandCEO 3 Round Stones, co-chair W3C Gov’t Linked Data WG

This presentation on http://slideshare.net/3roundstones

OMG Technical Meeting Special Event, Reston VA20-Mar-2013

1Wednesday, March 20, 13

Page 2: The Power of Linked Data for Government & Healthcare Information Integration

Agenda

• Government data publication on the Web• Update on EPA Linked Data Service• Healthcare Delivery Industry’s Appetite• Update on W3C Government Linked Data Working Group

2Wednesday, March 20, 13

Page 3: The Power of Linked Data for Government & Healthcare Information Integration

3 Round Stones produces the leading platform for the publication of reusable data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls.

3Wednesday, March 20, 13

Page 5: The Power of Linked Data for Government & Healthcare Information Integration

US EPA Linked Data

• Cloud-based Linked Data provision of 3 core programs:

• 2.9M Facilities• 100K substances• 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch April 2013

5Wednesday, March 20, 13

Page 6: The Power of Linked Data for Government & Healthcare Information Integration

US GPO• Cloud-based Linked Data provision of persistent URLs for US Government documents:

• 100k+ documents• Used by 1,240 Federal Depository Libraries and public

• In 3rd year of operation• Deemed an “Essential service” supporting US Congress

6Wednesday, March 20, 13

Page 7: The Power of Linked Data for Government & Healthcare Information Integration

7Wednesday, March 20, 13

Page 8: The Power of Linked Data for Government & Healthcare Information Integration

Big DataSimple dataComplex dataLegacy data

8Wednesday, March 20, 13

Page 9: The Power of Linked Data for Government & Healthcare Information Integration

9Wednesday, March 20, 13

Page 10: The Power of Linked Data for Government & Healthcare Information Integration

Open Government Data

10Wednesday, March 20, 13

Page 11: The Power of Linked Data for Government & Healthcare Information Integration

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

Growing chorus ...

11Wednesday, March 20, 13

Page 12: The Power of Linked Data for Government & Healthcare Information Integration

12Wednesday, March 20, 13

Page 13: The Power of Linked Data for Government & Healthcare Information Integration

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

13Wednesday, March 20, 13

Page 14: The Power of Linked Data for Government & Healthcare Information Integration

14Wednesday, March 20, 13

Page 15: The Power of Linked Data for Government & Healthcare Information Integration

15Wednesday, March 20, 13

Page 16: The Power of Linked Data for Government & Healthcare Information Integration

Open data + open standards + open platforms

Highly scalable computing on the Cloud

Open Web Standards

5 Star Data (Linked Data), whenever possible

Leverage Open Source tools where practical

16Wednesday, March 20, 13

Page 17: The Power of Linked Data for Government & Healthcare Information Integration

Use a non-proprietary format• Open Web data exchange formats

• RDF instead of CSV

• Benefits

• Accessibility, Interoperability & Re-use• Reduces the risks of

• “Super model” data warehouse approach

• Budget & schedule over runs

• Confidential info leakage

17Wednesday, March 20, 13

Page 18: The Power of Linked Data for Government & Healthcare Information Integration

18Wednesday, March 20, 13

Page 19: The Power of Linked Data for Government & Healthcare Information Integration

Universal Identifiers• It’s the foundation of the

Web

• Others can reference things

• Two references with the same URI are the same thing

• Quick, easy and scaleable

• People keep coming back for more!!

19Wednesday, March 20, 13

Page 20: The Power of Linked Data for Government & Healthcare Information Integration

20Wednesday, March 20, 13

Page 21: The Power of Linked Data for Government & Healthcare Information Integration

HELPING DEFINE THE PROCESS

PublishConvertDescribeNameModelIdentify

21Wednesday, March 20, 13

Page 22: The Power of Linked Data for Government & Healthcare Information Integration

HELPING DEFINE THE PROCESS

PublishConvertDescribeNameModelIdentify

Maintain

21Wednesday, March 20, 13

Page 23: The Power of Linked Data for Government & Healthcare Information Integration

22Wednesday, March 20, 13

Page 24: The Power of Linked Data for Government & Healthcare Information Integration

• Start with the basics

• Well curated datasets with relevant data

• Integrate related datasets (e.g., EPA chemical substances, toxic releases & facilities)

• Reach out to developers early

• Emphasize the internal agency benefit

• Address data quality ...

• Multiple approaches including crowed sourcing

A Path to Success

23Wednesday, March 20, 13

Page 25: The Power of Linked Data for Government & Healthcare Information Integration

Social responsibility of government publishers

• Must specify a license for use

• Publish frequency of data updates

• Ensure data is accurate as possible

• Recognize responsibility to maintain data

• Document & follow a persistence strategy

• Respond to reports of problematic data

24Wednesday, March 20, 13

Page 26: The Power of Linked Data for Government & Healthcare Information Integration

Callimachushttp://callimachusproject.orghttp://3roundstones.com

25Wednesday, March 20, 13

Page 27: The Power of Linked Data for Government & Healthcare Information Integration

CONTENTMANAGEMENT

SYSTEM

LINKED DATAMANAGEMENT

SYSTEM

Callimachus

UNSTRUCTURED

TEXT

TEXT

STRUCTURED

DATA

DATA

26Wednesday, March 20, 13

Page 28: The Power of Linked Data for Government & Healthcare Information Integration

27Wednesday, March 20, 13

Page 29: The Power of Linked Data for Government & Healthcare Information Integration

Guidance for developers

28Wednesday, March 20, 13

Page 30: The Power of Linked Data for Government & Healthcare Information Integration

29Wednesday, March 20, 13

Page 31: The Power of Linked Data for Government & Healthcare Information Integration

From WikipediaFrom EPA

Open Street Map

30Wednesday, March 20, 13

Page 32: The Power of Linked Data for Government & Healthcare Information Integration

31Wednesday, March 20, 13

Page 33: The Power of Linked Data for Government & Healthcare Information Integration

We’ve Seen This Before

32Wednesday, March 20, 13

Page 34: The Power of Linked Data for Government & Healthcare Information Integration

33Wednesday, March 20, 13

Page 35: The Power of Linked Data for Government & Healthcare Information Integration

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

34Wednesday, March 20, 13

Page 36: The Power of Linked Data for Government & Healthcare Information Integration

How much mercury did Elisa’s local cement plant release

in 2004?

35Wednesday, March 20, 13

Page 37: The Power of Linked Data for Government & Healthcare Information Integration

Linked Data Approach

36Wednesday, March 20, 13

Page 38: The Power of Linked Data for Government & Healthcare Information Integration

37Wednesday, March 20, 13

Page 39: The Power of Linked Data for Government & Healthcare Information Integration

Finding Hanson Permanente

38Wednesday, March 20, 13

Page 40: The Power of Linked Data for Government & Healthcare Information Integration

Finding Mercury Released in 20041

2

39Wednesday, March 20, 13

Page 41: The Power of Linked Data for Government & Healthcare Information Integration

TRI Report

40Wednesday, March 20, 13

Page 42: The Power of Linked Data for Government & Healthcare Information Integration

Data Reuse

41Wednesday, March 20, 13

Page 43: The Power of Linked Data for Government & Healthcare Information Integration

Potential Audience

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

42Wednesday, March 20, 13

Page 44: The Power of Linked Data for Government & Healthcare Information Integration

HTTP-accessible endpoints capable of returning XML or textual content

Convert XML or textual results to RDF

Render RDF to HTML via templateUser resolves asingle URI to anActive PURL

Multiple targets queriedindependently

1

David Wood1 and Tom [email protected], [email protected]

Active PURLs for Clinical Study Aggregation

The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.

The solution: Gather, convert, aggregate and format for display

Challenges

Next steps

How semantic technologies help

3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the CallimachusProject, an Open Source management system for Linked Data. Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enablePURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs. Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company'snetwork. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source isdynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readableversions of the data are also available.

Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributedenterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowingresearchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.

Distributed queries have many knownlimitations, such as the introduction ofmultiple single points of failure in anygiven PURL resolution. HTTP timeouts,auth/auth errors or other network failurescan slow or stop a pipeline from returningcorrectly. Similarly, distributed queries can resultin variant query-time performance due tocomplex network and endpoint perform-ance variances. Proactive caching and cache manage-meant strategies can improve runtimeperformance and protect end users fromthe limitations inherent in a distributedquery architecture. Caching ofintermediate results from endpoints hasnot yet been implemented.

We intend to continue to addressReferences

1. Callimachus Project,

User experience

Users resolve a URL thatprovides a unique identifier fora clinical study, drug, chemicalor other concept managed bythis system. The user maybe presented with the URL onHTML pages, search it via full-text techniques or discover itvia semantic search.

1

2 Users are presented with adynamically generated Webpage representing aggregatedclinical study information. Usersare isolated from the complexand distributed informationenvironment.

43Wednesday, March 20, 13

Page 45: The Power of Linked Data for Government & Healthcare Information Integration

44Wednesday, March 20, 13

Page 46: The Power of Linked Data for Government & Healthcare Information Integration

45Wednesday, March 20, 13

Page 47: The Power of Linked Data for Government & Healthcare Information Integration

46Wednesday, March 20, 13

Page 48: The Power of Linked Data for Government & Healthcare Information Integration

http://slideshare.com/3roundstones

Twitter : @BernHyland Email. [email protected]

Thank you for participating!!

47Wednesday, March 20, 13

Page 49: The Power of Linked Data for Government & Healthcare Information Integration

Credits

David NewmanGartner: “Innovation Insight: Linked Data Drives Innovation Through Information-Sharing Network Effects” Published: 15 December 2011

David Wood, ed. Linking Government Data, Springer (2011) http://3roundstones.com/linking-government-data/

US Executive Branch

Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html

W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

All other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa licenseAll other photos and images © 2010-2012 3 Round Stones, Inc. and released under a CC-by-sa license

48Wednesday, March 20, 13

Page 50: The Power of Linked Data for Government & Healthcare Information Integration

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

49Wednesday, March 20, 13