CQLD on health.data.gov @ SemTech 2011
-
Upload
george-thomas -
Category
Technology
-
view
1.141 -
download
1
description
Transcript of CQLD on health.data.gov @ SemTech 2011
Clinical Quality Linked Data on
health.data.gov
George Thomas, HHSSemTech2011, 2011-06-08Franciscan B, 2:20-2:45pm
2
This Presentation
• Data.gov– 2010: EOP/OMB and GSA, RPI
– 2011: add HHS/CMS
• Clinical Quality Linked Data– health.data.gov
– Hospital Compare: tools, metadata, data
• Community of Practice– W3C Government Linked Data Working Group
• Community of Interest– Data.gov PMO Semantic Web / Linked Data Team
3
data.gov 2010
• EOP and OMB– Open Gov Directive
• ‘TPC’
– Fed CIO Vivek Kundra
• ‘Democratizing Data’
• OMB and GSA– OCSIT
• Data.gov PMO
– Semantic Web• RPI, Virtuoso
4
HHS and CMS
• CTO @todd_park– /open innovation
• ‘unleash the mojo!’
• OCIO– Dep CIO, Chief Arch
• OCSQ– Hospital Compare
– Data.Medicare.gov
– Clinical Quality
Linked Data!
5
health.data.gov 2011
• Health Community– Mashups
• public and private
– Drupal, Socrata• showcase, challenges
• blogs, feeds, syndication
• Linked Data– Virtuoso serves:
• /def/{vocab}/{concept}
• /id/{concept}/{instance}
• /doc/{concept}/{instance}.ext
• /dataset/{filename}/{date}
6
Tools
• Google Refine + DERI RDF extension– Graph prototyping, source.tsv lifting
• Top Braid Composer– Vocabulary modeling (RDFS)
– Initial instance data testing (inferences and queries)
• Jena– schemagen .rdfs to .java
– ETL source.tsv to source.rdf/ttl
• Virtuoso – Quad store, HTTP conneg / url_rewrite rules
– Faceted search and browse, REST API’s
7
Hospital Compare Metadata
• Created a handful of (generic and domain specific)
small component vocabularies– health.data.gov/def/{vocab-name}/{Class-or-predicate}
• /def/hospital/Hospital
• /def/compare/Condition, /Measure, /Metric
– reference.data.gov/def/govdata
• /Record, /RecordSet, /State, /County, /Country
• Reused another handful (the usual)– VoID, FOAF, DC
– W3C Org, Vcard
• Evolve toward SKOS, SDMX-RDF and QB ?
8
Hospital Compare Data
• They don’t call it Virtuoso for nothing…– 303 from /id NIR to /doc IR
– Serves a variety of representation formats
• RDF/XML, RDF+JSON, Turtle, N-triples, CSV,
• Atom/OData feeds have wide usage scenarios
– All data about a particular Hospital over time
– All instances of Measure(s) as they evolve over time
– All data for a particular Report/Survey dataset over time …
– Follow your nose with faceted search and browse services
• Discover the data model while building SPARQL queries
– Whether you’re a carbon or silicon based agent
• ‘Sponge’ external sites, expose RDB’s as RDF, …
more…
9
Community of Practice
• W3C Government Linked Data Working Group– Member oriented, focused on SemWeb impl’s of GLD
– I’m a co-chair along with Bernadette Hyland
• Of Talis, Inc. – she’s also here at SemTech2011
– And the expert advice and support of W3C’s Sandro Hawk!
– Expected GLD charter
• Community Dir, Publishing Best Practices, Standard Vocabs
• Complements eGov IG charter
– First Face to Face GLD Meeting
• 6/29-30 in Washington, DC area at NITRD
10
Community of Interest
• Data.gov PMO Semantic Web / Linked Data Team– Open to the public!
• Ask me for telecon and webshare info if you’re interested in
participating
• Ongoing work– EPA Linked Data
• Facilities and Chemical Registries
– HHS and CMS Linked Data
• Additional Clinical Quality domains (other ‘compare’ data)
• Cross domain correlation
– Additional mashup (visualization) challenges
11
Thank You! Questions?
george at thomas dot name