Linked Open Government Data: What’s Next?
description
Transcript of Linked Open Government Data: What’s Next?
![Page 1: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/1.jpg)
Linked Open Government Data: What’s Next?
Li Ding, James A. Hendler, and Deborah L. McGuinness
With thanks to the entire RPI Tetherless World LOGD team: logd.tw.rpi.edu
particularly John Erickson, Tim Lebo, Dominic DiFranzo;, Alvaro Graves;
Gregory Williams; Xian Li; James Michaelis; Jin Zheng; Zhenning Shangguan; Johanna Flores, Evan Patton
Tetherless World Constellation, Rensselaer Polytechnic Institute
SemTech 2011 San Francisco June 7, 2011
![Page 2: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/2.jpg)
Outline
• Open Government Data
• Linked Open Government Data
• Challenges and Opportunities
• Future Directions
![Page 3: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/3.jpg)
Open Government Data:
Government data is already available and open on the Web and is growing.
Let’s create mash ups to expose more value.
?
![Page 4: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/4.jpg)
Opening Government Data
“Openness will strengthen our democracy and promote efficiency and effectiveness in Government.”
--- President Obama (Jan 2009)
“if people put data onto the web -- government data, scientific data, community data, whatever it is -- it will be used by other people to do wonderful things, in ways that they never could have imagined.”
-- Tim Berners-Lee (Feb 2010)
Source: http://www.whitehouse.gov/open, http://www.ted.com/talks/lang/eng/tim_berners_lee_the_year_open_data_went_worldwide.html
Linked Data and Semantic Tech are key enabler!
![Page 5: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/5.jpg)
International Open Government Data: A Great Opportunity
• 13 Other nations establishing open data• 24 States now offering data sites• 11 Cities in America with open data• 236 New applications from Data.gov datasets• 258 Data contacts in Federal Agencies• 308,650 Datasets available on Data.gov
• Open Government Data (OGD)– A public asset (collected by
government) with a large amount of high value data and wide domain coverage
– An international mandate for government transparency, business applications, citizen participation, and etc.
Deployment Status (source: Data.gov)
Source: http://www.data.gov/
![Page 6: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/6.jpg)
Challenges from Raw Open Government Data
Data in proprietary formats Independent curators
Distributed and unlinked Data
Smoke rate(Impacteen.org)
Policy coverage(NCI)
Limited Participation
![Page 7: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/7.jpg)
Linked Open Government Data
TWC: Tetherless World Constellation at Rensselaer Polytechnic Institute logd.tw.rpi.eduLOGD: Linked Open Government Data
![Page 8: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/8.jpg)
Linked Data is Large and is Growing
8
![Page 9: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/9.jpg)
The Tetherless World Constellation Linked Open Govt Data Portal
9
Create
TWC LOGD
ConvertQuery/Access
LOGDSPARQL Endpoint
Enhance
• RDF• RSS• JSON• XML• HTML• CSV• …
Community Portal
Data.gov deployment
![Page 10: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/10.jpg)
Linked Open Government Data
A Linked Open Government Data (LOGD) ecosystem is a Linked Data-based system where stakeholders of different sizes and roles find, manage, archive, publish, reuse, integrate, mash-up, and consume open government data in connection with online tools, services and societies.
![Page 11: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/11.jpg)
Moving data.gov to linked data (US)
• Third parties (like RPI) translate the government datasets into linked data formats
• US Data.gov hosts 6.4B RDF triples 5/21/2010•acknowledges Semantic Web as a key technology for open government data
![Page 12: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/12.jpg)
Government Data within the LD Cloud12
http://linkeddata.org/
Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)
![Page 13: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/13.jpg)
TWC LOGD: 50+ Demos in Many Domains using Various Technologies
Technology• Semantic Web• Semantic CMS• Semantic Search• Social network• NLP• Mobile• Visualization• Provenance• …
Domain• Health• Finance• Politics • Society• Economy• …
![Page 14: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/14.jpg)
Selected TWC Mashups
![Page 15: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/15.jpg)
Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)
PopSciGrid with NIH/NCI & Northwestern
Aimed at conveying complex health-related information to consumers and health decision makers Diverse datasets from NIH Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective Maintains provenance about data and manipulations Two-way communication: Feedback users’ comments to gov contacts (e.g. %)
![Page 16: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/16.jpg)
PopSciGrid Workflow
ConvertConvert
EnhanceEnhance
VisualizeVisualize
derive derive
create
IntegrateIntegrate
Ban coverage
PublishPublish
![Page 17: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/17.jpg)
The Abstract LOGD Workflow17
VisualizeVisualize End UserEnd User
GovAgency
GovAgency
MashedData
MashedData
LOGDLOGD
RAWOGDRAWOGD
EnhanceEnhance
IntegrateIntegrate
PublishPublish
ConvertConvert
DeveloperDeveloper
Usability of LOGD•Interoperability•Scalability•Provenance
Mashup Workflow(Conventional OGD)1.Publish2.Mashup3.Visualize
Mashup Workflow(Conventional OGD)1.Publish2.Mashup3.Visualize
![Page 18: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/18.jpg)
Challenge: Interoperability
★make your stuff available on the Web (whatever format) under an open license
★★make it available as structured data (e.g., Excel instead of image scan of a table)
★★★use non-proprietary formats (e.g., CSV instead of Excel)
★★★★use URIs to identify things, so that people can point at your stuff
★★★★★link your data to other data to provide context
Syntactic• Extract entities from HTML tables
• Parse Excel tables
Semantic• Does “Georgia” refer to a US state or a country?
• Is “2000” calendar year, fiscal year or dollar amount?
TBL’s 5-star Deployment Scheme for Linked Data
![Page 19: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/19.jpg)
Mashing up data from different countries
http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
![Page 20: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/20.jpg)
Even if not “rationalized” together
Build ontology mappingbased on shared terms“Economic”
![Page 21: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/21.jpg)
Enhance interoperability using Linked Data: drill down contextual knowledge
21
• Identity : URI
• Context– Description: metadata, esp. type & datatype– Mapping (linking identities)
• Syntactic– Common string name– Common URI
• Semantic– Complex Object: attributes + context (siblings)– Ontological Mapping: e.g., owl:sameAs– Rule-based Mapping: e.g. mapping “Liter” to “Gallon”
![Page 22: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/22.jpg)
Scalability factors in LOGD deployment
• Large number of OGD datasets– 6k+ Data.gov.uk – 200k+ Data.gov– 323k+ International OGD datasets
• Non-trivial human workload: clean-up syntax, enhance semantics, integrate datasets, visualize resulting data …
• Substantial computing workload: running time of complex tasks, memory and disk space, maintenance costs …
22
![Page 23: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/23.jpg)
International catalog23
![Page 24: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/24.jpg)
Scalability issues in the International Open Government Dataset Catalog
24
Crawled 40+ different dataset catalogs from 19 countries“non-trivial customized programming workload”
Searching 323,304 datasets“Complex SPARQL query got timeout”
Social AspectSocial Aspect
Computing AspectComputing Aspect
International Open Government
Dataset Catalog
International Open Government
Dataset Catalog
![Page 25: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/25.jpg)
Social Aspect: Distribute human workload to the right developers
25
Domain Expertise
App
licat
ion
Dev
elop
men
t E
xper
tise
Joint work with Alvaro Graves, PhD student at RPI
LaymanEnd Users
Software Engineers
Scientists,Experts
Genus
Students
ConvertConvert
EnhanceEnhanceCombineCombine
KnowledgeEngineers
VisualizeVisualize
PublishPublish
1. Decompose workload to fine-granular jobs
2. Leverage a wider range of developers
1. Decompose workload to fine-granular jobs
2. Leverage a wider range of developers
![Page 26: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/26.jpg)
Computing Aspect: fit computing power to LOGD deployment
• Scale up for more government data– Support collective incremental data processing– Support large scale data analysis: graph connectivity,
complex pattern/hypotheses discovery– Map repetitive developers’ workload to automated tools– Reduce service maintenance costs
• Scale down for wider range of end user apps– Limited computing power, e.g., mobile devices– End users’ cognitive constraints, e.g., screen-size,
executive summary
26
![Page 27: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/27.jpg)
Provenance
• Provenance-aware frameworks are needed to support transparency, appropriate attribution, and ultimately trust of any kind of open data.
• Versioning and persistence are important factors to sustainable applications
• Workflow provenance can help increase understanding and trust since it can be used to explain behavior and dependencies of intelligent systems
27
![Page 28: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/28.jpg)
Attribution in PopSciGrid
demo
persontechnology
dataset
agency
version
conversion
logd:uses_technology
dcterms:contributor
Example scenarios• List direct/indirect contributors • End users send feedback to curators• Curators learn usage of datasets• List demos by technology
void:subset
void:subset
dcterms:publisher
logd:uses_dataset
State-wise Tobacco Policy coverage stats
![Page 29: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/29.jpg)
TWC Semantic Water Quality Portal
Aimed at helping people investigate local water quality Diverse datasets, regulations, datatypes Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective Maintains provenance about data and manipulations Exposes unexpected uses of data (and thus unexpected usage patterns)
![Page 30: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/30.jpg)
Detailed View of Pollution
![Page 31: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/31.jpg)
Provenance of regulations
![Page 32: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/32.jpg)
Challenges Revisited
• Interoperability– Syntactic: Linked Data, RDF– Semantic: ontology, evolving
• Scalability (9.9 billion triples on the TWC LOGD)– Effective Social platform for task dispatching– More automations, e.g., data cleaning, and linked detection– Scalable tools, esp. SPARQL endpoint
• Provenance– Accountability: Privacy, licensing, trust– Credit / Blame– Replicate applications and transfer system building knowledge
• More issues – Persistent data access for changing data– …
32
![Page 33: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/33.jpg)
Summary
• The Open Government data is a key resource– Many governments releasing data, growing number in structured form
• Government (and general data) transparency comes through in the “mashing up” of data from many sites maintaining (and exposing) provenance– Key to linked data
• While there has been tremendous progress, many challenges remain– Trust, Provenance, Scaling, Interoperability, Archiving, Curation, …
• The Research agenda for linked government data is an important driving area for semantic technologies
![Page 34: Linked Open Government Data: What’s Next?](https://reader033.fdocuments.in/reader033/viewer/2022051621/568147ac550346895db4e92d/html5/thumbnails/34.jpg)
Questions?
The work presented in this talk was primarily conducted at the Tetherless World Constellation at Rensselaer Polytechnic Institute.
Comments / Questions: [ dingl | dlm ] @ cs.rpi.edu.
Events:
Open Linked Govt. Data Symposium: submission deadline June 15 http://tw.rpi.edu/web/event/AAAI/2011/Fall_Symposium_OGK
TWC / Elsevier Hackathon: June 27-28
http://tw.rpi.edu/web/event/TWCElsevierHackathonJune2011
Reference: Li Ding, Timothy Lebo, John S. Erickson, Dominic DiFranzo, Gregory Todd Williams, Xian Li, James Michaelis, Alvaro Graves, Jin Guang Zheng, Zhenning Shangguan, Johanna Flores, Deborah L. McGuinness and Jim Hendler, TWC LOGD: A Portal for Linked Open Government Data Ecosystems, submitted to JWS, special issue on semantic web challenge’10