Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R....

30
Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler Tetherless World Constellation March 22, 2010

Transcript of Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R....

Page 1: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Data-gov Wiki: Towards Linking Government Data

Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li,

Deborah L. McGuinness and Jim Hendler

Tetherless World ConstellationMarch 22, 2010

Page 2: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Outline

• Background– Open Government Data Initiative– data.gov

• The Data-gov Wiki– Making Government Linkable– Linking and Using Government Data– Provenance Issues

Page 4: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Open Government Data Initiative

Open Government Data Initiative • Transparency• Participation• Collaboration

Open Government Directive (Dec 8, 2009)• Publish Government Information Online • Improve the Quality of Government Information • Create and Institutionalize a Culture of Open Government • Create an Enabling Policy Framework for Open Government

Page 5: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

data.gov, data.gov.uk and beyond

What’s next?•More datasets•More links•More provenance

£30 million to fund "Institute of Web Science"

Page 6: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Statistics about data.gov

50 participating agencies: USDA, DOC, DOD, ED, DOE, HHS, DHS, HUD, DOI, DOJ, DOL, STATE, DOT, TREAS, VA, EPA, GSA, NASA, NSF, NRC, OPM, SBA, SSA, USAID, BBG, CFTC, CNS, EXIM, EOP, FCC, FDIC, FEC, FRB, IMLS, MSPB, NARA, NEA, NEH, NLRB, NTSB, OSHRC, ONHIR, OPIC, PBGC, RRB, SEC, SSS, TVA, CPSC, EEOC

Source: http://www.data.gov/metric accessed March 21, 2010

Page 7: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

The Data-gov Wikihttp://data-gov.tw.rpi.edu/

Page 8: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

About the data-gov Wiki

MissionThe data-gov project investigates the role of semantic web technologies, esp. linked data, in producing, processing and utilizing government data found in data.gov.

Objectives• Support linked government data publishing, applications and

provenance using semantic technologies• Educate potential developers and users • Enable social collaborations on linked government data

This project is run by the Tetherless World Constellation at RPI, headed by Profressor Jim Hendler and Deborah McGuinness and led by Li Ding. Other team members include: Dominic DiFranzo, Sarah Magidson ,James Michaelis, Alvaro Graves, Adam Bell, Jin Guang Zheng, Xian Li, Tim Lebo, Gregory Todd Williams, and Peter Coons.

Page 10: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Data-gov Cloud (Oct 2009)

US-COMMUNITY(2005-2007)

CASTNET(1990 – Present)

RECS(2005)

GOV-BUDGET(1962-2014)

TOXIC-RELEASE(2005-2008)

EARTHQUAKE(Present)

STATE-LIB(2006-2007)

PUBLIC-LIB(1992-2006)

MED-COST(1994-2009)

LABOR-STAT(19xx-Present)

DATA-GOV-CATALOG(present)

Government

Community

Services

Environment

CASTNET sites

RECS code

US agency US location

Linked Data

USAspending(2008-2010)

GeoNamesGeoNames

Page 11: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

http://data-gov.tw.rpi.edu/wiki/demos

data.gov + uk gov data + NY times + DBpedia

http://data-gov.tw.rpi.edu/wiki/Demo:_Comparing_US-USAID_and_UK-DFID_Global_Foreign_Aid

Page 12: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

From Open Government Data (OGD) to Linked Government Data (LGD)

Page 13: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Make government data linkable

Account name Agency name

Donations, Donations for the Official Residence of the Vice President

Executive Office of the President

RDF Conversion*Minimal and extensible * Web accessible

<rdf:Description rdf:about="#entry262"><dgp401:account_name>Donations, Donations for the Official Residence of the Vice President</dgp401:account_name> …<dgp401:agency_name>Executive Office of the President</dgp401:agency_name></rdf:Description>

Raw RDF: http://data-gov.tw.rpi.edu/raw/403/data-403.rdf

Raw Data: http://www.whitehouse.gov/omb/budget/fy2010/assets/receipts.csv

Page 14: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Linking at Conversion TimeReuse Property

<rdf:Description rdf:about="#entry840"><dgp401:account_name>Defense Vessel Transfer Receipt Account</dgp401:account_name> …<dgp401:agency_name>Department of Defense--Military</dgp401:agency_name></rdf:Description>

Raw RDF: http://data-gov.tw.rpi.edu/raw/402/data-402.rdf

<rdf:Description rdf:about="#entry262"><dgp401:account_name>Donations, Donations for the Official Residence of the Vice President</account_name> …<dgp401:agency_name>Executive Office of the President</dgp401:agency_name></rdf:Description>

Raw RDF: http://data-gov.tw.rpi.edu/raw/403/data-403.rdf

Page 15: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Linking using Semantic Wikienrich ontology definition

Property Definition: http://data-gov.tw.rpi.edu/vocab.php?property=92/title

[[rdfs:subPropertyOf::Property:rdfs:label]]

<owl:DatatypeProperty rdf:about="http://data-gov.tw.rpi.edu/vocab/p/92/title"> <rdfs:label>92/title</rdfs:label> <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#label"/> <rdfs:subPropertyOf rdf:resource="http://xmlns.com/foaf/0.1/name"/> <rdfs:subClassOf rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/> …</owl:DatatypeProperty>

Property Definition: http://data-gov.tw.rpi.edu/wiki/Property:92/title

Page 16: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Linking using Semantic Wikiconnect entities using owl:sameAs

X Wrong Wikipedia Name Correct Wikipedia Name

Page 17: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Incremental Data Enhancement

<rdf:Description rdf:about="http://data-gov.tw.rpi.edu/raw/403/data-403.rdf#entry262">…..<agency_name_link rdf:resource="http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President"/></rdf:Description>

Enhance raw RDF with links: http://data-gov.tw.rpi.edu/linked/403/agency_403.rdf

Link to DBpedia: http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President <swivt:Subject rdf:about="http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President"><rdfs:label>Executive Office of the President</rdfs:label><rdf:type rdf:resource="http://data-gov.tw.rpi.edu/vocab/c/Agencies_of_the_United_States_government"/><owl:sameAs rdf:resource="http://dbpedia.org/resource/Executive_Office_of_the_President"/> ……</swivt:Subject>

Page 18: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Runtime Linking in Applications

• Link datasets by common literal value

• Link datasets by overlapping time– Align multiple time series– Support users to comment on time series data

Page 19: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Provenance Issues

Page 20: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Provenance Annotation

• Descriptions

• Relations

DatasetDemo

Agency

Page 22: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Results from Revision Provenance

The number of datasets published at data.gov has been tripled since July 2009Dataset updates on data.gov are not limited to additions.

Page 23: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusion

Page 24: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusion - Observations

Minimal and extensible RDF conversion is useful for generate linked government data in a timely fashion

Literal name is still useful in linking data, especially if we know the context of data

Social semantic web technologies can help distributing high cost tasks, e.g. mapping entity names, to the crowd.

Provenance is a growing requirement to the transparency of open data applications

Page 25: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusions – Ongoing Workbuild hub datasets

GOV-BUDGET(1962-2014)

PUBLIC-LIB(1992-2006)

CASTNET sites

US agency US location

USAspending(2008-2010)

Employment statistics

Medicare cost

IRS annualTax report

DATA-GOV-CATALOG(present)

US CensusState population

Blah, blah………..

skos:altLabel owl:sameAs

Page 26: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusions – Ongoing Work Making Sense of LGD

AI + CI !

To appear in Web Sci 2010 conference – co-located with WWW 2010

Page 27: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusions – Ongoing Workincremental knowledge on social semantic web

• A social semantic web website can substantially promote collaborations on knowledge accumulation (ontology as well as instance linkage)

• We need a tradeoff on costly high quality conversion and ugly minimal conversion

#a dgp92:title “my title”

dgp92:title rdfs:subPropertyOf rdfs:label

#a rdfs:label “my title”

#a skos:prefLabel “my title”

#a foaf:name “my title”

?

Page 28: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Conclusions – Ongoing Workprovenance is everywhere

Evaluate issues on exposing provenance data and improve semantic-difference computation.

provenance vocabulary provenance awareness provenance reasoning provenance mining …

Page 29: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

Ok, it is really the final conclusion

• The data-gov project does not use much AI for now (most on representation side), but even little semantics goes a long way

• The massive knowledge accumulated in this project is now raising a number of challenges to AI (especially the computation side)

• Semantic technologies are not far from us, undergraduate students can build a demo quickly!

Page 30: Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.

BTW,….

Questions?

Shameless self-promotions• Link: http://data-gov.tw.rpi.edu/• “Browsing and Finding Linked Data” by

Shangguan this afternoon• See us at demo/poster session, we have more

exciting demos to show you• IPAW 2010 (June 2010, Troy, NY) will be

looking for late breaking news from you!