Linking the Open Data? by Petko Valtchev
-
Upload
trudat -
Category
Technology
-
view
289 -
download
3
description
Transcript of Linking the Open Data? by Petko Valtchev
![Page 1: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/1.jpg)
Linking the Open Data?Linking the Open Data?
Petko Valtchev (Assoc. Prof., Dept. of CS, UQAM)
ODX’13Montreal, April 6th
![Page 2: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/2.jpg)
Why Link The DataWhy Link The Data“I want you to put your data on the Web.”
Sir T. Berners-Lee (TED’07)
•Original Web (1990s):
• network of linked documents
•Web of Data (2000s):
• network of interlinked data items
•Linked Open Data: Publish data on the Web:
• max. reuse and inter-connections, min. redundancy, network effect
Data is really useful, whenever it is shared and combined with other data.
![Page 3: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/3.jpg)
Linking Data?• But how should one produce such data?
1. Global identification: a URL should point to any data item.
2. Reachability via HTTP: accessing the URL should retrieve the data item.
3. Linked structure: outgoing links (typed!) in the data should point to additional data with URLs.
• THE language : Resource Description Framework (RDF)
1.benefits: links provide context
http://www.w3.org/DesignIssues/LinkedData.html
![Page 4: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/4.jpg)
A Graph?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person
rdf:typerdf:type
Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name
dbpedia:Montredbpedia:Montreal al
dbpedia:Montredbpedia:Montreal al
foaf:based_nearfoaf:based_near
3,407,9633,407,9633,407,9633,407,963
dpprop:dpprop:populationpopulation
![Page 5: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/5.jpg)
A Graph?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person
rdf:typerdf:type
Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name
dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal
foaf:based_nearfoaf:based_near
3,407,9633,407,9633,407,9633,407,963
dpprop:dpprop:populationpopulation
dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada
dbpedia-owl:countrydbpedia-owl:country
![Page 6: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/6.jpg)
A Graph? Global?pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Person
rdf:typerdf:type
Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name
dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal
foaf:based_nearfoaf:based_near
3,407,9633,407,9633,407,9633,407,963
dpprop:dpprop:populationpopulation
pd:linguopd:linguopd:linguopd:linguo foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type
Linkun GuoLinkun GuoLinkun GuoLinkun Guo
foaf:namefoaf:name
dbpedia:Beijingdbpedia:Beijingdbpedia:Beijingdbpedia:Beijing
foaf:based_nearfoaf:based_near
20,693,00020,693,00020,693,00020,693,000
dpprop:populationdpprop:population
foaf:knowsfoaf:knows
dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada
dbpedia-owl:countrydbpedia-owl:country
![Page 7: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/7.jpg)
A Graph? Global? Giant?
pd:tedstr pd:tedstr pd:tedstr pd:tedstr foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type
Ted StraussTed StraussTed StraussTed Straussfoaf:namefoaf:name
dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal dbpedia:Montreal
foaf:based_nearfoaf:based_near
3,407,9633,407,9633,407,9633,407,963
dpprop:dpprop:populationpopulation
pd:linguopd:linguopd:linguopd:linguo foaf:Personfoaf:Personfoaf:Personfoaf:Personrdf:typerdf:type
Linkun GuoLinkun GuoLinkun GuoLinkun Guo
foaf:namefoaf:name
dbpedia:Beijingdbpedia:Beijingdbpedia:Beijingdbpedia:Beijing
foaf:based_nearfoaf:based_near
20,693,00020,693,00020,693,00020,693,000
dpprop:populationdpprop:population
foaf:knowsfoaf:knows
dbpedia:Canadadbpedia:Canadadbpedia:Canadadbpedia:Canada
dbpedia-owl:countrydbpedia-owl:country
dbpedia-owl:countrydbpedia-owl:countrydbpedia:Torontodbpedia:Torontodbpedia:Torontodbpedia:Toronto
dbpedia:Quebecdbpedia:Quebecdbpedia:Quebecdbpedia:Quebec dbpedia-owl:countrydbpedia-owl:country
![Page 8: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/8.jpg)
How is it Open ?• ‘‘If you want to start interlinking data then you can only do that if the data is
licensed in a way that allows such interlinking.’’
• But why is Open data on the Web not ‘linked’?
• CVS, XML, RDBs• no easy integration
• Web 2.0 Mashups?• data sources fixed
• Linked Open Data (LOD) cloud - global data space
Rufus Pollock
![Page 9: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/9.jpg)
The LOD cloud family picture
Sept. 2011
![Page 10: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/10.jpg)
What for?• Linking Open Drug Data (LODD), since 2008
• Publish/interlink publicly available data about drugs
• Provide answers to non trivial questions on the LODD
• For physicians
• Which are the equivalent drugs for a given condition?
• What drugs are currently under clinical trial?
• For patients
• What alternatives exist to a given drug?
• What are the contraindications for a drug?
![Page 11: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/11.jpg)
Supplemental Slides
Supplemental Slides
Petko Valtchev
(Assoc. Prof., Dept. of CS, UQAM)
ODX’13
Montreal, April 6th
![Page 12: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/12.jpg)
Main Entry Points into the LOD cloud
• DBPedia - a large multi-domain dataset containing extracted data from Wikipedia; it contains about 3.77M concepts, 400+M facts with abstracts in 11 different languages.
• YAGO - precise knowledge base with 1.7M entities and 15M facts derived from Wikipedia and WordNet.
• FOAF (Friend Of A Friend) - describes people, the links between them and the things they create and do.
• GoodRelations - a vocabulary for eCommerce, enabling web sites to publish details of their products and services in a machine-readable way.
• GeoNames - provides RDF descriptions of more than 6.5M geographical features worldwide.
![Page 13: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/13.jpg)
Cross-Media Cultural Heritage Management with LOD
• Simon is a Maths student visiting Montreal. He is fond of reading, cinema, music and history. His friends recommended him the flourishing Mile End district where many cafés serve espresso and european pastry.
• Once settled down in a bar, he opens his iPad to look what is exciting about the surroundings. Knowing his preferences, the mobile app suggests him an excerpt from a novel written by the local "infant du quarter", Mordecai Richler, called "The Apprenticeship of Duddy Kravitz". The excerpt describes the life of the Jewish community on two of the area's principal streets, St Urban St., and "The Main" St. in the 1930s.
• Once finished, Simon feels intrigued and accepts the suggestion to go for a short walk looking for remains from that period. While sipping his coffee, Simon checks the author's biography and finds he has written another book, "Barney's Version".
• After screening a summary, it is suggested to look at the eponimous film directed by Richard J. Lewis. While watching a trailer, he noticed the youthful red-haired actress playing the 1st wife of the main character and after querying the app’s knowledge base he learns that's Rachelle Lefevre who's born in Montreal.
• Before walking out, he checks the availability of a copy of "Barney's Version" and discovers that he can find one in the local municipal library.
• When on the go, the system plays "I'm your man" a song by Leonard Cohen, another literary celebrity from Montreal.
![Page 14: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/14.jpg)
The Semantic Annotations : RDFa• RDFa serializes RDF through HTML attributes
• similar to microformats
• @resource, @property, @href, @instanceof, @rel, etc.
![Page 15: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/15.jpg)
Cool applications of semantic annotations
• Semantic query answering:
• Where do my colleagues live?
• Possible answers from their own web pages (via Trudat HP)
• dbpedia:Montrealdbpedia:Montreal
• dbpedia:Lavaldbpedia:Laval
• dbpedia:Torontodbpedia:Toronto
• What are their dietary restrictions?
![Page 16: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/16.jpg)
Practical take on OD vs LOD• OD for social justice in US (say Atlanta)?
• Dataset 1: census data
• Focus on particular area with houses distinguished
• inhabited by black people vs white people
• Dataset 2: water supply data, houses connected to water lines or not
• By superposing datasets 1 and 2, analysis uncovered a discrimination
• ~83 % of the unconnected houses were inhabited by black people!!!
• How was it done (a guess)
• matching between addresses as strings compared :-(
• LOD format - simpler and more reliable processing:
• finding paths in the graph
![Page 17: Linking the Open Data? by Petko Valtchev](https://reader033.fdocuments.in/reader033/viewer/2022061112/5459153caf795953128b4c45/html5/thumbnails/17.jpg)
Data about the Data• Reasoning about the dataset:
• Metadata:
• e.g. Dublin core vocabulary
• Notion of provenance
• The problem of trust: everybody could publish everything