Extending DBpedia (LOD) using WikiTables
-
Upload
net2-project -
Category
Technology
-
view
565 -
download
0
Transcript of Extending DBpedia (LOD) using WikiTables
![Page 1: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/1.jpg)
Extending DBpedia (LOD) using WikiTables
Emir Muñoz
Unit for Reasoning and Querying
![Page 2: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/2.jpg)
Linked Open Data
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
October 12, 2012 -- E. Muñoz
![Page 3: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/3.jpg)
Linked Open Data
• DBpedia, an export of Wikipedia’s structured data
DBpedia provides RDF version of all wikipedia structured data (infoboxes)
October 12, 2012 -- E. Muñoz
![Page 4: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/4.jpg)
Linked Open Data
• DBpedia, an export of Wikipedia’s structured data
DBpedia provides RDF version of all wikipedia structured data (infoboxes)
But not yet a version of all normal Wikipedia tables or wikitables
October 12, 2012 -- E. Muñoz
![Page 5: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/5.jpg)
Tables as a source of LOD
http://en.wikipedia.org/wiki/Dublin
Caption as another row
Column header represents types of information
The values represent
instances of that types
http://en.wikipedia.org/wiki/Galway
Infoboxes (attr-value)
October 12, 2012 -- E. Muñoz
Tables are inherently concise as well as information rich
![Page 6: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/6.jpg)
Reasoning over Wikipedia Tables
http://en.wikipedia.org/wiki/Dublin
Recovering Table Semantics …
October 12, 2012 -- E. Muñoz
Dublin is twinned with the following places:
![Page 7: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/7.jpg)
Reasoning over Wikipedia Tables
dbpedia.org/resource/San_Jose,_California
dbpedia.org/resource/Liverpool
dbpedia.org/resource/Matsue,_Shimane
dbpedia.org/resource/Barcelona
dbpedia.org/resource/Beijing
dbpedia.org/resource/United_States
dbpedia.org/resource/United_Kingdom
dbpedia.org/resource/Japan
dbpedia.org/resource/Spain
dbpedia.org/resource/People’s_Republic_of_China
dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since
http://en.wikipedia.org/wiki/Dublin
Entity annotation for cells, mappings to DBpedia resources
(xsd:integer)
October 12, 2012 -- E. Muñoz
![Page 8: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/8.jpg)
Reasoning over Wikipedia Tables
dbpedia.org/resource/San_Jose,_California
dbpedia.org/resource/Liverpool
dbpedia.org/resource/Matsue,_Shimane
dbpedia.org/resource/Barcelona
dbpedia.org/resource/Beijing
dbpedia.org/resource/United_States
dbpedia.org/resource/United_Kingdom
dbpedia.org/resource/Japan
dbpedia.org/resource/Spain
dbpedia.org/resource/People’s_Republic_of_China
(xsd:integer)
dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since
dbpedia.org/ontology/country dbpedia.org/property/subdivisionName
is dbpedia.org/ontology/country of
http://en.wikipedia.org/wiki/Dublin
Extracting relations
October 12, 2012 -- E. Muñoz
![Page 9: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/9.jpg)
Reasoning over Wikipedia Tables
• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .
• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .
• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .
• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .
• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .
• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .
• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .
• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .
• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> .
• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> .
October 12, 2012 -- E. Muñoz
![Page 10: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/10.jpg)
Reasoning over Wikipedia Tables
• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> .
• <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> .
• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> .
• <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> .
• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> .
• <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> .
• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> .
• <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> .
• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> .
• <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> .
October 12, 2012 -- E. Muñoz
![Page 11: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/11.jpg)
Reasoning over Wikipedia Tables
• Let’s analyze these cases …
• Liverpool
• Matsue
• Beijing
October 12, 2012 -- E. Muñoz
![Page 12: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/12.jpg)
Not that simple…
• Web tables usually don’t have explicit semantics by themselves.
• Main issues:
– Complex tables with spans
– Captions inside the table as another row
– Not well-formed tables (i.e., not a matrix)
– We need filters (e.g., min 2 columns, 2 rows)
• We are extracting relations at row level and between the main entity and the table resources
October 12, 2012 -- E. Muñoz
![Page 13: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/13.jpg)
Parsing: Extracting Tables
http://en.wikipedia.org/wiki/People%27s_Republic_of_China
Caption as another row
Table split
October 12, 2012 -- E. Muñoz
Rowspans with pictures
First step: parsing Wiki format
![Page 14: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/14.jpg)
Parsing: Extracting Tables
• Problems with parsing the cell’s content
http://en.wikipedia.org/wiki/Danny_Kaye
October 12, 2012 -- E. Muñoz
![Page 15: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/15.jpg)
Parsing: Extracting Tables
• Problems with parsing the cell’s content
http://en.wikipedia.org/wiki/Danny_Kaye
October 12, 2012 -- E. Muñoz
![Page 16: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/16.jpg)
Parsing: Extracting Tables
Same page link Many different formats
Anchor text vs.
Content text
http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s
October 12, 2012 -- E. Muñoz
![Page 17: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/17.jpg)
Extracting Relations
A table containing tables
http://en.wikipedia.org/wiki/AFC_Ajax
October 12, 2012 -- E. Muñoz
![Page 18: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/18.jpg)
Extracting Relations
• Also relations between the main entity and the entities in the table
dbpedia.org/resource/AFC_Ajax
14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs
In his dbpedia page there is no mention
to AFC Ajax
http://en.wikipedia.org/wiki/AFC_Ajax
16 players
October 12, 2012 -- E. Muñoz
![Page 19: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/19.jpg)
dbpedia.org/resource/Christian_Eriksen
Disambiguation page dbpedia.org/resource/Ajax
http://en.wikipedia.org/wiki/AFC_Ajax
October 12, 2012 -- E. Muñoz
![Page 20: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/20.jpg)
Our Dataset
• enwiki dump from 2012-09-03 02:17:37
• 8.6 GB of Wikipedia pages that comprise
– 10,531,986 documents (HTML pages)
– Only 413,256 HTML contains tables
– 2,989,098 tables
– 905,929 tables after the filter
• 27.7% of the whole tables
– 0.46 tables per page (or 2.15 discarding pages without tables)
October 12, 2012 -- E. Muñoz
![Page 21: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/21.jpg)
Methodology
October 12, 2012 -- E. Muñoz
![Page 22: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/22.jpg)
Ranking of Relationships
• The current ranking function is naïve
October 12, 2012 -- E. Muñoz
http://en.wikipedia.org/wiki/AFC_Ajax
16 players
freq relationship score
14 dbpedia.org/ontology/team 0,875
14 dbpedia.org/property/clubs 0,875
11 dbpedia.org/property/currentclub 0,6875
3 dbpedia.org/property/youthclubs 0,1875
𝑠𝑐𝑜𝑟𝑒 =𝑓𝑟𝑒𝑙𝑛𝑟𝑜𝑤𝑠
![Page 23: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/23.jpg)
Ranking of Relationships
• For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1]
October 12, 2012 -- E. Muñoz
http://en.wikipedia.org/wiki/Danny_Kaye
![Page 24: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/24.jpg)
Ongoing Work and Challenges
• Improve the ranking function for relations.
• Store the 5.5M DBpedia (transitive) redirects locally (optimizing time).
• Statistical analysis of Wikipedia tables
– Number of columns, rows
– Headers, Captions
– External and internal links
• The big following challenge is the evaluation.
October 12, 2012 -- E. Muñoz
![Page 25: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/25.jpg)
What’s next?
• Some ideas in mind:
– Use the extracted relations to classify WikiTables
– Define a similarity function for WikiTables
English Italian
October 12, 2012 -- E. Muñoz
![Page 26: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/26.jpg)
What’s next?
October 12, 2012 -- E. Muñoz
http://en.wikipedia.org/wiki/Electronegativity
What means this number?
Here there is no reference to those numbers!
![Page 27: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/27.jpg)
What’s next?
October 12, 2012 -- E. Muñoz
http://en.wikipedia.org/wiki/Electronegativity
http://en.wikipedia.org/wiki/Chlorine
Chlorous acid is a chlorite
http://dbpedia.org/page/Chlorous_acid
![Page 28: Extending DBpedia (LOD) using WikiTables](https://reader034.fdocuments.in/reader034/viewer/2022052522/554975c9b4c905d8558b5844/html5/thumbnails/28.jpg)
Open problems
• Handle multiple-entities in the same cell
• Improve the ranking function
• Handle redirects before querying DBpedia
• How to evaluate the outcome
October 12, 2012 -- E. Muñoz
Thanks! Q & A
Thanks! Emir Muñoz
Unit for Reasoning and Querying