Weaving the Pedantic Web (LD
-
Upload
aidan-hogan -
Category
Technology
-
view
3.294 -
download
0
description
Transcript of Weaving the Pedantic Web (LD
![Page 1: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/1.jpg)
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
0:39:00 1
Weaving the Pedantic Web
LDOW 2010Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan
Decker, Axel Polleres
![Page 2: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
2
Linked Data…
![Page 3: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
3
Purpose of talk: Application developers… how to not sink…
![Page 4: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
4
Purpose of talk: RDF Publishers… how to avoid common mistakes…
![Page 5: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
5
Talking about errors in Linked Data…
We’ll try not to ruin the party
…statistics based on crawl: April 2009 5k domain limit 150k URIS, 55k RDF docs 12.5m triples (quads) Mentioning 1.6m URIs 5,850 classes/9,507 props Accept: application/rdf+xml
…okay… so no RDFa
Statistics are *illustrative* not exhaustive!
![Page 6: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
6
Chapter 1: HTTP-level issues… …a good RDF description these days is hard to find
![Page 7: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
7
Waldo URIs: URIs with no dereferencable RDF
Not a crawler’s idea of fun…
![Page 8: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
8
Hmm not *so* many…
5.3% of HTTP URIs return 40x/50x Excluding redirects… 92.8% return 200 OK
In return, only 45.4% of 200 Okay return report application/rdf+xml
34.8% return HTML… probably just HTML docs… okay… maybe a *few* contain RDFa
![Page 9: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
9
Lies… Damned Lies… & Content-Type Reporting
“Trust me, it’s RDF/XML”
![Page 10: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
10
Okay… So he’s actually pretty honest
16.9% of valid RDF/XML documents returned with an invalid/more generic Content-type:
text/xml (9.5%)application/xml (5.9%)text/plain (1%)text/html (0.4%)
Of those returning Content-type:application/rdf+xml
98.8% were valid RDF/XML
![Page 11: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
11
I wish they’d used a redirect…
Same triples, different document
![Page 12: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
12
E.g., the Miracle at Calais: turning 1,778 triples into ~∞ quads
http://d.opencalais.com/1/type/em/r/SameTriplesDifferentDocument
(apologies to OpenCalais guys – it’s just a convenient example)
![Page 13: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
13
Chapter 2: Reasoning issues… …or, how I learned to start worrying and stop loving
OWL
![Page 14: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie
14
It looks important, but I’m afraid I don’t fully follow
Undefined classes and properties…
![Page 15: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
15
Quite common…
14.3% of triples use undeclared property 8.1% of triples use undeclared class
Three cases:
Case 1: Namespace has no vocabulary/is not deferencable
(e.g., rss:item) Case 2: Term invented in related namespace (e.g., foaf:tagLine invented by LiveJournal) Case 3: Term is misspelt version of term defined in namespace (e.g., foaf:image vs. foaf:img)
![Page 16: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
16
Despite what you claim, not all of you can *actually be* Spartacus
Not-so-unique values for Inverse-Functional Properties
![Page 17: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
17
Spartacus relived…
08445a31a78661b5c746feff39a9db6e4e2cc5cf
sha1-sum of ‘mailto:’ common value for foaf:mbox_sha1sum
An inverse-functional (uniquely identifying) property!!!
Any person who shares the same value will be considered the same
*I’m Spartacus!*…and so’s my wife
![Page 18: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
18
…unattended, can be pretty serious…
foaf:mbox_sha1sum a owl:InverseFunctionalProperty .
?x foaf:mbox_sha1sum 08445a31a78661b5c746feff39a9db6e4e2cc5cf .
OWL 2 RL rule prp-ifp: ?p a owl:InverseFunctionalProperty . ?x1 ?p ?z . ?x2 ?p ?z .
⇒ ?x1 owl:sameAs ?x2 .
106 ?x1/?x2 bindings in body 1012 inferred pair-wise and reflexive owl:sameAs statements
…or in simpler terms: pow!
![Page 19: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
19
As he would undoubtedly be able to tell you, “true” is not a valid xsd:int
Malformed/incompatible datatypes
![Page 20: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/20.jpg)
Digital Enterprise Research Institute www.deri.ie
20
Not *too* bad…
4.7% of typed literals were “ill-typed” (lexically invalid)… mostly xsd:dateTimes (26.4% of all date-time
literals were invalid; e.g., omitted the seconds field)
Also, literals are sometimes incompatible with the datatype-range of a property: E.g., 21.8% of ical:description triples used
language tags incompatible with the defined range of xsd:string
E.g., 100% of sl:creationDate triples use plain literal values incompatible with defined range of xsd:date
![Page 21: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/21.jpg)
Digital Enterprise Research Institute www.deri.ie
21
Despite what FOAF says, it seems thatPersons can also be Documents
Mystical beings… Members of disjoint classes
![Page 22: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/22.jpg)
Digital Enterprise Research Institute www.deri.ie
22
Again, not *too* bad…
1,329 members of disjoint classes found
Generally caused by naïve URI naming: Use of information resource URIs to name
entities (particularly foaf:Persons) E.g., <me> foaf:knows <jim/foaf.rdf> .
![Page 23: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/23.jpg)
Digital Enterprise Research Institute www.deri.ie
23
Anybody can say anything, anywhere, and unfortunately for everyone else, have a good chance of being taken
seriously
Ontology hijacking…
![Page 24: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/24.jpg)
Digital Enterprise Research Institute www.deri.ie
24
From http://www.eiao.net/rdf/1.0<owl:Property rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
<rdfs:label xml:lang="en">type</rdfs:label><rdfs:comment xml:lang="en">Type of resource</rdfs:comment><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#testRun"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#pageSurvey"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#siteSurvey"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#scenario"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#rangeLocation"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#startPointer"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#endPointer"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#header"/><rdfs:domain rdf:resource="http://www.eiao.net/rdf/1.0#runs"/>
</owl:Property>
Ontology hijacking!!(apologies to EIAO guys – it’s just a convenient example)
Redefining Everything… …and home in time for tea
![Page 25: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/25.jpg)
Digital Enterprise Research Institute www.deri.ie
25
Solutions?
![Page 26: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/26.jpg)
Digital Enterprise Research Institute www.deri.ie
26
All presented issues have a suitable antidote, once you know about them
See paper for discussion…
Application side: workarounds
![Page 27: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/27.jpg)
Digital Enterprise Research Institute www.deri.ie
27
Syntax errors quite rare, partly due to popularity of W3C RDF/XML syntax validator
Need an all-in-one validation service Should not only validate strict errors, but
give feedback on suspected issues We offer a prototypical service at:
http://swse.deri.org/RDFAlerts/
Publishing side: Validators!
![Page 28: Weaving the Pedantic Web (LD](https://reader034.fdocuments.in/reader034/viewer/2022051514/54b67e2d4a79590b548b45a1/html5/thumbnails/28.jpg)
Digital Enterprise Research Institute www.deri.ie
28
Get the community to contact publishers about errors/issues as they arise
Get involved: http://pedantic-web.org/ 137 members!
Acknowledgements to: Aidan Hogan, Alex Passant, Me, Antoine Zimmermann, Axel Polleres, Michael Hausenblas, Richard Cyganiak, Stéphane Corlosquet
Publishing side: Pedantic Web Group