Sitemap4rdf(v2 boris)
-
Upload
boris-villazon-terrazas -
Category
Technology
-
view
1.201 -
download
7
description
Transcript of Sitemap4rdf(v2 boris)
sitemap4rdfgenerate Sitemap files from a SPARQLgenerate Sitemap files from a SPARQL
endpointhttp://www deri ie/http://www.deri.ie/
Boris Villazón-Terrazas and Richard Cyganiak (DERI)Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn 28660 Boadilla del Monte MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net
Phone: 34.91.3366605, Fax: 34.91.3524819
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
2
Linked Data frontends for triple stores
Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/
3
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
4
Sindice: the best RDF search engine
5
Sindice: the best RDF search engine
• 120M+ documentsC ti l d ti i 2006• Continuously updating since 2006
• Search APISearch API• RDF/XML, Turtle, RDFa, microformats
6
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
7
Sitemap Protocol
• Used by web crawlers• Efficiently find all your content & discover
what has been updatedhttp://sitemaps.org/
A i fil i i f i di URLA sitemap file contains information regarding one or more URLs onyour Web site. The information that is stored there helps searchengines better spider your website.
8
Sitemap Protocol: Simple example
<?xml version="1.0" encoding="UTF-8"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url>
<loc>http://yoursite/</loc></url><url>
<loc>http://yoursite/products/53546</loc>oc ttp://you s te/p oducts/535 6 / oc</url><url>
<loc>http://yoursite/products/98421</loc><loc>http://yoursite/products/98421</loc></url><url>
<loc>http://yoursite/products/41003</loc></url>
</urlset>
9
Sitemap Protocol: Optional parts
<?xml version="1.0" encoding="UTF-8"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url>
<loc>http://yoursite/</loc><lastmod>2010-06-24</lastmod>< h f >d il </ h f ><changefreq>daily</changefreq>
</url></urlset>
10
Sitemap Protocol: Huge sitemaps
• Gzip-compress your sitemap• Limit: 50k URLs or 10MB
• split into multiple sitemap filessplit into multiple sitemap files• add a sitemap index file
11
Sitemap Protocol: Discovery
• Publish the sitemap file
• Add a line to http://yoursite/robots.txt
• Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
Sitemap: http://yoursite/sitemap.xml
12
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
13
sitemap4rdf
• Simple command line tool• Sends a SPARQL query to list all URIs• Generates sitemap• Generates sitemap
it 4 df htt // it / l htt // it / /sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Example:
it 4 df if i th SPARQL d i t
sitemap4rdf http://geo.linkeddata.es/sparql http://geo.linkeddata.es/
• run sitemap4rdf specifying the SPARQL endpointand the prefix of the URLs to include in the Sitemap
14
Submit the sitemap location - Sindice
• http://sindice.com/main/submit
15
Submit the sitemap location - Google
• https://www.google.com/webmasters/tools/
16
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
17
Summary
• Sitemap protocol informs search engines about available pagesavailable pages• Supported by Sindice!
• sitemap4rdf generates Sitemap files by listing URIsin a SPARQL endpoint• Open source, Java• http://lab.linkeddata.deri.ie/2010/sitemap4rdf/• http://mccarthy dia fi upm es/sitemap4rdf/• http://mccarthy.dia.fi.upm.es/sitemap4rdf/• http://www.oeg-upm.net/index.php/en/downloads/122-sitemap4rdf
18
ToC
• Publishing Linked Data from a triple store• Publishing Linked Data from a triple store• Search engines
The Sitemap protocol• The Sitemap protocol• sitemap4rdf
S• Summary• Future work
19
Future Work
• Integrate sitemap4rdf with Pubby
• Generate voiD file automatically from a SPARQL endpoint
• Generate an entry in CKAN (registry of open knowledge packages) automatically through CKAN-API
http://ckan net/package/geolinkeddata• http://ckan.net/package/geolinkeddata
• Interact with prefix cc ( service for remembering and• Interact with prefix.cc ( service for remembering and looking up URI prefixes) through its API• geoes: < http://geo.linkeddata.es/ontology>geoes: http://geo.linkeddata.es/ontology
20
Future Work
• Support the semantic sitemap extension (when it willbe compatible with google)be compatible with google)• http://sw.deri.org/2007/07/sitemapextension/
21
sitemap4rdfgenerate Sitemap files from a SPARQLgenerate Sitemap files from a SPARQL
endpointhttp://www deri ie/http://www.deri.ie/
Boris Villazón-Terrazas and Richard Cyganiak (DERI)Facultad de Informática, Universidad Politécnica de Madrid
Campus de Montegancedo sn 28660 Boadilla del Monte MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madridhttp://www.oeg-upm.net
Phone: 34.91.3366605, Fax: 34.91.3524819