Download - How to get your data into Sindice and Google with sitemap4rdf

Page 1: How to get your data into Sindice and Google with sitemap4rdf

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute

How to get your data into Sindice and Google with

sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)

Page 2: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Publishing Linked Data

from a triple store

Page 3: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Linked Data frontends for triple stores

Source: Pubby website,

Page 4: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Search engines

Page 5: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sindice: the best RDF search engine

Page 6: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sindice: the best RDF search engine

120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats

Page 7: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

The Sitemap protocol

Page 8: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sitemap Protocol

Used by web crawlers Efficiently find all your content &

discover what has been updated

Page 9: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns=""> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>

Page 10: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sitemap Protocol: Optional parts

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns=""> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>

Page 11: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sitemap Protocol: Huge sitemaps

Gzip-compress your sitemap Limit: 50k URLs or 10MB

split into multiple sitemap filesadd a sitemap index file

Page 12: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Sitemap Protocol: Discovery

Publish the sitemap file Add a line to http://yoursite/robots.txt

Sitemap: http://yoursite/sitemap.xml

Page 13: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

sitemap4rdfGenerate Sitemap files from a SPARQL endpoint

Page 14: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute


Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap

sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Page 15: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Submit the sitemap location - Sindice

Page 16: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute

Submit the sitemap location - Google

Page 17: How to get your data into Sindice and Google with sitemap4rdf

Digital Enterprise Research Institute


Sitemap protocol informs search engines about available pages Supported by Sindice!

sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java