The Web of Data as a Massively Scalable NoSQL Database

Post on 08-May-2015

632 views 4 download

description

Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. It leverages fundamental characteristics of Web architecture (loose coupling, decentralization, simple and well defined access patterns) and builds on RDF (a W3C standard data model). We'll give a brief overview of RDF and show how Linked Data principles decouple its use for interoperability and data modelling from the "heavyweight" Semantic Web baggage that has long been considered a barrier to entry. The characteristics that allowed the Web to scale so quickly and widely include decentralization, a massively distributed architecture, an absence of integrity constraints, and weak guarantees about consistency. The Web of data aims to achieve the same end for data, promoting it to a first class Web citizen and making linking data as easy and ubiquitous as linking HTML documents. Many of the same characteristics that make the Web so successful and scalable also apply to the Web of Data. The rise of NoSQL databases is a response to the changing requirements of Web scale data. Typically these databases deliver performance at scale by relaxing consistency guarantees, eschewing transactions, using flexible data models and distributed architectures, and placing constraints on access patterns. Linked Data and RDF turn the Web itself into a decentralized and massively scalable sparse column store with globally identifiable column names; an enormous, globally distributed repository of linked, structured data. In this talk we will highlight the common characteristics of various flavors of NoSQL database and the Web of Data. We will also discuss important differences, and outline the trade-offs involved when deciding on a storage solution for your application data, such as the importance of query performance, availability or ACID transactions. We will be delving into concerns around: Scalability Data portability Common query languages Tool chain interoperability

Transcript of The Web of Data as a Massively Scalable NoSQL Database

The Web of Data as a NoSQL Database

Sam Tunnicliffe@beobal

Talis Systems Ltd

http://talis.comhttp://github.com/talis

NoSQL Now! 2011

version 1.0

entity retrievalusing xDBC & ORM

or custom SQL

schema-last

entity retrievalusing store specific

protocols andclients

sharded, polyglot storage

sharding strategymay be encapsulatedby clients/servers or

may require theapplication to handlerouting/addressingas well as managing

store specificprotocols and

clients

schema knowledgeresides in application

or access layer

What if you could use the Web as a database?

loose coupling

http://www.flickr.com/photos/11950mike/4707805552

outsource data acquisition costs

http://www.flickr.com/photos/juniorvelo/2861770108

proven, extreme scalability

http://www.flickr.com/photos/krayker/2268587409

leverage existing infrastructure

http://www.flickr.com/photos/ranjithsiji/4897513366

more and more diverse data

http://www.flickr.com/photos/mandy_pantz/2512569926

serendipity

http://www.flickr.com/photos/sylvar/3291628571

high latency

http://www.flickr.com/photos/zivkovic/5850008238

giving away control

http://www.flickr.com/photos/kecko/4052526123

variable availability

http://www.flickr.com/photos/numberstumper/3057162582

global names

global names

1969-059A

global names

1969-059A1969-059Aspacecraft/1969-059A

global names

1969-059A1969-059Aspacecraft/1969-059A

nasa.dataincubator.org/spacecraft/1969-059A

URIs for entity names

1969-059A1969-059Aspacecraft/1969-059A

nasa.dataincubator.org/spacecraft/1969-059Ahttp://nasa.dataincubator.org/spacecraft/1969-059A

things have attributes

mass 28801.1

things have attributes

mass 28801.1name “Apollo 11 CSM”

things have attributes

mass 28801.1name “Apollo 11 CSM”

launch launch/1969-059

URIs for attribute names

http://purl.org/net/schemas/space/mass 28801.1http://xmlns.com/foaf/0.1/name “Apollo 11 CSM”

http://purl.org/net/schemas/space/launch launch/1969-059

links

http://www.flickr.com/photos/juniorvelo/457197656

dereference to get data

DNS is your routing component

http://www.flickr.com/photos/cjschmit/4623783487

RDF and linked data

subject

predicate

object

RDF and linked data

1969-59A

launch

launch/1969-59

RDF and linked data

1969-59A

launch

launch/1969-59

launch date: 16 July 1969launch vehicle: Saturn Vweather: clear, dry

mass: 28801.1name: Apollo 11 CSM

launch/1969-59

1969-059A

Mexico

Apollo 11

Canada

United States

Cape Canaveral

RDF and linked data

launch date: 16 July 1969launch vehicle: Saturn Vweather: clear, dry

nasa.gov

geonames.org

Washington D.C.

alternate name: Stati Unitialternate name: Estados Unidosalternate name: アメリカ合衆国population: 311,874,000

web enabled data

entity lookups come from

authoritative sources

routes between linked entities isexplicit in data

DNS does the hard work

web enabled data

realtime discoveryof additionaldata sources

web enabled data

expandeddata universe

simplified access protocol

but some thingsare now outside of your control

local caches

http://www.flickr.com/photos/vhanes/3722327096

outcomes

http://www.flickr.com/photos/carbonnyc/293733099

shared effort

http://www.flickr.com/photos/toffehoff/244870160/

more simple data integration

http://www.flickr.com/photos/thedailyenglishshow/3947409618/

more linked data

http://www.flickr.com/photos/ninjanoodles/114033269

network effects

http://www.flickr.com/photos/asurroca/66225176

● using global names● for entities ● for attributes

● using standard formats● making data dereferenceable● linking to other data

use the web as a database by...

http://www.flickr.com/photos/ryanwick/3461847552

thank you

http://talis.com