KVIV / NoSQL : the new generation of database servers

116
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org NoSQL de nieuwe generatie van database servers KVIV IT - 3/6/2010 http://www.flickr.com/photos/wolfgangstaudt/2215246206/

description

presentation for KVIV on 2010/06/03 as usual with my presentations: if you we're not there, you missed half of the fun as some of the more important ideas are not in the slides

Transcript of KVIV / NoSQL : the new generation of database servers

Page 1: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NoSQLde nieuwe generatie van database serversKVIV IT - 3/6/2010

http://www.flickr.com/photos/wolfgangstaudt/2215246206/

Page 2: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Who am I

» Steven Noels - [email protected]

»Outerthought : scalable content applications

»makers of Daisy and Lily open source CMS

2

Page 3: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

An evolution driven by pain.

Page 4: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

4

hierarchical databases

IMS

OODBMS

XMLDB RDBMS

1. standardization

2. simplification

Page 5: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

History

5

RDBMS NOSQL

cachingdenormalisationshardingreplication ...

3. pain

4. rethinkingthe problem

Page 6: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scalability

Page 7: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Numbers of scale

7

http://qos.doubleclick.net/counters/

Page 8: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Types of scaling

8

» scaling for usage» volume of users

» volume of data

availabilityreplication

» scaling types of ops» concurrent read

» concurrent write

partioningconsistency

distribution

Page 9: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling

9

database app server users

Page 10: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling

10

database app server users

Page 11: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling: vertical partitioning

11

database app server users

Page 12: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling: horizontal partitioning

12

databases app servers users

complexity

Page 13: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling through architecture

13

data layer app layer users

Page 14: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Distributed systems are hard !

Page 15: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

8 fallacies of distributed computing» The network is reliable.

» Latency is zero.

» Bandwidth is infinite.

» The network is secure.

» Topology doesn't change.

» There is one administrator.

» Transport cost is zero.

» The network is homogeneous.

15

Pete

r D

euts

ch a

nd Ja

mes

Gos

ling

Page 16: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Scaling relational systems

16

»When scaling relational systems you loose their advantages but retain their overhead

»The pain is all about locking (i.e. writes)

»Caching alleviates the read pain to the cost of complexity

Page 17: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17

The Perspective of Cost

Page 18: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Enter NoSQL

Page 19: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

19

Page 20: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20

?Buzz-oriented development

Page 21: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cambrian Explosion

21

N-O-SQL

Page 22: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22

Page 23: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Common themes

23

» SCALE SCALE SCALE

» new datamodels

» devops

»N-O-SQL

»The Cloud :technology is of no interest anymore

Page 24: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

New Data

» sparse structures

»weak schemas

» graphs

» semi-structured

» document-oriented

24

Page 25: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NoSQL

»Not a movement.

»Not ANSI NoSQL-2010.

»Not one-size-fits-all.

»Not (necessarily) anti-RDBMS.

»No silver bullet.

25

Page 26: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NoSQL = pro Choice

26

Page 27: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NoSQL = toolbox

27

Page 28: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The NOSQL footprint

28

AC

ID,

sim

ple

oper

atio

nal

const

rain

ts

free-structured or sparse data

SQL

NOSQL

referential integrity,typed data

high

ly scalable an

davailab

le (com

plex

ity)

HBase

Cassandra

CouchDB

MongoDB

neo4j

Page 29: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL, if you need ...

» horizontal scaling (out rather than up)

» unusually common data (aka free-structured)

» speed (especially for writes)

» the bleeding edge

29

Page 30: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

SQL/RDBMS, if you need ...

» SQL

»ACID

» normalisation

» a defined liability

30

Page 31: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Theory

Page 32: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Academic background

»Amazon Dynamo

»Google BigTable

» Eric Brewer CAP theorem

32

Page 33: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Amazon Dynamo

33

» coined the term ‘eventual consistency’

» consistent hashing

Page 34: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Consistent hashing

34

- node C+ node D

http://www.lexemetech.com/2007/11/consistent-hashing.html

Page 35: KVIV / NoSQL : the new generation of database servers

»multi-dimensional column-oriented database

» on top of GoogleFileSystem

» object versioning

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Google BigTable

35

Page 36: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP theorem

36

strong consistency

highavailability

partition-tolerance

Page 37: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CAP

»Strong Consistency: all clients see the same view, even in the presence of updates

»High Availability: all clients can find some replica of the data, even in the presence of failures

»Partition-tolerance: the system properties hold even when the system is partitioned

37

Page 38: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Culture Clash

38

»ACID» highest priority: strong

consistency for transactions

» availability less important

» pessimistic

» rigorous analysis

» complex mechanisms

» BASE» availability and scaling

highest priorities

» weak consistency

» optimistic

» best effort

» simple and fast

spectrum

Page 39: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Availability ≠ total async !

Page 40: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

The Enterprise Service Bus

40

✘bus =

congestion

Page 41: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Bus systems

41

» objects don’t fit in a pipe

» object ➙ message

» serialization / de-serialization cost

»message size

» queuing = cost

Page 42: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Use a mixture of both

»async + sync

42

stuff which matters !

Page 43: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets :

Map/Reduce

Page 44: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Hadoop: HDFS + MapReduce» single filesystem + single execution-space

44

Page 45: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MapReduce example: WordCount

45

Page 46: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Hadoop ecosystem» Hadoop Common

» Subprojects

» Chukwa: A data collection system for managing large distributed systems.» HBase: A scalable, distributed database that supports structured data storage for

large tables.» HDFS: A distributed file system that provides high throughput access to application

data.» Hive: A data warehouse infrastructure that provides data summarization and ad hoc

querying.

» MapReduce: A software framework for distributed processing of large data sets on compute clusters.

» Pig: A high-level data-flow language and execution framework for parallel computation.

» ZooKeeper: A high-performance coordination service for distributed applications.» Mahout: machine learning libaries

46

Page 47: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Processing large datasets with MR

47

»Benefit from parallellisation

» Less modelling upfront (ad-hoc processing)

»Compartmentalized approach reduces operational risks

»AsterData et al. have SQL/MR hybrids for huge-scale BI

Page 48: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Market overview

Page 49: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Four Trends

»Trend 1 : Data Size

»Trend 2 : Connectedness

»Trend 3 : Semi-structure

»Trend 4 : Architecture

49

Page 50: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 50

2006 2007 2008 2009 2010

0

250

500

750

1000

161

253

397

623

988

ExaBytes (10!") of data stored per year

3

Trend 1: Data size

Data source: IDC 2007

Each year more and more digital data is created. Over two years we create more digital data than all the data created in history before that.

Page 51: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 51

Trend 2: Connectedness

4

Text documents

1990

Info

rmat

ion

conne

ctiv

ity

Folksonomies

Tagging

User-generated

contentWikis

RSS

Blogs

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0”

Ontologies

RDF

Giant

Global

Graph (GGG)

Over time data has evolved to be more and more interlinked and connected.Hypertext has links,Blogs have pingback,Tagging groups all related data

Page 52: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52

Trend 3: Semi-structure

5

! Individualization of content

• In the salary lists of the 1970s, all elements had exactly one job

• In the salary lists of the 2000s, we need 5 job columns! Or 8?

Or 15?

!All encompassing “entire world views”

• Store more data about each entity

!Trend accelerated by the decentralization of content generation

that is the hallmark of the age of participation (“web 2.0”)

Page 53: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 53

Trend 4: Architecture

6

DB

Application

1980s: Mainframe applications

Page 54: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 54

Trend 4: Architecture

7

DB

Application

1990s: Database as integration hub

Application Application

Page 55: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 55

DBDB DB

Trend 4: Architecture

8

Application

2000s: (moving towards) Decoupled serviceswith their own backend

Application Application

Page 56: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Products overview

Page 57: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

We welcome the Polyglot Persistence overlords.

Page 58: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Categories

» key-value stores

» column stores

» document stores

» graph databases

58

Page 59: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key-value stores

» Focus on scaling to huge amounts of data

»Designed to handle big loads

»Often: cfr. Amazon Dynamo

» ring partitioning and replication

»Data model: key/value pairs

59

Page 60: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key-value stores

»Redis

»Voldemort

»Tokyo Cabinet

60

Page 61: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis

»REmote DIctionary Server

» http://code.google.com/p/redis/

» vmware

61

Page 62: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Redis Features» persisted memcache, ‘awesome’

» RAM-based + persistable

» key ➙ values: string, list, set

» higher-level ops

» i.e. push/pop and sort for lists

» fast (very)

» configurable durability

» client-managed sharding

62

Page 63: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» http://project-voldemort.com/

» LinkedIn

63

Page 64: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

» persistent

» distributed

» fault-tolerant

» hash table

64

Page 65: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

65

API: GET, PUT,DELETE

Page 66: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Voldemort

66

routing logic moving up the stack,smaller latency

Page 67: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Column stores

»BigTable clones

» Sparseness!

»Data model: columns ➙ column families ➙ ACL

»Datums keyed by: row, column, time, index

» Row-range ➙ tablet ➙ distribution

67

Page 68: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Column stores

»BigTable

»HBase

»Cassandra

68

Page 69: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

BigTable

» http://labs.google.com/papers/bigtable.html

»Google

» layered on top of GFS

69

Page 70: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» http://hadoop.apache.org/hbase/

» StumbleUpon / Adobe / Cloudera

70

Page 71: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase

» sorted» distributed» column-oriented»multi-dimensional» highly-available» high-performance

» persisted» storage system

» adds random access reads and writes atop HDFS

71

Page 72: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase data model

72

»Distributed multi-dimensional sparse map

»Multi-dimensional keys:(table, row, family:column, timestamp) → value

»Keys are arbitrary strings

»Access to row data is atomic

Page 73: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Storage architecture

73

© lars george

Page 74: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

» http://cassandra.apache.org/

»Rackspace / Facebook

74

Page 75: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra

»Key-value store (with added structure)

»Reliability (identical nodes)

» Eventual consistent

»Distributed

»Tunable

» Partitioning

» Replication

75

CA

P

Page 76: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Cassandra applicability

76

FIT

» Scalable reliability (through identical nodes)» Linear scaling»Write throughput» Large Data Sets

NO FIT

» Flexible indexing»Only PK-based

querying»Big Binary Data» 1 Row must fit in

RAM entirely

Page 77: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 77

Page 78: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 78

Page 79: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Document databases

»≈ K/V stores, but DB knows what the Value is

» Lotus Notes heritage

»Data model: collections of K/V collections

»Documents often versioned

79

Page 80: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Document stores

»CouchDB

»MongoDB

»Riak

»MarkLogic

80

Page 81: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» http://couchdb.apache.org/

» couch.io

81

Page 82: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» fault-tolerant

» schema-free

» document-oriented

» accessible via a RESTful HTTP/JSON API

82

Page 83: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB documents

{ “_id”: ”BCCD12CBB”, “_rev”: ”AB764C”, “type”: ”person”, “name”: ”Darth Vader”, “age”: 63, “headware”: [“Helmet”, “Sombrero”], “dark_side”: true }

83

Page 84: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB REST API

»HTTP

» PUT /db/docid

»GET /db/docid

» POST /db/docid

»DELETE /db/docid

84

Page 85: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB Views»MapReduce-based

» Filter, Collate, Aggregate

» Javascript

85

function (Key, Values) { var sum = 0; for(var i in Values) sum += Values[i]; return sum; }

function (doc) { for(var i in doc.tags) emit(doc.tags[i], 1); }

map reduce

Page 86: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

CouchDB

» be careful on semantics

» replication ≠ partioning/sharding !

» distributed database = distributable database

» sharded / distributed deploymentrequires proxy layer

86

Page 87: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» http://www.mongodb.org/

» 10gen

87

Page 88: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

MongoDB

» cfr. CouchDB, really

» except for:

»C++

» performance focus

» runtime queries (mapreduce still available)

» native drivers (no REST/HTTP layering)

» no MVCC: update-in-place

» auto sharding (alpha)

88

Page 89: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Graph databases

» Focus on modeling structure of data - interconnectivity

» Scale, but only to the complexity of data

»Data model: property graphs

89

Page 90: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Graph databases

»Neo4j

»AllegroGraph (RDF)

90

Page 91: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

» http://neo4j.org/

»Neo Technology

91

Page 92: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j» data = nodes + relationships + key/value properties

92

Page 93: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Neo4j

»many language bindings, little remoting

» ‘whiteboard’ friendly

» scaling to complexity (rather than volume?)

» lots of focus on domain modelling

» SPARQL/SAIL impl for triple geeks

»mostly RAM centric (with disk swapping & persistence)

93

Page 94: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Market maturization

Page 95: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Rise of integrators

»Cloudera (H-stack)

»Riptano (Cassandra)

»Cloudant (hosted CouchDB)

» (Outerthought: HBase)

95

Page 96: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

VC capital

»Cloudera

» couch.io

»Neo

» 10gen

»many others

96

Page 97: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Experiences & (h)in(d)sights

Page 98: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 98

the fireside conversations

http://www.flickr.com/photos/52641994@N00/516394238/

Page 99: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

NOSQL applicability

»Horizontal scaling

»Multi-Master

»Data representation

» search of simplicity

» data that doesn’t fit the E-R model(graphs, trees, versions)

» Speed

99

Page 100: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Tool selection

» be careful on the marketeese:smoke and mirrors beware!

»monitor dev list, IRC, Twitter, blogs

»monitor project ‘sponsors’

»mix-and-match: polyglot persistency

»DON’T NOSQL WITHOUT INTERNAL SYS ARCHS & DEV(OP)S !

100

Page 101: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 101

Our Context: Lily

» cloud-scalable content store and search repository

» successor (in many ways) of Daisy

Page 102: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

aptness

complexity

inte

rnet

ente

rpri

seco

rpor

ate

com

mun

ity

NOSQL}S

QL}

102

Page 103: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Lily essentials

» (open source)

» scalable store (Apache HBase)

» and search (Apache SOLR)

» content repository

»α due mid 2010

»www.lilycms.org or @outerthought

103

Page 104: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Choosing a NoSQL store for Lily: step I

» automatic scaling to large data sets

» fault-tolerance

» flexible datamodel with sparse data

» commodity hardware

» efficient random access

» community-based open source

» Java if possible

104

Page 105: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

» need for consistency

» atomic single-row updates

»M/R for index regeneration

105

Choosing a NoSQL store for Lily: step II

Page 106: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

HBase» datamodel with column families and cell

versioning

» ordered tables with range scans

»HDFS for blob storage

»Apache

106

Choosing a NoSQL store for Lily: step III

Page 107: KVIV / NoSQL : the new generation of database servers

lily simplified architectureIIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 107

Lily client

client

client

Lily store node

store node

store node

distributed process coordination

and configuration (ZooKeeper)

query indexerupdate

WAL M/RMQ

documents2ary

indexes

WAL /

MQ

}

}

}

Lily Store Server

HBase Region Server

Hadoop DFS

index replica

replica replica

} SOLR

inverted index

REST

Page 108: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 108

lily architecture

Page 109: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 109

lily distributed architecture

Page 110: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Distribution,durability,and availabilityis everywhere

Page 111: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

When combining store and search, make sure your (search) index doesn’t become the store.

Page 112: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Key lessons learned

112

» unlearning normalization is very difficult

» integrity checking in code = not so bad

» doing joins in code can be very liberating

» importance of keyspace design

» secondary indexing

» data de-normalization = size! (x3)

» schema vs. code flexibility?

» distribution is everywhereand you shouldn’t forget about it

Page 113: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Mis-use cases

» SQL (or ORM) is a prerequisite

»Deeply hierarchical datasets (unless graph)

»Data integrity is listed on DBA job description

»High-security apps (enforced in DB)

»Transactional data (banking)

»Usage is highly unpredictable, combinatorial, or likely to change suddenly

113

Page 114: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Reading material

»Amazon Dynamo, Google BigTable, CAP

» http://nosql.mypopescu.com/

» http://nosql-database.org/

» http://twitter.com/nosqlupdate

» http://highscalability.com/

114

Page 115: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Questions?

115

http://www.flickr.com/photos/leehaywood/4237636853/

Page 116: KVIV / NoSQL : the new generation of database servers

IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 116

» [email protected]

» @stevenn

Thanks for your attention !