NoSQL and CouchDB

Post on 30-Oct-2014

2.044 views 3 download

Tags:

description

 

Transcript of NoSQL and CouchDB

Who am I ?

-> My Name: João Cerdeira-> Team Leader-> An Agile enthusiast: Scrum / Kanban / Lean-> A true believer in OpenSource

http://twitter.com/jacerdeira cerdeira@gmail.com

Disclamer

-> I understand your questions, but sometimes I don't have answers

-> I'm not a NoSQL Dogmatic, just an enthusiast about the new ways of storing information

-> I have worked with RDBMS for 12 years

Everyone has their preferences

I don't care if I/you will use SQL or NoSQL. I just want to deliver

better Services/Aplications to

the clients/users.

Concepts & Theory

Scale up vs Scale Down

Performance VS Scalability

Latency VS Throughput

Availability VS Consistency

Brewer's

CAPTheorem

Choose only 2:

Consistency

Availabil ity

Partition Tolerance

At a given time in certain enviroment

Consistency

Availability

PartitionTolerance

RDBMS

NoSQL

Centralized System

In a centralized system (RDBMS) we don't have network partition

P in CAP

So we get:

Availability

Consistency

-> A tomicity

-> Consistency

-> Isolated

-> Durability

Distr ibuted System

In a distr ibuted system we (might) have network partition

P in CAP

So you can only pick one:

Availabil ity

Consistency

CAP in practice

We have only two types of SystemsCP == CA (very similar)AP

So in a network partition we have only one choice

Consistency

Availabil ity

-> Basically Available

-> Soft state

-> Eventually consistent

Eventual Consistency

How to Scale OutRDBMS ?http://capellaniaprimaria.blogspot.com/2011/02/concurso-deportivo-4-pregunta.html

Partition

Partition + Replication

ORM Problems

ORM Problems

What you want ?

Find/read a record/object

ORM Problems

What you want ?

Find/read a record/object

What you get ?

A huge underground complexity

Let Validateour Thoughts

Let Validateour Thoughts

Do we need ACID for all solutions?

Let Validateour Thoughts

Do we need ACID for all solutions?

When is Eventually Consistent enough ?

Let Validateour Thoughts

Do we need ACID for all solutions?

When is Eventually Consistent enough ?

Different solutions require different needs

Why NoSQL Appears ?

Because New Drivers Appears

(business or technical demand)

New Drivers Behind NoSQL

Large amount of dataCommodity hardware

Scale Fast And CheapConstantly changing request (data)

Why RDBMS aren't good enough ?

Why RDBMS aren't good enough ?

Scalling reads in a RDBMS is hard

Why RDBMS aren't good enough ?

Scalling reads in a RDBMS is hard

Scalling writes is impossible

Think again

Do we really need a RDBMS ?

Think again

Do we really need a RDBMS ?

Sometimes !

Think again

Do we really need a RDBMS ?

Sometimes !

But a lot of times we don't !

NoSQL

How did NoSQL start ?

Google: BigtableAmazon: Dynamo

Facebook: CassandraLinkedIn: Valdemort

Yahoo: HBase (hadoop)

OriginsGoogle : “How can we build a DB on top of Google File

System”

Paper: Bigtable A distributed store system for →

structured data, 2006

Amazon: “How can we build a distributed hash table for the data center”

Paper : Dynamo Amazon's highly available key-value →

store

Different Types of NoSQL

Key-Value Stores

Document Databases

Column Databases

Graph Databases

Key-Value Stores

Origin: Amazon's Dynamo paperData model : Collections of KV pairsImplementations: Dynamo, Voldemort, Membase,

Riak, RedisGood For:

- Large amount of data- Scale writes and reads- Fast- Programmer friendly

Document Databases

Origin: Lotus NotesData model : Collections of DocumentsImplementations: CouchDB, MongoDB,

Amazon SimpleDBGood For:

- Human Data Structure - Programmer friendly- Rapid Development- Web friendly- CRUD

Column Databases

Origin: Google's BigTable PaperData model : Column family – each row (at least in

theory) can have different configurationImplementations: BigTable, HBase, CassandraGood For:

- Large amount of data- scale writes like no other- High availability

Graph Databases

Origin: Graph TheoryData model : Nodes and Relations,

both can have KV pairsImplementations: Neo4j, FlockDBGood For:

- resolve graph problems- Fast

Why I 'd choose CouchDB ?

-> Easy to understand documents-> Use standards web technologies-> Simple to install and configure-> Small footprint (works on mobile platforms)

-> Scales well (not for huge amount of data)

-> Replication in the core

CouchDB Main Principals

Document Oriented Database

No rows or columns

Collection of JSON Documents

Schema-Free

In CouchDB HTTP Rules

-> Everything is a HTTP Request-> We are used to know GET and POST-> But there are others:

-> PUT-> DELETE-> COPY

RESTful HTTP API

Why JSON ?

-> Light and text-based data format-> Simple to parse-> Not verbose (comparing to xml)

-> Suitable for javascript frameworks (jquery)

-> Parsers available in almost all programming languages

JSON Example{

make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {

gas_type: "Petrol" ,cubic_capacity: 4600

} ,previous_owners: [

{name: "John Smith" ,mileage: 1000

} ,{

name: "Jane Hunt" ,mileage: 2500

}]

}

JSON Example{

make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {

gas_type: "Petrol" ,cubic_capacity: 4600

} ,previous_owners: [

{name: "John Smith" ,mileage: 1000

} ,{

name: "Jane Hunt" ,mileage: 2500

}]

}

JSON Example{

make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {

gas_type: "Petrol" ,cubic_capacity: 4600

} ,previous_owners: [

{name: "John Smith" ,mileage: 1000

} ,{

name: "Jane Hunt" ,mileage: 2500

}]

}

JSON Example{

make: "Ford" ,model : "Mustang" ,year: 2009,body: "Coupe" ,color: "Red" ,engine: {

gas_type: "Petrol" ,cubic_capacity: 4600

} ,previous_owners: [

{name: "John Smith" ,mileage: 1000

} ,{

name: "Jane Hunt" ,mileage: 2500

}]

}

Example

Create / Delete Database

$ curl http://127.0.0.1:5984

{"couchdb":"Welcome","version":"1.0.1"}

$ curl -X PUT http://127.0.0.1:5984/contacts

{"ok":true}

$ curl -X GET http://127.0.0.1:5984/_all_dbs

["contacts","_users"]

$ curl -X DELETE http://127.0.0.1:5984/contacts

{"ok":true}

Manage Documents

$ curl -X PUT http://127.0.0.1:5984/contacts/joaocerdeira -d '{}'

{"ok":true,"id":"joaocerdeira","rev":"1-967a00dff5e02add41819138abb3284d"}

$ curl -X GET http://127.0.0.1:5984/contacts/joaocerdeira

{"_id":"joaocerdeira","_rev":"1-967a00dff5e02add41819138abb3284d"}

$ curl -X DELETE http://127.0.0.1:5984/contacts/joaocerdeira?rev=1-967a00dff5e02add41819138abb3284d

{"ok":true,"id":"joaocerdeira","rev":"2-eec205a9d413992850a6e32678485900"}

Manage Documents

$ curl -X PUT http://127.0.0.1:5984/contacts/joaocerdeira -d'{"firstName":"Joao","lastName":"Cerdeira","email":"cerdeira@gmail.com"}'

{"ok":true,"id":"joaocerdeira","rev":"1-186fe12b748c40559e8f234d8e566c18"}

$ curl -X GET http://127.0.0.1:5984/contacts/joaocerdeira

{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"cerdeira@gmail.com"}

Copy Documents

$ curl -X COPY http://127.0.0.1:5984/contacts/joaocerdeira -H "Destination: batatinha"

{"id":"batatinha","rev":"1-186fe12b748c40559e8f234d8e566c18"}

$ curl -X GET http://127.0.0.1:5984/contacts/batatinha

{"_id":"batatinha","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"cerdeira@gmail.com"}

Changing Documents

$ curl -X PUT http://127.0.0.1:5984/contacts/batatinha -d '{"_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Clown","lastName":"Batatinha","email":["batatinha@bataton.pt","batatinha@first.to.exit@rtp.pt"], "phone":"93 1234567"}'

{"ok":true,"id":"batatinha","rev":"2-b7079a6d71179b1571652059355d84c3"}

$ curl -X GET http://127.0.0.1:5984/contacts/batatinha

{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Clown","lastName":"Batatinha","email":["batatinha@bataton.pt","batatinha@first.to.exit@rtp.pt"], "phone":"93 1234567"}

MVCC

CouchDB never blocks

Append Mode Only

Designing Documents{

"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",

“doctype”:”contact”

"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”

"emails":[{

“type”:”personal”,“email”:"cerdeira@gmail.com“

},{

“type”:”business”,“email”:"joao.cerdeira@multicert.com“

}],“phones”:[

{“type”:”personal”,“phone”:"93 1234567“

},{

“type”:”business”,“phone”:"93 7654321“

}]

}

Designing Documents{

"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",

“doctype”:”contact”

"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”

"emails":[{

“type”:”personal”,“email”:"cerdeira@gmail.com“

},{

“type”:”business”,“email”:"joao.cerdeira@multicert.com“

}],“phones”:[

{“type”:”personal”,“phone”:"93 1234567“

},{

“type”:”business”,“phone”:"93 7654321“

}]

}

Designing Documents{

"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",

“doctype”:”contact”

"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”

"emails":[{

“type”:”personal”,“email”:"cerdeira@gmail.com“

},{

“type”:”business”,“email”:"joao.cerdeira@multicert.com“

}],“phones”:[

{“type”:”personal”,“phone”:"93 1234567“

},{

“type”:”business”,“phone”:"93 7654321“

}]

}

Designing Documents{

"_id":"joaocerdeira", "_rev":"1-186fe12b748c40559e8f234d8e566c18",

“doctype”:”contact”

"firstName":"Joao","lastName":"Cerdeira",“company”:”MULTICERT”

"emails":[{

“type”:”personal”,“email”:"cerdeira@gmail.com“

},{

“type”:”business”,“email”:"joao.cerdeira@multicert.com“

}],“phones”:[

{“type”:”personal”,“phone”:"93 1234567“

},{

“type”:”business”,“phone”:"93 7654321“

}]

}

Futon Web Interface

Views

Quering CouchDB

Queries in JavaScript

Use Map/Reduce for quering

For simple queries Map/Reduce isn't needed

Don't have joins (but you can have similar)

Simple Views

function(doc){emit(doc._id,doc);

}

function(doc){If (doc.type=='vip'){

emit(doc._id,doc);}

}

List All Documents

List All DocumentsOf type 'vip'

Temp Views

$ curl -X POST -H "Content-type: application/json" http://127.0.0.1:5984/contacts/_temp_view -d '{"map":"function(doc){emit(doc._id,doc);}"}'

{"total_rows":2,"offset":0,"rows":[

{"id":"batatinha","key":"batatinha","value":{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Palhaco","lastName":"Batatinha","email":["batatinha@bataton.pt","batatinha@first.to.exit@rtp.pt"],"phone":"93 1234567"}},{"id":"joaocerdeira","key":"joaocerdeira","value":{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Joao","lastName":"Cerdeira","email":"cerdeira@gmail.com","_deleted_conflicts":["2-eec205a9d413992850a6e32678485900"]}}

Normal Views

{"_id" : "_design/example","views" : {

"foo" : {"map":"function(doc){emit(doc._id,doc);}"

}}

}

$ curl -X PUT -H "Content-type: application/json" http://127.0.0.1:5984/contacts/_design/example -d @design_simple1.json

Normal Views

$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/foo {"total_rows":2,"offset":0,"rows":[

{"id":"batatinha","key":"batatinha","value":{"_id":"batatinha","_rev":"2-b7079a6d71179b1571652059355d84c3","firstName":"Palhaco","lastName":"Batatinha","email":["batatinha@bataton.pt","batatinha@primeiro.a.sair@rtp.pt"],"phone":"93 1234567"}},{"id":"joaocerdeira","key":"joaocerdeira","value":{"_id":"joaocerdeira","_rev":"1-186fe12b748c40559e8f234d8e566c18","firstName":"Jo\u00e3o","lastName":"Cerdeira","email":"cerdeira@gmail.com","_deleted_conflicts":["2-eec205a9d413992850a6e32678485900"]}}]}

Map/ReduceGoogle patent from the paper: http:// labs.google .com/papers/mapreduce.html

image source: http://map-reduce.wikispaces.asu.edu/

Map/Reduce Views

{"_id" : "_design/example","views" : {…...................................

"bar" : {"map":"function(doc){emit(doc,1);}","reduce":"function(keys, values, rereduce) {

return sum(values);}"}}}

$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/bar

{"rows":[{"key":null,"value":7}]}

Map/Reduce Views

{"_id" : "_design/example","views" : {…...................................

""aggreg" : { "map":"function(doc){if(doc.country){emit(doc.country,1);}}", "reduce":"function(keys, values, rereduce) {return sum(values);}" }

$ curl -X GET http://127.0.0.1:5984/contacts/_design/example/_view/aggreg?group=true {"rows":[{"key":"England","value":1},{"key":"Portugal","value":2},{"key":"US","value":2}]}

Replication

Write

Read

Write

ReadRead

Write

ReadRead

Read

One Time Replication

$ curl -H "Content-type: application/json -X POST http://127.0.0.1:5984/_replicate -d '{"source":"contacts","target":"contacts-replica"}'

{"ok":true,"session_id":"00872a440fdda973d6a9a18f2f571bb8","source_last_seq":19,"history": [{"session_id":"00872a440fdda973d6a9a18f2f571bb8","start_time":"Tue, 05 Jul 2011 23:03:32 GMT","end_time":"Tue, 05 Jul 2011 23:03:32 GMT","start_last_seq":0,"end_last_seq":19,"recorded_seq":19,"missing_checked":0,"missing_found":8,"docs_read":12,"docs_written":12,"doc_write_failures":0}]}

Write Write

Continuous Replication

$ curl -vX POST http://127.0.0.1:5984/_replicate-d '{

"source":"http://127.0.0.1:5984/contacts","target":"http://127.0.0.1:5984/contacts-replica","continuous":true

}'

Write Write

Read Write

White WriteRead

Load BalancingCaching

It's HTTP. So use the tools you know-> NGINX-> Squid-> Apache mod_proxy-> …....

Library

Conflict Resolution

http://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.htmhttp://thetowersofjacksonville.com/photogallery/photo12411/real.html

Conflicts Resolution

function(doc) {

if(doc._conflicts) {emit(doc._conflicts, null);}

}

{"total_rows":1,"offset":0,"rows":[{"id":"identifier","key":["2-7c971bb974251ae8541b8fe045964219"],"value":null}]}

$ curl -X DELETE $HOST/db-replica/identifier?rev=2-de0ea16f8621cbac506d23a0fbbde08a

{"ok":true,"id":"identifier","rev":"3-bfe83a296b0445c4d526ef35ef62ac14"}

$ curl -X PUT $HOST/db-replica/identifier-d '{"count":3,"_rev":"2-7c971bb974251ae8541b8fe045964219"}'

{"ok":true,"id":"identifier","rev":"3-5d0319b075a21b095719bc561def7122"}

Library

http://thetowersofjacksonville.com/photogallery/photo12411/real.htm

ClientsJavaScript : Jquery CouchDB Library.Net : RelaxJava : CouchDB4JPerl : CouchDB::Client Net::CouchDbRuby : CouchRestPython : couchdb-pythonScala : scouchdbAnd so much more ...

CouchDBIn

Mobile

http://www.digitaljournal.com/article/261153

Mobile PlatformsSupported

Simply Works

PhoneGAP LawnChair

Own Your Data

I like services like google but what aboutmy privacy ?!

I think CouchDB is the way to own my data

http://thetowersofjacksonville.com/photogallery/photo12411/real.htm

Partition with Cluster

Solutions

“CouchDB is built of the Web to the Web”

– Jacob Kaplan-Moss

We need a MindSetChange

Stop seing all the data in the

world as relational data

Don't trust me . . . or othersTry it !

And the Future…

Probably will be polyglot

Using RDBMS and more than one NoSQL

Database per solution

Success Stories