20100310 Miller Sts

Post on 01-Sep-2014

3.089 views 2 download

Tags:

description

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

Transcript of 20100310 Miller Sts

CouchDBApache

1

Official Beta!

Mike Miller UW/CloudantSTS 3/10/2010

INTRODUCTIONS

• Mike Miller

• not an Apache CouchDB committer

• Particle Physics Faculty at U. of Washington

• Cloudant cofounder, Chief Scientist

• Background: machine learning, analysis, big data, globally distributed systems

2

Mike Miller UW/CloudantSTS 3/10/2010

NEW PROBLEMS => NEW SOLUTIONS

3

Big Data Dynamic, Realtime

( http://www.economist.com/specialreports/PrinterFriendly.cfm?story_id=15557443)

Mike Miller UW/CloudantSTS 3/10/2010

NEW PROBLEMS => NEW SOLUTIONS

4

“old” “new”

Scale this horizontally/linearly

Mike Miller UW/CloudantSTS 3/10/2010

CAP THEOREM

5

consistency availability

partition tolerance

Brewer: Pick two (at a time)

write record

masterless, scalable(e.g., Cloudant’s Couch)

read record

Mike Miller UW/CloudantSTS 3/10/2010

• Name blame: Eric Evans from Rackspace

• Pragmatic response to new class of problems poorly fit to RDBMS:

• heterogeneous/dynamic data, geographic/network separation, scaling/concurrency

• Not an abandonment of 40+ years of DB research

• From home-brew => general solutions

• Best collection of online talks: https://nosqleast.com/2009/#agenda

6

NoSQL: Not Only SQL

Mike Miller UW/CloudantSTS 3/10/2010

TAXONOMY: 1

7

E. Eifrem, no:sql(east)

Mike Miller UW/CloudantSTS 3/10/2010

TAXONOMY: 2

8

E. Eifrem, no:sql(east)

Mike Miller UW/CloudantSTS 3/10/2010

EMIL’S CLASSIFICATION

9

but neglects: concurrency, performance, reliability...

CouchDBApache

10

Closing in on 1.0

Mike Miller UW/CloudantSTS 3/10/2010

CouchDB• easy, flexible, robust, concurrent, replicates

• Document: Schema-free database server

• RESTful JSON API

• views: MapReduce system for generating custom

• replication: Bi-directional incremental

• couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON

11

Mike Miller UW/CloudantSTS 3/10/2010

OF THE WEB

Django may be built for the Web, but CouchDB is built of the Web. I've never

seen software that so completely embraces the philosophies behind HTTP ... this is

what the software of the future looks like.

http://jacobian.org/writing/of-the-web/

Jacob Kaplan-MossOctober 17 2007

12

Mike Miller UW/CloudantSTS 3/10/2010

SHOULD YOU CARE?

The internet happened, and we ignored it. In retrospect that was a mistake.

Paraphrased from Bill Warner (Founder Avid, Wildfire)YCombinator Summer, 2008

13

CouchDB: An enabling technology

Mike Miller UW/CloudantSTS 3/10/2010

FROM INTEREST TO ADOPTION

14

• 100+ production apps

• 3 books being written

• Vibrant, open community

• Active commercial development

• Rapidly maturing

Mike Miller UW/CloudantSTS 3/10/2010

LETS TAKE A LOOK

15

Mike Miller UW/CloudantSTS 3/10/2010

DOCUMENTS

• Documents are JSON Objects

• Underscore-prefixed fields are reserved

• Documents can have binary attachments

• MVCC _rev deterministically generated from doc content16

Mike Miller UW/CloudantSTS 3/10/2010

REST API

• CreatePUT /mydb/mydocid

• RetrieveGET /mydb/mydocid

• UpdatePUT /mydb/mydocid

• DeleteDELETE /mydb/mydocid

17

etag headerplug-n-play cache

Mike Miller UW/CloudantSTS 3/10/2010 18

Mike Miller UW/CloudantSTS 3/10/2010

ROBUST

• Never overwrite previously committed data

• Documents stored in append, only b+trees, ‘copy-on-write’

• In the event of a server crash or power failure, just restart CouchDB -- there is no “repair”

• Take snapshots with “cp”

• Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput

19

Mike Miller UW/CloudantSTS 3/10/2010

CONCURRENT

• Erlang approach: lightweight processes to model the natural concurrency in a problem

• For CouchDB that means one process per TCP connection

• Lock-free architecture; each process works with an MVCC snapshot of a DB.

• Performance degrades gracefully under heavy concurrent load

20

Mike Miller UW/CloudantSTS 3/10/2010

CONCURRENCY IN ACTION: BBC

• WYSIWYG: graceful at scale: load, volume, concurrency21

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: B.Y.O. INDEX

• Custom, persistent representations of document data

• “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting

• Generated using MapReduce functions written in JavaScript (and other languages)

• Each view must have a map function and may also have a reduce function

• Leverages view collation, rich view query API

22

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: INCREMENTAL• Computing a view can be expensive, so CouchDB saves the

result in a B-tree and keeps it up-to-date

• Leaf nodes store map results, inner nodes store reductions of children

http://horicky.blogspot.com/2008/10/couchdb-implementation.html23

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: EXAMPLE

24

key value

Mike Miller UW/CloudantSTS 3/10/2010

DOCUMENTS BY AUTHOR

25

key value

Mike Miller UW/CloudantSTS 3/10/2010

WORD COUNT

26

Mike Miller UW/CloudantSTS 3/10/2010

RELATIONAL

27

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATION

• Peer-based, bi-directional replication using normal HTTP calls

• Mediated by a replicator process which can live on the source, target, or somewhere else entirely

• Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon)

• Applications (_design documents) replicate along with the data

• Ideal for offline applications -- “ground computing”

28

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATIONReplicatorSource Target

What updates do youhave since X?

GET /db/_changes?since=XWhich of these

are you missing?

POST /db/_missing_revs

Target needs theseid@rev documents

[GET /db/doc?open_revs=...]Save these documents

POST /db/_bulk_docs

Record the latest source update seen by the target

29

Mike Miller UW/CloudantSTS 3/10/2010

MASTER-SLAVE

30

Mike Miller UW/CloudantSTS 3/10/2010

MASTER-MASTER

31

Mike Miller UW/CloudantSTS 3/10/2010

ROBUST MULTI-MASTER

32

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

• Replication can introduce conflicts in a multi-master setup

• CouchDB deterministically chooses a winner; the loser is saved as a conflict revision

• Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions

33

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

PUT /a/foo PUT /b/foo

Case 1: save the same update to multiple replication partners

No Conflict

34

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

PUT /a/foo PUT /b/foo

Case 2: save different updates to the same document

Conflict

35

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

No Conflict36

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

37

save different updates to the same document to replication partners

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATION IN ACTION: UBUNTU 9.10

38

replication is a potential game-changer

“When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!”

Elliot Murphy, 10/13/2009

Mike Miller UW/CloudantSTS 3/10/2010

STUFF I DIDN’T TALK ABOUT• Server-side processing: validate, upsert, show, list

• Authentication using HTTP Basic, Cookie, OAuth

• couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1]

• _external indexers, including full-text search powered by CouchDB-Lucene [2]

• The many excellent client libraries, view servers, etc. contributed by the community [3]

• SQLite wrapper for CouchDB [4], and much more...

[1]: http://github.com/couchapp/couchapp[2]: http://github.com/rnewson/couchdb-lucene[3]: http://wiki.apache.org/couchdb/Related_Projects[4]: http://apsw.googlecode.com/svn/publish/couchdb.html

39

Mike Miller UW/CloudantSTS 3/10/2010

TAKE IT FOR A SPIN

40

easy offline:CouchDBX

hosted free:cloudant.com

Mike Miller UW/CloudantSTS 3/10/2010

relax

41