20100310 Miller Sts

41
CouchDB Apache 1 Official Beta!

description

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

Transcript of 20100310 Miller Sts

Page 1: 20100310 Miller Sts

CouchDBApache

1

Official Beta!

Page 2: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

INTRODUCTIONS

• Mike Miller

• not an Apache CouchDB committer

• Particle Physics Faculty at U. of Washington

• Cloudant cofounder, Chief Scientist

• Background: machine learning, analysis, big data, globally distributed systems

2

Page 3: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

NEW PROBLEMS => NEW SOLUTIONS

3

Big Data Dynamic, Realtime

( http://www.economist.com/specialreports/PrinterFriendly.cfm?story_id=15557443)

Page 4: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

NEW PROBLEMS => NEW SOLUTIONS

4

“old” “new”

Scale this horizontally/linearly

Page 5: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CAP THEOREM

5

consistency availability

partition tolerance

Brewer: Pick two (at a time)

write record

masterless, scalable(e.g., Cloudant’s Couch)

read record

Page 6: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

• Name blame: Eric Evans from Rackspace

• Pragmatic response to new class of problems poorly fit to RDBMS:

• heterogeneous/dynamic data, geographic/network separation, scaling/concurrency

• Not an abandonment of 40+ years of DB research

• From home-brew => general solutions

• Best collection of online talks: https://nosqleast.com/2009/#agenda

6

NoSQL: Not Only SQL

Page 7: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

TAXONOMY: 1

7

E. Eifrem, no:sql(east)

Page 8: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

TAXONOMY: 2

8

E. Eifrem, no:sql(east)

Page 9: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

EMIL’S CLASSIFICATION

9

but neglects: concurrency, performance, reliability...

Page 10: 20100310 Miller Sts

CouchDBApache

10

Closing in on 1.0

Page 11: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CouchDB• easy, flexible, robust, concurrent, replicates

• Document: Schema-free database server

• RESTful JSON API

• views: MapReduce system for generating custom

• replication: Bi-directional incremental

• couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON

11

Page 12: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

OF THE WEB

Django may be built for the Web, but CouchDB is built of the Web. I've never

seen software that so completely embraces the philosophies behind HTTP ... this is

what the software of the future looks like.

http://jacobian.org/writing/of-the-web/

Jacob Kaplan-MossOctober 17 2007

12

Page 13: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

SHOULD YOU CARE?

The internet happened, and we ignored it. In retrospect that was a mistake.

Paraphrased from Bill Warner (Founder Avid, Wildfire)YCombinator Summer, 2008

13

CouchDB: An enabling technology

Page 14: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

FROM INTEREST TO ADOPTION

14

• 100+ production apps

• 3 books being written

• Vibrant, open community

• Active commercial development

• Rapidly maturing

Page 15: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

LETS TAKE A LOOK

15

Page 16: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

DOCUMENTS

• Documents are JSON Objects

• Underscore-prefixed fields are reserved

• Documents can have binary attachments

• MVCC _rev deterministically generated from doc content16

Page 17: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

REST API

• CreatePUT /mydb/mydocid

• RetrieveGET /mydb/mydocid

• UpdatePUT /mydb/mydocid

• DeleteDELETE /mydb/mydocid

17

etag headerplug-n-play cache

Page 18: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010 18

Page 19: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

ROBUST

• Never overwrite previously committed data

• Documents stored in append, only b+trees, ‘copy-on-write’

• In the event of a server crash or power failure, just restart CouchDB -- there is no “repair”

• Take snapshots with “cp”

• Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput

19

Page 20: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONCURRENT

• Erlang approach: lightweight processes to model the natural concurrency in a problem

• For CouchDB that means one process per TCP connection

• Lock-free architecture; each process works with an MVCC snapshot of a DB.

• Performance degrades gracefully under heavy concurrent load

20

Page 21: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONCURRENCY IN ACTION: BBC

• WYSIWYG: graceful at scale: load, volume, concurrency21

Page 22: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: B.Y.O. INDEX

• Custom, persistent representations of document data

• “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting

• Generated using MapReduce functions written in JavaScript (and other languages)

• Each view must have a map function and may also have a reduce function

• Leverages view collation, rich view query API

22

Page 23: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: INCREMENTAL• Computing a view can be expensive, so CouchDB saves the

result in a B-tree and keeps it up-to-date

• Leaf nodes store map results, inner nodes store reductions of children

http://horicky.blogspot.com/2008/10/couchdb-implementation.html23

Page 24: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

VIEWS: EXAMPLE

24

key value

Page 25: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

DOCUMENTS BY AUTHOR

25

key value

Page 26: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

WORD COUNT

26

Page 27: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

RELATIONAL

27

Page 28: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATION

• Peer-based, bi-directional replication using normal HTTP calls

• Mediated by a replicator process which can live on the source, target, or somewhere else entirely

• Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon)

• Applications (_design documents) replicate along with the data

• Ideal for offline applications -- “ground computing”

28

Page 29: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATIONReplicatorSource Target

What updates do youhave since X?

GET /db/_changes?since=XWhich of these

are you missing?

POST /db/_missing_revs

Target needs theseid@rev documents

[GET /db/doc?open_revs=...]Save these documents

POST /db/_bulk_docs

Record the latest source update seen by the target

29

Page 30: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

MASTER-SLAVE

30

Page 31: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

MASTER-MASTER

31

Page 32: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

ROBUST MULTI-MASTER

32

Page 33: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

• Replication can introduce conflicts in a multi-master setup

• CouchDB deterministically chooses a winner; the loser is saved as a conflict revision

• Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions

33

Page 34: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

PUT /a/foo PUT /b/foo

Case 1: save the same update to multiple replication partners

No Conflict

34

Page 35: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

PUT /a/foo PUT /b/foo

Case 2: save different updates to the same document

Conflict

35

Page 36: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

replicate

No Conflict36

Page 37: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

CONFLICT RESOLUTION

37

save different updates to the same document to replication partners

Page 38: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

REPLICATION IN ACTION: UBUNTU 9.10

38

replication is a potential game-changer

“When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!”

Elliot Murphy, 10/13/2009

Page 39: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

STUFF I DIDN’T TALK ABOUT• Server-side processing: validate, upsert, show, list

• Authentication using HTTP Basic, Cookie, OAuth

• couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1]

• _external indexers, including full-text search powered by CouchDB-Lucene [2]

• The many excellent client libraries, view servers, etc. contributed by the community [3]

• SQLite wrapper for CouchDB [4], and much more...

[1]: http://github.com/couchapp/couchapp[2]: http://github.com/rnewson/couchdb-lucene[3]: http://wiki.apache.org/couchdb/Related_Projects[4]: http://apsw.googlecode.com/svn/publish/couchdb.html

39

Page 40: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

TAKE IT FOR A SPIN

40

easy offline:CouchDBX

hosted free:cloudant.com

Page 41: 20100310 Miller Sts

Mike Miller UW/CloudantSTS 3/10/2010

relax

41