20100310 Miller Sts
-
Upload
mlmilleratmit -
Category
Technology
-
view
3.089 -
download
2
description
Transcript of 20100310 Miller Sts
CouchDBApache
1
Official Beta!
Mike Miller UW/CloudantSTS 3/10/2010
INTRODUCTIONS
• Mike Miller
• not an Apache CouchDB committer
• Particle Physics Faculty at U. of Washington
• Cloudant cofounder, Chief Scientist
• Background: machine learning, analysis, big data, globally distributed systems
2
Mike Miller UW/CloudantSTS 3/10/2010
NEW PROBLEMS => NEW SOLUTIONS
3
Big Data Dynamic, Realtime
( http://www.economist.com/specialreports/PrinterFriendly.cfm?story_id=15557443)
Mike Miller UW/CloudantSTS 3/10/2010
NEW PROBLEMS => NEW SOLUTIONS
4
“old” “new”
Scale this horizontally/linearly
Mike Miller UW/CloudantSTS 3/10/2010
CAP THEOREM
5
consistency availability
partition tolerance
Brewer: Pick two (at a time)
write record
masterless, scalable(e.g., Cloudant’s Couch)
read record
Mike Miller UW/CloudantSTS 3/10/2010
• Name blame: Eric Evans from Rackspace
• Pragmatic response to new class of problems poorly fit to RDBMS:
• heterogeneous/dynamic data, geographic/network separation, scaling/concurrency
• Not an abandonment of 40+ years of DB research
• From home-brew => general solutions
• Best collection of online talks: https://nosqleast.com/2009/#agenda
6
NoSQL: Not Only SQL
Mike Miller UW/CloudantSTS 3/10/2010
TAXONOMY: 1
7
E. Eifrem, no:sql(east)
Mike Miller UW/CloudantSTS 3/10/2010
TAXONOMY: 2
8
E. Eifrem, no:sql(east)
Mike Miller UW/CloudantSTS 3/10/2010
EMIL’S CLASSIFICATION
9
but neglects: concurrency, performance, reliability...
CouchDBApache
10
Closing in on 1.0
Mike Miller UW/CloudantSTS 3/10/2010
CouchDB• easy, flexible, robust, concurrent, replicates
• Document: Schema-free database server
• RESTful JSON API
• views: MapReduce system for generating custom
• replication: Bi-directional incremental
• couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON
11
Mike Miller UW/CloudantSTS 3/10/2010
OF THE WEB
Django may be built for the Web, but CouchDB is built of the Web. I've never
seen software that so completely embraces the philosophies behind HTTP ... this is
what the software of the future looks like.
http://jacobian.org/writing/of-the-web/
Jacob Kaplan-MossOctober 17 2007
12
Mike Miller UW/CloudantSTS 3/10/2010
SHOULD YOU CARE?
The internet happened, and we ignored it. In retrospect that was a mistake.
Paraphrased from Bill Warner (Founder Avid, Wildfire)YCombinator Summer, 2008
13
CouchDB: An enabling technology
Mike Miller UW/CloudantSTS 3/10/2010
FROM INTEREST TO ADOPTION
14
• 100+ production apps
• 3 books being written
• Vibrant, open community
• Active commercial development
• Rapidly maturing
Mike Miller UW/CloudantSTS 3/10/2010
LETS TAKE A LOOK
15
Mike Miller UW/CloudantSTS 3/10/2010
DOCUMENTS
• Documents are JSON Objects
• Underscore-prefixed fields are reserved
• Documents can have binary attachments
• MVCC _rev deterministically generated from doc content16
Mike Miller UW/CloudantSTS 3/10/2010
REST API
• CreatePUT /mydb/mydocid
• RetrieveGET /mydb/mydocid
• UpdatePUT /mydb/mydocid
• DeleteDELETE /mydb/mydocid
17
etag headerplug-n-play cache
Mike Miller UW/CloudantSTS 3/10/2010 18
Mike Miller UW/CloudantSTS 3/10/2010
ROBUST
• Never overwrite previously committed data
• Documents stored in append, only b+trees, ‘copy-on-write’
• In the event of a server crash or power failure, just restart CouchDB -- there is no “repair”
• Take snapshots with “cp”
• Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput
19
Mike Miller UW/CloudantSTS 3/10/2010
CONCURRENT
• Erlang approach: lightweight processes to model the natural concurrency in a problem
• For CouchDB that means one process per TCP connection
• Lock-free architecture; each process works with an MVCC snapshot of a DB.
• Performance degrades gracefully under heavy concurrent load
20
Mike Miller UW/CloudantSTS 3/10/2010
CONCURRENCY IN ACTION: BBC
• WYSIWYG: graceful at scale: load, volume, concurrency21
Mike Miller UW/CloudantSTS 3/10/2010
VIEWS: B.Y.O. INDEX
• Custom, persistent representations of document data
• “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting
• Generated using MapReduce functions written in JavaScript (and other languages)
• Each view must have a map function and may also have a reduce function
• Leverages view collation, rich view query API
22
Mike Miller UW/CloudantSTS 3/10/2010
VIEWS: INCREMENTAL• Computing a view can be expensive, so CouchDB saves the
result in a B-tree and keeps it up-to-date
• Leaf nodes store map results, inner nodes store reductions of children
http://horicky.blogspot.com/2008/10/couchdb-implementation.html23
Mike Miller UW/CloudantSTS 3/10/2010
VIEWS: EXAMPLE
24
key value
Mike Miller UW/CloudantSTS 3/10/2010
DOCUMENTS BY AUTHOR
25
key value
Mike Miller UW/CloudantSTS 3/10/2010
WORD COUNT
26
Mike Miller UW/CloudantSTS 3/10/2010
RELATIONAL
27
Mike Miller UW/CloudantSTS 3/10/2010
REPLICATION
• Peer-based, bi-directional replication using normal HTTP calls
• Mediated by a replicator process which can live on the source, target, or somewhere else entirely
• Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon)
• Applications (_design documents) replicate along with the data
• Ideal for offline applications -- “ground computing”
28
Mike Miller UW/CloudantSTS 3/10/2010
REPLICATIONReplicatorSource Target
What updates do youhave since X?
GET /db/_changes?since=XWhich of these
are you missing?
POST /db/_missing_revs
Target needs theseid@rev documents
[GET /db/doc?open_revs=...]Save these documents
POST /db/_bulk_docs
Record the latest source update seen by the target
29
Mike Miller UW/CloudantSTS 3/10/2010
MASTER-SLAVE
30
Mike Miller UW/CloudantSTS 3/10/2010
MASTER-MASTER
31
Mike Miller UW/CloudantSTS 3/10/2010
ROBUST MULTI-MASTER
32
Mike Miller UW/CloudantSTS 3/10/2010
CONFLICT RESOLUTION
• Replication can introduce conflicts in a multi-master setup
• CouchDB deterministically chooses a winner; the loser is saved as a conflict revision
• Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions
33
Mike Miller UW/CloudantSTS 3/10/2010
CONFLICT RESOLUTION
replicate
PUT /a/foo PUT /b/foo
Case 1: save the same update to multiple replication partners
No Conflict
34
Mike Miller UW/CloudantSTS 3/10/2010
CONFLICT RESOLUTION
replicate
PUT /a/foo PUT /b/foo
Case 2: save different updates to the same document
Conflict
35
Mike Miller UW/CloudantSTS 3/10/2010
CONFLICT RESOLUTION
replicate
No Conflict36
Mike Miller UW/CloudantSTS 3/10/2010
CONFLICT RESOLUTION
37
save different updates to the same document to replication partners
Mike Miller UW/CloudantSTS 3/10/2010
REPLICATION IN ACTION: UBUNTU 9.10
38
replication is a potential game-changer
“When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!”
Elliot Murphy, 10/13/2009
Mike Miller UW/CloudantSTS 3/10/2010
STUFF I DIDN’T TALK ABOUT• Server-side processing: validate, upsert, show, list
• Authentication using HTTP Basic, Cookie, OAuth
• couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1]
• _external indexers, including full-text search powered by CouchDB-Lucene [2]
• The many excellent client libraries, view servers, etc. contributed by the community [3]
• SQLite wrapper for CouchDB [4], and much more...
[1]: http://github.com/couchapp/couchapp[2]: http://github.com/rnewson/couchdb-lucene[3]: http://wiki.apache.org/couchdb/Related_Projects[4]: http://apsw.googlecode.com/svn/publish/couchdb.html
39
Mike Miller UW/CloudantSTS 3/10/2010
TAKE IT FOR A SPIN
40
easy offline:CouchDBX
hosted free:cloudant.com
Mike Miller UW/CloudantSTS 3/10/2010
relax
41