Post on 10-May-2015
1
Navigating the NoSQL Landscape
Tugdual “tug” GrallTechnology Evangelist@tgrall
2
What we’ll talk about
• Why RDBMS are not enough?
• What are the different NoSQL taxonomies?
• Which “NoSQL” is right for me?
3
Growth is the New Reality
• Instagram gained nearly 1 million users overnight when they expanded to Android
4
Does it work with RDBMS backend?
Application Scales OutJust add more commodity web servers
Database Scales UpGet a bigger, more complex server
Note – Relational database technology is great for what it is great for, but it is not great for this.
5
Some alternative to scale out your RDBMS
Scale out your RDBMS• Run many SQL Servers• Data are sharded
(most of the time using client code)• Memcached for faster response time
6
Scale Out with RDBMS
Is this a good approach to scale?• Lot of components to deploy• Scale by Hand
– Caching– Sharding/Replication
Learn From Others This Scenario Costs Time and Money. Scaling SQL is potentially disastrous when going Viral: Very risky time for major code changes and migrations...You have no Time when skyrocketing up!
7
Lacking market solutions, users forced to invent
DynamoOctober 2007
CassandraAugust 2008
VoldemortFebruary 2009
BigtableNovember 2006
Very few organizations want to (fewer can) build and maintain database software technology.But every organization building interactive web applications needs this technology.
• No schema required before inserting data• No schema change required to change data format• Auto-sharding without application participation• Distributed queries• Integrated main memory caching• Data synchronization (mobile, multi-datacenter)
8
Other
All of these
Costs
High latency/low performance
Inability to scale out data
Lack of flexibility/rigid schemas
11%
12%
16%
29%
35%
49%
Source: Couchbase NoSQL Survey, December 2011, n=1351
What is the biggest data management problem driving your use of NoSQL in the coming year?
Survey: Schema inflexibility #1 adoption driver
9
NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance.
Application Scales OutJust add more commodity web servers
Database Scales OutJust add more commodity data servers
Scaling out flattens the cost and performance curves.
NoSQL Database Servers
10
NOSQL TAXONOMY
11
The Key-Value Store – the foundation of NoSQL
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
12
Memcached – the NoSQL precursor
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Memcached
In-memory onlyLimited set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: GetStructured Data: Append, Increment
“Simple and fast.”
Challenges: cold cache, disruptive elasticity
13
NoSQL catalog
Key-Value
Memcached
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
14
Redis – More “Structured Data” commands
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
“Data Structures”BlobListSet
Hash…
Redis
Disk Persistence (eventual consistency on the disk)Vast set of operationsBlob Storage: Set, Add, Replace, CASRetrieval: Get, Pub-SubStructured Data: Strings, Hashes, Lists, Sets,Sorted listsExample operations for a SetAdd, count, subtract sets, intersection, is member, atomic move from one set to another
15
NoSQL catalog
Key-Value
Memcached Redis
Data Structure
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
16
Membase – From key-value cache to database
Disk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replacement)Highly-available (data replication)Add or remove capacity to live cluster
“Simple, fast, elastic.”
MembaseKey
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
17
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
18
Couchbase – document-oriented database
Key
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
Auto-shardingDisk-based with built-in memcached cacheCache refill on restartMemcached compatible (drop in replace)Highly-available (data replication)Add or remove capacity to live cluster
When values are JSON objects (“documents”):Create indices, views and query against the views
JSONOBJECT
(“DOCUMENT”)
Couchbase
19
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure Document
Couchbase
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
20
MongoDB – Document-oriented database
Key
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
Disk-based with in-memory “caching”BSON (“binary JSON”) format and wire protocolMaster-slave replicationAuto-shardingValues are BSON objectsSupports ad hoc queries – best when indexed
BSONOBJECT
(“DOCUMENT”)
MongoDB
21
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure Document
MongoDB
Couchbase
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
22
Cassandra – Column overlays
Disk-based systemClustered External caching required for low-latency reads“Columns” are overlaid on the dataNot all rows must have all columnsSupports efficient queries on columnsRestart required when adding columns
CassandraKey
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Column 1
Column 2
Column 3 (not present)
23
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure Document Column
MongoDB
Couchbase Cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
24
Neo4j – Graph database
Disk-based systemExternal caching required for low-latency readsNodes, relationships and pathsProperties on nodesDelete, Insert, Traverse, etc.
Neo4j
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
25
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure Document Column Graph
MongoDB
Couchbase Cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j
26
NoSQL catalog
Key-Value
Memcached
Membase
Redis
Data Structure Document Column Graph
MongoDB
Couchbase Cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j
HBase InfiniteGraph
Coherence
27
28
What about Hadoop?
29
Hadoop : Big Data Swiss Army Knife
• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Key Value data store• Zookeeper : Centralized configuration management• HDFS : Distributed file system
30
So what? Hadoop & Couchbase
click streamevents
profiles, campaigns
profiles, real time campaign statistics
40 milliseconds to respond with the decision.
2
3
1
31
WHICH ONE IS RIGHT FOR ME ?
32
Other
All of these
Costs
High latency/low performance
Inability to scale out data
Lack of flexibility/rigid schemas
11%
12%
16%
29%
35%
49%
Source: Couchbase NoSQL Survey, December 2011, n=1351
What is the biggest data management problem driving your use of NoSQL in the coming year?
Survey: Schema inflexibility #1 adoption driver
33
Lack of Flexibility / Rigid Schema
• Aggregate Data Models (Martin Fowler)– Flexible Data Structure– Optimized Access– Easy to distribute data
o::1001
{uid: ji22jd,customer: Ann,line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ],payment: { type: Amex, expiry: 04/2001,
last5: 12345 }}
http://martinfowler.com/bliki/AggregateOrientedDatabase.html
34
Use Cases
Key Value • Session Management• User Profile/Preferences• Shopping Cart
Document • Event Logging• Content Management • Web Analytics• E-Commerce Application
Columns • Event Logging• Content Management• Counters
Graph • Connected Data / Social Networks• Routing, Dispatch• Recommendations based on Social Graph
Thanks to Martin Fowler
35
Production Environment
US DATA CENTER
EMEA DC
APAC DC
36
Scale out your data
• Modify cluster topology should be simple– Add, Remove, Configure Nodes on a running system
• What is the impact of topology changes?– Sharding, Caching of the data– Availability of the service during cluster changes
• More hardware = More failures– Availability, reliability of the system: failover support
37
Add Nodes
Two servers added to cluster One-click operation
Docs automatically rebalanced across cluster Even distribution of
docs Minimum doc
movement Cluster map updated
App database calls now distributed over larger # of servers
User Configured Replica Count = 1
Read/Write/Update Read/Write/Update
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
38
Fail Over Node
App servers happily accessing docs on Server 3
Server fails App server requests to server 3 fail Cluster detects server has failed
Promotes replicas of docs to active Updates cluster map
App server requests for docs now go to appropriate server
Typically rebalance would follow
User Configured Replica Count = 1
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7 Doc 8
Doc 6
Doc 3
DOC
DOC
DOCDOC
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC
Doc 9
Doc 5DOC
DOC
DOC
Doc 1
Doc 8
Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
39
Performance
• What is my working set?• How cache is working?
– Put your data in RAM
• How to design my data model?– Aggregate Model– Easy to change
40
41
Management and Monitoring
• Do not forget about Operations!– Service Reliability Engineering Team will thank you!
• Manage your cluster easily:– Command Line, Administration Console to change cluster
toplogy
• Monitor “your NoSQL”– Analyze the overall status of your cluster– View and fix bottlenecks
42
Conclusion
• One Size Does Not Fit All
• Overview of the the NoSQL types
• Choose the right solution– Developer Productivity– Large Scale Data
43
QUESTIONS?
44
Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage,
Couchbase effortlessly accommodates changing data management requirements.
Simple. Fast. Elastic. NoSQL.