The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
-
Upload
big-data-spain -
Category
Technology
-
view
106 -
download
0
description
Transcript of The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
![Page 1: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/1.jpg)
Five questionsfor your NoSQL solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra
![Page 2: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/2.jpg)
©2012 DataStax
how do I
modelmy application?
![Page 3: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/3.jpg)
©2012 DataStax
Popular options• Key/value
• Tabular
• Document
• Graph?
![Page 4: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/4.jpg)
©2012 DataStax
Schema is your friend
{ "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "[email protected]"],}
![Page 5: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/5.jpg)
©2012 DataStax
SQL can be your friend too
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE INDEX ON users(state);
SELECT * FROM usersWHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;
![Page 6: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/6.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
![Page 7: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/7.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
X
![Page 8: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/8.jpg)
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text>);
UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};
Collections
![Page 9: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/9.jpg)
©2012 DataStax
Joins don’t scale• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• ORDER BY?
![Page 10: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/10.jpg)
©2012 DataStax
SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’)
followers
?
tweets
![Page 11: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/11.jpg)
©2012 DataStax
CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));
Clustering in Cassandrauser_id tweet_id _author _body
jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum
... ... ...
driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor
... ... ...
yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet
... ... ...
![Page 12: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/12.jpg)
©2012 DataStax
CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));
Clustering in Cassandrauser_id tweet_id _author _body
jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum
... ... ...
driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor
... ... ...
yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet
... ... ...
SELECT * FROM timelineWHERE user_id = ’driftx’;
![Page 13: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/13.jpg)
©2012 DataStax
how does it
perform?
![Page 14: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/14.jpg)
©2012 DataStax
VLDB benchmark
![Page 15: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/15.jpg)
©2012 DataStax
Locking
![Page 16: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/16.jpg)
©2012 DataStax
Efficiency
![Page 17: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/17.jpg)
©2012 DataStax
UPDATE usersSET email_addresses = email_addresses + {...}WHERE user_id = ‘jbellis’;
![Page 18: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/18.jpg)
©2012 DataStax
Durability
![Page 19: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/19.jpg)
©2012 DataStax
Log-structured storage engine
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
![Page 20: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/20.jpg)
©2012 DataStax
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
k1 c1:v1
k1 c1:v1
![Page 21: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/21.jpg)
©2012 DataStax
Memory
Hard drive
write( , )k1 c2:v2
k1 c1:v1
k1 c1:v1
k1 c2:v2
c2:v2
![Page 22: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/22.jpg)
©2012 DataStax
Memory
Hard drive
k1 c1:v1
k1 c1:v1
k1 c2:v2
c2:v2
write( , )k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
![Page 23: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/23.jpg)
©2012 DataStax
Memory
Hard drive
k1 c1:v1
k1 c1:v4
k1 c2:v2
c2:v2
write( , )k1 c1:v4 c3:v3
k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
k1 c1:v4 c3:v3
c3:v3
![Page 24: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/24.jpg)
©2012 DataStax
Memory
Hard drive
SSTable
flush
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index / BF
cleanup
![Page 25: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/25.jpg)
©2012 DataStax
No random writes
![Page 26: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/26.jpg)
©2012 DataStax
The gory details
![Page 27: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/27.jpg)
©2012 DataStax
Larger than memory datasets
![Page 28: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/28.jpg)
©2012 DataStax
how does it handle
failure?
![Page 29: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/29.jpg)
©2012 DataStax
Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4
router
client
![Page 30: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/30.jpg)
©2012 DataStax
Availability• “High availability implies that a single fault will
not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax
• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram
![Page 31: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/31.jpg)
©2012 DataStax
Fully distributed, no SPOFclient
p1
p1
p1p3
p6
![Page 32: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/32.jpg)
©2012 DataStax
Multiple datacenters
![Page 33: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/33.jpg)
©2012 DataStax
![Page 34: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/34.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
internalresponse
response
1
2
3
4
![Page 35: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/35.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
internalresponse
response
1
2
3
4
![Page 36: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/36.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
1
2
replica fails
timeoutresponse 4
![Page 37: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/37.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
1
2
Xreplica fails
timeoutresponse 4
![Page 38: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/38.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
1
2
4
replica fails
timeoutresponse
hint 3
![Page 39: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/39.jpg)
©2012 DataStax
Self-healing
Client
request
Coordinator
Replica
internalrequest
1
2
4
Xreplica fails
timeoutresponse
hint 3
![Page 40: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/40.jpg)
©2012 DataStax
Other healing modes• AntiEntropyService
• Read repair
![Page 41: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/41.jpg)
©2012 DataStax
Dynamic snitch(dealing with partial failure)
Client Coordinator
40% busy
90% busy
30% busy
![Page 42: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/42.jpg)
©2012 DataStax
how does itscale?
![Page 43: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/43.jpg)
©2012 DataStax
VLDB benchmark
![Page 44: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/44.jpg)
©2012 DataStax
Scaling antipatterns• Metadata servers
• Router bottlenecks
• Overloading existing nodes when adding capacity
![Page 45: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/45.jpg)
©2012 DataStax
how
flexibleis it?
![Page 46: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/46.jpg)
©2012 DataStax
![Page 47: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/47.jpg)
©2012 DataStax
Data model: Realtime
Portfolios
StockHist
stock lastGOOG $95.52AAPL $186.10AMZN $112.98
LiveStocks
stock date priceGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78
user stock sharesjbellis GOOG 80jbellis LNKD 20yukim AMZN 100
![Page 48: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/48.jpg)
©2012 DataStax
Data model: Analytics
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
![Page 49: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/49.jpg)
©2012 DataStax
Data model: Analyticsstock rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.68
10dayreturns
INSERT OVERWRITE TABLE 10dayreturnsSELECT a.stock, b.date as rdate, b.price - a.priceFROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);
![Page 50: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/50.jpg)
©2012 DataStax
Data model: Analytics
portfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19
portfolio_returns
INSERT OVERWRITE TABLE portfolio_returnsSELECT portfolio, rdate, SUM(b.return)FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock)GROUP BY portfolio, rdate;
![Page 51: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/51.jpg)
©2012 DataStax
Data model: Analytics
INSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
![Page 52: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/52.jpg)
©2012 DataStax
![Page 53: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/53.jpg)
©2012 DataStax
Some Cassandra users
![Page 54: The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012](https://reader033.fdocuments.in/reader033/viewer/2022051515/54c666364a7959f3208b45b2/html5/thumbnails/54.jpg)
Questions?
• http://www.flickr.com/photos/26817893@N05/2573006312/
• http://www.flickr.com/photos/rowanbank/7686239548
• http://www.flickr.com/photos/mervtheswerve/6081933265
• http://www.flickr.com/photos/dg_pics/2526208830
• http://www.flickr.com/photos/wainwright/351684037
• http://www.flickr.com/photos/mikeneilson/1606662529
• http://www.flickr.com/photos/sbisson/3852905534
• http://www.flickr.com/photos/breadnbadger/2674928517
Image credits