Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

75
Brisk: Truly peer-to- peer Hadoop High-order bits from Cassandra & Hadoop srisatish ambati @srisatish

description

Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing. (And it seemed enjoyable to the audience in attendance)

Transcript of Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Page 1: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Brisk: Truly peer-to-peer Hadoop High-order bits from Cassandra & Hadoop

srisatish ambati@srisatish

Page 2: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

How many in audience…

Page 3: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

NoSQL -Know your queries.

Page 4: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

points

• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency

– Why facebook is not using Cassandra?• Anti-patterns• Community, Code, Tools• Q&A

Page 5: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy

Page 6: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value

Metrics typically way larger dataset than users.

Page 7: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Why Cassandra?

Page 8: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Operational simplicity peer-to-peer

Page 9: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Operational simplicity peer-to-peer

write

read

Page 10: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Replication: Multi-datacenterMulti-region ec2Multi-availability zones

Page 11: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones

dc1 dc2

reads local

Page 12: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage:

Page 13: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Netflix was running on AWS.

4.21.2011, Amazon Web Services outage:

Page 14: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

fast durable writes. fast reads.

Page 15: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Writes Sequential, append-only.~1-5ms

Page 16: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Writes Sequential, append-only.~1-5ms

On cloud: ephemeral disks rock!

Page 17: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

Page 18: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

ssds: improved read performance!

Page 19: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

amortize Replication over writes Repair over reads

Page 20: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Distribution between nodes Gossip Anti-entropy Failure-detector

L i g h t w e i g h t

Page 21: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Page 22: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Usecase #3: h a d o o pHdfs cassandra hiveLogs stats analytics

Page 23: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

BriskTruly peer-to-peer hadoop.

Page 24: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

mv computationnot data

Page 25: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 26: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

word count in MapReduce

map(String key, String value): // key: document name

// value: document contents for each word w in value: EmitIntermediate(w, "1");

reduce(String key, Iterator values): // key: a word

// values: a list of counts int result = 0; for each v in values: result += ParseInt(v);

Emit(AsString(result));

Page 27: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Parallel Execution View

Page 28: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 29: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 30: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

immutable datawrite-once-read-many!Files once created, written & closed..

not changing!

Page 31: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

jobtracker, tasktrackerhdfs: namenode, datanode

Page 32: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

clouderaamazon: elastic map reducehortonworksmapRbrisk

Page 33: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Tools & Analytics Hive, Pig, RKarmasphereDatameer… dozens of stealth startups!

Page 34: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 35: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

“However, given that there is only a single master, it’s failure is unlikely;”The MapReduce paper, 2004. Sanjay et,al, Google.

Page 36: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Namenode decomposition, explained.

Page 37: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 38: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure

Page 39: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 40: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Use column families (tables)inodesblock

Page 41: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

One kind of nodeno master node, no spofpeer-to-peer

Page 42: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 43: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes

Page 44: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;

logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); }

Page 45: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Hive: SQL-like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)

Page 46: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);

hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');

hive> SELECT count(*), ds FROM invites GROUP BY ds;

http://www.datastax.com/docs/0.8/brisk/about_hive

Page 47: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

ETLReal-time

Cassandra CFsDataCenters

Scale

@srisatish

Page 48: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

@srisatish

Page 49: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

No me in team!

Ben Coverston Ben Werther Brandon Williams Cathy Daw Jackson Chung Jake Luciani Joaquin Casares Jonathan Ellis

Michael Allen Mike Bulman Nate McCall Nick M Bailey Patricio Echague Tyler Hobbs SriSatish Ambati Yewei Zhang

Page 50: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

@srisatish100-node Brisk Cluster on Opscenter

Page 51: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

FUD, acronym: fear, uncertainty, doubt.

Page 52: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS

* N is replication factor. Not to be confused with T=total #of nodes

Page 53: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Tune-able, flexibility.For High Consistency:

read:quorum, write:quorumFor High Availability:

high W, low R.

Page 54: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";

* N is replication factor. Not to be confused with T=total #of nodes

Page 55: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 56: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.

Average NoSQL deployment size: ~6-12 nodes.

Page 57: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Usecase #5: searchApache Solr + Cassandra = Solandra

Other inbox/file Searches:xobni, c3

github.com/tjake/solandra

Page 58: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.

Page 59: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Miscellaneous, Myth: data-loss, partial rows.writes are durable.

Page 60: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Anti-PatternsTransactionsJoinsRead before write

Page 61: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Anti-Patterns for cloudebsjvm, virtualizedsingle region

Page 62: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

A few more good reasons for Cassandra...

Page 63: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

ToolsAMIs, OpsCenter, DataStaxAppDynamics

Getting Started with brisk ami

Netflix just builds AMIs for deployment!

Page 64: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

B e a u t i f u l C 0 d e

= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.

Page 65: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.

compaction.

Page 66: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

CommunityRobust. Rapid. Brisk #Professional support from DataStax.git clone [email protected]:riptano/brisk.git

engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..

Come join the efforts!

Page 67: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra

Page 68: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 69: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Page 70: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Copyright: xkcd

Page 71: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Copyright: plantoys

… more than one way to do it!

Page 72: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Summary -high scale peer-to-peer datastore

best friend for multi-region, multi-zone availability.

Hadoop – HDFS engulfing the DataWorld

Brisk – best of both worlds!

Page 73: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Q&A@srisatish

Page 74: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

OSS, 2008

+

+ +

Brisk

Cassandra

Incubator 2009

Bigtable, 2006Dynamo, 2007

TLP, 2010

Page 75: Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

NoSQL -Know your queries.