Big data
-
Upload
kevin-cawley -
Category
Technology
-
view
1.957 -
download
0
Transcript of Big data
Cassandra and Hadoop
Kevin Cawley, Engineer Linksmart
Cassandra been actively using for 2+ years
Hadoop 1 yr experience, sort of
Problem For Today
Running a survey for discovering nosql preference
Options: mongodb, redis, cassandra, couch, hbase, riak, voldermort, dynamodb
We're gonna get billions of responses RDBMS is going to fall over
We need nosql... what the hell is that??
Problem For Today, cont.
Kevin, response=cassandra, [email protected], response=redis, [email protected], response=cassandra, [email protected]
BILLIONS AND BILLIONS OF THESE!!!!
Cassandra
Linear scalability, high availability & performant database
Key Value store
Ring architecture w/ replication 2^217 tokens
Node 1
Node 2
Node 4
Node 3
Cassandra
Keypace
Column Families std, dynamic (mo better)
namepreference
100kevin cawleycassandra
101asher cawleycassandra
102emma cawleyredis
202201
redis['joe','bob']['matthias']
cassandra['kevin', 'asher']['tom']
mongodb['holly']['dan']
assume User keys as utf8;
Super columns
Not so superNice on paper, can be catastrophic in practice
Fanning not the cool refreshing kind
Getting phased out
202201
redis{'joe' => '[email protected], 'bob' => '[email protected]'}{'matthias' => '[email protected]', 'tom' => '[email protected]'}
Secondary Indexes
Indexes on column values
Replacement for not so super, super columnsComposite columns US:colorado:cassandra => kevin
Demo 1
Counters
Yes! Yum. Counters good
We built our own now free
Cassandra is eventually consistent makes this hard
Be clever and you will win
Demo 2
Counters
Counter
cassandra30333
redis22098
mongodb24567
couch12340
...
Hadoop
Distributed processing of large data sets across clusters of computers too good to be true?
Map Reduce
Acronym souphadoop common, hdfs, map reduce
hbase, pig, hive, zookeeper
Map Reduce is @ the heart
Map - processes a key/value pair to generate a set of intermediate key/value pairs
Reduce - function that merges all intermediate values associated with the same intermediate key
Map Reduce Our Example
Kevin, response=cassandra, [email protected], response=redis, [email protected], response=cassandra, [email protected]: cassandra kevin
cassandra asher
redis emma
Reduce:cassandra 2
Redis 1
AND the winner is cassandra w/ 2 votes!!!
Cassandra Hive
Hadoop/Brisk on Cassandra no luck
HiveData warehouse built on top of cassandraFS leveraging map reduce
Query the data using a SQL-like language called HiveQL
Demo 3
Summary
CassandraAwesome for storing massive amounts of data
Dangerous if you don't know what you are doing
Schemaless ironically modelling is extermely imp.
Ad-hoc questions are hard to answer fast
Hadoop/BriskGreat for answering ad hoc questions reasonably fast
What you really want is Cassandra Hadoop RDBMS
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso