Big data

Cassandra and Hadoop

Kevin Cawley, Engineer Linksmart

Cassandra been actively using for 2+ years

Hadoop 1 yr experience, sort of

Problem For Today

Running a survey for discovering nosql preference

Options: mongodb, redis, cassandra, couch, hbase, riak, voldermort, dynamodb

We're gonna get billions of responses RDBMS is going to fall over

We need nosql... what the hell is that??

Problem For Today, cont.

Kevin, response=cassandra, [email protected], response=redis, [email protected], response=cassandra, [email protected]

BILLIONS AND BILLIONS OF THESE!!!!

Cassandra

Linear scalability, high availability & performant database

Key Value store

Ring architecture w/ replication 2^217 tokens

Node 1

Node 2

Node 4

Node 3

Cassandra

Keypace

Column Families std, dynamic (mo better)

namepreference

100kevin cawleycassandra

101asher cawleycassandra

102emma cawleyredis

202201

redis['joe','bob']['matthias']

cassandra['kevin', 'asher']['tom']

mongodb['holly']['dan']

assume User keys as utf8;

Super columns

Not so superNice on paper, can be catastrophic in practice

Fanning not the cool refreshing kind

Getting phased out

202201

redis{'joe' => '[email protected], 'bob' => '[email protected]'}{'matthias' => '[email protected]', 'tom' => '[email protected]'}

Secondary Indexes

Indexes on column values

Replacement for not so super, super columnsComposite columns US:colorado:cassandra => kevin

Demo 1

Counters

Yes! Yum. Counters good

We built our own now free

Cassandra is eventually consistent makes this hard

Be clever and you will win

Demo 2

Counters

Counter

cassandra30333

redis22098

mongodb24567

couch12340

...

Hadoop

Distributed processing of large data sets across clusters of computers too good to be true?

Map Reduce

Acronym souphadoop common, hdfs, map reduce

hbase, pig, hive, zookeeper

Map Reduce is @ the heart

Map - processes a key/value pair to generate a set of intermediate key/value pairs

Reduce - function that merges all intermediate values associated with the same intermediate key

Map Reduce Our Example

Kevin, response=cassandra, [email protected], response=redis, [email protected], response=cassandra, [email protected]: cassandra kevin

cassandra asher

redis emma

Reduce:cassandra 2

Redis 1

AND the winner is cassandra w/ 2 votes!!!

Cassandra Hive

Hadoop/Brisk on Cassandra no luck

HiveData warehouse built on top of cassandraFS leveraging map reduce

Query the data using a SQL-like language called HiveQL

Demo 3

Summary

CassandraAwesome for storing massive amounts of data

Dangerous if you don't know what you are doing

Schemaless ironically modelling is extermely imp.

Ad-hoc questions are hard to answer fast

Hadoop/BriskGreat for answering ad hoc questions reasonably fast

What you really want is Cassandra Hadoop RDBMS

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

Big data

Technology

Transcript of Big data