Shujaat Hussain. A single column A single row.

25
A NOSQL STUDY: APACHE CASSANDRA Shujaat Hussain
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    4

Transcript of Shujaat Hussain. A single column A single row.

Page 1: Shujaat Hussain. A single column A single row.

A NOSQL STUDY: APACHE CASSANDRA

Shujaat Hussain

Page 2: Shujaat Hussain. A single column A single row.
Page 3: Shujaat Hussain. A single column A single row.
Page 4: Shujaat Hussain. A single column A single row.
Page 5: Shujaat Hussain. A single column A single row.
Page 6: Shujaat Hussain. A single column A single row.
Page 7: Shujaat Hussain. A single column A single row.
Page 8: Shujaat Hussain. A single column A single row.
Page 9: Shujaat Hussain. A single column A single row.

Data Model

A single column

Page 10: Shujaat Hussain. A single column A single row.

Data Model

A single row

Page 11: Shujaat Hussain. A single column A single row.

Data Model

Page 12: Shujaat Hussain. A single column A single row.
Page 13: Shujaat Hussain. A single column A single row.
Page 14: Shujaat Hussain. A single column A single row.
Page 15: Shujaat Hussain. A single column A single row.
Page 16: Shujaat Hussain. A single column A single row.
Page 17: Shujaat Hussain. A single column A single row.

CAP Theorem

Consistency –the system is in a consistent state after an operation

Availability –the system is “always on”, no downtime

Partition tolerance–the system continues to function even when split into disconnected subsets (by a network disruption)

Page 18: Shujaat Hussain. A single column A single row.

Performance vs MySQL w/ 50GB

MySQL 300ms write 350ms read

Cassandra 0.12ms write 15ms read

Page 19: Shujaat Hussain. A single column A single row.

Querying: Overview

You need a key or keys: Single: key=‘a’ Range: key=‘a’ through ’f’

And columns to retrieve: Slice: cols={bar through kite} By name: key=‘b’ cols={bar, cat, llama}

Nothing like SQL “WHERE col=‘faz’”

Page 20: Shujaat Hussain. A single column A single row.

Digg is a social news site that allows people to discover and share content from anywhere on the Internet by submitting stories and links, and voting and commenting on submitted stories and links.

Page 21: Shujaat Hussain. A single column A single row.

Problems Terabytes of data; high transaction rate (reads

dominated) Multiple clusters Management nightmare (high effort, error

prone) Unsatisfied availability requirements

(geographic isolation) Solution

Cassandra as primary data store Datacenter and rack-aware replication

Page 22: Shujaat Hussain. A single column A single row.

Twitter is a social networking and microblogging service that enables its users to send and read tweets, text-based posts of up to 140 characters.

Terabytes of data, ~1,000,000 ops/s

Page 23: Shujaat Hussain. A single column A single row.

Inbox Search 100 TB 160 nodes 1/2 billion writes per day (2yr old number?)

Page 24: Shujaat Hussain. A single column A single row.

Pros

Advantages Massive scalability High availability Lower cost (than competitive solutions at that

scale) (usually) predictable elasticity Schema flexibility, sparse & semi-structured

data

Page 25: Shujaat Hussain. A single column A single row.

Cons

Disadvantages Limited query capabilities (so far) Eventual consistency is not intuitive to

program for Makes client applications more complicated

No standardizatrion Portability might be an issue

Insufficient access control