Apache Cassandra: NoSQL in the enterprise

32
Apache Cassandra: NoSQL in the Enterprise, today Jonathan Ellis CTO @spyced

description

 

Transcript of Apache Cassandra: NoSQL in the enterprise

Page 1: Apache Cassandra: NoSQL in the enterprise

Apache Cassandra:NoSQL in theEnterprise, today

Jonathan Ellis CTO

@spyced

Page 2: Apache Cassandra: NoSQL in the enterprise

Cassandra Job Trends (indeed.com)

Page 3: Apache Cassandra: NoSQL in the enterprise

“Big Data” trend

Page 4: Apache Cassandra: NoSQL in the enterprise

Why Big Data Matters

Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.

Page 5: Apache Cassandra: NoSQL in the enterprise

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 6: Apache Cassandra: NoSQL in the enterprise

✤ Financial✤ Social Media✤ Advertising✤ Entertainment✤ Energy✤ E-tail✤ Health care✤ Government

Some users

Page 7: Apache Cassandra: NoSQL in the enterprise

Common use cases

✤ Time series data✤ Messaging✤ Ad tracking✤ Data mining✤ User activity streams✤ User sessions✤ Anything requiring:

Scalable + performant + highly available

Page 8: Apache Cassandra: NoSQL in the enterprise

Why Cassandra?

✤ Fully distributed, no SPOF✤ Multi-master, multi-DC✤ Linearly scalable✤ Larger-than-memory datasets✤ Best-in-class performance (not just writes!)✤ Fully durable✤ Integrated caching✤ Tuneable consistency

Page 9: Apache Cassandra: NoSQL in the enterprise

Classing partitioning with SPOF

master

slave

slave

partition 1 partition 2 partition 3 partition 4

request router

Page 10: Apache Cassandra: NoSQL in the enterprise

Fully distributed, no SPOF

client

p1

p1

p1p3

p6

Page 11: Apache Cassandra: NoSQL in the enterprise
Page 12: Apache Cassandra: NoSQL in the enterprise

Performance summary

Page 13: Apache Cassandra: NoSQL in the enterprise
Page 14: Apache Cassandra: NoSQL in the enterprise

“With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”

Adrian Cockcroft, Cloud Architect

Page 15: Apache Cassandra: NoSQL in the enterprise

Netflix on Cassandra

✤ Could not build datacenters fast enough✤ Made decision to go to cloud (AWS)✤ Applications include Netflix’s subscriber system, AB

testing, and viewing history service

✤ Over a year in, Netflix finds Cassandra to be✤ Fast✤ Cost-effective✤ Scalable✤ Flexible✤ Reliable: no SPOF

Page 16: Apache Cassandra: NoSQL in the enterprise

“Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.”

Matt Conway, VP Engineering

Page 17: Apache Cassandra: NoSQL in the enterprise

Backupify on Cassandra

✤ Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger

✤ Cassandra findings:✤ Solved scaling, allowing engineers to focus on their business✤ DataStax OpsCenter made it easy to monitor the health and

performance of their cluster✤ Reliable, redundant and scalable data storage helped

eliminate down-time✤ Ability to offer both backup and storage, but also analysis

Page 18: Apache Cassandra: NoSQL in the enterprise

“You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.”

Harry Robertson, Tech Lead

Page 19: Apache Cassandra: NoSQL in the enterprise

Ooyala on Cassandra

✤ Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online

✤ Cassandra findings:✤ Classic “Big Data” problem did not require re-architecting✤ Delivered ability to respond to increasingly sophisticated

analytic needs of customers✤ Developers spend time building application features, not

figuring out how to scale

Page 20: Apache Cassandra: NoSQL in the enterprise

“Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.”

Kyle Ambroff, Sr. Engineer

Page 21: Apache Cassandra: NoSQL in the enterprise

Formspring on Cassandra

✤ Users of Formspring engage with and learn more about each other by asking and responding to questions. Close to 4B responses in the system and 30M unique users

✤ Cassandra experience✤ No sharding needed – just add nodes to scale✤ Performance – the popular users with many followers saw no

speed reduction. No more memcached!✤ Flexibility of a schema-optional architecture is very developer

friendly

Page 22: Apache Cassandra: NoSQL in the enterprise

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

Page 23: Apache Cassandra: NoSQL in the enterprise

The evolution of Analytics

Analytics + Realtime

Page 24: Apache Cassandra: NoSQL in the enterprise

The evolution of Analytics

Analytics Realtime

replication

Page 25: Apache Cassandra: NoSQL in the enterprise

The evolution of Analytics

ETL

Page 26: Apache Cassandra: NoSQL in the enterprise

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

DatastaxEnterprise

Page 27: Apache Cassandra: NoSQL in the enterprise

DataStax Enterprise re-unifiesrealtime and analytics

Page 28: Apache Cassandra: NoSQL in the enterprise
Page 29: Apache Cassandra: NoSQL in the enterprise

Portfolio Demo dataflow

Portfolios

Historical Prices

Intermediate Results

Largest loss

Portfolios

Live Prices for today

Largest loss

Page 30: Apache Cassandra: NoSQL in the enterprise

Operations

✤ “Vanilla” Hadoop✤ 8+ services to setup, monitor, backup, and recover

(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)

✤ Single points of failure✤ Can't separate online and offline processing

✤ DataStax Enterprise✤ Single, simplified component✤ Self-organizes based on workload✤ Peer to peer✤ JobTracker failover

Page 31: Apache Cassandra: NoSQL in the enterprise

Managing & Monitoring Big Data

✤ DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations

Page 32: Apache Cassandra: NoSQL in the enterprise

Questions?