Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and...

16
How Rackspace is using Private Cloud for Big Data Bryan Thompson Big Data Use Case May 8th, 2013

Transcript of Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and...

Page 1: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

How Rackspace is using Private Cloud for Big Data Bryan Thompson

Big Data Use Case

May 8th, 2013

Page 2: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Our Big Data Problem

2

• Consolidate all monitoring data for reporting and analytical purposes.

• Every device (server, switch, SAN, UPS, etc.) and product produces multiple events per second

• Monitoring tens of thousands of devices (both physical and virtual)

• This adds up to terabytes of data per day, and growing…

Page 3: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Current Environment

3

• Dedicated Relational Database systems • Loaded nightly • Multiple BI Tools • 2450+ Users • To scale would be cost and time prohibitive:

• Cost of DB licenses • Cost of Hardware • Time to procure and configure servers • Concerns with performance • Heavy DBA work

Page 4: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

What our sponsors and end-users want…

• Plug in and start analyzing data • Act at the speed of the business • Maintain optimal query performance • Costs to store and analyze Data

Volumes • Abstract technical nuances of multiple

big data technologies • Use your preferred BI tool • High Availability

Page 5: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

To The Drawing Board!!!

5

• What we need is the ability to: • Host ever growing data volumes • Handle streaming data and hourly updates of

metrics with sub-second performance. • Rapid Scalability and High Availability • Leverage Open Source technologies • Ability to leverage multiple big data

technologies

Page 6: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM 6

The Analytic Compute Grid (ACG) Key components of the ACG

OpenStack can provide elasticity capabilities

Big Data Technologies (v1 Cassandra)

Advanced Hashing to run parallel clusters

Rule-based elasticity engine integrated w/ OpenStack

ANSI-SQL API w/ Extensions – ability to “plug in”

Page 7: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

ACG Architecture

7

Infra

stru

ctur

e Pl

atfo

rm

Big

Dat

a Te

chno

logi

es

Dat

a Pr

esen

tatio

n

Rackspace Private Cloud

Host OS & Hypervisors

Commodity Hardware

Cas

sand

ra

Post

greS

QL

Had

oop

Oth

ers

in

Futu

re

ACG Engine Monitor capacity and

auto-provision/de-provision infrastructure resources

Route request to right analytics

technology

APIs

BI Tools Data Mining Data Integration SQL

Mon

goD

B

Page 8: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Rule-Based Elasticity

Page 9: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Rule-Based Elasticity

Page 10: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Our OpenStack Environment at Launch

10

• Deployed on Rackspace Private Cloud • Can run multiple node configurations • New node is provisioned in seconds!!! • Operating System – Ubuntu • Big Data Technology – Cassandra • 32 Node Cluster – with capacity to grow

Page 11: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Performance Comparison

11

• SQL Server Environment (Dedicated Environment)

24 CPU 256 GB RAM

Availability Calculation against 1.5 Billion row sample – 132 hours (5.5 days)

Page 12: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Performance Comparison

12

• RPC OpenStack Environment – (virtual machines)

8 CPU 32 GB RAM

Availability Calculation against 1.5 Billion row sample – 3.2 hours!!!

Page 13: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM 13

ACG Features

• ACG is a Big Data Management System • Parallel engine supports multiple clusters • Highly configurable Rules Engine

• Time based • System Based

• ANSI SQL Compliant API with extensions • High Compression - Cassandra • Reusable Bulk-Loader • Can integrate with current ETL tool

Page 14: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

• PostgreSQL (launching this month) • Hadoop • Allow for seamless cross platform analysis • Migrate off legacy environment • Dev/QA Environments • Next big “big data” technology ?

The Road Ahead

14

Page 15: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

RACKSPACE® HOSTING | WWW.RACKSPACE.COM

Questions?

15

Page 16: Big Data Use CaseOur Big Data Problem 2 • Consolidate all monitoring data for reporting and analytical purposes. • Every device (server, switch, SAN, UPS, etc.) and product produces

16

RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM

RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM