Real-time Analytics with Storm and Cassandra - Sample Chapter
Real-time data analytics with Cassandra at iland
-
Upload
julien-anguenot -
Category
Data & Analytics
-
view
469 -
download
0
description
Transcript of Real-time data analytics with Cassandra at iland
Real-time data analytics with C* at iland
Cassandra Tech Day @ Houston, TX!October 14th, 2014!
Julien Anguenot (@anguenot) !
Agenda
• iland, iland platform and why C*?!• real-time data & domain constraints!• quick overview of iland C* deployment
iland platform
Julien Anguenot @ Cassandra Tech Day Houston 2014
19 8
8 3
Years delivering Years of delivering IT
ServicesYears cloud
infrastructure & disaster recovery expertise
IISO 27001 & SSAE16 global
data centers
Cloud-based Specializations:
Production; Test & Dev; DR
iland Internet Solutions
44
Julien Anguenot @ Cassandra Tech Day Houston 2014
iland platform essentially
• data warehouse running across multiple data-centers!• monitoring (resource consumption / performance)!• billing!• alerting!• predictive analytics!• cloud management!• cloud services (backups, DR, etc.)!• desktop and mobile applications (iland portal app)
5
Julien Anguenot @ Cassandra Tech Day Houston 2014 6
Julien Anguenot @ Cassandra Tech Day Houston 2014 7
Julien Anguenot @ Cassandra Tech Day Houston 2014 8
Julien Anguenot @ Cassandra Tech Day Houston 2014 9
Julien Anguenot @ Cassandra Tech Day Houston 2014 10
Julien Anguenot @ Cassandra Tech Day Houston 2014
Why did we choose Cassandra?
• MySQL and MongoDB attempts been big fails!• write latency (constant-time writes)!• distributed nature (multi-data centers)!• scalability, reliability, performance, availability!• sharding makes things complicated!• no master - slave means no SPOF!• simplicity!
11
real-time data & domain constraints
Julien Anguenot @ Cassandra Tech Day Houston 2014
Constraints
• write latency!• precision (used for billing)!• availability!• multi-data center!• tens of thousands of VMs !• agent-less!• pull (vs push)
13
Julien Anguenot @ Cassandra Tech Day Houston 2014
Pipeline
• collection of real-time data!• store!• aggregation!• rollups!• processing!• alerting!• reporting!• querying
14
Julien Anguenot @ Cassandra Tech Day Houston 2014
Data sources
• VMware infrastructure stack (each location)!• vCloud Director, vCenter, vShield Manager!
• network statistics!• AMQP, Syslog-ng, Web Services!• Salesforce!• Veeam, Zerto, more …
15
Julien Anguenot @ Cassandra Tech Day Houston 2014
Real-time metrics!
• 20 seconds samples!• dozens of performance counters per entity (VM, VNIC, etc.)!• time series!
• (timestamp, value)!• metadata!
• entity (network, vm, etc.)!• unit, etc.!
• then 1min, 1h, 1d, 1w and 1m historical rollups
16
Overview of iland C* deployment
Julien Anguenot @ Cassandra Tech Day Houston 2014
C* iland’s cluster (1/2)
• one (1) C* cluster !• 6 data centers - 25+ nodes - one (1) keyspace
18
Julien Anguenot @ Cassandra Tech Day Houston 2014
C* iland’s cluster (2/2)• each DC (1 or 2 racks of 3 nodes)!
• 3 nodes per location for writes!• replication Factor (RF) = 3 !• 3 nodes per location for API reads (Dallas, TX, London, UK, Singapore)!
• each node!• VM (vCenter powered) - Ubuntu 14.04 LTS w/ Cassandra base dpkg!• 16GB of RAM / 16 vCPUs!• currently: ~ 1TB of data per node!• not using SSD (yet)
19
Julien Anguenot @ Cassandra Tech Day Houston 2014 20
Reston, VALA,CA
Dallas, TX
US
Singapore
Asia
London,UK
Manchester,UK
EU
Data centers and replication (1/2)
Julien Anguenot @ Cassandra Tech Day Houston 2014
Data centers and replication (2/2)
21
C* W
iland ReST API
iland core platform iland core platform
iland ReST API
C* R C* RC* W
C* R only deployed in: Dallas, TX - London, UK - Singapore
Julien Anguenot @ Cassandra Tech Day Houston 2014
Access to data (read / write)
22
US portal
https://my.ilandcloud.com/
Citrix Netscaler (US)
EU portal Asia portal
US API EU API Asia API
Julien Anguenot @ Cassandra Tech Day Houston 2014
At the application level?
• History!• POC: C* 1.2 + Astyanax (2013/02)!• V1: C* 2.0 + Astyanax w/ CQL3 over thrift (2013/06) !• Current version: C* 2.0 + DataStax CQL Java driver!
• Java / JEE cluster (Wildfly AS)!• AMQP / RabbitMQ!• Redis to cache up “hot” data (read latency)!• Python & DataStax CQL driver for cluster / data upgrades
23
Slides !@ !
http://www.slideshare.net/anguenot/cassandra-tech-dayhou2014
http://www.iland.com/enterprise-cloud-services-portal/!@ilandcloud!@anguenot