Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Big Data Systems - Done “The Simple Way” Demi Ben-Ari - VP R&D @ CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017

Transcript of Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Page 1: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Big Data Systems - Done “The Simple Way”Demi Ben-Ari - VP R&D @


Page 2: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

About Me

Demi Ben-Ari, Co-Founder & VP R&D @ Panorays● Google Developer Expert● Co-Founder of Communities:

○ “Big Things” - Big Data, Data Science, DevOps○ Google Developer Group Cloud○ Ofek Alumni Association

In the Past:● Sr. Data Engineer - Windward● Team Leader & Sr. Java Software Engineer,

Missile defense System - “Ofek” – IAF

Page 3: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


● A lot of (NOT) funny Jokes● Problem definition and Environment● Monitoring pipeline solutions

○ Metrics○ Datastores○ Dashboards○ Alerting

● Summary● (Not going to address Service discovery and monitoring)

Page 4: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Say “Distributed”, Say “Big Data”,Say….

Page 5: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

What is Big Data (IMHO)? And What to Monitor?

● Systems involving the “3 Vs”:

What are the right questions we want to ask?

○ Volume - How much?

○ Velocity - How fast?

○ Variety - What kind? (Difference)

Page 6: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitor What?





Page 7: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monolith Structure

OS CPU Memory Disk

Processes Java Application Server


Web Server

Load Balancer

Users - Other Applications

Monitoring System


Many times...all of this was on a single physical server!

Page 8: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Distributed Microservices Architecture

Service A



Service B


Cache DBService C

Web Server


Analytics Cluster


Slave Slave Slave

Monitoring System???

Page 9: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Some basic concepts

Page 10: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Basic Concepts

● Monitoring● White-box

○ internals● Black-box

○ behavior● Dashboard● Alert

● Root cause● Node and machine● Deploy

○ Any change to a service’s running software or its configuration.

● KPI - Key Performance Indicator

Page 11: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Data flow and Environment(Use Case)

Page 12: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Structure of the Data

● Maritime Analytics Platform

● Geo Locations + Metadata

● Arriving over time

● Different types of messages being reported by satellites

● Encoded (For compression reasons)

● Might arrive later than actually transmitted

Page 13: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Data Flow Diagram

External Data


Analytics Layers

Data Pipeline

Parsed Raw

Entity Resolution Process

Building insightson top of the entities

Data Output Layer

Anomaly Detection


UI for End Users

Page 14: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Environment Description


Dev Testing Live Staging ProductionEnv


RESTful Java Services

Page 15: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Stack - Let’s fill in the blanks


Metrics Collection



Data Monitoring

Log Monitoring

Page 16: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Situations & Problems

Page 17: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

MongoDB + Spark

Worker 1

Worker 2



…Worker N

Spark Cluster




MasterSharded MongoDB

Replica Set

Page 18: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Cassandra + Spark

Worker 1

Worker 2



…Worker N

Cassandra Cluster

Spark Cluster



Page 19: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Cassandra + Serving

Cassandra Cluster



UI ClientUI Client

UI ClientUI Client

Web ServiceWeb

ServiceWeb ServiceWeb


Page 20: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


● Multiple physical servers

● Multiple logical services

● Want Scaling => More Servers

● Even if you had all of the metrics

○ You’ll have an overflow of the data

● Your monitoring becomes a “Big Data” problem itself

Page 21: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

This is what “Distributed” really Means

The DevOps Guy

(It might be you)

Page 22: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

So...SolutionsLet’s Start!

Page 23: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Operation System

Page 24: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Operation System Metrics

● What to measure:○ CPU○ Memory○ Disk Space

● How to measure:○ CollectD or StatsD reporting to Graphite○ New Relic

■ Nice and easy UI■ Even the free account gives great tool■ Alerting of thresholds

Page 25: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Some help from “the Cloud”

Page 26: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Google Cloud Platform StackDriver

Page 27: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Amazon Web Services CloudWatch

Page 28: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Report to Where?

● We chose: ● Graphite (InfluxDB) + Grafana● Can correlate System and

Application metrics in one place :)

Page 29: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Report to Where?

● Save DevOps efforts if you’re willing to Pay :)● Hosted Graphite

○● Throwing the “Big Data” volume monitoring problem at

someone else

Page 30: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Connections Connections…

Page 31: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Drivers to Datastores

● Actions they usually do:○ Open connection○ Apply actions

■ Select, Insert, Update, Delete○ Close connection

● Do you monitor each?○ Hint: Yes!!!! Hell Yes!!!

● Creating a wrapper in any programming language and reporting the metrics○ Count, execution times, errors…○ Infrastructure code that will give great visibility

Page 32: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Cassandra

Page 33: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Cassandra

● OpsCenter - by DataStax

Page 34: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Cassandra

● Is the enough?...

We can connect it to Graphite also (Blog: “Monitoring the

hell out of Cassandra”)

● Plug & Play the metrics to Graphite - Internal Cassandra


● Back to the basics: dstat, iostat, iotop, jstack

Page 35: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Cassandra

Page 36: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Spark

Page 37: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

What to monitor in an Apache Spark Cluster

● Application execution

● Resource consumption and allocation

● Task Failures

● Environment and Amount of servers

● Physical OS metrics

● Infrastructure services

Page 38: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Ways to Monitoring Spark● Grafana-spark-dashboards

○ Blog:

● Spark UI - Online on each application running● Spark History Server - Offline (After application finishes)● Spark REST API

○ Querying via inner tools to do ad-hoc monitoring● Back to the basics: dstat, iostat, iotop, jstack● Blog post by Tzach Zohar - “Tips from the Trenches”●

Page 39: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Monitoring Your Data

Page 40: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Data Questions? What should be measure

● Did all of the computation occur?

○ Are there any data layers missing?

● How much data do we have? (Volume)

● Is all of the data in the Database?

● Data Quality Assurance

Page 41: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Data Answers!● KISS (Keep it simple stupid)

● Jenkins + Maven (JUnit) for the rescue

● Creating a maven “monitoring” project.

○ Running scheduled tasks, each for the relevant data source

■ Database data existence

■ S3 files existence

■ Data flow that keeps on coming from sensors

■ (Any other data source that you can imagine…)

○ Scheduled task that write amount metrics to Graphite -> Dashboards

○ Report task execution to Graphite

Page 42: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Data Answers!● The method doesn’t really matter, as long as you:

○ Can follow the results over time

○ Know what your data flow, know what might fail

○ It’s easy for anyone to add more monitoring

(For the ones that add the new data each time…)

○ It don’t trust others to add monitoring

(It will always end up the DevOps’s “fault” -> No monitoring will be


Page 44: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


Page 45: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


Page 46: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

● Elastic● Architecture:




ELK - Elasticsearch + Logstash + Kibana



Indexer Web UIStorage

Page 47: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

ELK - Elasticsearch + Logstash + Kibana

Page 48: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Did someone say “Dashboard”?

Page 49: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


●● Open Source:● Came out as one of many Open source tool by● Created and Maintained by Arik Fraimovich (You rock!)● Written in Python● Has an on-premise and hosted solution

● At Panorays we also use if for Alerting via Its integration with Slack

Page 50: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Redash - Screenshots

Page 51: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


Page 53: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Summary - Monitoring Stack


Metrics Collection



Data Monitoring

Log Monitoring

Page 54: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

Not my problem...or is it?

Page 55: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017

So...Who does the monitoring in our company?

Page 56: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


● Correlating Application and System metrics!!!!● Ask the right monitoring questions -> answer with

Dashboards● KISS - simple is key, what’s hard, we tend not to do at all● Alert about what you can actually react to

○ (And to the relevant person)● Measure whatever you can

○ only way to know if you’re improving● Monitor your business KPIs too

Page 57: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


● If all of what I’ve said is not enough…

Graphs are fricking cool!

Page 58: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017


Page 60: Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017