The Hive Think Tank: "Stream Processing Systems" by Karthik Ramasamy of Twitter

Post on 09-Feb-2017

353 views 0 download

Transcript of The Hive Think Tank: "Stream Processing Systems" by Karthik Ramasamy of Twitter

Stream Processing Systems

Karthik RamasamyTwitter

@karthikz

2

Value of Real Time DataIt’s contextual

[1] Courtesy Michael Franklin, BIRTE, 2015.

3

Heron

Batching of tuplesAmortizing the cost of transferring tuples

Task isolation

Ease of

debug-ability/isolation/profiling

Fully API compatible with StormDirected acyclic graph

Topologies, Spouts and Bolts

Support for back pressureTopologies should self adjustingg

Use of main stream languagesC++, Java and Python

EfficiencyReduce resource consumption G

Design: Goals

4

Better Storm

Twitter Heron

Container Based Architecture\Separate Monitoring and Scheduling-Simplified Execution Model2Much Better Performance

5

HeronSample Topologies

6

Heron@TwitterStorm is decommissioned

LARG

EST

CLUS

TER

100’

s of T

OPO

LOGI

ES

BILL

IONS

OF M

ESSA

GES

100’s

OF T

ERAB

YTES

REDU

CED

INCI

DENT

S

GOO

D N

IGHT

SLE

EP

3X reduction in resource usage

Auto scaling the system in the presence of unpredictability

7

Technology Challenges

The Road Ahead

Auto tuning of real time analytics jobs/queries

Exploiting faster networks for efficiently moving data

ÄÜ

J

8

@karthikz Get in Touch