Malhar data torrent (Big Data Guru meetup 2014-02-27)

14
Hadoop’s Most Powerful Platform for Real-Time Stream Computations Prepared for Big Data Gurus February 27 th , 2014

description

Architects from DataTorrent talk about Malhar framework for processing streaming events in real time at massive scale.

Transcript of Malhar data torrent (Big Data Guru meetup 2014-02-27)

Page 1: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Hadoop’s Most Powerful Platform for Real-Time Stream Computations

Prepared for Big Data GurusFebruary 27th, 2014

Page 2: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Data processed by Hadoop (batch) DataTorrent(real-time)

Time

Now

[ seconds to millisec ]

Databases(HBase, Oracle,…)

Ad hoc queries

Standard Reports

hrs hrs millisec

Real-time ETL and Business Insights

Real-time Predictive Analytics

Real-time Business Actions

Real-time Business logic with HA

Real-time Monitoring and Alerting

Big Data – Done NOW

Business Decisions in Less than a Second

Page 3: Malhar data torrent (Big Data Guru meetup 2014-02-27)

DataTorrent Big Data Platform• Vision: Ubiquitize Real-Time Big Data Computations

– Enterprise quality: Highly Available, Linearly Scalable, Operable and Easy to Use

– Big data dimensional computations in real time with linear scalability

• Real-Time ETL: De-dup, Staging, Cleanup, Transformations, Load …• Real-Time Computation Apps and Feed Ingestion (Games, Mobile,

Set-top Boxes, Devices, …)– Multi-Feed Sources– Run business logic in real-time with HA

• Real-Time Monitoring, and Security: Capacity, DDOS, …• Real-Time Predictive Analytics: Web Analytics, Business Analytics,

Page 4: Malhar data torrent (Big Data Guru meetup 2014-02-27)

DataTorrent in Hadoop Ecosystem

Page 5: Malhar data torrent (Big Data Guru meetup 2014-02-27)

© DataTorrent, 2014 - Confidential

DataTorrent in the Modern Data Architecture

APPL

ICAT

ION

SDA

TA S

YSTE

MSO

URC

ES

RDBMS EDW

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

HANA

BusinessObjects BI

OPERATIONAL TOOLS

DEV & DATA TOOLS

Existing Sources (CRM, ERP, Clickstream, Logs)

INFRASTRUCTURE

Business Analytics Business Intelligence Tools OLAP Clients

Real-time Stream Analytics

DATA

IN

MO

TIO

N

Page 6: Malhar data torrent (Big Data Guru meetup 2014-02-27)

StrAM (Stream Application Master)

Security

SLA

Scalability

Alerts

Fault Tolerance

Tools

Partitioning

Web Services

Dynamic Modifications

State Snapshot

Malhar – Open Source Operators and Apps Library (Apache v2 License)

DataTorrent Technology Stack

Page 7: Malhar data torrent (Big Data Guru meetup 2014-02-27)

© DataTorrent Inc. 2014 - Confidential

DataTorrent in Hadoop Reference Architecture

DATA IN MOTIONREAL TIME STREAMING APPLICATIONS

SOURCE DATA

MS Queue’s

Events

Files

Databases

Sensor data

Social

APPL

ICAT

ION

S BusinessObjects BI

Query/Visualization/ Reporting/Analytical Tools and Apps

Enterprise Repositories

RDBMS

EDW

NoSQL

Real Time Ingestion

DATA AT RESTBATCH APPLICATIONS

Hive

Pig

HBase

Custom

Message Queue

Data In Motion

YARN

HDFS

YARN

MapReduce

HDFS

OPERATIONAL INTELLEGENCE

BUSINESS ACTIONS

PREDICTIVE ANALYTICS

STREAM ETL

REALTIME ALERTS

Page 8: Malhar data torrent (Big Data Guru meetup 2014-02-27)

© DataTorrent, 2014 - Confidential

Stream Processing

•A Stream is a sequence of data events with schema

•An Operator takes input streams and compute output streams

•An Application is a Directed Acyclic Graph (DAG)

•In-memory asynchronous distributed computations

•A Streaming Window is an atomic batch of sequential data events

Page 9: Malhar data torrent (Big Data Guru meetup 2014-02-27)

DataTorrent Hadoop GRID1

2

43 6

NM NM NM NM

Resource Manager

StrAM

3

5

5

64

2

1

DT Gateway

dtCLIDT

Console

MapReduce

MapReduce

MapReduce

MapReduceMapReduce

MapReduce

Page 10: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Malhar Open Source Project• Apache 2.0• Operators (over 400 operators)

– Algorithms– Ingestion, ETL– Input and Output Adapters

• UI Widgets (over 50 widgets)– Console widgets for stats– Application widgets for app data

• Application Templates– LogStream– Map Reduce Debugger– Shuffle less MapReduce

• Demo Apps (15 demo apps)

Page 11: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Malhar Open Source Project• Continuous Integration: Unit tests• Performance tests for operators • Daily tests of Demos and Apps for memory usage• More operators and UI widgets added as per new use cases/user

requests• Fully supported: Documentation, Certification• Input and Output Adapters

– HBase, MongoDB, CouchDB, Redis, Memcache– Flume, Kakfa, RabbitMQ, ActiveMQ, ZeroMQ– JBDC, MySql, DerbyDB, TimesTen– MQTT, Twitter, RSS, HTTP, WebSockets, Socket– Logs: Apache, SMTP– DFS, Local cache (Guava)

• Languages: Java, Python, JavaScript, Script, R

Page 12: Malhar data torrent (Big Data Guru meetup 2014-02-27)

DataTorrent’s Platform Differentiators.

Extreme Scalability Mission Critical Hadoop-Native

• Automatically scales to changing loads. Massive performance per node. Billions of events/sec

• Sub-second latency with linear scalability.

• Complex monitoring applications with massive computations.

• Built-in Stateful Fault-tolerance. 24/7 uptime - Highly Available.

• Predictive Analysis, Root cause. Real-time ETL.

• Update your application while it's running! A/B testing (2h2014).

• Develop faster and implement any business logic with our open-source framework.

• Runs on your existing Apache Hadoop cluster.

• Integrate seamlessly with your existing data flow and monitoring stack.

Page 13: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Live Demonstration

Page 14: Malhar data torrent (Big Data Guru meetup 2014-02-27)

Thank You!

• DataTorrent• Try Sandbox (https://datatorrent.com)• Free for

• Startup Program (Contact us for more details)• Up to 25GB memory usage in production• Non-production clusters

• Malhar Open Source (Apache 2.0) project • https://github.com/DataTorrent/Malhar• [email protected]• Applications available Jan 2014

• LogStream Application• Map-Reduce Monitor

DataTorrent Inc.3200 Partrick Henry, 2nd FlSanta Clara, CA 95054

[email protected]

Twitter.com/DataTorrentFacebook.com/DataTorrent