How to Pick the Right Big Data Framework

16
| Analytics and Mobility How to Pick the Right Big Data Framework Ajay Rajgure Senior Consultant, MicroStrategy Professional Services

Transcript of How to Pick the Right Big Data Framework

Page 1: How to Pick the Right Big Data Framework

| Analytics and Mobility

How to Pick the RightBig Data FrameworkAjay RajgureSenior Consultant, MicroStrategy Professional Services

Page 2: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Introduction

Ajay RajgureSenior Consultant MicroStrategy Professional Services Greater New York City Area

2

Page 3: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Learning Objectives

• Gain basic understanding of Big Data technology

• Know commonly used Big Data Frameworks

• Compare and pick right Big Data Framework for your business

• Know MicroStrategy Big Data Connectors

3

Page 4: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Agenda

4

• What is Big Data?

• Apache Hadoop

• Apache Spark

• Apache Flink

• Big Data framework comparisons

• MicroStrategy Big Data Connectors

Page 5: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

What is Big Data?

Data which is big in terms of :

5

• Volume• Velocity• Variety

Difficult to manage and process using RDBMS or other traditional technologies.

Big Data solutions provide the tools, methodologies, and technologies that are used to capture, store, search and analyze the data in seconds to find relationships and insights for innovation and competitive gain that were previously unavailable.

Examples of Big Data :

• Number of credit card transactions per seconds• Data generated through stock market trades every day• Data generated by users on social media sites

Page 6: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Big Data Technologies

6

Operational Big Data • NoSQL databases like MongoDB

• Provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored

Analytical Big Data• Massively Parallel Processing (MPP) database systems and MapReduce

• Provide analytical capabilities for retrospective and complex analysis that may touch most or all of the data

Page 7: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Apache HadoopBig Data framework

7

Apache Hadoop is an open source, Scalable, and Fault tolerant framework written in Java.

Apache Hadoop provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Hadoop consists of three key parts –• Hadoop Distributed File System (HDFS) – It is the storage layer of Hadoop• Map-Reduce – It is the batch data processing layer of Hadoop• YARN – It is the resource management layer of Hadoop

Input Data

Map Map Map Map

Compute

Reduce Reduce Reduce

Output

chunk-1 chunk-2 chunk-3 chunk-4

Page 8: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Apache Hadoop EcosystemBig Data framework

8

HiveSQL engine that gets converted to MAPREDUCE to run against HDFS.

PigHigher level scripting language that allows you to create MAPREDUCE programs to run against HDFS.Slightly analogous to Oracle PL/SQL.

ImpalaIMPALA is a low latency SQL engine that by passes MAPREDUCE.

YARNYet Another Resource Negotiator.

Page 9: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Apache SPARKBig Data framework

9

• It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations which includes Interactive Queries and Stream Processing

• The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application

• High Level API's in Java, Scala, Python, and R

• Medium Latency and High Throughput

Spark Core APIJava | Scala | Python | R

SparkSQL

SparkStreaming

MLlib(Machine Learning)

GraphX

Page 10: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Apache FLINKBig Data framework

10

• Flink is an alternative to MapReduce, it processes data more than 100 times faster than MapReduce

• It is independent of Hadoop but it can use HDFS to read, write, store, process the data

• Flink does not provide its own data storage system. It takes data from distributed storage

• Flink is the best fit to be an ideal tool for real time analytics

• It has a true streaming model and does not take input data stream as micro-batch

• Flink's programming modal provides High Throughput and Low Latency

• High Level API's in Java, Scala

Page 11: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

FLINK ArchitectureBig Data framework

11

Storage / Streaming

TableIt enables users to perform ad-hoc analysis using SQL like expression language for relational stream and batch processing

GellyIt is the graph processing engine which allows users to run set of operations to create, transform and process the graph

FlinkMLIt is the machine learning library which provides intuitive APIs and an efficient algorithm to handle machine learning applications

Dataset APIBatch Processing

Datastream APIStream Processing

RuntimeDistributed Streaming Dataflow

LocalSingle JVM

ClusterStandalone, Yarn,

Mesos, Tez

CloudGoogle GCE Amazon EC2

FilesLocal-FS

HDFSS3…

DatabasesMongoDB

HbaseSQL…

StreamsRabbitMQ

KafkaFlume…

Flin

kM

L

Gel

ly

Tabl

e

Tabl

e

APIs and Libraries

Kernel

Deploy

Storage

Page 12: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Pros: Pros: Pros:

Big Data Processing Tools Evolution

Scalability: Easily scale to thousands of inexpensive servers with just a single

Hadoop cluster

Highly effective and quick data processing: Hadoop can process petabytes of data in a matter of hours and terabytes of

data in minutes

Cost-effective 

Highly robust; If ever a cluster fail happens,

the data will automatically be passed on to

another location

Speed: Spark in memory engine which can process huge amount of data in less time, giving near real-time results.

Support for multiple languages: park provides

built-in APIs in Java, Scala, or Python.

Advanced Analytics: Event-based Streaming,

Machine learning (ML), and Graph algorithms.

Speed: Can process huge amount of data in less time, even faster than Spark because of its true

stream processing model.

It is the large-scale data processing framework which

can process data generated at very high velocity.

+ Everything that is available in Spark.

12

Limitations

LimitationsUse Cases

Difficult to program in native MapReduceLow performance on small data set

Many kinds of algorithms not supported, like

iterative machine learning, event based, etc.

Spark is not true streaming model, as it processes data in micro-batches.

• Credit card fraud detection and prevention

• Real time alerts in healthcare

• Network attack prevention

Page 13: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

MicroStrategy Big Data Connectors

13

Page 14: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

MicroStrategy Big Data Connectors

14

Page 15: How to Pick the Right Big Data Framework

| Analytics and Mobility M icroStrategy C onfidentia l. C opyright © 2019 M icroS trategy Incorporated. A ll R ights R eserved.

Demo: Connecting to Big Data From MicroStrategy

15

Page 16: How to Pick the Right Big Data Framework

| Analytics and Mobility

Thank YouQ&A

16