Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start...

25
GLOBAL SPONSORS Tuning your Hadoop Analytics with Isilon Scale-Out NAS Alexander Graf Advisory Systems Engineer Unstructured Data and Analytics

Transcript of Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start...

Page 1: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

GLOBAL SPONSORS

Tuning your Hadoop Analytics withIsilon Scale-Out NASAlexander GrafAdvisory Systems EngineerUnstructured Data and Analytics

Page 2: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

AGENDA

• Commonly seen first Haddoop uscases• Isilon Scale out Datalake concept• Better Hadoop architecture with Isilon• What about performance ? • How to find the right solution ?

Page 3: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Common first Hadoop usecases

• Predictive Maintenance• Churn prediction / prevention• Fraud detection• Datawarehouse offloading

Page 4: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

© Copyright 2017 Dell Inc.4

Isilon Scale Out NAS: Simplicity and Ease of Use

• Automation:NO manual interventionNO reconfigurationNO server or client mount point or

application changesNO data migrationsNO RAID

Single File System Spans All Nodes

Scales linear to 33 PB

Page 5: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Customers>8000

>17% YoY Customer Growth

>2000 Analytics Customers

In Scale-Out NASNow in All-Flash#1

>3.2 Exabyte's Shipped Calendar 2016

Recognized Leader

ISILON MOMENTUM

Page 6: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

ISILON - THE RECOGNIZED LEADER

Page 7: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

© Copyright 2017 Dell Inc.7

Isilon Workload Consolidation

Page 8: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Ethernet

HADOOP ARCHITECTURE – DAS VS ISILON

NameNode

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Data Node + Compute Node

Ethernet

Compute Node Compute Node Compute Node

Compute NodeCompute Node Compute Node

name node

name node

name node

data node

Page 9: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

TRADITIONAL “SHARE-NOTHING” HADOOP

Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure

Unstructured Data

1

Existing Primary Storage

2 3 4 2 3 4 2 3 4 2 3 4

• Hadoop on a Stick (R=3) means 5 data copies ($$$$)

• Data has to copy to the Hadoop cluster before analysis can begin (Time to Results)

How will you maintain data consistency when a file changes on your primary storage?

Page 10: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Existing Virtualized Data Center

Existing Primary Storage

ISILON “SHARE-EVERYTHING” HADOOP

1 Start using Hadoop NOW with

unused processing and RAM available in your VMware environment

No replication required (Use your existing data)

Access to same data via NAS and HDFS protocols

Time to results extremely fast using already existing data with NO COPIES or wasted $$$$

Analysis Can Begin with the 1st VM

New Hadoop Compute Nodes

Unstructured Data

Use Native HDFS Protocol

Page 11: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Data Center Network

TIME-TO-RESULTS

Data Copy AnalysisIn-Place Analysis

Existing Primary Storage

Hadoop on a Stick

Have you ever copied 100TB from Primary Storage to a Hadoop system?

How long does it take to copy 100TB from one place to

another over a 10Gb link?

>24 Hours

Data Center Network

Existing Primary Storage

Hadoop Compute Nodes

Reading relevant data to

analysis

Page 12: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Virtual ServersHDFSNFSFTPSMB

Support for Multiple Hadoop Landscapes

name node

name node

name node

name node data node

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

(or even different versions/distro’s)

DATA LAKE

Cloudera IBM

Page 13: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Increase Utilization to Control Costs

Hadoop 1

Hadoop 2

HBase

• Consolidated cluster has access to entire pool of physical resources • Take advantage of multi-tenancy to increase utilization during non-peak hours

Source:

Page 14: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

HDFSPERFORMANCE BENCHMARKS

Page 15: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

DATANODE LOAD BALANCINGINTELLIGENTLY IMPROVE YOUR HADOOP PERFORMANCE

Key Features

Benefits

Intelligently provides datanode with the least load to new HDFS clients

Totally transparent to client, no configuration required

Improves overall performance of Hadoop clients for analytics workloads

Avoids overloading any specific OneFS node and increases cluster resilience

Node 1 Node 2

HDFS Client

Node 3

1. Namenode: Where to write?

2. Write to Node 2.

3. Good, will write to Node 2.

Connection Count

Page 16: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

HIBENCH – WORDCOUNT TESTS

DAS Results:

Type Input_data_size Duration(s) Throughput(bytes/s) Throughput/node

Tiny 36 KB 24.441 1478 295Large 3 GB 90.349 36358575 7271715Huge 32 GB 136.893 239963008 47992601Gigantic 328 GB 1429.692 229763783 45952756

Isilon Results:

Type Input_data_size Duration(s) Throughput(bytes/s) Throughput/node

Tiny 36 KB 23.446 (4.07% Faster) 1529 305Large 3 GB 62.457 (30.87% Faster) 52595796 10519159Huge 32 GB 101.105 (26.14% Faster) 324901473 64980294Gigantic 328 GB 574.295 (59.83% Faster) 571990421 114398084

Counts the occurrence of each word in the input data, which are generated using RandomTextWriter

Page 17: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Even faster with all FlashIsilon Generation 6

Capacity

Perfo

rman

ce

S-Series

NL-Series HD-Series

X-Series

S-Series

NL-Series HD-Series

X-Series

250k ops, 15GB/s ops

F800

2GB/s480TB/chassis

H400

120TB-480TB/chassis

A200

40k ops5GB/s

H500117k ops12GB/s

H600

800TB/chassis

A2000

Page 18: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

FINDINGTHE RIGHTSOLUTION

Page 19: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

HADOOP DECISIONS

DAS

ECS

Page 20: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

3 TRADITIONAL DISCOVERY QUESTIONS

1

2

3

What do you hope to achieve with Hadoop?

Why is this impactful to your business?

Which Hadoop

Distribution will you

choose?

Page 21: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Data Science

Data EngineeringDataOps

Data Thinking

Experienced Partners• Consulting: Data, Algorithms,

Compute, Mindset• Guiding companies to data leader-

and creatorship

• Ideation & Scoping of Usecases• Data Analysis• Development of machine learning

algorithms• Proof of Concepts

• Architechture design and concepts• Engineering and deployment• Testing and test management• Application managment

• Managed, hybrid, cloud infrastructures• DevOps Application management• Haddop and beyond on scale solutions• Security concepts and system design

Page 22: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

*UM HADOOP-AS-A-SERVICE

1 Hadoop-HW on prem at customer Datacenter or off prem at UM Datacenter

2 *um provides fully managed platform services including hadoop layer

3 Customer specific analytics Software (tableau, SAS or others)

managed by

Compute nodes

Page 23: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Proven solutions for unstructured analytics

Dell EMC Unstructured Analytics Portfolio

PowerEdge Solution accelerators Splunk Ready System Hadoop Ready Bundle QuickStart for Hadoop EDW Optimization Solutions Hadoop Backup Solutions SAS-Grid Solution with Isilon Streaming Analytics Solutions

Page 24: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Recap - Better Hadoop with Isilon

• No data-loading, better performance => FASTER RESULTS• Run pilots on existing infrastructure• Run multiple Hadoop distributions• Scale storage and compute indepenently• Get enterprise storage features– Snapshots, DR-Replicas, Compliance

• Get best possible capacity utilisation – 80 % + of raw

Page 25: Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start using Hadoop NOW with unused processing and RAM available in your VMware environment

Thank you