Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start...

GLOBAL SPONSORS

Tuning your Hadoop Analytics withIsilon Scale-Out NASAlexander GrafAdvisory Systems EngineerUnstructured Data and Analytics

AGENDA

• Commonly seen first Haddoop uscases• Isilon Scale out Datalake concept• Better Hadoop architecture with Isilon• What about performance ? • How to find the right solution ?

Common first Hadoop usecases

• Predictive Maintenance• Churn prediction / prevention• Fraud detection• Datawarehouse offloading

© Copyright 2017 Dell Inc.4

Isilon Scale Out NAS: Simplicity and Ease of Use

• Automation:NO manual interventionNO reconfigurationNO server or client mount point or

application changesNO data migrationsNO RAID

Single File System Spans All Nodes

Scales linear to 33 PB

Customers>8000

>17% YoY Customer Growth

>2000 Analytics Customers

In Scale-Out NASNow in All-Flash#1

>3.2 Exabyte's Shipped Calendar 2016

Recognized Leader

ISILON MOMENTUM

ISILON - THE RECOGNIZED LEADER

© Copyright 2017 Dell Inc.7

Isilon Workload Consolidation

Ethernet

HADOOP ARCHITECTURE – DAS VS ISILON

NameNode

Data Node + Compute Node






Ethernet

Compute Node Compute Node Compute Node

Compute NodeCompute Node Compute Node

name node

name node

name node

data node

TRADITIONAL “SHARE-NOTHING” HADOOP

Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure

Unstructured Data

1

Existing Primary Storage

2 3 4 2 3 4 2 3 4 2 3 4

• Hadoop on a Stick (R=3) means 5 data copies ($$$$)

• Data has to copy to the Hadoop cluster before analysis can begin (Time to Results)

How will you maintain data consistency when a file changes on your primary storage?

Existing Virtualized Data Center


ISILON “SHARE-EVERYTHING” HADOOP

1 Start using Hadoop NOW with

unused processing and RAM available in your VMware environment

No replication required (Use your existing data)

Access to same data via NAS and HDFS protocols

Time to results extremely fast using already existing data with NO COPIES or wasted $$$$

Analysis Can Begin with the 1st VM

New Hadoop Compute Nodes

Unstructured Data

Use Native HDFS Protocol

Data Center Network

TIME-TO-RESULTS

Data Copy AnalysisIn-Place Analysis


Hadoop on a Stick

Have you ever copied 100TB from Primary Storage to a Hadoop system?

How long does it take to copy 100TB from one place to

another over a 10Gb link?

>24 Hours

Data Center Network


Hadoop Compute Nodes

Reading relevant data to

analysis

Virtual ServersHDFSNFSFTPSMB

Support for Multiple Hadoop Landscapes

name node

name node

name node

name node data node

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

MAP Reduce

(or even different versions/distro’s)

DATA LAKE

Cloudera IBM

Increase Utilization to Control Costs

Hadoop 1

Hadoop 2

HBase

• Consolidated cluster has access to entire pool of physical resources • Take advantage of multi-tenancy to increase utilization during non-peak hours

Source:

HDFSPERFORMANCE BENCHMARKS

DATANODE LOAD BALANCINGINTELLIGENTLY IMPROVE YOUR HADOOP PERFORMANCE

Key Features

Benefits

Intelligently provides datanode with the least load to new HDFS clients

Totally transparent to client, no configuration required

Improves overall performance of Hadoop clients for analytics workloads

Avoids overloading any specific OneFS node and increases cluster resilience

Node 1 Node 2

HDFS Client

Node 3

1. Namenode: Where to write?

2. Write to Node 2.

3. Good, will write to Node 2.

Connection Count

HIBENCH – WORDCOUNT TESTS

DAS Results:

Type Input_data_size Duration(s) Throughput(bytes/s) Throughput/node

Tiny 36 KB 24.441 1478 295Large 3 GB 90.349 36358575 7271715Huge 32 GB 136.893 239963008 47992601Gigantic 328 GB 1429.692 229763783 45952756

Isilon Results:

Type Input_data_size Duration(s) Throughput(bytes/s) Throughput/node

Tiny 36 KB 23.446 (4.07% Faster) 1529 305Large 3 GB 62.457 (30.87% Faster) 52595796 10519159Huge 32 GB 101.105 (26.14% Faster) 324901473 64980294Gigantic 328 GB 574.295 (59.83% Faster) 571990421 114398084

Counts the occurrence of each word in the input data, which are generated using RandomTextWriter

Even faster with all FlashIsilon Generation 6

Capacity

Perfo

rman

ce

S-Series

NL-Series HD-Series

X-Series

S-Series

NL-Series HD-Series

X-Series

250k ops, 15GB/s ops

F800

2GB/s480TB/chassis

H400

120TB-480TB/chassis

A200

40k ops5GB/s

H500117k ops12GB/s

H600

800TB/chassis

A2000

FINDINGTHE RIGHTSOLUTION

HADOOP DECISIONS

DAS

ECS

3 TRADITIONAL DISCOVERY QUESTIONS

1

2

3

What do you hope to achieve with Hadoop?

Why is this impactful to your business?

Which Hadoop

Distribution will you

choose?

Data Science

Data EngineeringDataOps

Data Thinking

Experienced Partners• Consulting: Data, Algorithms,

Compute, Mindset• Guiding companies to data leader-

and creatorship

• Ideation & Scoping of Usecases• Data Analysis• Development of machine learning

algorithms• Proof of Concepts

• Architechture design and concepts• Engineering and deployment• Testing and test management• Application managment

• Managed, hybrid, cloud infrastructures• DevOps Application management• Haddop and beyond on scale solutions• Security concepts and system design

*UM HADOOP-AS-A-SERVICE

1 Hadoop-HW on prem at customer Datacenter or off prem at UM Datacenter

2 *um provides fully managed platform services including hadoop layer

3 Customer specific analytics Software (tableau, SAS or others)

managed by

Compute nodes

Proven solutions for unstructured analytics

Dell EMC Unstructured Analytics Portfolio

PowerEdge Solution accelerators Splunk Ready System Hadoop Ready Bundle QuickStart for Hadoop EDW Optimization Solutions Hadoop Backup Solutions SAS-Grid Solution with Isilon Streaming Analytics Solutions

Recap - Better Hadoop with Isilon

• No data-loading, better performance => FASTER RESULTS• Run pilots on existing infrastructure• Run multiple Hadoop distributions• Scale storage and compute indepenently• Get enterprise storage features– Snapshots, DR-Replicas, Compliance

• Get best possible capacity utilisation – 80 % + of raw

Thank you

Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start...

Documents

Transcript of Tuning yourHadoopAnalytics with IsilonScale-Out NAS · ISILON “SHARE-EVERYTHING” HADOOP 1 Start...