Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ......

20
PRESENTATION TITLE GOES HERE Can Enterprise Storage Fix Hadoop? John Webster Senior Partner Evaluator Group

Transcript of Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ......

Page 1: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

PRESENTATION TITLE GOES HERE

Can Enterprise Storage Fix Hadoop?

John Webster Senior Partner

Evaluator Group

Page 2: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

2 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Agenda

What is the Internet Data Center and how is it different from Enterprise Data Center? How is the Apache Software Foundation (ASF) addressing the issues? What needs fixing from the perspective of Enterprise Storage vendors and the Enterprise Storage world? What are the proposed fixes? Can Hadoop fix Enterprise Storage? Can the Internet Data Center/Enterprise Data Center Chasm be Crossed?

FYI: I will use vendor names and products as examples only—no explicit or implied endorsements

4/13/2014 2

Page 3: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

3 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

The Data Center Chasm

4/13/2014 3

Internet Data

Center

Enterprise Data Center

Page 4: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

4 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Defining the Data Center Chasm

Internet Data Center Enterprise Data Center

4/13/2014 4

Embraces open source Automates IT Comfortable with systems that run in “failure mode” “Cheep and deep” – hardware inefficiency not an obvious issue More willing to build their own systems and self-support Manages storage (often JBOD) from a systems perspective

Prefers proprietary but learning open source Approaches IT automation conservatively Doesn’t get “failure mode” Hardware efficiency-conscious More willing to buy from proprietary vendors and deal with them for support Sees value in storage environment as a place for data and storage management

Page 5: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

5 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved. 4/13/2014 5

Page 6: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

6 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

What has the ASF Fixed in HDFS?

NameNode SPOF NameNode active/standby failover support

Snapshot Read-only Copy on Write (COW) included in latest v2 Beta (2.1.0)

NFS support Support for NFSv3 in latest v2 Beta (2.1.0)

DR Support Distributed Copy (distcp)

4/13/2014 6

Page 7: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

7 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

What Needs to be Fixed — the Enterprise Storage Vendor Perspective

Hadoop NameNode is a single point of failure in V1. Manual failover in v2 (Beta). JobTracker is also a single point of failure For data integrity and protection, HDFS creates three full clone copies of data

3x the storage for each file – slow and inefficient If all three copies are corrupted, you’re still hosed (reload and start over)

60% of Enterprise Hadoop projects fail or are put on hold Steep learning curve—six months is not uncommon for those that actually go from pilot to production No storage tiering Limited (if any) ways to respond to corporate security and data governance policies Difficult to move between cloud and data center Fundamentally a batch process Data in/out processes can take longer than the actual query process Inability to dis-aggregate storage from compute so that the two can be scaled independently Dearth of applications built on top Dearth of people available in the job market to run this beast and the ones that can go for big bucks ….and more leading some analysts to believe that Big Data has entered the “trough of disillusion”

Page 8: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

8 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

What Needs to be Fixed — the Enterprise Storage Vendor Perspective

Hadoop NameNode is a single point of failure JobTracker is also a single point of failure For data integrity and protection, HDFS creates three full clone copies of data

3x the storage for each file – slow and inefficient If all three copies are corrupted, you’re still hosed (reload and start over)

60% of Enterprise Hadoop projects fail or are put on hold Steep learning curve—six months is not uncommon for those that actually go from pilot to production No storage tiering Limited (if any) ways to respond to corporate security and data governance policies Difficult to move between cloud and data center Fundamentally a batch process Data in/out processes can take longer than the actual query process Inability to dis-aggregate storage from compute so that the two can be scaled independently Dearth of applications built on top Dearth of people available in the job market to run this beast and the ones that can go for big bucks ….and more leading some analysts to believe that Big Data has entered the “trough of disillusion”

Page 9: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

9 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Storage in Shared Nothing

NODE 1

NODE 2

NODE 3

NODE n

DAS DAS DAS DAS

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

DAS

Network Layer

1-10 Gb Ethernet

Compute Layer

Commodity Servers

Storage Layer

6-12 disks in each server

typically JBOD

Scale to thousands

of nodes

Only the Ethernet network is shared

In Hadoop, Control = Name Node; Node 1,2… = Data Node

Page 10: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

10 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Hadoop External Storage – EMC Isilon Example

Shared storage replaces node-level DAS HDFS implemented as “over the wire” protocol on OneFS Isilon cluster nodes emulate NameNodes and DataNodes NameNode SPOF eliminated Decoupled storage and compute layers Data services, data protection, and DR by OneFS Analytics on data in place – i.e. minimal if any data moving

4/13/2014 10

Page 11: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

11 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Hadoop External Storage – NetApp Example

Preserves shared nothing architecture and HDFS Decouple compute and storage Hardware RAID: reduction in copies from 3 to 2 NameNode metadata in separate array for faster NameNode recovery DataNode drive failures do not “blacklist” the DataNode Apply built-in enterprise data and storage management functionality to Hadoop data

4/13/2014 11

Hadoop

MAP

RED

UC

E

FAS Series

E-Series

Data ONTAP®

Namenode

Secondary Namenode

Datanodes/ Tasktracke

rs

NFS

Metadata Store

Data Stores 4 separate, shared-

nothing partitions per chassis

6Gb SAS, Direct Connect

10GbE

10GbE

Source: NetApp

Page 12: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

12 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

NODE 1

NODE

NODE

NODE n

1 2 3 4 5 6 7 8

B8G

MR

3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Secondary Storage

Network Layer

Compute Layer

Primary Storage Layer

SAN/NAS Secondary Storage Layer Data mirrored or migrated from primary to secondary storage

Storage services also live here

Secondary Storage

Layer

Page 13: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

13 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Progression of Hadoop @ Yahoo!

4/13/2014 13

Source: Yahoo!

Page 14: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

14 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Can Hadoop Fix Enterprise Storage?

4/13/2014 14

Page 15: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

15 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Can Hadoop Fix Enterprise Storage?

Modern Enterprise Storage Issues Inflexible and “non-elastic” Siloed Proprietary Opposite of “Cheap and deep” Bound to three-year product life cycles Developed for the traditional enterprise data center environment Doesn’t offer performance at scale and low cost and all at the same time

4/13/2014 15

Page 16: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

16 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Is Hadoop a new Storage Platform?

No It’s a distributed computing platform for analytics

Yes

HDFS - Embedded, distributed file system (like scale-out NAS)

Data protection and management built-in (like Enterprise Storage)

Storage performance at scale and low cost and with native intelligence and all at the same time

Growing use case as data repository for existing enterprise BI and Data Warehousing apps – the “Data Lake”

Page 17: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

17 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

What Does the Enterprise Want from Big Data?

“If we could harness all of our data, we would be a much stronger business.”*

* From CompTIA survey where two thirds of respondents either agreed or strongly agreed with the statement

4/13/2014 17

Page 18: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

18 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Can the Chasm Be Crossed?

4/13/2014 18

Internet Data

Center

Enterprise Data Center

Page 19: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

19 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Is a new computing paradigm lurking behind the Hadoop hype?

Page 20: Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

20 2014 Data Storage Innovation Conference. © Insert Your Company Name. All Rights Reserved.

Summary

Hadoop is crossing the chasm A more pragmatic approach to integrating shared storage with Hadoop is emerging The Hadoop Holy Grail: Operational (transactional) processing with real time analytics