Post on 17-Oct-2015
1 Copyright 2013 EMC Corporation. All rights reserved.
Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business Insight
Ryan Peterson Director, Solutions Architecture Isilon Storage Division
2 Copyright 2013 EMC Corporation. All rights reserved.
Roadmap Information Disclaimer EMC makes no representation and undertakes no obligations with
regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, Roadmap Information).
Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.
3 Copyright 2013 EMC Corporation. All rights reserved.
Agenda Quick Review of Isilon Key Features Quick Review of Hadoop Lessons Learned Common Misconceptions Hadoop Technology Review Hadoop Technology Challenges Lessons Learned Seeing Hadoop Differently Case Study Example Resources
4 Copyright 2013 EMC Corporation. All rights reserved.
0
10
20
30
40
50
60
70
80
90
2009 2010 2011 2012 2013 2014
Exab
ytes
The Unstructured Data Challenge
By 2013, 80% of all storage capacity sold will be for unstructured data Source: Scale Out Storage in the Content Driven Enterprise: Unleashing the Value of Information Assets, IDC White Paper
File based: 61.8% CAGR Block based: 23.7% CAGR
5 Copyright 2013 EMC Corporation. All rights reserved.
EMC Isilon Scale-Out NAS
Single file system, single volume, global namespace for simplicity and ease of use Scales to over 20 PB Stripes data across all nodes for high resiliency and up to N+4 data protection Robust data backup and disaster recovery options Unmatched efficiency with > 80% storage utilization and automated storage tiering Worlds fastest NAS with over 100 GB/s throughput, 1.6M SPECsfs ops Integrated support for industry-standard protocols including NFS, SMB, HTTP, FTP,
and HDFS for operational flexibility
Native HDFS and HDFS 2.0 support
6 Copyright 2013 EMC Corporation. All rights reserved.
Hadoop Finding your Gold Nugget of Data
7 Copyright 2013 EMC Corporation. All rights reserved.
Created 6+ years ago
Software platform designed to analyze massive amounts of unstructured data
Two core components: Hadoop Distributed File System (HDFS) (storage) MapReduce (compute)
Now a top-level Apache project backed by large, open source development community
Hadoop
8 Copyright 2013 EMC Corporation. All rights reserved.
Hadoop is a complete solution
Hadoop is a share-nothing architecture
Hadoop is a mainstream technology
Hadoop is only for Data Scientists
Hadoop is only good with DAS
HDFS is a robust file system
Hadoop is an Engineering Exercise
Hadoop Lessons Learned Common Misperceptions
9 Copyright 2013 EMC Corporation. All rights reserved.
Isilon HDFS interface
Isilon supports the HDFS interfaces for the NameNode and DataNode to host and metadata and data
Underlying filesystem is OneFS As simple as pointing the HDFS
clients to the DNS name of the Isilon cluster!
10 Copyright 2013 EMC Corporation. All rights reserved.
Technology Review
11 Copyright 2013 EMC Corporation. All rights reserved.
Secondary NameNode
DataNode / Task Tracker Job Tracker
NameNode
Technology Review
12 Copyright 2013 EMC Corporation. All rights reserved.
NameNode
Manages the file system namespace Stores all the Metadata in the RAM Filenames, owners, group, access info Knows associated blocks Manages block replication
13 Copyright 2013 EMC Corporation. All rights reserved.
Secondary NameNode
Manages edit log and check-pointing of NameNode metadata
Does NOT provide NameNode failover Is not a backup or hot standby for the NameNode
14 Copyright 2013 EMC Corporation. All rights reserved.
Job Tracker
Manages all the jobs to the cluster Tracks and reports the status of jobs
and tasks
Provides job queuing functionality
15 Copyright 2013 EMC Corporation. All rights reserved.
DataNode / Task Tracker
Stores blocks of files on top of native host OS file system (e.g. EXT3, ZFS)
Serves read/write requests from the clients Perform block creation, deletion, and replication Same block can be stored on multiple DataNodes for
redundancy
16 Copyright 2013 EMC Corporation. All rights reserved.
Technology Challenges
17 Copyright 2013 EMC Corporation. All rights reserved.
Hadoop Technology Challenges
Traditional Hadoop NameNode Architecture and Data Resiliency
Data Protection and Version Control with Hadoop Manual Import and Export of Data Scalability of Traditional Hadoop Infrastructure Protocol Support Time to Results
18 Copyright 2013 EMC Corporation. All rights reserved.
Traditional NameNode Architecture
NameNode
When NameNode
map is lost or damaged, data location
information no longer exists
NameNode provides
location details of all stored information
No automatic recovery of NameNode = downtime Even with NameNode failover due out soon in Hadoop, manual recovery required
19 Copyright 2013 EMC Corporation. All rights reserved.
Distributed (Clustered) NameNode When Using Isilon
Metadata stored across systems same way as standard file metadata Built-in clustered redundancy across many nodes
NameNode
Clustering the NameNode on Isilon allows
for the failure protection level Isilon
already provides
Clu
ster
ed N
ameN
ode
20 Copyright 2013 EMC Corporation. All rights reserved.
Snapshot/Version Control Before
After
Traditional HDFS does not have replication
No Snapshotting of data Loss of Version control Not designed for Mission
Critical
Full Snapshot IQTM integration identifies changes
Multi-threaded, Multi-Node Scale-Out replication
Improved RPO/RTO for business continuity
Geo-replicated Hadoop!
21 Copyright 2013 EMC Corporation. All rights reserved.
Traditional Share-Nothing Hadoop
Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure
Unstructured Data
1
Existing Primary Storage
2 3 4 2 3 4 2 3 4 2 3 4
Hadoop on a Stick (R=3) means 5 data copies ($$$$)
Data has to copy to the Hadoop cluster before analysis can begin (Time to Results)
How long would it take to copy all of your data to another storage platform? How would you maintain data consistency when a file changes on your primary storage?
22 Copyright 2013 EMC Corporation. All rights reserved.
Isilon Share-Everything Hadoop
Existing Virtualized Data Center
Use Native HDFS Protocol
Unstructured Data
1
Start using Hadoop NOW with unused processing and RAM available in your VMware environment
No replication required (Use your existing data)
Access to same data via NAS and HDFS protocols
Time to results extremely fast using already existing data with NO COPIES or wasted $$$$
New Hadoop Compute Nodes
Existing Primary Storage
23 Copyright 2013 EMC Corporation. All rights reserved.
Protocol Support Servers
Servers
Servers
Before
After
HDFS is not visible to Windows, Unix, Linux, Apple, or any other file system natively
Big Data is only used for Big Data
Inherent Multi-Protocol Support in Isilon allows ubiquitous access to all file systems including Hadoop
Big Data is actual data!
Servers
24 Copyright 2013 EMC Corporation. All rights reserved.
Data Center Network
Time-to-Results
Data Copy Analysis In-Place Analysis
Existing Primary Storage
Hadoop on a Stick
Have you ever copied 100TB from Primary Storage to a Hadoop system?
How long does it take to copy
100TB from one place to another
over a 10GB link?
>24 Hours
Data Center Network
Existing Primary Storage
Hadoop Processing Nodes
Reading relevant data to analysis
25 Copyright 2013 EMC Corporation. All rights reserved.
Dependent Scaling Traditional Hadoop HDFS
Isilon HDFS
Storage to Compute ratio is fixed Scaling compute means scaling
capacity
Difficult to provide QoS Compute upgrade is a forklift
Scale compute independent of storage
Achieve optimal performance balance even as workloads evolve
No data migrations, ever! Add new performance as
hardware evolves
Compute
Sto
rage
Required performance/ capacity
Required Hadoop Cluster Nodes
26 Copyright 2013 EMC Corporation. All rights reserved.
Independent Scaling Traditional Hadoop HDFS
Isilon HDFS
Storage to Compute ratio is fixed Scaling compute means scaling
capacity
Difficult to provide QoS Compute upgrade is a forklift
Scale compute independent of storage
Achieve optimal performance balance even as workloads evolve
No data migrations, ever! Add new performance as
hardware evolves
Compute
Sto
rage
Required performance/ capacity
Required Hadoop Cluster Nodes
27 Copyright 2013 EMC Corporation. All rights reserved.
Hadoop can be inexpensive
Hadoop can be easy to deploy
Hadoop can use my existing data
Hadoop NameNode data can be protected
Hadoop data can have uptime guarantees
HDFS is better as a protocol than file system
Isilon addresses many Hadoop challenges
Hadoop Lesson Learned See Hadoop Differently
28 Copyright 2013 EMC Corporation. All rights reserved.
Return Path Captures Competitive Advantage with Hadoop Analytics and EMC Isilon
Challenge Data growing 2550 terabytes per year Limited performance and capacity to support intensive
Hadoop analytics Disparate systems lacked performance and capacity
Solution X-series SmartPools, SmartConnect,
SmartQuotas, InsightIQ
Results Enables unconstrained access to email data for analysis Reduces shared storage data center footprint by 30 percent Improves availability and reliability for Hadoop analytics savings of $350,000 from lower power, cooling, and
maintenance
Applications Hadoop, internally
developed email intelligence solutions
Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. Thats a significant business enabler, allowing Return Path to develop customer solutions much faster.
DIZ CARTER VP Infrastructure
Operations
29 Copyright 2013 EMC Corporation. All rights reserved.
For More Information EMC.com:
EMC Isilon Scale-Out NAS: http://www.emc.com/isilon Scale-Out Storage Solutions for Hadoop:
http://www.emc.com/big-data/scale-out-storage-hadoop.htm
Solution Brief: EMC Big Data Storage and Analytics Solution White Paper: Hadoop on EMC Isilon Scale-Out NAS Analyst Report: EMCs Enterprise Hadoop Solution, Enterprise Strategy
Group, 2012 Email me: ryan.peterson@emc.com
30 Copyright 2013 EMC Corporation. All rights reserved.
Session Name Date Time
Isilon Scale-Out NAS Overview and Future Directions Monday 5/6 Wednesday 5/8 1-2pm 8:30-9:30am
Protecting & Backing Up the Isilon Cluster at Enterprise Scale
Tuesday 5/7 Thursday 5/9
10-11am 8:30-9:30am
Get Better Insight into Your Isilon Cluster with Tools that Help You Manage Your Performance & Capacity
Tuesday 5/7 Thursday 5/9
10-11am 11:30am-12:30pm
Related Sessions
Birds of a Feather Date Time
Online File Sharing & Collaboration Opportunities and Challenges in Deploying with On-Premise Storage Tuesday, 5/7 1-2pm
Hadoop Opportunities and Challenges in Deploying with an Enterprise Infrastructure Wednesday, 5/8 1-2pm
31 Copyright 2013 EMC Corporation. All rights reserved.
Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to win
Weigh your current Big Data at the EMC ISILON booth and get a t-shirt.
Join one of our theater presentations and receive a FREE drink ticket at the Captains Lab.
Discover the future of enterprise storage at Isilons Keynote Wednesday, April 8 11:30 AM
Venetian Ballroom
Drawing immediately following for a 3D Printer (Makerbot)
Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business InsightRoadmap Information DisclaimerAgendaThe Unstructured Data ChallengeEMC Isilon Scale-Out NASHadoop Finding your Gold Nugget of DataSlide Number 7Hadoop Lessons Learned Common MisperceptionsIsilon HDFS interfaceTechnology ReviewTechnology ReviewNameNodeSecondary NameNodeJob TrackerDataNode / Task TrackerTechnology ChallengesHadoop Technology ChallengesTraditional NameNode ArchitectureDistributed (Clustered) NameNode When Using IsilonSnapshot/Version ControlTraditional Share-Nothing HadoopIsilon Share-Everything HadoopProtocol SupportTime-to-ResultsDependent ScalingIndependent ScalingHadoop Lesson Learned See Hadoop DifferentlyReturn PathFor More InformationRelated Sessions Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to winSlide Number 32