EMC Hadoop Starter Kit - ViPR Edition
-
Upload
walshe1 -
Category
Data & Analytics
-
view
595 -
download
2
description
Transcript of EMC Hadoop Starter Kit - ViPR Edition
1© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter KitViPR Edition
EMC Open Innovation Lab
2© Copyright 2014 EMC Corporation. All rights reserved.
The Digital Universe
Less than 1% of the World’s Data
is AnalyzedBy 2020, the Internet will
connect 7.6B people
and 200B things (sensors, machines, cars, appliances…)
Data Volumes
2000: 2 Exabytes a year2011: 2 Exabytes a day
3© Copyright 2014 EMC Corporation. All rights reserved.
Location & Types Of Big Data
Structured Data
UnstructuredData
Enterprise
ForecastData
LocationData
CreditData
ShippingData
Social, Video Data
Partner Public
10101010100101010011001010101110010
1101010100101011111
TelemetryData
Location & Types Of Big (& Fast!) Data
4© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Challenges
Depends on HDFS for data repository– Must make legacy data accessible through HDFS
Hadoop HDFS inefficiencies:– 3 copies for protection– No advanced data efficiency: de-duplication, thin provision– Security
Integration with robust traditional data center products: compute virtualization, enterprise storage
5© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Storage Options
Hadoop HDFS
• Leverage Hadoop distro HDFS data services
• Compute, and data converged on cluster of servers
Storage Array
• Name node and Data node services from storage array (i.e. EMC Isilon)
Storage OS
Name node and Data node services from storage OS (i.e. EMC ViPR)
6© Copyright 2014 EMC Corporation. All rights reserved.
ViPR HDFS
HDFS is becoming the de facto file system for distributed applications
ViPR is a great platform for HDFS– Addresses limitations of off-the-shelf HDFS– Brings HDFS to existing storage hardware– Enables HDFS/object/file scenarios– Flexible software model allows colocation
7© Copyright 2014 EMC Corporation. All rights reserved.
Support Mixed WorkloadsObject, File and HDFS operations on the same data
VIRTUAL ARRAY
Isilon3rd Party
VNX5500
ViPR Data Services offer three bucket options:
– Object– HDFS– ObjectandHDFS
ObjectandHDFS provides user with access to either S3 or HDFS
– Full compatibility with existing object based APIs
▪ Amazon S3, Openstack Swift, Atmos
Object HDFSObject& HDFS
8© Copyright 2014 EMC Corporation. All rights reserved.
Simple, Easy, Cost Effective EMC Starter Kit for Hadoop – ViPR Edition
Deployment guides for major Hadoop distributions:– Pivotal, Cloudera, and Hortonworks
Four step deployment:– Deploy preferred Hadoop Distribution– Deploy EMC ViPR with Object, and HDFS data services– Configure Hadoop distribution to use ViPR HDFS target– Validation Process
▪ Load data file via S3 interface▪ Test MapReduce job