EMC Big Data Solutions Overview

34
1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved. EMC Big Data Solutions Overview

description

Overview of emerging EMC Big Data solutions using Hadoop, and Splunk

Transcript of EMC Big Data Solutions Overview

Page 1: EMC Big Data Solutions Overview

1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Big Data Solutions Overview

Page 2: EMC Big Data Solutions Overview

2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Big Data - Why do I care? Digital universe is expanding rapidly

– 44x to 50x data expansion this decade– By 2020 40ZB (40 trillion GB)

▪ 1.7 MB of new information will be created for each and every human being on the planet -- every second of every day.

41% growth of IoT, M2M data– % of data generated about us exploding– % of data tagged and analyzed exploding

Emerging Markets +62% of data– 22% from China alone

IT challenges: – servers will increase 10x– Information directly managed by enterprises

will grow 14%– Data under security governance will grow

40%– Number of IT professionals is expected to

grow by only a factor of 1.5x by 2020.

Page 3: EMC Big Data Solutions Overview

3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Big Data Challenges for IT

Complexity– Multiple Hadoop distributions (Apache, Cloudera,

Hortonworks, Pivotal) Costs

– Acquisition & Operations Security & Governance

– Finance SEC17a-4, HIPPA– ISO – Audit

Big Data is more than Hadoop– Use familiar analytics tools

Page 4: EMC Big Data Solutions Overview

4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter Kit

Page 5: EMC Big Data Solutions Overview

5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Simple, Easy, Cost Effective EMC Starter Kit for Hadoop

Create simplified process to get started with Hadoop:– 4-8 node cluster– Automated, repeatable deployment– Leverage existing infrastructure investment

Success Criteria:– Low, no new cost– 2 hour customer deployment– Make it easy to leverage familiar, robust enterprise infrastructure

Page 6: EMC Big Data Solutions Overview

6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter Kit EMC-VMware Deployment Guide

– Enable HDFS on Isilon cluster– Deploy Cloudera compute cluster– Deploy Hortonworks compute cluster– Deploy PivotalHD compute cluster– Deploy Apache compute cluster– Test data set – Ulysses with Map Reduce process– Collateral available through ECN, blogs, and twitter

Running deployment in OIL for demo’s, Pilots EMC vLab created – PivotalHD with VMware, EMC Isilon

Page 7: EMC Big Data Solutions Overview

7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter KitHow do I get Free access to Hadoop Starter Kit?

• Type “EMC hadoop Starter kit” into google• https://community.emc.com/community/connect/everything_big_data• https://community.emc.com/docs/DOC-26892• http://theruddyduck.typepad.com/• https://www.youtube.com/watch?feature=player_embedded&v=MtBRbTeJbZM• https://www.youtube.com/watch?feature=player_embedded&v=1Lch5e3wGtA

Key Data Sets:• Close to 4300 views!• HSK Downloads:

• Pivotal – 410• Cloudera – 261• HortonWorks – 275• Apache – 310

• Over 150 Isilon HDFS license’s deployed world wide!

Page 8: EMC Big Data Solutions Overview

8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC ViPR with HDFS

Page 9: EMC Big Data Solutions Overview

9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

VCE VblockTM

Turnkey Solution for Big Data and Analytics

SERVER

NETWORK

STORAGE

VIRTUALIZATION

PROTECTION

EMC Symmetric VMAX, VNX and Isilon

EMC Avamar, Data Domain, VPLEX, RecoverPoint

Cisco Unified Computing System (UCS) serversCisco Data Center and Cloud Networking (DCN) portfolio

VMware vSphere including Big Data Extension (BDE)

Page 10: EMC Big Data Solutions Overview

10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Converged Platform for Big Data and AnalyticsVCE VblockTM

Page 11: EMC Big Data Solutions Overview

11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Big Data Challenges for IT

Complexity– Multiple Hadoop distributions (Apache, Cloudera,

Hortonworks, Pivotal) Costs

– Acquisition & Operations Security & Governance

– Finance SEC17a-4, HIPPA– ISO – Audit

Big Data is more than Hadoop– Use familiar analytics tools

Page 12: EMC Big Data Solutions Overview

12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.12

Industry’s Most Efficient & Secure Big Data Management Solution

Jyothi SwaroopDirector, Product Marketing & Alliances

Page 13: EMC Big Data Solutions Overview

13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.13

EnterpriseData

Analytical Archive: Enterprise Data Warehouse

OffloadCompliance Archive:

Tape Avoidance/Replacement

First SQL Compatible, Enterprise-grade Database to run on Isilon Scale-out NAS

(with Hadoop or not).

RainStor & EMC Isilon Solution & Use-case

Page 14: EMC Big Data Solutions Overview

14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

RainStor Architecture

Page 15: EMC Big Data Solutions Overview

15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Data Security

• Authentication – RBAC• Authorization – ACL’s by

user• Encryption – Data at Rest• Audit Trail – logs data

access by user for audit• Immutability – data can

never changed

Page 16: EMC Big Data Solutions Overview

16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Big Data Challenges for IT

Complexity– Multiple Hadoop distributions (Apache, Cloudera,

Hortonworks, Pivotal) Costs

– Acquisition & Operations Security & Governance

– Fiance SEC17a-4, HIPPA– ISO – Audit

Big Data is more than Hadoop– Use familiar analytics tools

Page 17: EMC Big Data Solutions Overview

17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Big Data with Splunk

Page 18: EMC Big Data Solutions Overview

18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Splunk Company Highlights

• Founded 2004 • First SW in 2006• HQ: San Francisco, CA• AP HQ: Hong Kong• EMEA HQ: London• Over 850+ employees • 8+ Offices WW

Company (SPLK: >100% IPO)

• On Premise, SaaS or In the Cloud: Licensed by Daily Index Volume

• Free Download 500MB Trial: Same bits Scale 500MB > 100s TBs/day

Products/Business Model

6000+ Customers

Business Highlights

60+ Fortune 100

90+ Countries

Page 19: EMC Big Data Solutions Overview

19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Industry Leading Platform for Machine Data

Any Machine Data Operational Intelligence

EMCStorage

Search and Investigation

Proactive Monitoring

Operational Visibility

Real-time Business Insights

CommodityServers

Online Service

s Web Service

s

ServersSecurity GPS

Location

StorageDesktops

Networks

Packaged Applications

CustomApplicationsMessaging

TelecomsOnline

Shopping Cart

Web Clickstreams

Databases

Energy Meters

Call Detail Records

Smartphones and Devices

RFID

Page 20: EMC Big Data Solutions Overview

20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Industry Leading Platform for Machine Data

Any Machine Data Operational Intelligence

HA Indexes and

Storage

Search and Investigation

Proactive Monitoring

Operational Visibility

Real-time Business Insights

CommodityServers

Online Service

s Web Service

s

ServersSecurity GPS

Location

StorageDesktops

Networks

Packaged Applications

CustomApplicationsMessaging

TelecomsOnline

Shopping Cart

Web Clickstreams

Databases

Energy Meters

Call Detail Records

Smartphones and Devices

RFID

Any amount, any location, any source

Schema-on-the-fly

Universal forwarding

No back-end RDBMS

No need to filter

data

Page 21: EMC Big Data Solutions Overview

21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

EMC Starter Kit for Splunk• Splunk is easy to setup and deploy• Infrastructure for Splunk should be easy and

inexpensive• Use familiar, robust IT infrastructure• Leverage existing IT investment• Provide reliable, repeatable, tested solution

How do I get Free access to EMC-Splunk Starter Kit?• Type “EMC reference architecture for splunk”

into google• https://community.emc.com/docs/DOC-27406• Over 1000 views!

Page 22: EMC Big Data Solutions Overview

22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Splunk Performance with Shared Storage & Compute

RAID 10 6x15k RPM

Time to search (s)

Single Search0

1

2

3

Isilon DAS EC2

Single Search0

10

20

30

Isilon DAS EC2

Time to 1st event (s)

18.072.499

3.02 26.50

Single Index0

10

20

30

Isilon DAS EC2

Single Index0

40

80

Isilon DAS EC2

79,057

10,94437,574

10,649

Average EPS (1000s)Average KBPS (1000s)

2.48 20.18

22,400

38,730

Page 23: EMC Big Data Solutions Overview

23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Partners Big Data on Vblock

EMC Solutions for HadoopMany Joint Pivotal on EMC customers

Formal collaboration established

Officially Support IsilonCo-branded HSK for Cloudera

Many Joint Customers

Enabling Service ProvidersHDaaS

Several key winsCo-branded HSK for Splunk

Many Joint CustomersJoint support

Jointly architected Vblock for Hadoop with VMware, Cisco, EMC

Several Customer Pilots

Hadoop Wins

Many installed wins with all of the major distributions

Two new case studies:

Page 24: EMC Big Data Solutions Overview
Page 25: EMC Big Data Solutions Overview

25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Why Use Shared Infrastructure for Hadoop?

Page 26: EMC Big Data Solutions Overview

26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Combined Storage/Compute

VM

Hadoop in VM• VM lifecycle

determinedby Datanode

• Limited elasticity• Limited to Hadoop

Multi-Tenancy

Storage

Compute

VM

VM

Separate Storage• Separate compute

from data• Elastic compute• Enable shared

workloads• Raise utilization

Storage

T1 T2

VM

VM

VM

Separate Compute Tenants• Separate virtual clusters

per tenant• Stronger VM-grade security

and resource isolation• Enable deployment of

multiple Hadoop runtime versions

Slave NodeHadoop Deployment Models

Page 27: EMC Big Data Solutions Overview

27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Why HDFS on EMC (Isilon) shared storage

• No Ingest necessary

• Eliminate NameNode SPOF

• Eliminate 3x mirroring

• Enterprise feature set

• Multi-protocol access

• Simultaneous Multi-distribution support

• Better cost!

• Smart-Dedupe for Hadoop

• SEC 17a-4 Compliant WORM

• Kerberos Authentication

• Hadoop Multi-tenancy

• Simultaneous Distribution Version Support

• Great performance!

Module 4: Horizontal and Vertical Markets

Page 28: EMC Big Data Solutions Overview

28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Rapid Deployment

Self service tools

Automated resource rebalancing

Performance

True multi-tenancy

Elastic scaling

Avoid dedicated hardware

VM-based isolation

Increase resource utilization

Choice of distributions and storage

Maintain management flexibility at scale

Leverage vSphere features

Why Virtualize Hadoop?

Operational Simplicity with Performance

Maximize Resource Utilization on New or

Existing Hardware

Architect Scalable and Flexible Big Data Platform

Page 29: EMC Big Data Solutions Overview

29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Performance: Native vs. Virtual, 32 hosts, 16 disks/host

Source: http://www.vmware.com/resources/techresources/10360

Page 30: EMC Big Data Solutions Overview

30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. 30© Copyright 2013 Pivotal. All rights reserved.

Pivotal-Isilon Alliance

Federation Plan & Field Momentum

Q4 2013

Page 31: EMC Big Data Solutions Overview

31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Pivotal Overview

Data Science Team

▶ Developer-friendly.

▶ Industry leading application framework and runtimes.

▶ Complete & disruptive set of data products.

▶ Services that accelerate productivity.

▶ Multi-cloud deployment.

▶ Commitment to open source & open standards.

One

Page 32: EMC Big Data Solutions Overview

32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Revised Color Palette For 2014

WhiteR 255G 255B 255

BlackR 0G 0B 0

EMC BlueR 44G 149B 221

GreenR 73G 169B 66

VMware GrayR 113G 112B 116

EMC GrayR 186G 188B 190

RedR 206G 49B 49

Pivotal GreenR 0G 125B 104

Lt. BlueR 147G 197B 255

Replaces Replaces ReplacesReplacesReplaces

Page 33: EMC Big Data Solutions Overview

33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.

Page 34: EMC Big Data Solutions Overview