10 Amazing Things To Do With a Hadoop-Based Data Lake

19

description

Greg Chase, Director, Product Marketing presents Big Data 10 A mazing Things to do With A Hadoop-based Data Lake at the Strata Conference + Hadoop World 2014 in NYC.

Transcript of 10 Amazing Things To Do With a Hadoop-Based Data Lake

Page 1: 10 Amazing Things To Do With a Hadoop-Based Data Lake
Page 2: 10 Amazing Things To Do With a Hadoop-Based Data Lake

2© 2014 Pivotal Software, Inc. All rights reserved. 2© 2014 Pivotal Software, Inc. All rights reserved.

10 Amazing Things To Do With a Hadoop-Based Data Lake

Strata Conference New York 2014

Greg ChaseDirector, Product Marketing, Pivotal Software

Page 3: 10 Amazing Things To Do With a Hadoop-Based Data Lake

3© 2014 Pivotal Software, Inc. All rights reserved.

Pivotal Business Data Lake Architecture

Ingestion Tier

Insights Tier

Unified Operations Tier

Command Center

Processing Tier

Spring XD, Oozie

Distillation Tier

Pivotal HD

Unstructured and structured data

GemFire XD

HAWQ/Greenplum

GemFire XDSpring XD

Spring XDGemFire XD

SqoopFlume

Spring XD

GemFire XDHAWQHBase

HAWQMapReduce

HivePig

Query interfaces

HAWQGemFire XD

HBase

Sources Action Tier

ClickstreamSensor Data

WeblogsNetworkData

CRM DataERP Data

GemFire

RabbitMQRedis

Pivotal CF

Page 4: 10 Amazing Things To Do With a Hadoop-Based Data Lake

4© 2014 Pivotal Software, Inc. All rights reserved.

Pivotal Business Data Lake Architecture

Ingestion Tier

Insights Tier

Unified Operations Tier

Command Center

Processing Tier

Spring XD, Oozie

Distillation Tier

Pivotal HD

Unstructured and structured data

GemFire XD

HAWQ/Greenplum

GemFire XDSpring XD

Spring XDGemFire XD

SqoopFlume

Spring XD

GemFire XDHAWQHBase

HAWQMapReduce

HivePig

Query interfaces

HAWQGemFire XD

HBase

Sources Action Tier

ClickstreamSensor Data

WeblogsNetworkData

CRM DataERP Data

GemFire

RabbitMQRedis

Pivotal CF

Page 5: 10 Amazing Things To Do With a Hadoop-Based Data Lake

5© 2014 Pivotal Software, Inc. All rights reserved.

1. Store Massive Data Sets

Rack 1 Rack 2 Rack 3 Rack n

Scale-out: use

commodity hardware

and storage

Page 6: 10 Amazing Things To Do With a Hadoop-Based Data Lake

6© 2014 Pivotal Software, Inc. All rights reserved.

2. Mix Disparate Data Sources

101010101010Sensor data

CRM data

Website click streams

Schema flexibility:

adsorb different

data types from data sources

Page 7: 10 Amazing Things To Do With a Hadoop-Based Data Lake

7© 2014 Pivotal Software, Inc. All rights reserved.

Pivotal Business Data Lake Architecture

Ingestion Tier

Insights Tier

Unified Operations Tier

Command Center

Processing Tier

Spring XD, Oozie

Distillation Tier

Pivotal HD

Unstructured and structured data

GemFire XD

HAWQ/Greenplum

GemFire XDSpring XD

Spring XDGemFire XD

SqoopFlume

Spring XD

GemFire XDHAWQHBase

HAWQMapReduce

HivePig

Query interfaces

HAWQGemFire XD

HBase

Sources Action Tier

ClickstreamSensor Data

WeblogsNetworkData

CRM DataERP Data

GemFire

RabbitMQRedis

Pivotal CF

Page 8: 10 Amazing Things To Do With a Hadoop-Based Data Lake

8© 2014 Pivotal Software, Inc. All rights reserved.

3. Ingest Bulk Data

Microbatch

Scalable open source

tools for batch

loading data

D …

Batch

D … D

Sqoop Bulk load RDBMS

Spring XD Bulk load With processing With analytics Any source

Flume Event driven Any source

Page 9: 10 Amazing Things To Do With a Hadoop-Based Data Lake

9© 2014 Pivotal Software, Inc. All rights reserved.

4. Ingest High-Velocity Data

Capture all volatile data.

Apply structure.

101010101010101010110101010101010101011010101010101010101

Spring XD Bulk load Real-time ingest With processing With analytics Any source

Pivotal GemFire XD Advanced DB operations Consistency Reliable persistence Convert to structured

Streaming data

Page 10: 10 Amazing Things To Do With a Hadoop-Based Data Lake

10© 2014 Pivotal Software, Inc. All rights reserved.

Pivotal Business Data Lake Architecture

Ingestion Tier

Insights Tier

Unified Operations Tier

Command Center

Processing Tier

Spring XD, Oozie

Distillation Tier

Pivotal HD

Unstructured and structured data

GemFire XD

HAWQ/Greenplum

GemFire XDSpring XD

Spring XDGemFire XD

SqoopFlume

Spring XD

GemFire XDHAWQHBase

HAWQMapReduce

HivePig

Query interfaces

HAWQGemFire XD

HBase

Sources Action Tier

ClickstreamSensor Data

WeblogsNetworkData

CRM DataERP Data

GemFire

RabbitMQRedis

Pivotal CF

Page 11: 10 Amazing Things To Do With a Hadoop-Based Data Lake

11© 2014 Pivotal Software, Inc. All rights reserved.

5. Apply Structure to Unstructured / Semi-Structured Data

Flexible processing of different data types

1010101010101

1010101010101

1010101010101

Page 12: 10 Amazing Things To Do With a Hadoop-Based Data Lake

12© 2014 Pivotal Software, Inc. All rights reserved.

6. Make Data Available for MPP SQL Analysis

Name Node

Fast processing

for advanced

analytics in many

supported HDFS

formats

Resource Manager

HAWQ Master

Data Node

Node Manager

HAWQ Segment(s)

Data Node

Node Manager

Data Node

Node Manager

Data Node

Node Manager

HAWQ Segment(s)

HAWQ Segment(s)

HAWQ Segment(s)

Hadoop Cluster

Page 13: 10 Amazing Things To Do With a Hadoop-Based Data Lake

13© 2014 Pivotal Software, Inc. All rights reserved.

7. Achieve Data Integration

Create multi-dimensional

analytical models.

1010101010101

1010101010101

1010101010101

Page 14: 10 Amazing Things To Do With a Hadoop-Based Data Lake

14© 2014 Pivotal Software, Inc. All rights reserved.

8. Improve Machine Learning & Predictive Analytics

Richer, deeper data

sets for accurate

predictive analytics.

HAWQ Master

HAWQ Segment(s)

HAWQ Segment(s)

HAWQ Segment(s)

Page 15: 10 Amazing Things To Do With a Hadoop-Based Data Lake

15© 2014 Pivotal Software, Inc. All rights reserved.

9. Deploy Real-Time Automation at Scale

Respond in real-time, at

scale.

Archive history in Hadoop.

Pivotal GemFire XD

Web App

Web App

Web App

101010101010

101010101010

In-Memory

Page 16: 10 Amazing Things To Do With a Hadoop-Based Data Lake

16© 2014 Pivotal Software, Inc. All rights reserved.

10. Achieve Continuous Innovation at Scale

Deploy automationAt scale

Capture and store all data

Analyze to discover insights

& algorithms

Page 17: 10 Amazing Things To Do With a Hadoop-Based Data Lake

17© 2014 Pivotal Software, Inc. All rights reserved.

Increase Value Derived from Data With a Data Lake

Store massive data sets

Mix disparate

data

Ingest bulk data

Ingest high-

velocity data

Apply structure

Enable MPP

analysis

Achieve data

integration

Improve predictive analytics

Deploy real-time

automation at scale

Achieve continuous innovation

Business Value

Page 18: 10 Amazing Things To Do With a Hadoop-Based Data Lake

18© 2014 Pivotal Software, Inc. All rights reserved. 18© 2014 Pivotal Software, Inc. All rights reserved.

For more information on Pivotal Big Data SuiteVisit Pivotal.io/big-data

Page 19: 10 Amazing Things To Do With a Hadoop-Based Data Lake