10 Amazing Things To Do With a Hadoop-Based Data Lake
-
Upload
pivotal -
Category
Technology
-
view
9.992 -
download
1
description
Transcript of 10 Amazing Things To Do With a Hadoop-Based Data Lake
2© 2014 Pivotal Software, Inc. All rights reserved. 2© 2014 Pivotal Software, Inc. All rights reserved.
10 Amazing Things To Do With a Hadoop-Based Data Lake
Strata Conference New York 2014
Greg ChaseDirector, Product Marketing, Pivotal Software
3© 2014 Pivotal Software, Inc. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier
Command Center
Processing Tier
Spring XD, Oozie
Distillation Tier
Pivotal HD
Unstructured and structured data
GemFire XD
HAWQ/Greenplum
GemFire XDSpring XD
Spring XDGemFire XD
SqoopFlume
Spring XD
GemFire XDHAWQHBase
HAWQMapReduce
HivePig
Query interfaces
HAWQGemFire XD
HBase
Sources Action Tier
ClickstreamSensor Data
WeblogsNetworkData
CRM DataERP Data
GemFire
RabbitMQRedis
Pivotal CF
4© 2014 Pivotal Software, Inc. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier
Command Center
Processing Tier
Spring XD, Oozie
Distillation Tier
Pivotal HD
Unstructured and structured data
GemFire XD
HAWQ/Greenplum
GemFire XDSpring XD
Spring XDGemFire XD
SqoopFlume
Spring XD
GemFire XDHAWQHBase
HAWQMapReduce
HivePig
Query interfaces
HAWQGemFire XD
HBase
Sources Action Tier
ClickstreamSensor Data
WeblogsNetworkData
CRM DataERP Data
GemFire
RabbitMQRedis
Pivotal CF
5© 2014 Pivotal Software, Inc. All rights reserved.
1. Store Massive Data Sets
…
Rack 1 Rack 2 Rack 3 Rack n
Scale-out: use
commodity hardware
and storage
6© 2014 Pivotal Software, Inc. All rights reserved.
2. Mix Disparate Data Sources
101010101010Sensor data
CRM data
Website click streams
Schema flexibility:
adsorb different
data types from data sources
7© 2014 Pivotal Software, Inc. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier
Command Center
Processing Tier
Spring XD, Oozie
Distillation Tier
Pivotal HD
Unstructured and structured data
GemFire XD
HAWQ/Greenplum
GemFire XDSpring XD
Spring XDGemFire XD
SqoopFlume
Spring XD
GemFire XDHAWQHBase
HAWQMapReduce
HivePig
Query interfaces
HAWQGemFire XD
HBase
Sources Action Tier
ClickstreamSensor Data
WeblogsNetworkData
CRM DataERP Data
GemFire
RabbitMQRedis
Pivotal CF
8© 2014 Pivotal Software, Inc. All rights reserved.
3. Ingest Bulk Data
Microbatch
Scalable open source
tools for batch
loading data
D …
Batch
D … D
Sqoop Bulk load RDBMS
Spring XD Bulk load With processing With analytics Any source
Flume Event driven Any source
9© 2014 Pivotal Software, Inc. All rights reserved.
4. Ingest High-Velocity Data
Capture all volatile data.
Apply structure.
101010101010101010110101010101010101011010101010101010101
Spring XD Bulk load Real-time ingest With processing With analytics Any source
Pivotal GemFire XD Advanced DB operations Consistency Reliable persistence Convert to structured
Streaming data
10© 2014 Pivotal Software, Inc. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier
Command Center
Processing Tier
Spring XD, Oozie
Distillation Tier
Pivotal HD
Unstructured and structured data
GemFire XD
HAWQ/Greenplum
GemFire XDSpring XD
Spring XDGemFire XD
SqoopFlume
Spring XD
GemFire XDHAWQHBase
HAWQMapReduce
HivePig
Query interfaces
HAWQGemFire XD
HBase
Sources Action Tier
ClickstreamSensor Data
WeblogsNetworkData
CRM DataERP Data
GemFire
RabbitMQRedis
Pivotal CF
11© 2014 Pivotal Software, Inc. All rights reserved.
5. Apply Structure to Unstructured / Semi-Structured Data
Flexible processing of different data types
1010101010101
1010101010101
1010101010101
12© 2014 Pivotal Software, Inc. All rights reserved.
6. Make Data Available for MPP SQL Analysis
Name Node
Fast processing
for advanced
analytics in many
supported HDFS
formats
Resource Manager
HAWQ Master
Data Node
Node Manager
HAWQ Segment(s)
Data Node
Node Manager
Data Node
Node Manager
Data Node
Node Manager
HAWQ Segment(s)
HAWQ Segment(s)
HAWQ Segment(s)
Hadoop Cluster
13© 2014 Pivotal Software, Inc. All rights reserved.
7. Achieve Data Integration
Create multi-dimensional
analytical models.
1010101010101
1010101010101
1010101010101
14© 2014 Pivotal Software, Inc. All rights reserved.
8. Improve Machine Learning & Predictive Analytics
Richer, deeper data
sets for accurate
predictive analytics.
HAWQ Master
HAWQ Segment(s)
HAWQ Segment(s)
HAWQ Segment(s)
15© 2014 Pivotal Software, Inc. All rights reserved.
9. Deploy Real-Time Automation at Scale
Respond in real-time, at
scale.
Archive history in Hadoop.
Pivotal GemFire XD
Web App
Web App
Web App
101010101010
101010101010
In-Memory
16© 2014 Pivotal Software, Inc. All rights reserved.
10. Achieve Continuous Innovation at Scale
Deploy automationAt scale
Capture and store all data
Analyze to discover insights
& algorithms
17© 2014 Pivotal Software, Inc. All rights reserved.
Increase Value Derived from Data With a Data Lake
Store massive data sets
Mix disparate
data
Ingest bulk data
Ingest high-
velocity data
Apply structure
Enable MPP
analysis
Achieve data
integration
Improve predictive analytics
Deploy real-time
automation at scale
Achieve continuous innovation
Business Value
18© 2014 Pivotal Software, Inc. All rights reserved. 18© 2014 Pivotal Software, Inc. All rights reserved.
For more information on Pivotal Big Data SuiteVisit Pivotal.io/big-data