Create your Big Data vision and Hadoop-ify your data warehouse
-
Upload
jeff-kelly -
Category
Technology
-
view
758 -
download
2
description
Transcript of Create your Big Data vision and Hadoop-ify your data warehouse
1 © Copyright 2012 EMC Corporation. All rights reserved.
Create Your Big Data Vision And Hadoop-ify Your Data Warehouse
Jeff Kelly, Principal Research Contributor
The Wikibon Project
Bill Schmarzo, CTO EIMA Practice, EMC Professional
Services
2 © Copyright 2012 EMC Corporation. All rights reserved.
Agenda � Current Market Observations
� The Big Data Business Maturity Index and How to Identify Your Best Use Case
� Get Started With Hadoop and Other New Technologies
� What Should You Look For in a Vendor?
� Q&A
3 © Copyright 2012 EMC Corporation. All rights reserved.
Current Market Observations Jeff Kelly
4 © Copyright 2012 EMC Corporation. All rights reserved.
Big Data Market Size
2012 $11.4b
2013 $18.2b
2017 $48b
ü 59% Growth Y-o-Y 2011 to 2012
ü Forecast 60%+ Growth in 2013
ü 31% CAGR Forecast 2012 through 2017
2014 $28b
2015 $37.9b
2016 $43.7b
Source: Wikibon Big Data Vendor Revenue and Market Forecast, 2012-2017
5 © Copyright 2012 EMC Corporation. All rights reserved.
Big Data Market Segmentation, 2012 Services Leading the Way
Professional Services $3,784m
34%
Cloud and SaaS $608m
5% Pro. Services Compute Storage Networking Database Applications Data mgt. Cloud
n = $11,400m
6 © Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Drivers ü Increased Awareness and Investments
By Large Enterprises Beyond the Web ü Retailers like Sears leveraging Big Data for
price optimization. ü Financial services firms, including JPMC, Morgan
Stanley and BoA, conduct fraud analysis, risk profiling and more.
ü Pharmaceutical including Bristol Myers Squibb makers use Big Data to support drug development.
ü Continued Investment by Web Pioneers and Three Letter Agencies ü Google alone spent $1b+ on infrastructure in Q4 2012. ü “Everything we do is a Big Data problem.” – Jay Parikh, VP of Engineering, Facebook ü CIA CTO Ira Hunt: Our mission is to “collect everything and hang on to it forever.”
7 © Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Drivers, Cont. ü Increasingly Sophisticated Professional Services
ü Professional services building on experience of assisting early adopters. ü Some (but not all) are vendor and product agnostic. ü Focusing on identifying use cases, improving communication, and leveraging
existing assets.
ü Technology Maturation ü Open source community and vendors making
Hadoop enterprise-ready, easier to use. ü Better integration between Big Data and
existing IT infrastructure. ü Extending Big Data accessibility to business
users via BI and data visualization tools.
Consulting
Training & Educations Integration
8 © Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Inhibitors ü Lack of Data Scientists and Big Data
Practitioners
ü Big Data Technology Still Complex, Difficult to Manage/Use
ü Organizational Resistance to Data-Driven Decision Making
ü Confusion Due to Vendor Marketing and “Big Data Washing”
Big Data [Your Product Name Here]
9 © Copyright 2012 EMC Corporation. All rights reserved.
The Big Data Business Maturity Index and How to Identify Your Best Use Case Bill Schmarzo
10 © Copyright 2012 EMC Corporation. All rights reserved.
Business Metamorphosis
Data Monetization
Business Optimization
Business Insights Business
Monitoring
Monitoring business performance to flag
areas of interest
Big Data Business Model Maturation Index
Integrate insights & recommendations
into existing business processes
Embed analytics to optimize select
business processes
Leverage insights to identify new revenue
opportunities
Transform customer and product insights to
move into new markets
Measures the degree to which the organization has integrated big data and advanced analytics into their business model
11 © Copyright 2012 EMC Corporation. All rights reserved.
How to Identify Your Best Use Case The Big Data strategy document ensures a tight linkage between your organization’s business initiatives and your big data strategy
• Big data business cases, ROI and analy4c requirements
• Key Performance Indicators and leading metrics
• Business ques4ons with metrics, dimensions, hierarchies
• Business decisions, decision flow/process and UEX requirements
• Analy4c algorithms and modeling requirements
• Required data sources
Business Strategy: Provide Unique Starbucks Customer Experience
Business Initiatives: • Increase number of “Gold Card” customers • Increase “Gold Card” customer revenue & engagement (store visits, spend per visit, advocacy)
Mobile App • •
Social Media • •
Store Sales • •
Customer Loyalty • •
Collect customer engagement information through multiple channels (store, web, mobile)
Profile and micro-segment customers to improve marketing and offers effectiveness
Analyze social media data to identify and monitor brand advocates
Monitor and adjust customer engagement effectiveness (visits, revenue, margin, advocacy)
Tasks
Develop intimate knowledge of “Gold Card” customers life stage, behaviors and interests
Act upon intimate knowledge of “Gold Card” customers to increase store revenues
• Expand customer data collection points • Leverage “gold card” member transactions, feedback (surveys) and social data • Integrate customer-specific insights back into operational, management and loyalty systems
Outcomes & CSF’s
12 © Copyright 2012 EMC Corporation. All rights reserved.
Get Started With Hadoop and Other New Technologies Bill Schmarzo
13 © Copyright 2012 EMC Corporation. All rights reserved.
A Playbook For Modernizing Your Data Warehouse With New Big Data Technologies And Capabilities
#1) Enhance data warehouse with new unstructured data metrics
#2) Data virtualization to extend existing data warehouse environment
#3) MPP RDBMS to increase data platform scalability and agility
#4) In-database analytics to accelerate analytic development
#5) Hadoop to create the next generation Operational Data Store
14 © Copyright 2012 EMC Corporation. All rights reserved.
#1) Enhance Data Warehouse With New Unstructured Data Metrics Leverage HDFS to provide a single platform that supports your traditional SQL-based BI environment plus your growing unstructured data needs at scale
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache
Pivotal HD
Configure, Deploy, Monitor, Manage
Command Center
Hadoop Virtualization (HVE)
DataLoader
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced Database Services
15 © Copyright 2012 EMC Corporation. All rights reserved.
ETL
Cached Streaming Data
Unified Data Platform
Data Source
Real-Time Visualization
Advanced Analytics and Modeling
Data Source
CEP/ Workflow
Data Federation Tool
Semantic Master
Data Discovery /
Data Mapping
Data Source
Data Source
#2) Extend Existing Data Warehouse Via Data Virtualization Leverage data federation tools to speed data discovery and analysis via virtual, on-demand access to data sources outside your EDW
16 © Copyright 2012 EMC Corporation. All rights reserved.
• Massively Parallel Processing (MPP), scale-out architectures provide cost effective options for managing and analyzing massive data volumes
• MPP data warehouses provide linear scalability on general purpose, commodity systems (e.g., fault-tolerant scale out environment; automatic parallelization; I/O optimized)
#3) Massively Parallel Processing (MPP) Relational Databases
17 © Copyright 2012 EMC Corporation. All rights reserved.
#4) In-Database Processing And Analytics
Conventional: A Data Scientist needs to move 1 TB of data from a 5-processor database server to the analytical server at 1 gigabytes per second (Gbs)
In-Database: A Data Scientist leaves the 1 TB data in the 5-processor database server and runs the same algorithm directly in the database
0 20 40 60 80 100 120 140 180 160 200
Data Movement Time = (1TB x 8) / 1Gbs / 60 s = 133.3 minutes Processing Time = 60 minutes
12 minutes
Total Time = 193.3 minutes
Time (minutes)
Conventional
In-Database
18 © Copyright 2012 EMC Corporation. All rights reserved.
Hadoop Data Store Analytics Environment
Data Preparation and Enrichment
ALL data fed into Hadoop Data Store
EDW ETL
Analytic Sandbox
BI Environment
• Production • Predictable load • SLA-drive • Standard tools
• Exploratory, Ad Hoc • Unpredictable load • Experimentation • Best tool for the job
#5) Next Gen Operational Data Store/Data Prep With Hadoop
Feeds production BI and Enterprise Data Warehouse environment and high-velocity Analytics Sandbox
19 © Copyright 2012 EMC Corporation. All rights reserved.
How To Get Started
20 © Copyright 2012 EMC Corporation. All rights reserved.
EMC Big Data Analytics Strategy And Implementation Services
Analytics Operationalization
Identify current state, determine required state and conduct gap analysis to develop analytics implementation roadmap
Analytics Lab
Deploy analytics sandbox to quantify the business case
Vision Workshop
Identify big data analytics business use cases
Repeat the process for identified business cases
21 © Copyright 2012 EMC Corporation. All rights reserved.
What Should You Look For in a Vendor? Jeff Kelly
22 © Copyright 2012 EMC Corporation. All rights reserved.
Advice for Selecting Big Data Vendors ü Balance short-term goals with long-term vision.
ü Objectives are:
ü Quick, demonstrable ROI.
ü Sustainable Big Data practice.
ü Don’t get hung up on “speeds and feeds” or feature-by-feature comparisons.
ü Focus on substance, flexibility, commitment and experience.
23 © Copyright 2012 EMC Corporation. All rights reserved.
Selecting Big Data Vendors, Cont. ü Evaluate products portfolios based on:
ü Ability to monetize existing and future data assets.
ü Ability to integrate with and compliment existing data management technology.
ü Accessibility to power users and business users alike (depending on use case).
ü Ability to apply information governance and security best practices.
ü Select service providers with track records of assisting enterprises adopt data-driven culture as well as technology.
24 © Copyright 2012 EMC Corporation. All rights reserved.
To type a question via WebEx, click on the Q&A tab Please select “Ask: All Panelists”
to ensure your questions reach us. Thank you!
Questions and Answers
25 © Copyright 2012 EMC Corporation. All rights reserved.
Learn More… � See us at…
– EMC World, May 5-9 www.emc.world.com
� Contact Jeff Kelly – Email: [email protected] – LinkedIn: http://www.linkedin.com/in/jeffreyfkelly/ – Twitter: @jeffreyfkelly – Research: http://www.wikibon.org/bigdata
� Contact Bill Schmarzo – Email: [email protected] – LinkedIn: http://www.linkedin.com/in/schmarzo – Twitter: @schmarzo – Blog: http://infocus.emc.com/author/william_schmarzo/