The Value of the Modern Data Architecture with Apache Hadoop and Teradata
-
Upload
hortonworks -
Category
Technology
-
view
121 -
download
4
description
Transcript of The Value of the Modern Data Architecture with Apache Hadoop and Teradata
© Hortonworks Inc. 2013
The Value of a Modern Data Architecture with Apache Hadoop and Teradata
Page 1
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A
Page 2
© Hortonworks Inc. 2013
Data Systems
Applica/
ons
Sources
Existing Data Architecture
Page 3
Custom Analy/c App
Packaged Analy/c App
Tradi/onal Sources (RDBMS, OLTP, OLAP)
RDBMS EDW Discovery PlaEorm
APPLICAT
IONS
DATA
SYSTEMS
DATA
SOURC
ES
© Hortonworks Inc. 2013
Big Data Explosion
Big Data Market Trends & Projections
Page 4
20% % by which org’s
leveraging modern info management
systems outperform peers by 2015
ñ
1 Zettabyte (ZB) =
1 Billion TBs
15x
growth rate of machine
generated data by 2020
The US has 1/3 of the world’s data
Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data
analytics in retail and manufacturing by 2020
© Hortonworks Inc. 2013
Traditional Data Architecture AP
PLICAT
IONS
DATA
SYSTEMS
DATA
SOURC
ES
OLTP, POS SYSTEMS
Business Analy/cs
Custom Applica/ons
Packaged Applica/ons
Pressured
RDBMS EDW Discovery PlaEorm
Tradi/onal New Sources
(RDBMS, OLTP, OLAP) (sen/ment, click, geo, sensor, …)
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Page 5
© Hortonworks Inc. 2013
Modern Data Architecture Enabled
Page 6
APPLICAT
IONS
DATA
SYSTEMS
DATA
SOURC
ES
OLTP, POS SYSTEMS
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy/cs
Custom Applica/ons
Packaged Applica/ons
RDBMS EDW Discovery PlaEorm
Tradi/onal New Sources
(RDBMS, OLTP, OLAP) (sen/ment, click, geo, sensor, …)
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A
Page 7
© Hortonworks Inc. 2013
What Data is Being Stored in Hadoop?
1. Social Understand how your customers feel about your brand and products – right now
2. Clickstream Capture and analyze website visitors’ data trails and optimize your website
3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines
4. Geolocation Analyze location-based data to manage operations where they occur
5. Server Logs Research logs to diagnose process failures and prevent security breaches
6. Unstructured (text, video, pictures, etc..) Understand patterns in text across millions of unstructured work products: web pages, emails, video, pictures and documents
Value
Page 8
© Hortonworks Inc. 2013
Modern Data Architecture Applied Da
ta Systems
Applica/
ons
Sources
Infrastructure -‐ Data Lake Modern Data Architecture RDBMS EDW Discovery
PlaEorm
Custom Analy/c App
Packaged Analy/c App
• Store all data and build/enable applications on shared “data lake”
• As orgs mature they move to this as a goal for Hadoop
• Delivers broad value across the enterprise Tradi/onal New Sources
(RDBMS, OLTP, OLAP) (sen/ment, click, geo, sensor, …)
Shared Data Lake
APPLICAT
IONS
DATA
SYSTEMS
DATA
SOURC
ES
Page 9
© Hortonworks Inc. 2013
Driving Efficiency Driving Opportunity
Drivers for Hadoop Adoption
Modern Data Architecture Hadoop has a central role in next
generation data architectures while integrating with existing data systems
Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge
Existing Traditional Server log
Clickstream
Big Data Sets Emerging
Sentiment/Social Machine/Sensor Geo-locations
Page 10
© Hortonworks Inc. 2013
Integrated Interoperable with existing data center investments Skills
Leverage your existing skills: development, operations, analytics
Requirements for Hadoop Adoption
Page 11
Key Services Platform, operational and data services essential for the enterprise
3 Requirements for Hadoop’s Role in the Modern Data Architecture
© Hortonworks Inc. 2013
Interoperating With Your Tools AP
PLICAT
IONS
DATA
SYSTEMS
DEV & DATA TOOLS
OPERATIONAL TOOLS
Viewpoint
Microsoft Applications
DATA
SOURC
ES
Tradi/onal New Sources
(RDBMS, OLTP, OLAP) (sen/ment, click, geo, sensor, …)
Page 12
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A
Page 13
14 2/28/14 Teradata Confidential
Shift from a Single Platform to an Ecosystem
“Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.”
“We will abandon the old models based on the desire to implement for high-value analytic applications.”
"Logical" Data Warehouse
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
UNIFIED DATA ARCHITECTURE
ACCESS MOVE MANAGE Marketing Executives
Operational Systems
Frontline Workers
Customers Partners
Engineers
Data Scientists
Business Analysts
Math and Stats
Data Mining
Business Intelligence
Applications
Languages
Marketing
ANALYTIC TOOLS
USERS
DISCOVERY PLATFORM
INTEGRATED DATA WAREHOUSE
ERP
SCM
CRM
Images
Audio and Video
Machine Logs
Text
Web and Social
SOURCES
DATA PLATFORM
UNIFIED DATA ARCHITECTURE
ACCESS MOVE MANAGE Marketing Executives
Operational Systems
Frontline Workers
Customers Partners
Engineers
Data Scientists
Business Analysts
Math and Stats
Data Mining
Business Intelligence
Applications
Languages
Marketing
ANALYTIC TOOLS
USERS
DISCOVERY PLATFORM
INTEGRATED DATA WAREHOUSE
ERP
SCM
CRM
Images
Audio and Video
Machine Logs
Text
Web and Social
SOURCES
DATA PLATFORM
Business Intelligence
Predictive Analytics
Operational Intelligence
Data Discovery
Path, graph, time-series analysis
Pattern Detection
Fast Loading
Filtering and Processing
Online Archival
Marketing Executives
Operational Systems
Frontline Workers
Customers Partners
Engineers
Data Scientists
Business Analysts
Math and Stats
Data Mining
Business Intelligence
Applications
Languages
Marketing
USERS
DISCOVERY PLATFORM
INTEGRATED DATA WAREHOUSE
ERP
SCM
CRM
Images
Audio and Video
Machine Logs
Text
Web and Social
SOURCES
DATA PLATFORM
TERADATA UNIFIED DATA ARCHITECTURE
ACCESS MOVE MANAGE
ANALYTIC TOOLS
18 2/28/14 Teradata Confidential
Teradata Appliance for Hadoop Value-Added Software Bringing Hadoop to Enterprise
Access: SQL-H™, Teradata Studio Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover
Refining, Metadata, Entity Resolution
Security & Data Access
HCatalog Kerberos Kerberos
© Hortonworks Inc. 2013
KNOX AMBARI
Modern Data Architecture Details
Page 19
SOURCE DATA
Sensor Log Data
Customer/Inventory
Data
Clickstream Data
Flat Files
Sentiment Analysis
Data
DB
File
JMS
REST
HTTP
Streaming
Analytical Platforms
Teradata IDW
Aster Discovery Platform
Query/Visualization/ Reporting/Analytical
Tools and Apps
JDBC/ODBC Compliant Tool
MAPREDUCE
YARN
STRUCTURING
HCATALOG (metadata services)
INTERACTIVE Teradata SQL-H
EXPORT
SQOOP / HIVE
LOAD
TDCH
Viewpoint Alerts Services System
Health Node Health
Space Usage
Capacity Heatmap
Metrics Analysis
TVI – Proactive system monitoring tied to Teradata customer support
HDFS
REFINE HIVE
PIG
CUSTOM
ETL
LOAD SQOOP
FLUME
Web HDFS
NFS
EXTRACT
20 2/28/14 Teradata Confidential
Teradata Vital Infrastructure (TVI)
PROACTIVE RELIABILITY, AVAILABILITY, AND MANAGEABILITY
1U server virtualizes system and cabinet management software Server Management VMS • Cabinet Management Interface Controller (CMIC) • Service Work Station (SWS) • Automatically installed on base/first cabinet
VMS allows full rack solutions without additional cabinet for traditional SWS
Eliminates need for expansion racks, reducing customers’ floor space and energy costs
Supports Teradata hardware and Hadoop software
TVI Support for Hadoop
62–70% of Incidents Discovered through TVI
21 2/28/14 Teradata Confidential
Standard SQL Access to Hadoop Data
• Trusted: Use existing tools/skills and enable self-service BI with granular security
• Standard: 100% ANSI SQL access to Hadoop data
• Fast: Queries run on Teradata or Aster, data accessed from Hadoop
• Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS
Pig
Hive
Hadoop MR
Teradata SQL-H Aster SQL-H
HCatalog
Dat
a
Dat
a Fi
ltering
Give business users on-the-fly access to data in Hadoop
22 2/28/14 Teradata Confidential
Teradata Unified Data Architecture™ Partners Support Many Layers
23 2/28/14 Teradata Confidential
PATH ANALYSIS Discover Patterns in Rows of Sequential Data
TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data
STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations
SEGMENTATION Discover Natural Groupings of Data Points
MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions
DATA TRANSFORMATION Transform Data for More Advanced Analysis
Graph Analysis Graph analytics processing and visualization
SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library
Teradata Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps
24 2/28/14 Teradata Confidential
More Accurate Customer Churn Prevention
Hadoop captures, stores and transforms social, images and call records
Aster does path and pattern
analysis
Data Sources
Multi-Structured Raw Data
Call Center Voice Records
Traditional Data Flow
Analysis + Marketing Automation
(Customer Retention Campaign)
Capture, Retain and Refine Layer
ETL Tools
Hadoop
Call Data
Check Data
Teradata Integrated DW
Dim
ensi
onal
Dat
a
An
alytic Resu
lts
Aster Discovery Platform
Sentiment Scores
CLICKSTREAM DATA
SOCIAL FEEDS
25 2/28/14 Teradata Confidential
MPP RDBMS + Hadoop Customer Successes
26 2/28/14 Teradata Confidential
Key Considerations For EDW and Hadoop
MPP RDBMS Hadoop Stable Schema Evolving Schema
Leverages Structured Data Structure Agnostic
ANSI SQL Flexible Programming
Iterative Analysis Batch Analysis
Fine Grain Security N/A
Cleansed Data Raw Data
Seeks Scans
Updates/Deletes Ingest
Service Level Agreements Flexibility
Core Data All Data
Complex Joins Complex Processing
Efficient Use of CPU/IO Low Cost of Storage
27 2/28/14 Teradata Confidential
Complete Consulting and Training
Services Areas of Focus
Teradata Analytic Architecture Services
Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop
Teradata DI Optimization
Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes
Teradata Big Analytics
Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool
Teradata Workshop for Hadoop
Introduction workshop (across all of UDA)
Teradata Data Staging for Hadoop
Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data
Teradata Platform for Hadoop
Installation guidance and mentoring for Hadoop platform, D-I-Y after installation
Teradata Managed Services for Hadoop
Operations, management, administration, backup, security, process control for Hadoop
Teradata Training Courses for Hadoop
Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop
28 2/28/14 Teradata Confidential
Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences
Situation
Large retailer with 700M visits/year, 2M customers / day look at 1M products online
Problem
Increase ability of web content owners to self-serve insights
Solution
Treat web walks like DNA sequences of simple patterns.
Impact • Data: loaded logs into Hortonworks
• Loaded 2 months of raw data in 1 hour, vs. 1 day on old system
• Can load a day’s log data in 60 sec • Sessionize: Creates sequence for
visit, e.g., boils 20 customer clicks down to 1 line:
• <Home –Search -Look at Product - Add to Basket – Pay – Exit>
• Analyze: Business analysts can now do path analysis
• Act: • Segmentations by behavior can
increase conversion rates by 5-10%. • Web design changes can drive
another 10-20% more visitors into the sales funnel
29 2/28/14 Teradata Confidential
Demo
Demo