The Comprehensive Approach: A Unified Information Architecture
-
Upload
inside-analysis -
Category
Technology
-
view
1.159 -
download
2
description
Transcript of The Comprehensive Approach: A Unified Information Architecture
Twitter Tag: #briefr
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
! May: Analytics
! June: Intelligence
! July: Disruption
! August: Analytics
! September: Integration
! October: Database
Twitter Tag: #briefr
Twitter Tag: #briefr
! Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing.
! What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable.
! The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge.
Twitter Tag: #briefr
Dr. Richard Hackathorn is a well-known industry analyst, technology innovator and international educator. He has pioneered innovations in database management, decision support and data warehousing. Richard has published numerous articles, presented at leading industry conferences, and conducted professional seminars in eighteen countries. He has written three books, entitled Enterprise Database Connectivity, Using the Data Warehouse (with William H. Inmon), and Web Farming for the Data Warehouse. Richard taught at the Wharton School and at the University of Colorado.
! Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications.
! It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities.
! Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.
Twitter Tag: #briefr
Chris Twogood is Vice President of Product and Services Marketing for Teradata Corporation. He is responsible for marketing products (database, utilities, and platform), and services (professional and customer services), plus technical field sales support. Chris has twenty-five years of experience in the computer industry specializing in Data Warehousing, Decision Support, Customer Management and Appliance platforms. Chris has held roles that span Strategy, Application Definition, Marketing, Product Requirements/Management, Platform Solutions and Product Marketing.
Twitter Tag: #briefr
Unified Big Data Architecture
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 10
Big Data: From Transactions to Interactions
Increasing data variety and complexity
BIG DATA User Generated
Content Mobile Web
SMS/MMS
Sentiment External Demographics
HD Video
Speech to Text
Product/Service Logs
Social Network
Business Data Feeds
User Click Stream
Web logs WEB
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search marketing
Behavioral Targeting
Dynamic Funnels
Segmentation
Offer details
Customer Touches
Support Contacts
CRM
Purchase detail Purchase record Payment record
ERP
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 11
Unified Big Data Architecture Bridging Classic & Big Data Worlds
“Capture only what’s needed”
SQL performance and structure
MapReduce processing flexibility
IT delivers a platform for storing, refining, and
analyzing all data sources Business explores data for questions worth answering
Big Data Analytics Multi-structured & Iterative Analysis
IT structures the data to answer those questions
Business determines what questions to ask
Classic BI Structured & Repeatable Analysis
“Capture in case it’s needed”
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 12
Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis
Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.
Discover and Explore Reporting and Execution in the Enterprise
Capture, Store and Refine
Audio/ Video Images Docs Text Web &
Social Machine
Logs CRM SCM ERP
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 13
ANALYTICS
Unified Big Data Architecture for the Enterprise
Discovery Platform Active Data Warehouse
Audio/ Video Images Text Web &
Social Machine
Logs CRM SCM ERP
Engineers Business Analysts Quants Data Scientists
Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.
Capture, Store, Refine
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 14
Analyst’s Goal: Get Insights from Data in Hadoop
Business Analysts Quants Data Scientists
SQL SQL & MapReduce
Teradata Aster MapReduce Platform
HDFS
Teradata IDW
Aster MapReduce Portfolio Teradata Analytics Portfolio
Engineers
IT is the optimizer
MR, Pig, Hive
Custom Code and Development
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 15
Analytics on Hadoop Data
Business Analysts Quants Data Scientists
SQL SQL & MapReduce
HDFS
Aster MapReduce Portfolio Teradata Analytics Portfolio
Engineers
Aster MapReduce Portfolio
SQL SQL & MapReduce
Teradata Aster MapReduce Platform
Teradata IDW
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 16
What’s Technically Different in Big Data Analytics Variety of data types requires different schemas • Data that uses a stable schema (structured) - Data from packaged business processes with well-defined & known attributes
(e.g., ERP data, Inventory Records, Supply Chain records, …)
• Data that has an evolving schema (semi-structured) - Data generated by machine processes; known but changing set of attributes
(e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)
• Data that has a format, but no schema (unstructured) - Data captured by machines with well-defined format, but no semantics
(e.g., images, videos, web pages, PDF documents, …) - Semantics can be extracted from raw data by interpreting the format and
extracting semantics (e.g., shapes from video, face recognition in images, logo detection, …)
- Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 17
When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type
Low Cost Storage & Retention
Loading and Refining
Reporting Analytics
(User-driven, interactive)
Data Pre-Processing, Prep, Cleansing
Transformations
Stable Schema
Teradata / Hadoop Teradata Teradata Teradata Teradata
(SQL analytics)
Evolving Schema Hadoop Aster /
Hadoop
Aster (joining with
structured data) Aster
Aster (SQL + MapReduce
Analytics)
Format, No Schema Hadoop Hadoop Hadoop
Aster (MapReduce Analytics)
18 5/29/12 Teradata Copyright ©2012
Architecture Flexibility – Stable Schema
Extreme Data Appliance
Data Warehouse Appliance
Active Enterprise Data Warehouse
Low Cost Storage & Retention
Load, Data Prep & Refining
Transformation
Benefits High volume data
storage, light transformations
CPU Intense transformations, medium
volume data storage
Low Latency, Minimize Data Movement/Complexity,
transformation aligned to reference data
Compression Software compression Automatic compression engines Compress on cold
List Price/TB $4K $11K $30K*
High Capacity Drives
Solid State Drives
300-600 GB drives
* price/TB on cold storage only
Confidential and proprietary. Copyright © 2012 Teradata Corporation. 19
Unified Big Data Architecture
SQL Analytics
Unified Big Data Architecture and Data Flow Enabling a Data-Driven Business
Analytic Tools
& Users
Multi-Structured Raw Data
Interaction Architecture
Social Media
Sensors, Scientific and Geospatial Data
ETL
Transaction Architecture
Traditional Data Sources
Business Applications
Store & Refine
Dim
ensi
onal
Dat
a An
alytic Resu
lts
Iterative Discovery
& Analytics
Unified Analytic Access
Twitter Tag: #briefr
© Bolder Technology, Inc. 2012
Thinking Beyond the Enterprise Data Warehouse
Richard Hackathorn
Slide 23 © Bolder Technology, Inc. 2012
A New Ballgame!
• Big Data is forcing us to rethink the goals and architecture for data warehousing
• Traditional EDW is no longer sufficient § Exclusive collection of corporate information § Striving toward a single version of truth § Only structured data has business value § Predefined questions are the norm
• We are now facing a new set of issues!
Slide 24 © Bolder Technology, Inc. 2012
Issue: Exclusive to Inclusive
• All data can not be managed within the boundaries of the EDW § Too much and too fast § Too complex and changing § Controlled by others § New data sources are critical § Short-lived data sources are also critical
• Need to be more agile, flexible, responsive • Requiring ‘smart’ curating of new sources
§ What should be captured, stored, and retained? • Requiring ‘smart’ data exploration
Slide 25 © Bolder Technology, Inc. 2012
Issue: Ever-Changing Multiple Truths
• “More things in heaven and earth than are dreamt of in your philosophy” § IOW we do not know what we do not know!
• Example: multiple personalities for the same customer
• Business semantic analysis is critical and continuous activity
Slide 26 © Bolder Technology, Inc. 2012
Issue: Discovering Structure
• Need for a constant refining of all data § Constantly maturing data by enhancing,
compressing, and structuring
• Business value comes from leveraging structured data into process variations § What do you do differently with what you know? § Analytics and data mining add structure
Twitter Tag: #briefr
• An interesting (and seldom discussed) facet of Big Data is the emerging applications that are NOT social networking analytics on web logs and website behaviors. What are the ‘killer’ apps in this area? Do they involve the “Internet of Things”?
• Big Data is big in volume and in variety. It is also big in velocity. There is a lot per second…per minute…per day. How should a unifying architecture handle the velocity of Big Data?
• Many are trying to “Capture in case it is needed” as their approach to Big Data. But, can you capture all the data? At what point does cost of data capture/storage exceed the business benefits? How do you decide what to capture, store, and retain?
• Data exploration is an increasingly popular term. How does it differ from data analysis? Can you really find useful information through data exploration when you do not know what you are looking for? Examples?
Twitter Tag: #briefr
• When you unify the architecture for Big Data (as contrasted with isolated islands of Big Data applications), the data needs to move through several physical stores. Given the volume and velocity of data flows, can/should Big Data be duplicated in multiple stores?
• What is the difference between the Hadoop (Hive, etc) system and the Teradata Aster system? Could you use both for analytics? Do you need both in your unifying architecture?
• Are the ‘traditional’ BI tools (like BusinessObjects, Cognos) relevant to Big Data analytics? Are they needed in companies that are heavily Big Data? Are they evolving and expanding to incorporate the new approaches and techniques required for Big Data?
• A key requirement in any unifying Big Data architecture is managing the complexity of schemas. It seems that we need a new generation of semantic analysis tools to assist with schema management. What tools are emerging to support this requirement?
Twitter Tag: #briefr
• Gregory Piatetsky-Shapiro of KDnuggets ran a recent poll on the largest dataset that his audience of data miners has so far analyzed. The median size for 2012 was in the range 10-100 GB. If most of the data for half of the analytics projects can fit into main memory on a server platform, why is there such a need for expensive architectures supporting MPP, MapReduce, and the like?
• http://www.kdnuggets.com/polls/2012/largest-dataset-analyzed-data-mined.html
Twitter Tag: #briefr
! June: Intelligence
! July: Disruption
! August: Analytics
! September: Integration
! October: Database
! November: Cloud