The Comprehensive Approach: A Unified Information Architecture

32

description

The Briefing Room with Richard Hackathorn and Teradata Slides from the Live Webcast on May 29, 2012 The worlds of Business Intelligence (BI) and Big Data Analytics can seem at odds, but only because we have yet to fully experience comprehensive approach to managing big data – a Unified Big Data Architecture. The dynamics continue to change as vendors begin to emphasize the importance of leveraging SQL, engineering and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing. Register for this episode of The Briefing Room to learn the value of taking a strategic approach for managing big data from veteran BI and data warehouse consultant Richard Hackathorn. He'll be briefed by Chris Twogood of Teradata, who will outline his company's recent advances in bridging the gap between Hadoop and SQL to unlock deeper insights and explain the role of Teradata Aster and SQL-MapReduce as a Discovery Platform for Hadoop environments. For more information visit: http://www.insideanalysis.com Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E

Transcript of The Comprehensive Approach: A Unified Information Architecture

Page 1: The Comprehensive Approach: A Unified Information Architecture
Page 2: The Comprehensive Approach: A Unified Information Architecture

[email protected]

Twitter Tag: #briefr

Page 3: The Comprehensive Approach: A Unified Information Architecture

!  Reveal the essential characteristics of enterprise software, good and bad

!  Provide a forum for detailed analysis of today’s innovative technologies

!  Give vendors a chance to explain their product to savvy analysts

!  Allow audience members to pose serious questions... and get answers!

Twitter Tag: #briefr

Page 4: The Comprehensive Approach: A Unified Information Architecture

!   May: Analytics

!   June: Intelligence

!   July: Disruption

!   August: Analytics

!   September: Integration

!   October: Database

Twitter Tag: #briefr

Page 5: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

!   Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing.

!   What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable.

!   The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge.

Page 6: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

Dr. Richard Hackathorn is a well-known industry analyst, technology innovator and international educator. He has pioneered innovations in database management, decision support and data warehousing. Richard has published numerous articles, presented at leading industry conferences, and conducted professional seminars in eighteen countries. He has written three books, entitled Enterprise Database Connectivity, Using the Data Warehouse (with William H. Inmon), and Web Farming for the Data Warehouse. Richard taught at the Wharton School and at the University of Colorado.

Page 7: The Comprehensive Approach: A Unified Information Architecture

!  Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications.

!   It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities.

!  Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.

Twitter Tag: #briefr

Page 8: The Comprehensive Approach: A Unified Information Architecture

Chris Twogood is Vice President of Product and Services Marketing for Teradata Corporation. He is responsible for marketing products (database, utilities, and platform), and services (professional and customer services), plus technical field sales support. Chris has twenty-five years of experience in the computer industry specializing in Data Warehousing, Decision Support, Customer Management and Appliance platforms. Chris has held roles that span Strategy, Application Definition, Marketing, Product Requirements/Management, Platform Solutions and Product Marketing.

Twitter Tag: #briefr

Page 9: The Comprehensive Approach: A Unified Information Architecture

Unified Big Data Architecture

Page 10: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 10

Big Data: From Transactions to Interactions

Increasing data variety and complexity

BIG DATA User Generated

Content Mobile Web

SMS/MMS

Sentiment External Demographics

HD Video

Speech to Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Web logs WEB

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search marketing

Behavioral Targeting

Dynamic Funnels

Segmentation

Offer details

Customer Touches

Support Contacts

CRM

Purchase detail Purchase record Payment record

ERP

Page 11: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 11

Unified Big Data Architecture Bridging Classic & Big Data Worlds

“Capture only what’s needed”

SQL performance and structure

MapReduce processing flexibility

IT delivers a platform for storing, refining, and

analyzing all data sources Business explores data for questions worth answering

Big Data Analytics Multi-structured & Iterative Analysis

IT structures the data to answer those questions

Business determines what questions to ask

Classic BI Structured & Repeatable Analysis

“Capture in case it’s needed”

Page 12: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 12

Need for a Unified Big Data Architecture for New Insights Enabling All Users for Any Data Type from Data Capture to Analysis

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Discover and Explore Reporting and Execution in the Enterprise

Capture, Store and Refine

Audio/ Video Images Docs Text Web &

Social Machine

Logs CRM SCM ERP

Page 13: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 13

ANALYTICS

Unified Big Data Architecture for the Enterprise

Discovery Platform Active Data Warehouse

Audio/ Video Images Text Web &

Social Machine

Logs CRM SCM ERP

Engineers Business Analysts Quants Data Scientists

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Capture, Store, Refine

Page 14: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 14

Analyst’s Goal: Get Insights from Data in Hadoop

Business Analysts Quants Data Scientists

SQL SQL & MapReduce

Teradata Aster MapReduce Platform

HDFS

Teradata IDW

Aster MapReduce Portfolio Teradata Analytics Portfolio

Engineers

IT is the optimizer

MR, Pig, Hive

Custom Code and Development

Page 15: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 15

Analytics on Hadoop Data

Business Analysts Quants Data Scientists

SQL SQL & MapReduce

HDFS

Aster MapReduce Portfolio Teradata Analytics Portfolio

Engineers

Aster MapReduce Portfolio

SQL SQL & MapReduce

Teradata Aster MapReduce Platform

Teradata IDW

Page 16: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 16

What’s Technically Different in Big Data Analytics Variety of data types requires different schemas •  Data that uses a stable schema (structured) -  Data from packaged business processes with well-defined & known attributes

(e.g., ERP data, Inventory Records, Supply Chain records, …)

•  Data that has an evolving schema (semi-structured) -  Data generated by machine processes; known but changing set of attributes

(e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)

•  Data that has a format, but no schema (unstructured) -  Data captured by machines with well-defined format, but no semantics

(e.g., images, videos, web pages, PDF documents, …) -  Semantics can be extracted from raw data by interpreting the format and

extracting semantics (e.g., shapes from video, face recognition in images, logo detection, …)

-  Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately

Page 17: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 17

When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type

Low Cost Storage & Retention

Loading and Refining

Reporting Analytics

(User-driven, interactive)

Data Pre-Processing, Prep, Cleansing

Transformations

Stable Schema

Teradata / Hadoop Teradata Teradata Teradata Teradata

(SQL analytics)

Evolving Schema Hadoop Aster /

Hadoop

Aster (joining with

structured data) Aster

Aster (SQL + MapReduce

Analytics)

Format, No Schema Hadoop Hadoop Hadoop

Aster (MapReduce Analytics)

Page 18: The Comprehensive Approach: A Unified Information Architecture

18 5/29/12 Teradata Copyright ©2012

Architecture Flexibility – Stable Schema

Extreme Data Appliance

Data Warehouse Appliance

Active Enterprise Data Warehouse

Low Cost Storage & Retention

Load, Data Prep & Refining

Transformation

Benefits High volume data

storage, light transformations

CPU Intense transformations, medium

volume data storage

Low Latency, Minimize Data Movement/Complexity,

transformation aligned to reference data

Compression Software compression Automatic compression engines Compress on cold

List Price/TB $4K $11K $30K*

High Capacity Drives

Solid State Drives

300-600 GB drives

* price/TB on cold storage only

Page 19: The Comprehensive Approach: A Unified Information Architecture

Confidential and proprietary. Copyright © 2012 Teradata Corporation. 19

Unified Big Data Architecture

SQL Analytics

Unified Big Data Architecture and Data Flow Enabling a Data-Driven Business

Analytic Tools

& Users

Multi-Structured Raw Data

Interaction Architecture

Social Media

Sensors, Scientific and Geospatial Data

ETL

Transaction Architecture

Traditional Data Sources

Business Applications

Store & Refine

Dim

ensi

onal

Dat

a An

alytic Resu

lts

Iterative Discovery

& Analytics

Unified Analytic Access

Page 20: The Comprehensive Approach: A Unified Information Architecture
Page 21: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

Page 22: The Comprehensive Approach: A Unified Information Architecture

© Bolder Technology, Inc. 2012

Thinking Beyond the Enterprise Data Warehouse

Richard Hackathorn

[email protected]

Page 23: The Comprehensive Approach: A Unified Information Architecture

Slide 23 © Bolder Technology, Inc. 2012

A New Ballgame!

•  Big Data is forcing us to rethink the goals and architecture for data warehousing

•  Traditional EDW is no longer sufficient §  Exclusive collection of corporate information §  Striving toward a single version of truth §  Only structured data has business value §  Predefined questions are the norm

•  We are now facing a new set of issues!

Page 24: The Comprehensive Approach: A Unified Information Architecture

Slide 24 © Bolder Technology, Inc. 2012

Issue: Exclusive to Inclusive

•  All data can not be managed within the boundaries of the EDW §  Too much and too fast §  Too complex and changing §  Controlled by others §  New data sources are critical §  Short-lived data sources are also critical

•  Need to be more agile, flexible, responsive •  Requiring ‘smart’ curating of new sources

§  What should be captured, stored, and retained? •  Requiring ‘smart’ data exploration

Page 25: The Comprehensive Approach: A Unified Information Architecture

Slide 25 © Bolder Technology, Inc. 2012

Issue: Ever-Changing Multiple Truths

•  “More things in heaven and earth than are dreamt of in your philosophy” §  IOW we do not know what we do not know!

•  Example: multiple personalities for the same customer

•  Business semantic analysis is critical and continuous activity

Page 26: The Comprehensive Approach: A Unified Information Architecture

Slide 26 © Bolder Technology, Inc. 2012

Issue: Discovering Structure

•  Need for a constant refining of all data §  Constantly maturing data by enhancing,

compressing, and structuring

•  Business value comes from leveraging structured data into process variations §  What do you do differently with what you know? §  Analytics and data mining add structure

Page 27: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

•  An interesting (and seldom discussed) facet of Big Data is the emerging applications that are NOT social networking analytics on web logs and website behaviors. What are the ‘killer’ apps in this area? Do they involve the “Internet of Things”?

•  Big Data is big in volume and in variety. It is also big in velocity. There is a lot per second…per minute…per day. How should a unifying architecture handle the velocity of Big Data?

•  Many are trying to “Capture in case it is needed” as their approach to Big Data. But, can you capture all the data? At what point does cost of data capture/storage exceed the business benefits? How do you decide what to capture, store, and retain?

•  Data exploration is an increasingly popular term. How does it differ from data analysis? Can you really find useful information through data exploration when you do not know what you are looking for? Examples?

Page 28: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

•  When you unify the architecture for Big Data (as contrasted with isolated islands of Big Data applications), the data needs to move through several physical stores. Given the volume and velocity of data flows, can/should Big Data be duplicated in multiple stores?

•  What is the difference between the Hadoop (Hive, etc) system and the Teradata Aster system? Could you use both for analytics? Do you need both in your unifying architecture?

•  Are the ‘traditional’ BI tools (like BusinessObjects, Cognos) relevant to Big Data analytics? Are they needed in companies that are heavily Big Data? Are they evolving and expanding to incorporate the new approaches and techniques required for Big Data?

•  A key requirement in any unifying Big Data architecture is managing the complexity of schemas. It seems that we need a new generation of semantic analysis tools to assist with schema management. What tools are emerging to support this requirement?

Page 29: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

•  Gregory Piatetsky-Shapiro of KDnuggets ran a recent poll on the largest dataset that his audience of data miners has so far analyzed. The median size for 2012 was in the range 10-100 GB. If most of the data for half of the analytics projects can fit into main memory on a server platform, why is there such a need for expensive architectures supporting MPP, MapReduce, and the like?

•  http://www.kdnuggets.com/polls/2012/largest-dataset-analyzed-data-mined.html

Page 30: The Comprehensive Approach: A Unified Information Architecture
Page 31: The Comprehensive Approach: A Unified Information Architecture

Twitter Tag: #briefr

!   June: Intelligence

!   July: Disruption

!   August: Analytics

!   September: Integration

!   October: Database

!   November: Cloud

Page 32: The Comprehensive Approach: A Unified Information Architecture