The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

85
Tuesday, August 21, 2012

description

The Briefing Room with John O’Brien and Teradata Slides from the Live Webcast on Aug. 21, 2012 Data and context -- that's the ultimate combination. Uniting those two is the goal of today's information managers, as they seek to connect the world of traditional business intelligence on structured data to the ocean of new, multi-structured Big Data that can provide so much valuable context and additional insights. The question of how begs answers, but the big issue of what technology is best dominates the dialogue in the world's most cutting-edge companies. Check out this episode of The Briefing Room to learn from veteran database Analyst John O'Brien of Radiant Advisors as he explains how certain information architectures have advantages over others with respect to bridging structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata who will detail his company's innovations in SQL-MapReduce, which allows professionals to perform multi-structured analytics at scale. He'll describe how a new extension called SQL-H allows analysts to use Hadoop as if it were just another table in the database. For more information visit: http://www.insideanalysis.com

Transcript of The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Page 1: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Tuesday, August 21, 2012

Page 2: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Eric [email protected]

Tuesday, August 21, 2012

Page 3: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Reveal the essential characteristics of enterprise software, good and bad

Provide a forum for detailed analysis of today’s innovative technologies

Give vendors a chance to explain their product to savvy analysts

Allow audience members to pose serious questions... and get answers!

Twitter Tag: #briefr

Tuesday, August 21, 2012

Page 4: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

August: Analytics

September: Integration

October: Database

November: Cloud

December: Innovators

Twitter Tag: #briefr

Tuesday, August 21, 2012

Page 5: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing.

What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable.

The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge.

Tuesday, August 21, 2012

Page 6: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

 John is the Principal and Founder of Radiant Advisors. As a recognized thought leader in BI, John has been publishing articles and presenting at conferences for the past 10 years. He has been a Best Practices judge, presenter and panel participant at TDWI. John has also developed and presented his own courses: Radiant Advisors Learning Catalog.

John has a B.S. in Mechanical Engineering from California State University and an M.B.A. from the University of Colorado. He is a Certified Business Intelligence Professional with mastery levels in Leadership and Administration, Database Administration and Business Intelligence. 

Tuesday, August 21, 2012

Page 7: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications.

It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities.

Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.

Tuesday, August 21, 2012

Page 8: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Steve Wooledge is Senior Director of Marketing at Teradata’s Aster Center of Innovation, where he is an evangelist for the company’s analytic platform product and responsible for awareness, demand generation, and solution marketing for the data scientist. Steve has more than 10 years of experience in product marketing and business development for business intelligence, data management, Web analytics and e-commerce products.

Prior to his current role, Steve held product marketing positions at Interwoven and Business Objects as well as sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum.

Steve has a B.S. in Chemical Engineering and an M.B.A. in Marketing and Finance.

Tuesday, August 21, 2012

Page 9: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

The Unified Big Data Architecture &Bridging the Analyst Gap for Hadoop

Steve Wooledge, Sr. Director of MarketingAugust 21, 2012

Tuesday, August 21, 2012

Page 10: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

• Quick intro to Teradata Aster

• The need for a unified big data architecture

• Bridging the Analyst Gap for Hadoop: Aster SQL-H™

Topics

Tuesday, August 21, 2012

Page 11: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 12: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 13: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 14: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 15: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules

§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 16: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules

§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 17: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules

§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 18: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Teradata Aster

§ Aster Solution: Data discovery platform - Delivers MapReduce analytic framework within a MPP database

§ Brings data science to the business: Enables MapReduce processing through the analytic language of business, standard SQL

§ Delivers new analytics: Gives businesses new breakthrough analytic apps via pre-packaged pattern, path, and graph SQL-MapReduce modules

§ On multi-structured data: Leverages multi-structured data sources for increased analytic breadth & accuracy

Customers

Leading Innovator in Data Discovery for the Enterprise

Tuesday, August 21, 2012

Page 19: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Your Analytic & Advanced Reporting Applications

Store

Process

Rapid Analytics Development

Embedded Analytic Processing

Massively Parallel Data Storage

• Commodity-hardware based• Software only, appliance, or cloud• Relational-data architecture can

be extended for non-relational types

• SQL-MapReduce framework• Analyze both structured

& multi-structured data• Linear, incremental scalability

• 50+ pre-built analytic modules• Visual IDE; develop apps in hours• Many programming languages

Analysts Data ScientistsBusiness UsersCustomers

Develop

Teradata Aster MapReduce Platform

Tuesday, August 21, 2012

Page 20: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

• Payment processing analytics down from one day to one minute with SQL-MapReduce

• Web log data processing from seven hours to 20 minutes

• Interactive dashboards with all KPI’s from point of order inception—down from five hours to five minutes

Business Impact / ROI

Increased conversions from recommendations with 360-degree view of customer across in-store and .com behavior

Build revenue attribution models to link every purchase to a site feature

Reduce churn from one day previously to 20 minutes

Deeper Consumer Insights with Teradata Aster

Tuesday, August 21, 2012

Page 21: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Big Data: From Transactions to Interactions

Web logs WEB

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search marketing

Behavioral Targeting

Dynamic Funnels

Terabytes

Segmentation

Offer details

Customer Touches

Support Contacts

CRM

Gigabytes

MegabytesPurchase detailPurchase recordPayment record

ERP

Tuesday, August 21, 2012

Page 22: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Big Data: From Transactions to Interactions

Increasing data variety and complexity

BIG DATAUser Generated Content

Mobile Web

SMS/MMS

Sentiment External Demographics

HD Video

Speech to Text

Product/Service Logs

Social Network

Business Data Feeds

User Click Stream

Web logs WEB

Offer history

A/B testing

Dynamic Pricing

Affiliate Networks

Search marketing

Behavioral Targeting

Dynamic Funnels

Terabytes

Segmentation

Offer details

Customer Touches

Support Contacts

CRM

Gigabytes

MegabytesPurchase detailPurchase recordPayment record

ERP

Petabytes

Tuesday, August 21, 2012

Page 23: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Unified Big Data ArchitectureBridging Classic & Big Data Worlds

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

Tuesday, August 21, 2012

Page 24: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Unified Big Data ArchitectureBridging Classic & Big Data Worlds

“Capture only what’s needed”

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

Tuesday, August 21, 2012

Page 25: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Unified Big Data ArchitectureBridging Classic & Big Data Worlds

“Capture only what’s needed”

IT delivers a platform for storing, refining, and

analyzing all data sourcesBusiness explores data for questions worth answering

Big Data MethodMulti-structured & Iterative Analysis

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

Tuesday, August 21, 2012

Page 26: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Unified Big Data ArchitectureBridging Classic & Big Data Worlds

“Capture only what’s needed”

IT delivers a platform for storing, refining, and

analyzing all data sourcesBusiness explores data for questions worth answering

Big Data MethodMulti-structured & Iterative Analysis

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

“Capture in case it’s needed”

Tuesday, August 21, 2012

Page 27: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Unified Big Data ArchitectureBridging Classic & Big Data Worlds

“Capture only what’s needed”

SQL performance and structure

MapReduce Processing Flexibility

IT delivers a platform for storing, refining, and

analyzing all data sourcesBusiness explores data for questions worth answering

Big Data MethodMulti-structured & Iterative Analysis

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

“Capture in case it’s needed”

Tuesday, August 21, 2012

Page 28: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

MapReduce Analytics

Example: Pattern Matching Analysis

SQL-MapReduce• Single-pass of data• Linked list sequential analysis

Traditional SQL• Self-Joins for sequencing• Limited operators for ordered data

Tuesday, August 21, 2012

Page 29: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

The Advantages of MapReduceRaw click-stream data and pattern matching with nPathGoal• Increase understanding of customer behavior

on a website to improve advertising rates or website navigation

Challenges• Full website session-level data needed,

typically from raw web logs• Requires complex multi-pass SQL queries or Non-SQL techniques• Requires rewriting query to change number

of clicks analyzed

MapReduce Value• Performance: Single pass over data

regardless of number of clicks analyzed• Manageability: Much simpler code— from 350 lines of SQL to 18-line SQL- MapReduce• Ease of Use: Pattern flexibility to handle

varied numbers of clicks and click patterns without rewriting code

Click Stream Analysis: Comparative Performance

Example Analytic LogicPeople who search ‘diabetes’ also browse…People who download visit pages A, B, D …

0

100

200

300

400

SQL  (3pg) SQL-­‐MR  (3pg) SQL-­‐MR  (4pg) SQL-­‐MR  (8pg) SQL-­‐MR  (12pg)

Time

MapReduce for 3, 4, 8, 12 pages:77-131 seconds

SQL for 3 pages: 6 minutes

Tuesday, August 21, 2012

Page 30: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

18 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Need for a Unified Big Data Architecture for New InsightsEnabling All Users for Any Data Type from Data Capture to Analysis

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Discover and Explore Reporting and Execution in the Enterprise

Capture, Store and Refine

Audio/Video Images Docs Text Web &

SocialMachine

Logs CRM SCM ERP

Tuesday, August 21, 2012

Page 31: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Capture, Store, Refine

Teradata Unified Big Data ArchitectureAny User, Any Data, Any Analysis

Audio/Video Images Text Web &

SocialMachine

Logs CRM SCM ERP

Engineers Business AnalystsQuantsData Scientists

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Discovery Platform Integrated Data Warehouse

Aster MapReduce Portfolio Teradata Analytics Portfolio

SQL-H

Tuesday, August 21, 2012

Page 32: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

20 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Hadoop Points of Integration – Bulk Data Transfer• Teradata:Hadoop• JDBC (available today)− Hadoop programs can call JDBC

• TDDBinputformat/Dboutputformat (available today)− Submits SQL to JDBC

• Cloudera Sqoop (available today)− Command line import/export database objects

• Aster:Hadoop• Aster-Hadoop Adaptor – node:node transfer using SQL-MapReduce

Opportunity for analysts to more easily access Hadoop data

Tuesday, August 21, 2012

Page 33: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Source: Enterprise Strategy Group; April 5, 2012

Tuesday, August 21, 2012

Page 34: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Source: Enterprise Strategy Group; April 5, 2012

Tuesday, August 21, 2012

Page 35: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Bridging the Business Analyst Gap for Hadoop Data

Tuesday, August 21, 2012

Page 36: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

23 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Aster SQL-H™A Business User’s Bridge to Analyze Hadoop Data

Aster SQL-H gives analysts and data scientists a better way to analyze data stored cheaply in Hadoop

•Allow standard ANSI SQL to Hadoop data

•Leverage existing BI tool investments

•Enable 50+ prebuilt SQL-MapReduce Apps and IDE

•Lower costs by making data analysts self-sufficient

Announced June 12th, 2012

Tuesday, August 21, 2012

Page 37: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

24 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

The Big Data Architecture Today Has GapsAnalyst’s Goal: Get Insights from Data in Hadoop

Business AnalystsQuantsData Scientists

SQLSQL & SQL-MapReduce

Teradata Aster Discovery Platform

HDFS

Teradata IDW

Aster MapReduce Portfolio Teradata Analytics Portfolio

Engineers

IT is the optimizer

MR, Pig, Hive

Custom Code and Development

Tuesday, August 21, 2012

Page 38: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Analytics on Hadoop Data with Aster SQL-H

Business AnalystsQuantsData Scientists

SQLSQL & MapReduce

HDFS

Aster MapReduce Portfolio Teradata Analytics Portfolio

Engineers

Teradata Aster Discovery Platform

Teradata IDW

Tuesday, August 21, 2012

Page 39: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

25 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Analytics on Hadoop Data with Aster SQL-H

Business AnalystsQuantsData Scientists

SQLSQL & MapReduce

HDFS

Aster MapReduce Portfolio Teradata Analytics Portfolio

Engineers

Aster MapReduce Portfolio

SQL SQL & SQL-MapReduceSQL-H

Teradata Aster Discovery Platform

Teradata IDW

Tuesday, August 21, 2012

Page 40: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

26 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

HCatalog

Pig

Hadoop MapReduce

Hive

Aster SQL-H

HDFS

Aster SQL-H Integration with Hadoop CatalogA Business User’s Bridge to Analyzing Data in Hadoop

• Industry’s First Database Integration with Hadoop’s HCatalog

• Abstraction layer to easily and efficiently read structured & multi-structured data stored in HDFS

• Uses Hadoop Catalog (HCatalog) to perform data abstraction functions (e.g. automatically understands tables, data partitions)

• HDFS data presented to users as Aster tables

• Fully accessible within the Aster SQL and SQL-MapReduce processing engines, plus ODBC/JDBC & BI tools

Tuesday, August 21, 2012

Page 41: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

27 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

HCatalog

Pig

Hadoop MR

Hive

Aster Layer: SQL-H

Hadoop Layer: HDFS

Data & Processing Locality in SQL-H

Dat

a

Data Filtering

•SQL & SQL-MapReduce processing•Intermediate data persistence•Optional: HDFS data subset persistence for maximum performance

•Hcatalog: metadata store

•HDFS: data repository

•No MapReduce processing in Hadoop

•Directly & in parallel move data from HDFS to Teradata Aster

Tuesday, August 21, 2012

Page 42: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

28 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Business Analysts (Powerful analytics & Performance)•50+ advanced SQL-MapReduce functions (Aster MapReduce Portfolio)•Simplified, SQL-based interface with Hadoop data structures (Hcatalog)•Interoperability with existing ecosystem & skillset

Architects and Administrators (Maintainability)•Leverage existing DBA skill-sets without additional overhead•Simplify administration and monitoring

- Alternatives require manual creation and maintenance of metadata- Less work and fewer errors- Can do filtering with Aster; select data from HCatalog, leverage partitioning

Benefits of Aster SQL-H™Deep metadata layer integration between Aster and Hadoop

Tuesday, August 21, 2012

Page 43: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

29 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Some of the 50+ out-of-the-box analytical appsAster MapReduce Portfolio: the App Store of Big Data

Path AnalysisDiscover patterns in rows of sequential data

Text AnalysisDerive patterns and extract features in textual data

Statistical AnalysisHigh-performance processing of common statistical calculations

SegmentationDiscover natural groupings of data points

Marketing AnalyticsAnalyze customer interactions to optimize marketing decisions

Data TransformationTransform data for more advanced analysis

Tuesday, August 21, 2012

Page 44: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Big Data Architecture: Optimizing Workloads with Specialized Approach

Tuesday, August 21, 2012

Page 45: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type

Low Cost Storage & Retention

Loading and RefiningLoading and RefiningReporting

Analytics (User-driven, interactive)

Low Cost Storage & Retention Data Pre-Processing,

Prep, Cleansing TransformationsReporting

Analytics (User-driven, interactive)

Stable Schema

Teradata /Hadoop

Teradata Teradata TeradataTeradata(SQL analytics)

Evolving Schema Hadoop Aster /

Hadoop

Aster(joining with structured data)

AsterAster(SQL + MapReduce Analytics)

Format, No Schema Hadoop Hadoop Hadoop

Aster(MapReduce Analytics)

Social feeds, text, document, or image processingAudio/video storage and refining

Storage and batch transformations

Interactive data discoveryWeb clickstream

Set-top box analysisCDRs, Sensor logs, JSON

Financial analysis, ad-Hoc/OLAPEnterprise-wide BI and Reporting

Spatial/TemporalActive Execution

Tuesday, August 21, 2012

Page 46: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type

Low Cost Storage & Retention

Loading and RefiningLoading and RefiningReporting

Analytics (User-driven, interactive)

Low Cost Storage & Retention Data Pre-Processing,

Prep, Cleansing TransformationsReporting

Analytics (User-driven, interactive)

Stable Schema

Teradata /Hadoop

Teradata Teradata TeradataTeradata(SQL analytics)

Evolving Schema Hadoop Aster /

Hadoop

Aster(joining with structured data)

AsterAster(SQL + MapReduce Analytics)

Format, No Schema Hadoop Hadoop Hadoop

Aster(MapReduce Analytics)

Social feeds, text, document, or image processingAudio/video storage and refining

Storage and batch transformations

Interactive data discoveryWeb clickstream

Set-top box analysisCDRs, Sensor logs, JSON

Tuesday, August 21, 2012

Page 47: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type

Low Cost Storage & Retention

Loading and RefiningLoading and RefiningReporting

Analytics (User-driven, interactive)

Low Cost Storage & Retention Data Pre-Processing,

Prep, Cleansing TransformationsReporting

Analytics (User-driven, interactive)

Stable Schema

Teradata /Hadoop

Teradata Teradata TeradataTeradata(SQL analytics)

Evolving Schema Hadoop Aster /

Hadoop

Aster(joining with structured data)

AsterAster(SQL + MapReduce Analytics)

Format, No Schema Hadoop Hadoop Hadoop

Aster(MapReduce Analytics)

Social feeds, text, document, or image processingAudio/video storage and refining

Storage and batch transformations

Tuesday, August 21, 2012

Page 48: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

31 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

When to Use Which? The best approach by workload and data type• Processing as a Function of Schema Requirements by Data Type

Low Cost Storage & Retention

Loading and RefiningLoading and RefiningReporting

Analytics (User-driven, interactive)

Low Cost Storage & Retention Data Pre-Processing,

Prep, Cleansing TransformationsReporting

Analytics (User-driven, interactive)

Stable Schema

Teradata /Hadoop

Teradata Teradata TeradataTeradata(SQL analytics)

Evolving Schema Hadoop Aster /

Hadoop

Aster(joining with structured data)

AsterAster(SQL + MapReduce Analytics)

Format, No Schema Hadoop Hadoop Hadoop

Aster(MapReduce Analytics)

Tuesday, August 21, 2012

Page 49: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

32 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

ESG Benchmark Report Summary3rd-party validation of Aster and Hadoop “fit”

Scope• Identical hardware for Aster and Hadoop• Clickstream, sentiment, & traditional retail data• Compare “time to insight” and “time to develop”

Results•Loading: Hadoop 1.8x faster•Transforms: Hadoop 1.3x faster•Analytics: Aster 35x faster (range: 4-416x)•Development: Aster 3x faster

Tuesday, August 21, 2012

Page 50: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Confidential and proprietary. Copyright © 2012 Teradata Corporation.33

Hadoop vs. Aster Web Clickstream Analytics

Tuesday, August 21, 2012

Page 51: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Confidential and proprietary. Copyright © 2012 Teradata Corporation.33

Hadoop vs. Aster Web Clickstream Analytics

Aster33X Faster

Aster1.5X Faster

Aster6X Faster

On average Aster is

18x Faster

Tuesday, August 21, 2012

Page 52: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 53: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 54: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 55: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 56: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 57: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 58: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

34 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Example: Golden Path Analysis of Top Site PathsIdentifying Top Pathing Occurrences (for any event of interest)

• Business Question• How do we find and rank the 10

most frequent paths taken to the checkout page?- Page Visits exist in multiple rows in

the database, for each user

• Analytics Question• What is the most common path for

a user on the site to…1. Enter the site2. View any page (other than the Help

page)- Make a purchase on the Checkout

page- Rank the top 10 occurrences

SELECT click_path, count(*) as path_frequencyFROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE( overlapping ) PATTERN(‘(RELEVANT|IGNORE)*.BUY’) SYMBOLS( page_type IN (‘help.asp’) AS IGNORE, page_type NOT IN (‘help.asp’) AS RELEVANT, page_type = ‘checkout’ as BUY)RESULT( accum( page_id of RELEVANT) as

click_path )) TGROUP BY click_pathORDER BY count(*) descLIMIT 10;

Tuesday, August 21, 2012

Page 59: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

35 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Single Channel Pathing Analysis

Tuesday, August 21, 2012

Page 60: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

36 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Analyzing Multi-channel Identifies Advertising Signal

Tuesday, August 21, 2012

Page 61: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Confidential and proprietary. Copyright © 2012 Teradata Corporation.37

Hadoop Provides 1.3x Faster ELT on Average

Tuesday, August 21, 2012

Page 62: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Confidential and proprietary. Copyright © 2012 Teradata Corporation.38

When to Use Which Depends on Data Type- Aster faster on parsing and sessionizing Weblogs

Tuesday, August 21, 2012

Page 63: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Evolving Schema ExampleAster Digital Marketing Client

Raw Web Logs

Analytic Tools

Teradata AsterC

ooki

e-le

vel

data

Archival

Hadoop (on AWS)(Storage, aggregations,

cleansing)

Ad Server Logs

Media Data (Aggregated)

Custom Data by Client

Tuesday, August 21, 2012

Page 64: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Evolving Schema ExampleAster Digital Marketing Client

Raw Web Logs

Analytic Tools

Teradata AsterC

ooki

e-le

vel

data

Archival

Hadoop (on AWS)(Storage, aggregations,

cleansing)

Ad Server Logs

Media Data (Aggregated)

Custom Data by Client

• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers

• Sessionize by client• nPath identifies segment path

analysis (behavior after ads)

Tuesday, August 21, 2012

Page 65: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Evolving Schema ExampleAster Digital Marketing Client

• Benefits:Raw Web

Logs

Analytic Tools

Teradata AsterC

ooki

e-le

vel

data

Archival

Hadoop (on AWS)(Storage, aggregations,

cleansing)

Ad Server Logs

Media Data (Aggregated)

Custom Data by Client

• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers

• Sessionize by client• nPath identifies segment path

analysis (behavior after ads)

Tuesday, August 21, 2012

Page 66: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Evolving Schema ExampleAster Digital Marketing Client

• Benefits:- Marketing analysts more

productive with AsterRaw Web

Logs

Analytic Tools

Teradata AsterC

ooki

e-le

vel

data

Archival

Hadoop (on AWS)(Storage, aggregations,

cleansing)

Ad Server Logs

Media Data (Aggregated)

Custom Data by Client

• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers

• Sessionize by client• nPath identifies segment path

analysis (behavior after ads)

Tuesday, August 21, 2012

Page 67: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

39 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

Evolving Schema ExampleAster Digital Marketing Client

• Benefits:- Marketing analysts more

productive with Aster- Lower cost - storage and

batch refining done on Amazon Elastic MapReduce

Raw Web Logs

Analytic Tools

Teradata AsterC

ooki

e-le

vel

data

Archival

Hadoop (on AWS)(Storage, aggregations,

cleansing)

Ad Server Logs

Media Data (Aggregated)

Custom Data by Client

• Segmentation: Custom SQL-MR algorithms to match and create centralized identifiers

• Sessionize by client• nPath identifies segment path

analysis (behavior after ads)

Tuesday, August 21, 2012

Page 68: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

More Accurate Customer Churn Prevention

Data Sources

Multi-Structured Raw Data

Call Center Voice Records

Check Images

Traditional Data Flow

Analysis +

Marketing Automation

(Customer Retention Campaign)

Capture, Retain & Refine Layer

ETL Tools

Hadoop

Call Data

Check Data

Social feeds

Teradata Integrated DW

Dim

ensi

onal

Dat

a

An

alytic Resu

lts

Aster Discovery Platform

Clickstream Data

Sentiment Scores

Tuesday, August 21, 2012

Page 69: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

40 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

More Accurate Customer Churn Prevention

Hadoop captures, stores and

transforms social, images and call

records

Aster does path and sentiment analysis with

multi-structured data

Data Sources

Multi-Structured Raw Data

Call Center Voice Records

Check Images

Traditional Data Flow

Analysis +

Marketing Automation

(Customer Retention Campaign)

Capture, Retain & Refine Layer

ETL Tools

Hadoop

Call Data

Check Data

Social feeds

Teradata Integrated DW

Dim

ensi

onal

Dat

a

An

alytic Resu

lts

Aster Discovery Platform

Clickstream Data

Sentiment Scores

Tuesday, August 21, 2012

Page 70: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

41 Confidential and proprietary. Copyright © 2012 Teradata Corporation.

SummaryBringing the VALUE of Hadoop to the Enterprise

• Teradata is focused on extracting most business value for customers from data in Hadoop

• Mainstream organizations need a unified big data architecture- Best-of-breed with Hadoop, Aster, Teradata- Brings “Data Science” to business analysts- 50+ business-ready MapReduce analytics and apps- Enabled by SQL-MapReduce framework and new SQL-H

• Learn more at www.asterdata.com/mapreduce

Tuesday, August 21, 2012

Page 71: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Tuesday, August 21, 2012

Page 72: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Tuesday, August 21, 2012

Page 73: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

THE GREAT DIVIDE: BRIDGING UNSTRUCTURED AND STRUCTURED DATA FOR NEW CUSTOMER INSIGHTS

§Briefing Room - August 21, 2012§John O’Brien, Radiant Advisors§[email protected]

1

Tuesday, August 21, 2012

Page 74: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

Principal and Founder, Radiant AdvisorsJOHN O’BRIEN

§With over 25 years of experience delivering value through data warehousing and BI programs, John O’Brien's unique perspective comes from the combination of his roles as a practitioner, consultant, and vendor in the BI industry. His knowledge in designing, building, and growing enterprise BI systems and teams brings real world insights to each role and phase within a BI program.

§Today, through Radiant Advisors John provides research and advisory services that guide companies in meeting the demands of next generation information management, architecture, and emerging technologies.

2

Instructor 10+ yearsAs a recognized thought leader in BI, John has been publishing articles and presenting at conferences in North America and Europe for the past 10 years, including The Data Warehousing Institute where he has been invited as one of TDWI’s Best Practices judges, Executive Summit presenters and expert panel participants. John has also developed and presented many of his own courses that now comprise the initial Radiant Advisors Learning Catalog.

EducationJohn has a B.S. in Mechanical Engineering from California State University with an emphasis in control systems and instrumentation and an Executive M.B.A. from University of Colorado.  He is a Certified Business Intelligence Professional (CBIP) since 2005 with mastery levels in Leadership and Administration, Database Administration and Business Intelligence.

ExperiencedIn 2005, John co-founded and became CTO of a data warehouse appliance company that raised $43 million in several rounds of venture capital financing and has many global production customers.  As CTO, John’s primary role was to focus product development and BI market strategy.

Tuesday, August 21, 2012

Page 75: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

MapReduce

WHERE DOES CONTEXT LIVE?

3§Bridging the Great Divide: Unstructured and Structured Data

Stru

ctur

edUn

stru

ctur

ed

Context in structures

Context leveraged

Context in structures

Context(s) leveraged

Context in abstractionBI ToolsDirect access

Hadoop HDFS

Hiv

e

PIG

MapReduce

Individual Context with Data Scientists

Centralized Context inabstraction

Context in Data Scientists

Centralized Context inabstraction

More Rigid More Agile

HCatalog

Hiv

e

PIG

M/R

Hadoop HDFS

Tuesday, August 21, 2012

Page 76: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

MapReduce

UNLOCKING UNSTRUCTURED VALUE

4§Bridging the Great Divide: Unstructured and Structured Data

çHCatalog

BI

Tool

Very Few Data Scientists

Many Many Consumers

Yesterday Tomorrow

DB

More Analysts

Very Few Data Scientists

More Analysts

Valu

e

Valu

e

Users Involved Users Involved

Hadoop HDFS

Hiv

e

PIG

MapReduce

Power Users Power Users

Analysts &Casual Users

Hadoop HDFS

Hiv

e

PIG

Tuesday, August 21, 2012

Page 77: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

DISCOVERY IN BI PROCESSES

5§Bridging the Great Divide: Unstructured and Structured Data

BI

Tool

Hadoop HDFS

Hiv

e

PIG

M/R

çHCatalog

Very Few Data Scientists

Many Many Consumers

More Analysts/Modelers

Many More Analysts

ç

FewAnalysts/Modelersç

ç

BI

Tool

DiscoverContext

1.

Defined Context Available to

Structured Database2.

Tuesday, August 21, 2012

Page 78: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

MODERN BI ARCHITECTURES

6§Bridging the Great Divide: Unstructured and Structured Data

Hadoop HDFS

Map

Redu

ceInternet,

Sensor data

çVery Few Data Scientists

Hadoop:Massive ScalabilityLowest CostHandles Complexity

çFewAnalysts/Modelers

Operational SystemsInsulate Change or Direct to Staging

Staging

Data Marts

Migrate History

HCatalog

or ETL Acquire

Data MartsData Marts

ETL

or ETL

ç

ETL

çç

PIG

Hive

Many Many Consumers

Data Warehouse:Optimized Work LoadsOperationalBenefit from Context

Tuesday, August 21, 2012

Page 79: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

© Copyright 2012 Radiant Advisors. All Rights Reserved v1.00.000

SUMMARY

7§Bridging the Great Divide: Unstructured and Structured Data

• Understand context in processes and architectures

• Realize that value is unlocked with more users

• Discovery is a powerful BI process to operationalize

• Modern BI Architectures are integrating Hadoop

Tuesday, August 21, 2012

Page 80: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

• Is Aster Solution intended for Data Discovery Platform and/or Analytic Engine Platform?

• Is there any difference in semantics for Teradata's vision of Integrated Data Warehouse vs. "Analytic Platform" which includes Aster and Hadoop?

• Does the Hcatalog need to be defined before users can use SQL-H to query Hadoop?

• The Aster MapReduce Portfolio enables its users to query and pull data from the Hadoop HDFS directly via SQL-H.  When data is pulled in from HDFS into Aster, are the Aster tables modeled as in Hcatalog or as key-value pairs?

• Is the output of the SQL-MR in Aster inserted into another physical table for further usage?

Tuesday, August 21, 2012

Page 81: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

• Given that Hive and PIG are interface layers above the MapReduce processing layer, does the Aster Layer SQL-H work as an interface layer interfacing with MapReduce?  Does SQL-H work similar to Hive when processing data inside HDFS?

• When it comes to performance comparisons between Aster and Hadoop, what guidelines were given in sizing the Hadoop environment?

• Given the commodity nature of Hadoop, does it make sense to increase the size of Hadoop environment to gain performance more cost effectively?

• When to use Hadoop or Aster? Based on data type?  Based on workload (e.g. Load, ETL, Analyze)? Or Based on Analysis type (e.g. Sentiment Classification or Sessionization)?

Tuesday, August 21, 2012

Page 82: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

• Does Aster store "multi-structured" data such as audio, video, image, pdf, etc files as a blog/clob field in database records or stores pointers to files?

• Does Aster Data have Predictive Modeling Markup Language (PMML) compatibility to enable Discovery through the inter-operability of Analytic Models to allow models developed in SAS or other platforms to be migrated to Aster?

Tuesday, August 21, 2012

Page 83: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Tuesday, August 21, 2012

Page 84: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

August: Analytics

September: Integration

October: Database

November: Cloud

December: Innovators

Tuesday, August 21, 2012

Page 85: The Great Divide: Bridging Structured and Unstructured Data for New Customer Insights

Twitter Tag: #briefr

Tuesday, August 21, 2012