Putting Business Intelligence to Work on Hadoop Data Stores

39
Putting Busines Putting Busines Work on Hado Work on Hado Ian Fyfe, Chief Techno © 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights R s Intelligence to s Intelligence to oop Data Stores oop Data Stores ology Evangelist, Pentaho Worldwide: +1 (866) 660-7555 | Slide 1 Reserved. www.pentaho.com.

Transcript of Putting Business Intelligence to Work on Hadoop Data Stores

Page 1: Putting Business Intelligence to Work on Hadoop Data Stores

Putting BusinesPutting BusinesWork on HadoWork on Hado

Ian Fyfe, Chief Techno

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights R

s Intelligence to s Intelligence to oop Data Storesoop Data Stores

ology Evangelist, Pentaho

Worldwide: +1 (866) 660-7555 | Slide 1Reserved. www.pentaho.com.

Page 2: Putting Business Intelligence to Work on Hadoop Data Stores

Session AbstractThis presentation will cover how to ovmore out of your business data analysAn inexpensive way of storing large volumes of daAn inexpensive way of storing large volumes of dagetting data out of Hadoop is tough due to a lackexperience high latency (up to several minutes pequery, reporting, and business analysis with tradiTh fi t t i i H d ' t i tThe first step in overcoming Hadoop's constraintsinfrastructure built on top of Hadoop, which provschedule reporting of large datasets data stored ilanguage called Hive QL which is based on SQL anthis data.But to really unlock the power of Hadoop, you mumultiple (often tens or hundreds) of nodes with atool that will then allow you to move your Hadooy ywhere you can use BI tools for analysis.

Attendees will learn, how an IT person without jaIntegrate with Hadoop and Hive to bring ETL, datanalyzing Big Data;Provide key data integration and transformation fManage and control Hadoop jobs using a graphica

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Manage and control Hadoop jobs using a graphicaIntegrating Hadoop data with data from other soufor today's massive volumes of data.

vercome Hadoop's constraints to get sis.ata Hadoop is also scalable and redundant But ata, Hadoop is also scalable and redundant. But

k of a built-in query language. Also, because users er query), Hadoop is not appropriate for ad hoc itional tools. i ti t HIVE d t h s is connecting to HIVE, a data warehouse

vides the relational structure necessary for in Hadoop files. HIVE also provides a simple query nd which enables users familiar with SQL to query

ust be able to efficiently extract data stored across a user-friendly ETL (extract, transform and load) op data into a relational data mart or warehouse p

va programming skills can:ta warehousing and BI applications to the tasks of

functionality to Hadoop data;al interface;

Worldwide: +1 (866) 660-7555 | Slide 2

al interface;urces to drive compelling reporting and analytics

Page 3: Putting Business Intelligence to Work on Hadoop Data Stores

THE CASE FOR B

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

BIG DATA

Worldwide: +1 (866) 660-7555 | Slide 3

Page 4: Putting Business Intelligence to Work on Hadoop Data Stores

The Case for Big DataEnterprises increasingly face neelarger and larger volumes of stru

ComplianceCompetitive Advantage

Challenges associated with big daChallenges associated with big daCost – storage and processing powerTimeliness of data processing

Why Hadoop?Low cost, reliable scale-out architecParallel distributed computing framParallel, distributed computing framProven success in solving Big Data prGoogle, Yahoo!, IBM and GEVib i l di iVibrant community, exploding intere

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

eds to store, process and maintain ctured and unstructured data

ataatar

cture for storing massive amounts of datamework for processing data

Google trends for ‘Hadoop’

mework for processing dataroblems at fortune 500 companies like

i l iest, strong commercial investments

Worldwide: +1 (866) 660-7555 | Slide 4

Page 5: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop for Data IntegratioTop Use Cases for Hadoop*

1. “mine data for improved busines2 “reducing cost of data analysis”

Top Challenges with Hadoop*

2. reducing cost of data analysis3. “log analysis”

1. Steep technical learning curve2. Hiring qualified people3. Availability of appropriate produ

Unfortunately, Hadoop was not designed

It’s not a database

High latency queries and jobs not ideal

*Based on a survey of 100+ Hadoop users conducted

Skill set mismatch for traditional ETL us

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

*Based on a survey of 100+ Hadoop users conducted

n and BI

ss intelligence”

ucts and tools

d specifically for ETL and BI use cases:

for all BI use cases

d by Karmasphere Sept 2010

sers and BI Solution architects

Worldwide: +1 (866) 660-7555 | Slide 5

d by Karmasphere, Sept. 2010

Page 6: Putting Business Intelligence to Work on Hadoop Data Stores

ESTABLISHING AARCHITECTURE F

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

AN FOR BIG DATA

Worldwide: +1 (866) 660-7555 | Slide 6

Page 7: Putting Business Intelligence to Work on Hadoop Data Stores

Example Use Cases Top

Transactional•Fraud detectionFi i l i / t•Financial services/sto

Sub-Transactional•Weblogs•Social/online media•Social/online media•Telecoms events

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

dayy

k k tock markets

Worldwide: +1 (866) 660-7555 | Slide 7US and Worldwide: +1 (866) 660-7555 | Slide

Page 8: Putting Business Intelligence to Work on Hadoop Data Stores

Example Use Cases Top

Non-Transactional•Web pages, blogs etcD t•Documents

•Physical eventsy•Application events•Machine events

In most cases structur

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

dayy

c

red or semi-structured

Worldwide: +1 (866) 660-7555 | Slide 8US and Worldwide: +1 (866) 660-7555 | Slide

Page 9: Putting Business Intelligence to Work on Hadoop Data Stores

Traditional Business InData Mart(s)

Tape/TTape/T

Data ? ?DataSource

?? ?

??

??

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

ntelligence (BI)g ( )

TrashTrash

Worldwide: +1 (866) 660-7555 | Slide 9US and Worldwide: +1 (866) 660-7555 | Slide

Page 10: Putting Business Intelligence to Work on Hadoop Data Stores

Data Lake• Single source• Large volumeLarge volume• Not distilled

T i ll th 0 2• Typically no more than 0-2 lakes per company

• Known and unknown questions

• Multiple user communities• Don’t fit in traditional

RDBMS with a reasonable cost

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 10US and Worldwide: +1 (866) 660-7555 | Slide

Page 11: Putting Business Intelligence to Work on Hadoop Data Stores

Data Lake Requiremenq

• Store all the data• Satisfy routine reporting

and analysis• Satisfy ad-hoc query /

analysis / reporting • Balance performance and

cost

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

nts

Worldwide: +1 (866) 660-7555 | Slide 11US and Worldwide: +1 (866) 660-7555 | Slide

Page 12: Putting Business Intelligence to Work on Hadoop Data Stores

What if...Data Mart(s) Ad-H

Data L

Data

Data L

DataSource

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Hoc Data Warehouse

Lake(s)Lake(s)

Worldwide: +1 (866) 660-7555 | Slide 12US and Worldwide: +1 (866) 660-7555 | Slide

Page 13: Putting Business Intelligence to Work on Hadoop Data Stores

Big Data Does Not Repg p

It’s not a database

High latency

Optimized for mass

Big Data databases

Databases are no© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Databases are no-

lace Data Marts

sive data-crunching

s are immature

SQLWorldwide: +1 (866) 660-7555 | Slide 13US and Worldwide: +1 (866) 660-7555 | Slide

SQL

Page 14: Putting Business Intelligence to Work on Hadoop Data Stores

What Hadoop Really isp yCore Components

HDFSa distributed file system allowstorage across a cluster of comstorage across a cluster of comservers

MapReduceMapReduceFramework for distributed comcommon use cases include aggsorting, and filtering BIG data Problem is broken up into smaof work that can be computedof work that can be computedrecomputed in isolation on anycluster

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

s….

wing massive mmodity mmodity

mputation, gregating, sets

all fragments d or d or y node of the

Worldwide: +1 (866) 660-7555 | Slide 14US and Worldwide: +1 (866) 660-7555 | Slide

Page 15: Putting Business Intelligence to Work on Hadoop Data Stores

What Hadoop Really isp yRelated Projects

Hive – a data warehouse Hive a data warehouse infrastructure on top of H

Implements a SQL like Query lImplements a SQL like Query lincluding a JDBC driverAllows MapReduce developers p pcustom mappers and reducers

Hbase – the Hadoop dataAH HA!

A variant of NoSQL databases,problematic for traditional BIBest at storing large amounts unstructured data

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

unstructured data

s….

Hadooplanguage language,

to plugin p g

abase –

of

Worldwide: +1 (866) 660-7555 | Slide 15US and Worldwide: +1 (866) 660-7555 | Slide

Page 16: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop and BI?p

Distributed processinDistributed file systeC dit h dCommodity hardwarPlatform independenPlatform independenScales out beyond teeconomy of a RDBM

In many cases it’s the

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

ngem rent (in theory)nt (in theory) echnology and/or

MS

only viable solution

Worldwide: +1 (866) 660-7555 | Slide 16US and Worldwide: +1 (866) 660-7555 | Slide

Page 17: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop and BI?p

90% of new Had90% of new Hadare transfoare transfosemi/structsemi/struct

* of those companies we’ve talke

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

of those companies we ve talke

doop use casesdoop use cases ormation oformation of tured data*tured data

ed to

Worldwide: +1 (866) 660-7555 | Slide 17US and Worldwide: +1 (866) 660-7555 | Slide

ed to...

Page 18: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop and BI?p

“The working conditiowithin Hadoop are showithin Hadoop are sho

ETL Developer

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

ons ocking”ocking

Worldwide: +1 (866) 660-7555 | Slide 18US and Worldwide: +1 (866) 660-7555 | Slide

Page 19: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop and BI?p

Instead of this...

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 19US and Worldwide: +1 (866) 660-7555 | Slide

Page 20: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop and BI?p

You have to do this in public void map(

Text key,

Text value,

OutputCollector output

Reporter reporter)Reporter reporter)

public void reduce(p

Text key,

Iterator values,

OutputCollector output

Reporter reporter)

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Java...

t,

t,

Worldwide: +1 (866) 660-7555 | Slide 20US and Worldwide: +1 (866) 660-7555 | Slide

Page 21: Putting Business Intelligence to Work on Hadoop Data Stores

People dPeople dHadoop forHadoop for

they wathey wa

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

don’t usedon t use BI becauseBI because

ant toant to...

Worldwide: +1 (866) 660-7555 | Slide 21US and Worldwide: +1 (866) 660-7555 | Slide

Page 22: Putting Business Intelligence to Work on Hadoop Data Stores

they do i...they do ithey hathey ha

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

it becauseit because ave toave to...

Worldwide: +1 (866) 660-7555 | Slide 22US and Worldwide: +1 (866) 660-7555 | Slide

Page 23: Putting Business Intelligence to Work on Hadoop Data Stores

... and unfowasn’t d

for most BI r

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

ortunately it designed equirements

Worldwide: +1 (866) 660-7555 | Slide 23US and Worldwide: +1 (866) 660-7555 | Slide

Page 24: Putting Business Intelligence to Work on Hadoop Data Stores

Why not addthe things it

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

d to Hadoop ’s missing...

Worldwide: +1 (866) 660-7555 | Slide 24US and Worldwide: +1 (866) 660-7555 | Slide

Page 25: Putting Business Intelligence to Work on Hadoop Data Stores

... until itwhat we n

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

t can do need it to?

Worldwide: +1 (866) 660-7555 | Slide 25US and Worldwide: +1 (866) 660-7555 | Slide

Page 26: Putting Business Intelligence to Work on Hadoop Data Stores

If only wIf only wJava embJava, emb

data transformdata transform

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

we had awe had a beddablebeddable,

mation enginemation engine...

Worldwide: +1 (866) 660-7555 | Slide 26US and Worldwide: +1 (866) 660-7555 | Slide

Page 27: Putting Business Intelligence to Work on Hadoop Data Stores

A Data Integration Engg gData Marts, Da

Analytical Ay

Data IntegrData IntegrEngine

Hadoop Data IntegrE iHadoop Engine

Data IntegrEngine

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

g

gine for Hadoopg pata Warehouse, Applicationspp

rationration e

ration Design

Deploye Deploy

Orchestrate

ration e

Worldwide: +1 (866) 660-7555 | Slide 27US and Worldwide: +1 (866) 660-7555 | Slide

Page 28: Putting Business Intelligence to Work on Hadoop Data Stores

Visualize Reporting / Dashb

OptimizeDM &

OptimizeHiv

Files / 

Load Applications

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

boards / Analysis

Web Tier

& DW RDBMS

veHadoop

HDFSHadoop

s & Systems

Worldwide: +1 (866) 660-7555 | Slide 28US and Worldwide: +1 (866) 660-7555 | Slide

Page 29: Putting Business Intelligence to Work on Hadoop Data Stores

Reporting / Dashb

DM &

adat

a

HivMet

a

Files / 

Applications

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

boards / Analysis

Web Tier

& DW RDBMS

veHadoop

HDFSHadoop

s & Systems

Worldwide: +1 (866) 660-7555 | Slide 29US and Worldwide: +1 (866) 660-7555 | Slide

Page 30: Putting Business Intelligence to Work on Hadoop Data Stores

Data Mart(s) Ad-H

Data LData L

DataDataSource

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

Data WarehouseHoc

ake(s)ake(s)

Worldwide: +1 (866) 660-7555 | Slide 30US and Worldwide: +1 (866) 660-7555 | Slide

Page 31: Putting Business Intelligence to Work on Hadoop Data Stores

Reporting / Dashb

Data Lake

Applications

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

boards / Analysis

Web Tier

RDBMS

HadoopHadoop

s & Systems

Worldwide: +1 (866) 660-7555 | Slide 31US and Worldwide: +1 (866) 660-7555 | Slide

Page 32: Putting Business Intelligence to Work on Hadoop Data Stores

Product Requirements for BI Ag

Lower technical barriers through grapenvironment for creating and managingM R d j bMapReduce jobs

Extreme ETL scalability through deploacross the Hadoop clusteracross the Hadoop cluster

Easily spin-off high performance datainteractive analysis

Easily integrate data from Hadoop withother sources

P id d t d BI dd i Provide end-to-end BI addressing commcases with Hadoop including reporting, query and interactive analysis

Reduce costs through subscription-basereduced dependency on scarce technica

d i i t i bilit

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

resources, and easier maintainability

gainst Hadoop

phical ETL g Hadoop

Interactive Analysis

oyment Batch Reportingand Ad Hoc Query

Interactive Analysis

D t M t

marts for Data Marts

gile

BI

Hih data from

BI Hadoop

Ag Hive

mon BI use ad hoc Data Integration Jobs

ed pricing, al Log

FilesDBs andother sources

Worldwide: +1 (866) 660-7555 | Slide 32

Page 33: Putting Business Intelligence to Work on Hadoop Data Stores

THE ROA

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

D AHEAD

Worldwide: +1 (866) 660-7555 | Slide 33

Page 34: Putting Business Intelligence to Work on Hadoop Data Stores

The Road AheadOther NoSQL Integration

Facilitate BI use cases on top Facilitate BI use cases on top MongoDB, Cassandra

Streaming Data Source SuStreaming Data Source SuIn support of near-realtime usLong/always running data procLong/always running data proc

Contiguous Meta-dataData Lineage and Impact AnalyData Lineage and Impact Analyarchitecture

The End of MapReduce (… asp (understand)

Push down optimization of Tra

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

native MapReduce tasks in Had

of HBase possibly others like of HBase, possibly others like

upportupportse casescessing jobscessing jobs

ysis covering the entire big data ysis covering the entire big data

s a concept ETL users need to p

ansformations that generate

Worldwide: +1 (866) 660-7555 | Slide 34US and Worldwide: +1 (866) 660-7555 | Slide

doop

Page 35: Putting Business Intelligence to Work on Hadoop Data Stores

Hadoop Distro Wars

The Apache Software Foundation

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. Worldwide: +1 (866) 660-7555 | Slide 35

Page 36: Putting Business Intelligence to Work on Hadoop Data Stores

Tools That Make Hadoe.g. Apache Pig

Pig is a platform for analyzing large data sets

Produces sequences of Produces sequences of MapReduce programs

Integrate Pig scripts into enterprise data integration workflows e.g.

1 Submit and monitor a 1. Submit and monitor a series of Pig and MapReduce jobs

2. Process a database bulk load step to ready data for ad-hoc analysis or

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

report bursting

oop Easier

Worldwide: +1 (866) 660-7555 | Slide 36

Page 37: Putting Business Intelligence to Work on Hadoop Data Stores

Growth in Adoption oNoSQL Big Data Platf

Hbase – the Hadoop database

mongoDB – scalable high performmongoDB scalable, high-perform

LexisNexis HPCC – a data intens

Many othersMany others

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

of Other forms

ance document oriented databaseance, document-oriented database

ive computing system platform

Worldwide: +1 (866) 660-7555 | Slide 37

Page 38: Putting Business Intelligence to Work on Hadoop Data Stores

Summary

Hadoop and other Big Data NGreat at storing and processinGreat at storing and processinNot designed for Business Inte

Choosing the right BI technoto drive actionable insightsg

Graphical user interfacesScalableSpin-off data martsIntegrate data into data warehIntegrated dashboards, reportintegration

© 2010, Pentaho. All Rights Reserved. www.pentaho.com.

NoSQL platformsng large diverse data volumesng large diverse data volumeselligence

ology can unlock your Big Data

housesting, data analysis, data

Worldwide: +1 (866) 660-7555 | Slide 38

Page 39: Putting Business Intelligence to Work on Hadoop Data Stores

ThankThank

ifyfe@penifyfe@pen

© 2010, Pentaho. All Rights Reserved. www.pentaho.com. © 2010, Pentaho. All Rights Reserved. www.pentaho.com.

k You!k You!

ntaho comntaho.com

Worldwide: +1 (866) 660-7555 | Slide 39US and Worldwide: +1 (866) 660-7555 | Slide