Data Aggregation in Today's Data Warehouse

50
Data Aggregation in Today's Data Aggregation in Today's Data Warehouse Data Warehouse New England Business Objects New England Business Objects User Group User Group Yossi Matias Yossi Matias CTO CTO HyperRoll HyperRoll

Transcript of Data Aggregation in Today's Data Warehouse

Page 1: Data Aggregation in Today's Data Warehouse

Data Aggregation in Today's Data Aggregation in Today's Data WarehouseData Warehouse

New England Business Objects New England Business Objects User GroupUser Group

Yossi MatiasYossi MatiasCTOCTO

HyperRollHyperRoll

Page 2: Data Aggregation in Today's Data Warehouse

2 1/24/2006

Recap Recap –– BI made easyBI made easy

Page 3: Data Aggregation in Today's Data Warehouse

3 1/24/2006

Reports made easyReports made easy

Page 4: Data Aggregation in Today's Data Warehouse

4 1/24/2006

…… and wide spread ..and wide spread ..

Page 5: Data Aggregation in Today's Data Warehouse

5 1/24/2006

Business Objects XI Platform CapabilitiesBusiness Objects XI Platform Capabilities

• High performance• Scalability• Reliability• Service-oriented architecture

• But what about the underlying data warehouse?

Page 6: Data Aggregation in Today's Data Warehouse

6 1/24/2006

An example applicationAn example application

Page 7: Data Aggregation in Today's Data Warehouse

7 1/24/2006

BO UniverseBO Universe

Page 8: Data Aggregation in Today's Data Warehouse

8 1/24/2006

Aggregate queriesAggregate queries

Page 9: Data Aggregation in Today's Data Warehouse

9 1/24/2006

Typical Data WarehouseTypical Data Warehouse

• A schema• Methodology • Lots of summary tables• Table management

challenges– Numbers of tables– Complex configurations– Table refresh– Redundant storage

Page 10: Data Aggregation in Today's Data Warehouse

10 1/24/2006

WhatWhat’’s wrong with this picture?s wrong with this picture?

Multiple Views of Multiple Views of summary tables and summary tables and

complex universecomplex universe

Page 11: Data Aggregation in Today's Data Warehouse

11 1/24/2006

Performance Fundamentals:Performance Fundamentals:AggregationsAggregations

Number of AggregationsNumber of Aggregations

Tim

eTi

me

Processing Time

Processing TimeQuery Time

Query Time

PrePre--calculated summaries of datacalculated summaries of dataIntersections of levels from each dimensionIntersections of levels from each dimensionTradeoff between processing and query timesTradeoff between processing and query times

Page 12: Data Aggregation in Today's Data Warehouse

12 1/24/2006

The Summary Table DilemmaThe Summary Table Dilemma

# of Summary Tables

Que

ry P

erfo

rman

ce

ROLAP enginesrequire a steadydiet of summarytables to perform

Maintenance B

urden

Unbearable

Simple

A few querieshave acceptablePerformance…

….but the majority of queries,especially ad-hoc requests,perform poorly and system

adoption suffers

At some point summary tablemaintenance becomes

unbearable

Page 13: Data Aggregation in Today's Data Warehouse

13 1/24/2006

Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments

Applications Databases Flat Files MainframeEAI/EDI

ETL LayerETL Layer

Data Warehouse ODS

Data marts

$ $!Summary TablesMultidimensional

Data StoresBursted Reports Data AlertsCached Reports Extra Hardware

Memory, CPUs

NEED FOR REAL TIME INFORMATIONLow High

Poor Query Performance & Poor User Concurrency

DSS Ad-hocQuery

Budgeting &Planning

OperationalBI

CPM BAM Real-TimeDashboards

Longer Batch Window

Page 14: Data Aggregation in Today's Data Warehouse

14 1/24/2006

On the limitation of RDBMS On the limitation of RDBMS

“In fact, relational DBMS were never intended to provide the very powerful functions for data synthesis, analysis, and consolidation that is being defined as multi-dimensional data analysis.

These types of functions were always intended to be provided by separate, end-user tools that were outside and complementary to the relational DBMS products.”

E.F. Codd, S.B. Codd and C.T. SalleyProviding OLAP to User-Analysts: An IT Mandate

Page 15: Data Aggregation in Today's Data Warehouse

15 1/24/2006

The CatchThe Catch--22 of data aggregation in DW22 of data aggregation in DW

• We want a Data Warehouse that performs data aggregations effectively

• The Data Warehouse should ideally consist of relational databases

• Relational databases are not set to support effectively data aggregation

Page 16: Data Aggregation in Today's Data Warehouse

16 1/24/2006

The HyperRoll approachThe HyperRoll approach

• Build an effective non-relational data aggregation server

• Have the data aggregation server provide “aggregation services” to a relational database

• As a result, have a HyperRoll enabled relational database that effectively supports aggregations

Page 17: Data Aggregation in Today's Data Warehouse

17 1/24/2006

DBMS

DB2 CLI ODBC Oracle OCI ASCII

HyperRoll for RelationalHyperRoll for Relational

Access

Storage

Loading

FACTTABLE

ETLData is loaded into

HR in order to build aggregates

Hyp

erR

oll E

ngin

e

DBMSViewGateway

Up to 90% reduction in batch window compared to existing aggregation strategies

Benefit

Summary table storage &

maintenance reduced or eliminated

Benefit

Up to 100x faster queries, and endusers continue to use familiar

applications

Benefit

Page 18: Data Aggregation in Today's Data Warehouse

18 1/24/2006

HyperRollHyperRoll--enabled Data Warehouseenabled Data Warehouse

Hyp

erR

oll

StarSchema AggregatesView

ROLAP Queries (SQL)

Data Warehouse or Mart

10x – 100x performanceimprovement Replace or

ComplementSummary Tables(but does NOT Build or storesSummary tables)

Page 19: Data Aggregation in Today's Data Warehouse

19 1/24/2006

A DW implementationA DW implementation

Fact1

Fact2

HyperRoll

MV1

MV3

MV4

MV5

MV9

MV10

MV2

MV6

MV7

MV8

MV11

MV12

Query Tools

400 Millions

36 Millions

Page 20: Data Aggregation in Today's Data Warehouse

20 1/24/2006

Typical Data WarehouseTypical Data Warehouse

• A schema• Methodology • Lots of summary tables• Table management

challenges– Numbers of tables– Complex configurations– Table refresh– Redundant storage

Page 21: Data Aggregation in Today's Data Warehouse

21 1/24/2006

Data Warehouse with HyperRollData Warehouse with HyperRoll

• Same methodology• Same schema• Now only “one summary” table

– Represents all aggregations – Simplifies management

Page 22: Data Aggregation in Today's Data Warehouse

22 1/24/2006

Data Warehouse with HyperRollData Warehouse with HyperRoll

• Same methodology• Same schema• Now only “one summary” table

– Represents all aggregations – Simplifies management

Page 23: Data Aggregation in Today's Data Warehouse

23 1/24/2006

WhatWhat’’s wrong with this picture?s wrong with this picture?

Multiple Views of Multiple Views of summary tables and summary tables and

complex universecomplex universe

Page 24: Data Aggregation in Today's Data Warehouse

24 1/24/2006

Data Warehouse with HyperRollData Warehouse with HyperRoll

One View of All One View of All Possible TablesPossible Tables

Page 25: Data Aggregation in Today's Data Warehouse

25 1/24/2006

Query to the HyperRoll ViewQuery to the HyperRoll View

HyperRoll View --- Simple Query

Few SecondsFew Seconds - query response time !!!

Page 26: Data Aggregation in Today's Data Warehouse

26 1/24/2006

Significant Performance Significant Performance EnhancementEnhancement

0

500

1000

1500

2000

2500

3000

3500

1M 5M 10M 15M 20MMillions of Records

Number of Seconds to Complete Query

Business Objects + Oracle + HyperRoll

Business Objects + Oracle

Less than 1 second

Page 27: Data Aggregation in Today's Data Warehouse

27 1/24/2006

Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments

Applications Databases Flat Files MainframeEAI/EDI

ETL LayerETL Layer

Data Warehouse ODS

Data marts

$ $!Summary TablesMultidimensional

Data StoresBursted Reports Data AlertsCached Reports Extra Hardware

Memory, CPUs

NEED FOR REAL TIME INFORMATIONLow High

Poor Query Performance & Poor User Concurrency

DSS Ad-hocQuery

Budgeting &Planning

OperationalBI

CPM BAM Real-TimeDashboards

Longer Batch Window

Page 28: Data Aggregation in Today's Data Warehouse

28 1/24/2006

Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments

Longer Batch Window

$ $!Summary TablesMultidimensional

Data StoresBursted Reports Data AlertsCached Reports Extra Hardware

Memory, CPUs

DSS Ad-hocQuery

Budgeting &Planning

NEED FOR REAL TIME INFORMATIONLow High

Poor Query Performance & Poor User Concurrency

OperationalBI

CPM BAM Real-TimeDashboards

Applications Databases Flat Files MainframeEAI/EDI

ETL LayerETL Layer

Data Warehouse ODS

Data marts

Hyp

erR

oll E

ngin

e

Hyp

erR

oll E

ngin

e

Page 29: Data Aggregation in Today's Data Warehouse

29 1/24/2006

The best of both worldThe best of both world

RDBMS

OLAP

Relational“Unlimited” scope of dataVariety of client toolsHigh maintenanceComplex table joins and aggregations slows down queriesComplex analysis difficult

OLAPFast QueriesComplex AnalysisLimited scopeLong cube buildsLimited client tools

HyperRoll offers the best of both worldsTransparent integration with both Relational and Multidimensional databasesSeamless to the existing client toolsFast build process (dramatically faster then OLAP)Fast queries without having to design, build and maintain multiple summary tablesBroader scope of analysis (dimensions and data)Eliminates complex Joins and GroupBy

Page 30: Data Aggregation in Today's Data Warehouse

30 1/24/2006

The HyperRoll aggregation serverThe HyperRoll aggregation server

• What’s the magic with the HyperRoll aggregation server

• Does it compute all possible aggregates?

• How come it can perform so much better than OLAP cubes

Page 31: Data Aggregation in Today's Data Warehouse

31 1/24/2006

Legend

Multidimensional CubeMultidimensional Cube

Theoretical scope of data

Leaf level dataAggregated Data

Problems:•Sparsity•Irregularity

Page 32: Data Aggregation in Today's Data Warehouse

32 1/24/2006

16 81 256 1024 4096

16384

65536

0

10000

20000

30000

40000

50000

60000

70000

2 3 4 5 6 7 8

Data Explosion SyndromeData Explosion Syndrome

Number of DimensionsNumber of Dimensions

Num

ber o

f Agg

rega

tions

Num

ber o

f Agg

rega

tions

(4 levels in each dimension)(4 levels in each dimension)

Typical OLAP ProblemsTypical OLAP ProblemsData ExplosionData Explosion

Page 33: Data Aggregation in Today's Data Warehouse

33 1/24/2006

What is the HyperRoll?What is the HyperRoll?• An intelligent Aggregation Server• Software engine based on proprietary

algorithms for data aggregation– Pre-computes a small-footprint data store– Enables quick computation of aggregate values– Highly-efficient I/O

• The logical equivalent of OLAP for relational without the limitations

• Patented Architecture for standalone data aggregation

• Integrated into existing relational databases and Business Intelligence systems

Page 34: Data Aggregation in Today's Data Warehouse

34 1/24/2006

What about HardwareWhat about Hardware--based solutionsbased solutions

• Will better H/W make the aggregation problem go away?

• The good news:– Better h/w platforms improve performance

• The bad news– The problem will just get worse over time

Page 35: Data Aggregation in Today's Data Warehouse

35 1/24/2006

Longer Processing TimesSoaring Costs

Limited Analysis FlexibilityOut of Date Information

ConsequencesConsequencesConsequences

InfoGlut is Only Getting WorseInfoGlut is Only Getting Worse

9 Mo. 18 Mo.Time

MultipleVolume ofVolume of

Corporate DataCorporate Data

Linear Processing Linear Processing CapabilityCapability

(Moore(Moore’’s Law)s Law)3

1

2

Page 36: Data Aggregation in Today's Data Warehouse

36 1/24/2006

Expense process required 4 reports to run sequentially Total time to complete task taking 4 hours Queries ran from 11 to 14 minutes

Financial Services Company Application: Expense Management Primary Business Issue: Analyst Productivity

Financial Institution Financial Institution

Before

Query Performance Increase: 37 to 90X

Process now completed in minutes Queries run in 2 to 18 secondsProjected manpower saving: >$500K

Oracle, Business Objects

Page 37: Data Aggregation in Today's Data Warehouse

37 1/24/2006

Customer Test Results Customer Test Results

Query Name Oracle Timing (MV)

HR Timing Improvement

Bill to 7 min 51 sec 14 sec 31 X

Territory 7 min 15 sec 1 sec 427 X

Region 11 min 4 sec 1 sec 422 X

Sales Force12 min 12 sec 1 sec 438 X

All Sales Force 16 min 37 sec 1 sec 541 X

Page 39: Data Aggregation in Today's Data Warehouse

39 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

Step1Analyze

the

Business

• Analyze Reports

• Analyze Semantic Layer

• Select Measures

• Select Dimensions

• Select Hierarchies

• Obtain Design Validation

• Look for hidden requirements

Step1

Analyze the Business Objects Universe, reports, queries and schema

Page 40: Data Aggregation in Today's Data Warehouse

40 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

Step2

Design the HyperRoll metadata structure using HyperRoll HDF Builder

Page 41: Data Aggregation in Today's Data Warehouse

41 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

Step3

HyperRoll is loaded with source data:•RDBMS•Flat files H

yper

Rol

l • Source data is read

• HyperRoll aggregation engine is loaded and calculated

• Hierarchies are developed

Page 42: Data Aggregation in Today's Data Warehouse

42 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

Step4

Create the Database view and link it to HyperRoll

HyperRoll

• Define an ODBC System DSN for HyperRoll

• Create a DBlink for the DSN

• Create View as Select * from HyperRoll@DBlink

Page 43: Data Aggregation in Today's Data Warehouse

43 1/24/2006

Setting up Business Objects to work with Setting up Business Objects to work with HyperRollHyperRoll

Step5

Modify the Business Objects Universe by adding the Database View that points to HyperRoll

• Add the View to the Universe

• Enable the Aggregate Aware Function

Page 44: Data Aggregation in Today's Data Warehouse

44 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

Step6

Execute queries against Database as they normally would

RDBMS Hyp

erR

oll

• Transparent redirection between detail and aggregate data

• No user training

• Dramatically improved query response

• Dramatically improved manageability

Page 45: Data Aggregation in Today's Data Warehouse

45 1/24/2006

Add the View to Business ObjectsAdd the View to Business Objects

• Here the new Database view has been added to the current BO Universe

• The view comprises the aggregated data for the existing schema

Database View Accessing HyperRoll data

Page 46: Data Aggregation in Today's Data Warehouse

46 1/24/2006

Enable Aggregate Aware FunctionEnable Aggregate Aware Function• In the Aggregate Aware function place the matching

column from the view as the first parameter, and the column from the fact table as the second parameter

@Aggregate_Aware(SH.HR_SALES_VW.AMOUNT_SOLD, SH. SALES.AMOUNT_SOLD)

Page 47: Data Aggregation in Today's Data Warehouse

47 1/24/2006

Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll

RDBMS

Business Objects End User Layer

Query Request

Business Objects SQL GenerationIs it a summary

request?N Y

SQL Request

FACTTABLE VIEW

DETAILED SUMMARIZED Hyp

erR

oll I

nsta

nce

Gateway

Page 48: Data Aggregation in Today's Data Warehouse

48 1/24/2006

HyperRoll Value Propositions HyperRoll Value Propositions

• Improved query performance• Reduced batch window to load data• Lower maintenance and support costs • Enables operational BI• Complimentary to existing BI, DB and

DW infrastructures

Page 49: Data Aggregation in Today's Data Warehouse

49 1/24/2006

How to learn moreHow to learn more

• On algorithms for massive data sets– http://theory.stanford.edu/~matias/

• On HyperRoll– Talk to me over the break

[email protected]

– Talk to Kathleen • [email protected]• (845)-928-6974

– Take a webinar www.hyperroll.com

Page 50: Data Aggregation in Today's Data Warehouse

50 1/24/2006

Yossi Matias, CTOHyperRoll

NEBOUG

January 19, 2006January 19, 2006

Realizing the Potential of Realizing the Potential of Business IntelligenceBusiness Intelligence