German Presentation2

45
#Kscope Advanced BSO Design Tim German Senior Consultant, Qubix

description

essbase

Transcript of German Presentation2

Page 1: German Presentation2

#Kscope

Advanced BSO Design

Tim German

Senior Consultant, Qubix

Page 2: German Presentation2

#Kscope

Qubix Company Information

Founded 1987 in UK

Offices in UK, Australia, Dubai and US

Focused on delivery of EPM solutions using

Oracle Hyperion suite:

● Essbase

● Planning

● HFM

Clients – Financial, Government,

Transportation, Retail, Manufacturing…

Oracle Platinum Partner

Page 3: German Presentation2

#Kscope

Summary

● Deep-dive into the ‘usual suspects’ of BSO design

principles

● Dense / Sparse configuration

● Block Size

● Dynamic vs Stored

● Dimension Order

● Dimensional analysis and modeling

● Useful techniques

● Approaches to handling SCDs

● Varying attributes

● Designing for maximum parallelism

Page 4: German Presentation2

#Kscope

Conventions and Notes

Examples in MaxL, not ESSCMD

Tested and demonstrated using 11.1.2.2

Assumes some familiarity with, and (to run

demonstration code) access to:

● Sample.Basic

● ASOSamp.Sample

Documentation reference links are 11.1.2.2

Page 5: German Presentation2

#Kscope

Density and Block Size (1)

Block size

● Product of stored dense dimension members,

multiplied by eight

● Size on disk likely to be considerably smaller

● Compression

● Bitmap vs Run-Length Encoding vs ZLIB (vs IVP)

Page 6: German Presentation2

#Kscope

Density and Block Size (2)

● Uncompressed block

● Bitmap 11110 01110 11100 11110 01010

● Compressed Block 100,100,100,100,50,50,50,25,25,25,10,10,10,10,5,10

Jan Feb Mar Apr May

Sales 100 100 100 100 #Mi

COGS #Mi 50 50 50 #Mi

Marketing 25 25 25 #Mi #Mi

Payroll 10 10 10 10 #Mi

Misc #Mi 5 #Mi 10 #Mi

Page 7: German Presentation2

#Kscope

Density and Block Size (3)

● Same uncompressed block

● Run-Length Encoding 100x4,#Mi,#Mi,50x3,#Mi,#Mi,25x3,#Mi,#Mi,10x4,#Mi,#Mi,5

,#Mi,10,#Mi

Jan Feb Mar Apr May

Sales 100 100 100 100 #Mi

COGS #Mi 50 50 50 #Mi

Marketing 25 25 25 #Mi #Mi

Payroll 10 10 10 10 #Mi

Misc #Mi 5 #Mi 10 #Mi

Page 8: German Presentation2

#Kscope

Density and Block Size (4)

● ZLIB

● High CPU demand, possibly most effective

compression

● IVP

● Not selectable – Essbase uses for very sparse

blocks

● Intelligent selection block-by-block (not clear

this is true –selection time?)

● Settings only applied when writing

● IVP

Page 9: German Presentation2

#Kscope

Density and Block Size (5)

Size limits - DBAG vs Reality…

Measure density empirically

● Sample data set!

● Set all but one dimension to dense

● Restructure

● Record block density

● Iterate through dimensions

● Sample data set must be representative!

● If filtered (e.g.) on a dimension, little value on test

Page 10: German Presentation2

#Kscope

Dimension Order (2)

Classic ‘old-school’ recommendation –

Hourglass layout

● Largest Dense Dimension

● Smallest Dense Dimension

● Smallest Sparse Dimension

● Largest Sparse Dimension

Dense dimension order primarily impacts

compression / storage

Sparse dimension order primarily impacts

calculation / parallelism

Page 11: German Presentation2

#Kscope

Dimension Order (1)

Calculator cache

● Contains a bitmap which tracks which blocks exist

during a sparse calculation

● Can’t track ALL blocks in most databases (number

of bits would equal number of potential blocks!)

● Hence ‘bitmap’ vs ‘anchor’ dimensions

● Essbase has a number of options depending on the

size of the calculator cache

Page 12: German Presentation2

#Kscope

Dimension Order (2)

Essbase ‘prefers’ the option of creating multiple

bitmaps with a single ‘anchor’ dimension

Modify the Sample.Basic outline to make

Scenario sparse (for demonstration purposes)

● Scenario – 2 members / 0 max dependent parents

● Product – 19 members / 2 max dependent parents

● Market – 25 members / 1 max dependent parents

Note: Max dependent parents =

● Largest number of dependent (i.e. where children

aggregate into) parents of any single child

Page 13: German Presentation2

#Kscope

Dimension Order (3)

Bitmap for stored combinations

Bitmap size

Size in bytes = 2 x 19 / 8 = 5 (nearly, and note that

minimum is 4 – but almost never a real problem)

Number of bitmaps

Anchor max. dependent parents + 2 = 1 + 2 = 3

Total cache size required = 3 x 5 = 15 bytes

Product 100 100-10 …

Actual 0 0 1 …

Budget 0 0 1 …

… … … … …

Page 14: German Presentation2

#Kscope

Dimension Order (4)

If not possible, next-best option is the use of a

single bitmap with a single anchoring dimension

Total cache size required

= Bitmap size = 5 bytes (notice smaller)

Page 15: German Presentation2

#Kscope

Dimension Order (5)

If not possible, next-best option is the use of a

single bitmap with a multiple anchoring

dimensions

This happens when there are too many sparse

combinations before the last dimension to even

create one full bitmap

Can tell which option is selected from the log:

Calculator cache with multiple bitmaps

for [Product]

Page 16: German Presentation2

#Kscope

Dimension Order (6)

Later recommendation – Hourglass ‘on a stick’

● Non-consolidating sparse dimensions

● Very often we are FIXed on one member at a time

Parallelism (more later…)

Page 17: German Presentation2

#Kscope

Storage Properties

Classic recommendations

● Upper-level dense dynamic

● Upper-level sparse stored

● Why the disconnect?

What happens when a block is ‘touched’ for

read / write / calculation

General principle - choose more CPU activity

ahead of more IO

Page 18: German Presentation2

#Kscope

Dimensional Analysis (Dimensions 1)

‘Real’ or ‘Base’ dimensions

● For a single member of another dimension, data

may exist at any member of a ‘real’ dimension

● Dimensions are usually hierarchic – i.e. they contain

upper-level nodes which represent aggregations of

lower-level nodes

● Does not always mean additive

● Remember to use label-only when upper-level nodes are for

‘navigation’ only

● Implied sharing

Page 19: German Presentation2

#Kscope

Dimensional Analysis (Dimensions 2)

Attribute dimensions

● For a single member of a ‘base’ dimension, data

always exists at the same member of an attribute

dimension*

● Attribute to Base relationship, One to Many

● IOW, there is a ‘pre-existing’ relationship between

the two dimensions

● Remember that Attributes can only be on Sparse

dimensions

● Attribute queries are dynamic

● Performance – blocks to attribute member

Page 20: German Presentation2

#Kscope

Dimensional Analysis (Dimensions 3)

Attribute dimensions Cont.

● Distinguish attributes from hierarchies!

● ‘Region’, ‘State’, ‘City’ might all exist in Fact data

● Should ‘Region’ be an attribute on ‘City’?

● One to Many criterion met

● ‘Population’ might be an attribute on ‘City’

● Consider whether attribute queries at multiple levels of base

dimension are required

● Revised definition of Attribute dimensions

Page 21: German Presentation2

#Kscope

Dimensional Analysis (Dimensions 4)

Example

Sales Profit COGS Expenses

Small East 123 456 123 456

Medium East 234 678 234 678

Small Central 123 456 123 456

Medium Central 234 678 234 678

Page 22: German Presentation2

#Kscope

Dimensional Analysis (Alternates 1)

Different roll-up of same members

● Level-zero or above

● Alternates vs Attributes / Dimensions – Cross-Tab

possible

● Technical advantages to alternates

Sales Jan NY Actual

Colas Root

Beer

Cream

Soda

Fruit

Soda

Diet

Drinks

Bottle 123 456 123 456 123

Can 234 678 234 678 234

Page 23: German Presentation2

#Kscope

Dimensional Analysis (Irrelevance 1)

Interdimensional irrelevance

● BSO is a bad place to combine multiple sets of data

which require different dimensionality

● Example:

● Requirement for a Sales breakdown by Product

● Requirement for a Payroll Expense breakdown by

Employee

● Wrong Answer : Add an Employee dimension to

your existing ‘Sample.Basic’ style cube

Page 24: German Presentation2

#Kscope

Dimensional Analysis (Irrelevance 2)

● Sales data is loaded to ‘No Employee’ member

Sales Jan NY

100-10 100-20 100-30 100-40 No_Prod

Emp1 #Mi #Mi #Mi #Mi #Mi

Emp2 #Mi #Mi #Mi #Mi #Mi

Emp3 #Mi #Mi #Mi #Mi #Mi

No_Emp

123 234 345 456 #Mi

Page 25: German Presentation2

#Kscope

Dimensional Analysis (Irrelevance 3)

● Payroll expense data is loaded to ‘No Product’

member

● Net result

● Guaranteed greater block sparsity (if one or both

dimensions are sparse)

● Guaranteed lower block density (if one or both dimensions

are dense)

Payroll Jan NY

100-10 100-20 100-30 100-40 No_Prod

Emp1 #Mi #Mi #Mi #Mi 123

Emp2 #Mi #Mi #Mi #Mi 234

Emp3 #Mi #Mi #Mi #Mi 345

No_Emp

#Mi #Mi #Mi #Mi #Mi

Page 26: German Presentation2

#Kscope

Dimensional Analysis (Irrelevance 4)

● Better Answer – build two applications

● In some cases, may legitimately want to show data

with different dimensionality in the same reports:

● Sales and R&D

● R&D and Sales available by Product

● R&D cost not available by Market

● Harder Answers -

● Load data to both cubes

● Transparent partitioning

● @XWRITE

● @XREF

● Last two generally in scenarios with smaller datasets –

rates, drivers

Page 27: German Presentation2

#Kscope

Dimensional Analysis (Other Issues)

Consider reporting requirements

Classic Example: Period

● Years

● Periods

● Days

Some requirements (e.g. retail) may include

reporting ‘Year on Year’ same Periods, Weeks

or Days

Different dimensional structures make

significant difference in how easily these are

generated

Page 28: German Presentation2

#Kscope

Dimensional Analysis (Conclusion)

Common theme

● It’s difficult to design a database if you aren’t familiar

with (or don’t investigate) the dataset

● It’s difficult to design a database if you aren’t familiar

with (or don’t investigate) reporting requirements

● With unfamiliar dataset, can gain some insights

quickly using relational tools

● Can use queries to identify one-to-many / many-to-

one relationships in Fact data SELECT Market, COUNT(Region) FROM Fact

GROUP BY MARKET

● No equivalent for reporting users

Page 29: German Presentation2

#Kscope

Varying Attributes (1)

Attribute associations that change over time*

Classic example:

● Associate Markets with a Sales Manager – but

responsibilities change over time

● Use ‘vanilla’ attribute –

● Manager attribute on Market

● Provides view of Markets by Manager

● Gives view of historic data based on current Manager

● Use ‘vanilla’ dimensions –

● Manager and Market as two dimensions

● Provides view of Markets by Manager

● Gives view of historic data based on historic Manager

Page 30: German Presentation2

#Kscope

Varying Attributes (2)

Do not have to represent attributes varying over

time

At most abstract:

● ‘Varying’ attributes represent

● The relationship between one dimension (the ‘Base’

dimension) and another dimension (the ‘Attribute’

dimension)…

● …varying across the members of another dimension (the

‘Independent’ dimension)

Page 31: German Presentation2

#Kscope

Varying Attributes (3)

‘Discrete’ vs ‘Continuous’

Independent dimensions – not necessarily

sparse (unlike ‘Base’ dimension

Multiple independent dimensions?

Use cases for non-Time-independent

dimensions

PERSPECTIVE

Page 32: German Presentation2

#Kscope

Varying Attributes (4)

Association

● Via Studio

● Via EAS

● Not directly via Load Rule

Page 33: German Presentation2

#Kscope

Handling SCDs (1)

Classic OLAP problem

History follows dimensional changes

● E.g. re-organize and decide to create a new ‘South

East’ region containing ‘Florida’ and ‘North Carolina’

Options to handle

● Allow it

● Multiple true dimensions

● Attribute dimensions

● Varying(!) attribute dimension

● Alternate hierarchies

Page 34: German Presentation2

#Kscope

Handling SCDs (2)

Allow it – provides the ‘Current’ view only

Multiple true dimensions

● E.g. Region vs Market

● Provides the ‘Reality’ view only

● Drill-down depends on SUPMISSING

● Not always good news, depending on situation

● Particularly, multiple large dimensions

● May involve some ETL not previously required!

Page 35: German Presentation2

#Kscope

Handling SCDs (3)

Attribute dimension combined with true

dimension

● Provides ‘reality’ view…

● …and ‘current’ view

Complex, but automatable even without Studio

Challenging for ad hoc reporting

Current view is an attribute query

Page 36: German Presentation2

#Kscope

Handling SCDs (4)

Varying Attribute Dimensions

● Combines ‘allow it’ and ‘multi-dimension’ approach

● Can provide ‘Reality’ view

● Can provide ‘Current’ view

No increase in database size

Dynamically calculated retrievals

Difficult to maintain automatically except via

Studio

Page 37: German Presentation2

#Kscope

Handling SCDs (5)

Alternate hierarchies

● Can produce as many ‘snapshots’ as required

● Technical advantage of precalculated

aggregatations

● Reporting advantage of drill that works normally

● Disadvantage of increasing database size

No requirement for ‘full’ snapshot!

Avoid creating alternates containing more

members than necessary

Can use queries to identify minimal set

Page 38: German Presentation2

#Kscope

Designing for Parallelism (1)

● Break a single calculation down into multiple ‘tasks’

then execute several of these tasks simultaneously

● 32- and 64-bit platforms currently support up to 64

and 128 simultaneous threads respectively

● Maximum number of threads can be specified.

Documented suggestion to set “number of CPUs” –

1, leaving a processor free for the OS, but maximum

does not seem to hurt

● Enable parallel calculation either via essbase.cfg file

entries…

● …or in the calculation script itself

Page 39: German Presentation2

#Kscope

Designing for Parallelism (2)

● Example on Sample.Basic. Running script…

● …produces log message:

● Now, running…

● …produces log message

CALC ALL

Calculating in serial

SET CALCPARALLEL 2;

CALC ALL;

Calculating in parallel with [2] threads

Calculation task schedule [20,4,1]

Parallelizing using [1] task dimensions.

Page 40: German Presentation2

#Kscope

Designing for Parallelism (3)

● Calculation task schedule’ provides a clue for

understanding how parallel calculation is working

● Calculation task selection can not be arbitrary if

correct results are to be produced

● Look at the “Market” dimension in Sample.Basic

● There are 20 level-zero members (US states), 4

level-one members (regions) and one level-two

member

Calculation task schedule [20,4,1]

Page 41: German Presentation2

#Kscope

Designing for Parallelism (4)

● Deliberately introducing an interdependency

demonstrates the underlying intelligence

● Check log:

● 19 = All level-zero members of Market, except

“Colorado”, which must wait on “East”

● 3 = “East”, “West” “South” (“Central” waits on ‘CO’)

● 1 = “Colorado”

● 1 = “Central”

● 1 = “Market”

“Colorado” = “East”;

Calculating in parallel with [2] threads

Calculation task schedule [19,3,1,1,1]

Page 42: German Presentation2

#Kscope

Designing for Parallelism (5)

● Certain formulae can prevent parallel calculation, so

a ‘minor’ outline change can have a significant

calculation performance impact – see the app log

● For those members consider Dynamic Calc

(assuming no stored members are dependent)

● Occasionally, disabling parallel calculation results in

faster execution – large proportion of empty tasks

● Parallel calculation requires ‘Uncommitted Access’ –

no warning in log if this requirement not met

● Not always desirable to have maximum parallelism –

for example, Planning-type applications

● Each thread grabs its own Calculator Cache (ref.

Rick Sawa’s K’scope paper)

Page 43: German Presentation2

#Kscope

Designing for Parallelism (6)

Modify CALCTASKDIMS if using ‘hourglass on

a stick’

Get largest, ‘densest’ sparse dimensions into

the TASKDIM set

Monitor via the calculation log

● Empty Tasks

Page 44: German Presentation2

#Kscope

Questions?

Page 45: German Presentation2

#Kscope

References - Documentation

‘Understanding Implied Sharing’ -

http://docs.oracle.com/cd/E26232_01/doc.11122/esb_dbag/dotattrs.html#dotattrs1042993

‘Optimizing Essbase Caches’ (includes Calculator Cache) -

http://docs.oracle.com/cd/E26232_01/doc.11122/esb_dbag/dstcache.html