German Presentation2
description
Transcript of German Presentation2
#Kscope
Advanced BSO Design
Tim German
Senior Consultant, Qubix
#Kscope
Qubix Company Information
Founded 1987 in UK
Offices in UK, Australia, Dubai and US
Focused on delivery of EPM solutions using
Oracle Hyperion suite:
● Essbase
● Planning
● HFM
Clients – Financial, Government,
Transportation, Retail, Manufacturing…
Oracle Platinum Partner
#Kscope
Summary
● Deep-dive into the ‘usual suspects’ of BSO design
principles
● Dense / Sparse configuration
● Block Size
● Dynamic vs Stored
● Dimension Order
● Dimensional analysis and modeling
● Useful techniques
● Approaches to handling SCDs
● Varying attributes
● Designing for maximum parallelism
#Kscope
Conventions and Notes
Examples in MaxL, not ESSCMD
Tested and demonstrated using 11.1.2.2
Assumes some familiarity with, and (to run
demonstration code) access to:
● Sample.Basic
● ASOSamp.Sample
Documentation reference links are 11.1.2.2
#Kscope
Density and Block Size (1)
Block size
● Product of stored dense dimension members,
multiplied by eight
● Size on disk likely to be considerably smaller
● Compression
● Bitmap vs Run-Length Encoding vs ZLIB (vs IVP)
#Kscope
Density and Block Size (2)
● Uncompressed block
● Bitmap 11110 01110 11100 11110 01010
● Compressed Block 100,100,100,100,50,50,50,25,25,25,10,10,10,10,5,10
Jan Feb Mar Apr May
Sales 100 100 100 100 #Mi
COGS #Mi 50 50 50 #Mi
Marketing 25 25 25 #Mi #Mi
Payroll 10 10 10 10 #Mi
Misc #Mi 5 #Mi 10 #Mi
#Kscope
Density and Block Size (3)
● Same uncompressed block
● Run-Length Encoding 100x4,#Mi,#Mi,50x3,#Mi,#Mi,25x3,#Mi,#Mi,10x4,#Mi,#Mi,5
,#Mi,10,#Mi
Jan Feb Mar Apr May
Sales 100 100 100 100 #Mi
COGS #Mi 50 50 50 #Mi
Marketing 25 25 25 #Mi #Mi
Payroll 10 10 10 10 #Mi
Misc #Mi 5 #Mi 10 #Mi
#Kscope
Density and Block Size (4)
● ZLIB
● High CPU demand, possibly most effective
compression
● IVP
● Not selectable – Essbase uses for very sparse
blocks
● Intelligent selection block-by-block (not clear
this is true –selection time?)
● Settings only applied when writing
● IVP
#Kscope
Density and Block Size (5)
Size limits - DBAG vs Reality…
Measure density empirically
● Sample data set!
● Set all but one dimension to dense
● Restructure
● Record block density
● Iterate through dimensions
● Sample data set must be representative!
● If filtered (e.g.) on a dimension, little value on test
#Kscope
Dimension Order (2)
Classic ‘old-school’ recommendation –
Hourglass layout
● Largest Dense Dimension
● Smallest Dense Dimension
● Smallest Sparse Dimension
● Largest Sparse Dimension
Dense dimension order primarily impacts
compression / storage
Sparse dimension order primarily impacts
calculation / parallelism
#Kscope
Dimension Order (1)
Calculator cache
● Contains a bitmap which tracks which blocks exist
during a sparse calculation
● Can’t track ALL blocks in most databases (number
of bits would equal number of potential blocks!)
● Hence ‘bitmap’ vs ‘anchor’ dimensions
● Essbase has a number of options depending on the
size of the calculator cache
#Kscope
Dimension Order (2)
Essbase ‘prefers’ the option of creating multiple
bitmaps with a single ‘anchor’ dimension
Modify the Sample.Basic outline to make
Scenario sparse (for demonstration purposes)
● Scenario – 2 members / 0 max dependent parents
● Product – 19 members / 2 max dependent parents
● Market – 25 members / 1 max dependent parents
Note: Max dependent parents =
● Largest number of dependent (i.e. where children
aggregate into) parents of any single child
#Kscope
Dimension Order (3)
Bitmap for stored combinations
Bitmap size
Size in bytes = 2 x 19 / 8 = 5 (nearly, and note that
minimum is 4 – but almost never a real problem)
Number of bitmaps
Anchor max. dependent parents + 2 = 1 + 2 = 3
Total cache size required = 3 x 5 = 15 bytes
Product 100 100-10 …
Actual 0 0 1 …
Budget 0 0 1 …
… … … … …
#Kscope
Dimension Order (4)
If not possible, next-best option is the use of a
single bitmap with a single anchoring dimension
Total cache size required
= Bitmap size = 5 bytes (notice smaller)
#Kscope
Dimension Order (5)
If not possible, next-best option is the use of a
single bitmap with a multiple anchoring
dimensions
This happens when there are too many sparse
combinations before the last dimension to even
create one full bitmap
Can tell which option is selected from the log:
Calculator cache with multiple bitmaps
for [Product]
#Kscope
Dimension Order (6)
Later recommendation – Hourglass ‘on a stick’
● Non-consolidating sparse dimensions
● Very often we are FIXed on one member at a time
Parallelism (more later…)
#Kscope
Storage Properties
Classic recommendations
● Upper-level dense dynamic
● Upper-level sparse stored
● Why the disconnect?
What happens when a block is ‘touched’ for
read / write / calculation
General principle - choose more CPU activity
ahead of more IO
#Kscope
Dimensional Analysis (Dimensions 1)
‘Real’ or ‘Base’ dimensions
● For a single member of another dimension, data
may exist at any member of a ‘real’ dimension
● Dimensions are usually hierarchic – i.e. they contain
upper-level nodes which represent aggregations of
lower-level nodes
● Does not always mean additive
● Remember to use label-only when upper-level nodes are for
‘navigation’ only
● Implied sharing
#Kscope
Dimensional Analysis (Dimensions 2)
Attribute dimensions
● For a single member of a ‘base’ dimension, data
always exists at the same member of an attribute
dimension*
● Attribute to Base relationship, One to Many
● IOW, there is a ‘pre-existing’ relationship between
the two dimensions
● Remember that Attributes can only be on Sparse
dimensions
● Attribute queries are dynamic
● Performance – blocks to attribute member
#Kscope
Dimensional Analysis (Dimensions 3)
Attribute dimensions Cont.
● Distinguish attributes from hierarchies!
● ‘Region’, ‘State’, ‘City’ might all exist in Fact data
● Should ‘Region’ be an attribute on ‘City’?
● One to Many criterion met
● ‘Population’ might be an attribute on ‘City’
● Consider whether attribute queries at multiple levels of base
dimension are required
● Revised definition of Attribute dimensions
#Kscope
Dimensional Analysis (Dimensions 4)
Example
Sales Profit COGS Expenses
Small East 123 456 123 456
Medium East 234 678 234 678
Small Central 123 456 123 456
Medium Central 234 678 234 678
#Kscope
Dimensional Analysis (Alternates 1)
Different roll-up of same members
● Level-zero or above
● Alternates vs Attributes / Dimensions – Cross-Tab
possible
● Technical advantages to alternates
Sales Jan NY Actual
Colas Root
Beer
Cream
Soda
Fruit
Soda
Diet
Drinks
Bottle 123 456 123 456 123
Can 234 678 234 678 234
#Kscope
Dimensional Analysis (Irrelevance 1)
Interdimensional irrelevance
● BSO is a bad place to combine multiple sets of data
which require different dimensionality
● Example:
● Requirement for a Sales breakdown by Product
● Requirement for a Payroll Expense breakdown by
Employee
● Wrong Answer : Add an Employee dimension to
your existing ‘Sample.Basic’ style cube
#Kscope
Dimensional Analysis (Irrelevance 2)
● Sales data is loaded to ‘No Employee’ member
Sales Jan NY
100-10 100-20 100-30 100-40 No_Prod
Emp1 #Mi #Mi #Mi #Mi #Mi
Emp2 #Mi #Mi #Mi #Mi #Mi
Emp3 #Mi #Mi #Mi #Mi #Mi
No_Emp
123 234 345 456 #Mi
#Kscope
Dimensional Analysis (Irrelevance 3)
● Payroll expense data is loaded to ‘No Product’
member
● Net result
● Guaranteed greater block sparsity (if one or both
dimensions are sparse)
● Guaranteed lower block density (if one or both dimensions
are dense)
Payroll Jan NY
100-10 100-20 100-30 100-40 No_Prod
Emp1 #Mi #Mi #Mi #Mi 123
Emp2 #Mi #Mi #Mi #Mi 234
Emp3 #Mi #Mi #Mi #Mi 345
No_Emp
#Mi #Mi #Mi #Mi #Mi
#Kscope
Dimensional Analysis (Irrelevance 4)
● Better Answer – build two applications
● In some cases, may legitimately want to show data
with different dimensionality in the same reports:
● Sales and R&D
● R&D and Sales available by Product
● R&D cost not available by Market
● Harder Answers -
● Load data to both cubes
● Transparent partitioning
● @XWRITE
● @XREF
● Last two generally in scenarios with smaller datasets –
rates, drivers
#Kscope
Dimensional Analysis (Other Issues)
Consider reporting requirements
Classic Example: Period
● Years
● Periods
● Days
Some requirements (e.g. retail) may include
reporting ‘Year on Year’ same Periods, Weeks
or Days
Different dimensional structures make
significant difference in how easily these are
generated
#Kscope
Dimensional Analysis (Conclusion)
Common theme
● It’s difficult to design a database if you aren’t familiar
with (or don’t investigate) the dataset
● It’s difficult to design a database if you aren’t familiar
with (or don’t investigate) reporting requirements
● With unfamiliar dataset, can gain some insights
quickly using relational tools
● Can use queries to identify one-to-many / many-to-
one relationships in Fact data SELECT Market, COUNT(Region) FROM Fact
GROUP BY MARKET
● No equivalent for reporting users
#Kscope
Varying Attributes (1)
Attribute associations that change over time*
Classic example:
● Associate Markets with a Sales Manager – but
responsibilities change over time
● Use ‘vanilla’ attribute –
● Manager attribute on Market
● Provides view of Markets by Manager
● Gives view of historic data based on current Manager
● Use ‘vanilla’ dimensions –
● Manager and Market as two dimensions
● Provides view of Markets by Manager
● Gives view of historic data based on historic Manager
#Kscope
Varying Attributes (2)
Do not have to represent attributes varying over
time
At most abstract:
● ‘Varying’ attributes represent
● The relationship between one dimension (the ‘Base’
dimension) and another dimension (the ‘Attribute’
dimension)…
● …varying across the members of another dimension (the
‘Independent’ dimension)
#Kscope
Varying Attributes (3)
‘Discrete’ vs ‘Continuous’
Independent dimensions – not necessarily
sparse (unlike ‘Base’ dimension
Multiple independent dimensions?
Use cases for non-Time-independent
dimensions
PERSPECTIVE
#Kscope
Varying Attributes (4)
Association
● Via Studio
● Via EAS
● Not directly via Load Rule
#Kscope
Handling SCDs (1)
Classic OLAP problem
History follows dimensional changes
● E.g. re-organize and decide to create a new ‘South
East’ region containing ‘Florida’ and ‘North Carolina’
Options to handle
● Allow it
● Multiple true dimensions
● Attribute dimensions
● Varying(!) attribute dimension
● Alternate hierarchies
#Kscope
Handling SCDs (2)
Allow it – provides the ‘Current’ view only
Multiple true dimensions
● E.g. Region vs Market
● Provides the ‘Reality’ view only
● Drill-down depends on SUPMISSING
● Not always good news, depending on situation
● Particularly, multiple large dimensions
● May involve some ETL not previously required!
#Kscope
Handling SCDs (3)
Attribute dimension combined with true
dimension
● Provides ‘reality’ view…
● …and ‘current’ view
Complex, but automatable even without Studio
Challenging for ad hoc reporting
Current view is an attribute query
#Kscope
Handling SCDs (4)
Varying Attribute Dimensions
● Combines ‘allow it’ and ‘multi-dimension’ approach
● Can provide ‘Reality’ view
● Can provide ‘Current’ view
No increase in database size
Dynamically calculated retrievals
Difficult to maintain automatically except via
Studio
#Kscope
Handling SCDs (5)
Alternate hierarchies
● Can produce as many ‘snapshots’ as required
● Technical advantage of precalculated
aggregatations
● Reporting advantage of drill that works normally
● Disadvantage of increasing database size
No requirement for ‘full’ snapshot!
Avoid creating alternates containing more
members than necessary
Can use queries to identify minimal set
#Kscope
Designing for Parallelism (1)
● Break a single calculation down into multiple ‘tasks’
then execute several of these tasks simultaneously
● 32- and 64-bit platforms currently support up to 64
and 128 simultaneous threads respectively
● Maximum number of threads can be specified.
Documented suggestion to set “number of CPUs” –
1, leaving a processor free for the OS, but maximum
does not seem to hurt
● Enable parallel calculation either via essbase.cfg file
entries…
● …or in the calculation script itself
#Kscope
Designing for Parallelism (2)
● Example on Sample.Basic. Running script…
● …produces log message:
● Now, running…
● …produces log message
CALC ALL
Calculating in serial
SET CALCPARALLEL 2;
CALC ALL;
Calculating in parallel with [2] threads
Calculation task schedule [20,4,1]
Parallelizing using [1] task dimensions.
#Kscope
Designing for Parallelism (3)
● Calculation task schedule’ provides a clue for
understanding how parallel calculation is working
● Calculation task selection can not be arbitrary if
correct results are to be produced
● Look at the “Market” dimension in Sample.Basic
● There are 20 level-zero members (US states), 4
level-one members (regions) and one level-two
member
Calculation task schedule [20,4,1]
#Kscope
Designing for Parallelism (4)
● Deliberately introducing an interdependency
demonstrates the underlying intelligence
● Check log:
● 19 = All level-zero members of Market, except
“Colorado”, which must wait on “East”
● 3 = “East”, “West” “South” (“Central” waits on ‘CO’)
● 1 = “Colorado”
● 1 = “Central”
● 1 = “Market”
“Colorado” = “East”;
Calculating in parallel with [2] threads
Calculation task schedule [19,3,1,1,1]
#Kscope
Designing for Parallelism (5)
● Certain formulae can prevent parallel calculation, so
a ‘minor’ outline change can have a significant
calculation performance impact – see the app log
● For those members consider Dynamic Calc
(assuming no stored members are dependent)
● Occasionally, disabling parallel calculation results in
faster execution – large proportion of empty tasks
● Parallel calculation requires ‘Uncommitted Access’ –
no warning in log if this requirement not met
● Not always desirable to have maximum parallelism –
for example, Planning-type applications
● Each thread grabs its own Calculator Cache (ref.
Rick Sawa’s K’scope paper)
#Kscope
Designing for Parallelism (6)
Modify CALCTASKDIMS if using ‘hourglass on
a stick’
Get largest, ‘densest’ sparse dimensions into
the TASKDIM set
Monitor via the calculation log
● Empty Tasks
#Kscope
Questions?
#Kscope
References - Documentation
‘Understanding Implied Sharing’ -
http://docs.oracle.com/cd/E26232_01/doc.11122/esb_dbag/dotattrs.html#dotattrs1042993
‘Optimizing Essbase Caches’ (includes Calculator Cache) -
http://docs.oracle.com/cd/E26232_01/doc.11122/esb_dbag/dstcache.html