Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
Modeling and Querying Multidimensional Data Sources in Siebel Analytics Kazi A. Zaman Donovan A....
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of Modeling and Querying Multidimensional Data Sources in Siebel Analytics Kazi A. Zaman Donovan A....
Modeling and Querying Multidimensional Data Sources in Siebel Analytics
Kazi A. Zaman Donovan A. [email protected] [email protected]
© 2005 Siebel Systems, Inc. Confidential. 2
Structure of Talk
Challenges of federating relational and multidimensional data sources
Overview of Multidimensional data sources
Overview of Siebel Analytics Architecture
Our approach to solving the problem
Issues with multi vendor support
Conclusions and Future Work
© 2005 Siebel Systems, Inc. Confidential. 3
Why federating multidimensional sources is important
Enterprises have a multitude of data sources
Not always consolidated in a single data warehouse
Cubes (OLAP systems) are best suited for certain applications: e.g. budgeting
Many important business questions require information from both relational and multidimensional systems Budgets vs. actuals Real time Reporting: HR system data integrated with sales
pipeline data
© 2005 Siebel Systems, Inc. Confidential. 4
Multidimensional Data Sources
Highly aggregated view of data, primarily used for analysis
Provides a dimensional view of data
Prominent examples: Microsoft Analysis Services, Hyperion, SAP/BW
Cubes: Storage mechanism not necessarily MOLAP
Query Language: Vendor specific interfaces, MDX
Access Mechanisms: Vendor specific Interfaces (e.g. BAPI), ODBO, XMLA
© 2005 Siebel Systems, Inc. Confidential. 5
Key Differences from Relational Systems
Rich metadata exposed: Dimensions, hierarchies, levels, measures
Specialized language constructs for manipulating this metadata: Ancestors(), Descendants()
Query results are multidimensional datasets- not rowsets
Ability to specify complex multi pass calculations
Special functionality for time series calculations
© 2005 Siebel Systems, Inc. Confidential. 6
Siebel Analytics Server
Analytics Server is a federated system Supports rich data sources: Relational(DB2, Oracle, SQL Server,
Teradata), OLAP (Analysis Services, SAP/BW), XML Supports rich schemas (OLTP, DW)
Executes queries specified against a logical business model containing data warehousing constructs
Analytics Server translate logical queries to queries against one or more backend data sources
Design goal to push as much processing to back end data sources
Carries out post processing on joined query results
Does not have its own storage layer
© 2005 Siebel Systems, Inc. Confidential. 7
Query Processing Overview
Optimizer/Compiler (Rewrite Rules)
Code Generator
Navigation
Presentation LayerPresentation Layer
Business Model & Mapping Layer
•Dimensions•Hierarchies•Measures•Alternative Sources•Partitioning•Aggregation Rules•Time Series
Business Model & Mapping Layer
•Dimensions•Hierarchies•Measures•Alternative Sources•Partitioning•Aggregation Rules•Time Series
Physical Layer
•Security•Connections•DB Features•Schema
Physical Layer
•Security•Connections•DB Features•Schema
Repository -- Metadata Repository -- Metadata
Generated access planand Initial SQLGenerated access planand Initial SQL
Optimized SQL based ontarget databasesand DB Features tablesalso perform optimizationto improve efficiencies
Optimized SQL based ontarget databasesand DB Features tablesalso perform optimizationto improve efficiencies
Generate physical SQLFor external sources andInternal plan for operationsthat must be executed inthe server, including Parallelization, sorting, etc.
Generate physical SQLFor external sources andInternal plan for operationsthat must be executed inthe server, including Parallelization, sorting, etc.
© 2005 Siebel Systems, Inc. Confidential. 8
Requirements for federating multidimensional sources
Model multidimensional data sources in physical layer of metadata
Mark fragments of a federated query plan for execution at a multidimensional source based on source capabilities
Generate MDX from the relational query plan fragment (SQL to MDX translation)
Ability to convert multidimensional result set into two dimensional rowset
© 2005 Siebel Systems, Inc. Confidential. 9
Challenges
SQL has a relational model, MDX multidimensional
We convert the multidimensional model to relational
Lose full power of multidimensional model
SQL : open world : Country = “USA”
MDX closed world : Geography.[USA]
If no such member, query will fail.
© 2005 Siebel Systems, Inc. Confidential. 10
Metadata Modeling: Cubetables
Cube with 2 hierarchies and 2 measures
Time: Year -> Quarter -> Month
Geography: State -> City
Measures: profit, sales
Cube Table T
(Year, Quarter, Month, State, City, Profit, Sales)
Hierarchy, level , agg rule info is preserved
© 2005 Siebel Systems, Inc. Confidential. 12
Rowset Creation from Multidimensional Result Sets
MDX result sets consist of dimensional members on axes and measures in delimited cells.
SELECT
{[Measures].[Sales]} on COLUMNS,
{Crossjoin({[Year].Members},
{[Products].[Soda].Members}) on ROWS
FROM [Sales]
Generate only 2 dimensional queries
Measures on COLUMNS, dimensions on ROWS
Sales
1997 Coke 100
1998 Coke 200
© 2005 Siebel Systems, Inc. Confidential. 13
Transforming the Intermediate Rowset
Intermediate rowset may need further transformation Number of columns in rowset may differ from number of requested
columns Ordering of columns in rowset may differ from requested order.
Protocols for intermediate rowset transformation A simple example protocol maps intermediate column indexes to
columns in the final rowset (1, 2, 3) : select year, product, sum (sales) from T group by year,
product (3, 2, 1): select sum (sales), product, year from T group by year,
product Different protocols for different data sources/ MDX generation
algorithms
© 2005 Siebel Systems, Inc. Confidential. 14
MDX Code Generation
Effectively SQL to MDX translation along with rowset creation protocol data
Makes use of cubetable specific metadata – hierarchies and levels
Different code generation strategies for different SQL templates
Support as wide a set of SQL templates as possible
Generate efficient MDX – lack of mature optimizers in multidimensional data sources
© 2005 Siebel Systems, Inc. Confidential. 15
MDX Generation Examples
SELECT c1, c2…, aggr(m1), aggr(m2)
FROM Table
WHERE <conditions>
GROUP BY c1, c2….
HAVING <conditions>
Goal to translate entire SQL template to efficient MDX
Metadata Information T (Store Country, Store State, Year, Quarter, Unit Sales)
Aggregation Rule: SUM
© 2005 Siebel Systems, Inc. Confidential. 16
Multiple dimensions plus measure with matching aggregate rule
SQL
Select “Store Country”, Year, SUM(Unit Sales)From TGroup By “Store Country”, Year
MDX
Select
{[Unit Sales]} on columns,
{ nonemptycrossjoin([Store Country].members, [Year].members)} on rows
From [Sales]
© 2005 Siebel Systems, Inc. Confidential. 17
Measure with non-matching aggregate rule
Select “Store Country”, Year, AVG(Unit Sales)From TGroup By “Store Country”, Year
with
set [A] as '{[Store Country].members}'
set [B] as '{[Year].members}'
set [C] as 'nonemptycrossjoin({[A]},{[B]})'
member [measures].[MS1] as 'AVG(nonemptycrossjoin(Descendants(Store.currentmember,[Store State]), Descendants(Time.currentmember,[Quarter]) ),[Unit Sales])'
select
{[MS1]} on columns,
{[C]} on rows
from [Sales]
© 2005 Siebel Systems, Inc. Confidential. 18
Matching aggregate rule, predicate refers to GROUP BY columns
Select “Store Country”, Year, SUM(Unit Sales)From T Where “Store Country” In (‘USA’, ‘India’) AND Year = ‘1997’Group By “Store Country”, Year
with
set [A] as '{filter([Store Country].members, Store.currentmember.name = "USA" OR Store.currentmember.name = "India")}'
set [B] as '{filter([Year].members, time.currentmember.name = "1997") }'
set [C] as 'nonemptycrossjoin({[A]},{[B]})'
select
{[Unit Sales]} on columns,
{[C]} on rows
from [Sales]
© 2005 Siebel Systems, Inc. Confidential. 19
Multiple levels of a dimension plus measure with matching aggregate rule, predicates refers to both levels
Select “Store Country”, “Store State”, SUM(Unit Sales)From T Where “Store Country” = ‘USA’ AND “Store State” In (‘CA’,’ OR’)’Group By “Store Country” , “Store State”
with
member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'
set [A] as 'filter({[Store Country].members},Store.currentmember.name = “USA”)‘
set[B] as'Filter( Generate({[A]},Descendants([Store].currentmember,[Store].[Store State])), [Store].currentmember.name= "CA" OR [Store].currentmember.name= "OR" )'
© 2005 Siebel Systems, Inc. Confidential. 20
Continued…..
select
{[Measures].[CountryAnc], [Measures].[Unit Sales]} on columns,
{[B]} on rows
From
[Sales]
© 2005 Siebel Systems, Inc. Confidential. 21
Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list
Select Store Country, Store State, SUM(UnitSales)From T Where Year = ‘1997’Group By Store Country, Store State
© 2005 Siebel Systems, Inc. Confidential. 22
Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list
Slicer used:
with
member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'
set [A] as '{[Store State].members}'
select
{[Measures].[CountryAnc],[Unit Sales]} on columns,
{ [A]} on rows
From
[Sales]
Where ([1997])
© 2005 Siebel Systems, Inc. Confidential. 23
Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list
with
member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'
member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name'
set [A] as '{[Store State].members}'
set [B] as '{[Time].[Year].members} '
member [measures].[MS1] as 'SUM(filter(nonemptycrossjoin(Descendants(Store.currentmember,[Store State]), {[B]} ), [Time].currentmember.name="1997",[Unit Sales])'
select
{[Measures].[CountryAnc],
[Measures].[MS1]} on columns,
{[A]} on rows
From
[Sales]
© 2005 Siebel Systems, Inc. Confidential. 24
Dimension plus measure with matching aggregate rule with HAVING clause
Select “Store Country”, SUM(Unit Sales)From TGroup By “Store Country”
Having SUM(Unit Sales) > 10000
select
{[Unit Sales]} on columns,
Filter({ [Store Country].members}, 10000 < [Unit Sales])
on rows
from
[Sales]
© 2005 Siebel Systems, Inc. Confidential. 25
Multiple Vendor Support
MDX and XMLA support varies widely from vendor to vendor
Caption names vs Unique Names
Classes of hierarchies supported
Treatment of Properties
Using ancestor within a calculated member
Metadata returned <structure> Cardinality of levels
© 2005 Siebel Systems, Inc. Confidential. 26
Captions vs Member Names
Caption : USA Member Name: PG2003012
MDX queries use Member Name not caption
Incoming SQL uses Caption not member name
Member Name is 7 bit ASCII
Need to convert between captions & member names
Solution: cache mappings between member names and captions on demand
Affects class of predicates pushed (no more >, <)
© 2005 Siebel Systems, Inc. Confidential. 27
Conclusions and Future Work
Ability to handle multidimensional and relational data in a single framework
Generate efficient MDX queries for best performance
Varying vendor support requires differing MDX code generation and intermediate rowset processing strategies
Support for larger number of vendors, wider class of SQL, parent-child hierarchies