Categories of OLAP - ir.nuk.edu.tw08]CategoriesofOLAP.pdf1 Categories of OLAP Categories of OLAP...

55
1 Categories of OLAP Categories of OLAP tools MOLAP, ROLAP, HOLAP, DOLAP OLAP extension to SQL ROLLUP, CUBE, RANK() OVER, Windowing Storage Comparison of MOLAP & ROLAP

Transcript of Categories of OLAP - ir.nuk.edu.tw08]CategoriesofOLAP.pdf1 Categories of OLAP Categories of OLAP...

1

Categories of OLAP

Categories of OLAP toolsMOLAP, ROLAP, HOLAP, DOLAP

OLAP extension to SQLROLLUP, CUBE, RANK() OVER, Windowing

Storage Comparison of MOLAP & ROLAP

2

Categories of OLAP ToolsOLAP tools are categorized according to the architecture used to store and process multi-dimensional data. There are four main categories:

Multi-dimensional OLAP (MOLAP)Relational OLAP (ROLAP)Hybrid OLAP (HOLAP)Desktop OLAP (DOLAP)

3

Multi-dimensional OLAP (MOLAP)

Use specialized data structures and multi-dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data. Data is typically aggregated and stored according to predicted usage to enhance query performance.

4

Multi-dimensional OLAP (MOLAP)

Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application.

5

Multi-dimensional OLAP (MOLAP)

Traditionally, require a tight coupling with the application layer and presentation layer. Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs).

6

Typical Architecture for MOLAP Tools

Query on multidimensional database, cubesMore storage, less time

7

MOLAP Tools - Development Issues

Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data. Navigation and analysis of data is limited because the data is designed according to previously determined requirements.

8

MOLAP Tools - Development Issues

MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support.

9

Relational OLAP (ROLAP)

Fastest-growing style of OLAP technology due to requirements to analyze ever-increasing amounts of data and the realization that users cannot store all the data they require in MOLAP databases.

10

Relational OLAP (ROLAP)

Supports RDBMS products using a metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation.

11

Relational OLAP (ROLAP)

To improve performance, some products use SQL engines to support the complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema.

12

Typical Architecture for ROLAP Tools

Query on fact and dimension tablesLess storage, more time

13

ROLAP Tools - Development Issues

Performance problems associated with the processing of complex queries that require multiple passes through the relational data.Middleware to facilitate the development of multi-dimensional applications. (Software that converts the two-dimensional relation into a multi-dimensional structure).

14

ROLAP Tools - Development Issues

Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures.

15

Hybrid OLAP (HOLAP)

Provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. Deliver selected data directly from the DBMS or via a MOLAP server to the desktop (or local server) in the form of a data cube, where it is stored, analyzed, and maintained locally.

16

Hybrid OLAP (HOLAP)

Promoted as being relatively simple to install and administer with reduced cost and maintenance.

17

Typical Architecture for HOLAP Tools

18

HOLAP Tools - Development Issues

Architecture results in significant data redundancy and may cause problems for networks that support many users. Ability of each user to build a custom data cube may cause a lack of data consistency among users. Only a limited amount of data can be efficiently maintained.

19

Desktop OLAP (DOLAP)

Store the OLAP data in client-based files and support multi-dimensional processing using a client multi-dimensional engine. Requires that relatively small extracts of data are held on client machines. They may be distributed in advance, or created on demand (possibly through the Web).

20

Desktop OLAP (DOLAP)

As with multi-dimensional databases on the server, OLAP data may be held on disk or in RAM, however, some DOLAP products allow only read access. Most vendors of DOLAP exploit the power of desktop PC to perform some, if not most, multi-dimensional calculations.

21

Desktop OLAP (DOLAP)

The administration of a DOLAP database is typically performed by a central server or processing routine that prepares data cubes or sets of data for each user. Once the basic processing is done, each user can then access their portion of the data.

22

Typical Architecture for DOLAP Tools

Some cubes on clients

23

DOLAP Tools -Development Issues

Provision of appropriate security controls to support all parts of the DOLAP environment. Since the data is physically extracted from the system, security is generally implemented by limiting the information compiled into each cube. Once each cube is uploaded to the user's desktop, all additional meta data becomes the property of the local user.

24

DOLAP Tools -Development Issues

Reduction in the effort involved in deploying and maintaining the DOLAP tools. Some DOLAP vendors now provide a range of alternative ways of deploying OLAP data such as through e-mail, the Web or using traditional client/server architecture.Current trends are towards thin client machines.

25

OLAP Extensions to SQLAdvantages of SQL include that it is easy to learn, non-procedural, free-format, DBMS-independent, and that it is a recognized international standard.However, major limitation of SQL is the inability to answer routinely asked business queries such as computing the percentage change in values between this month and a year ago or to compute moving averages, cumulative sums, and other statistical functions.

26

OLAP Extensions to SQL

Answer is ANSI adopted a set of OLAP functions as an extension to SQL to enable these calculations as well as many others that used to be impossible or even impractical within SQL. IBM and Oracle jointly proposed these extensions early in 1999 and they now form part of the current SQL standard, namely SQL: 2003.

27

OLAP Extensions to SQL -RISQL

The extensions are collectively referred to as the ‘OLAP package’ and are described as follows:

Feature T431, ‘Extended Grouping capabilities’SELECT..GROUP BY ROLLUP(columnlist)SELECT..GROUP BY CUBE(columnlist)

Feature T611, ‘Extended OLAP operators’RANK() OVER (ORDER BY columnlist)DENSE_RANK() OVER (ORDER BY columnlist)Windowing calculations

28

Extended Grouping Capabilities

Aggregation is a fundamental part of OLAP. To improve aggregation capabilities the SQL standard provides extensions to the GROUP BY clause such as the ROLLUP and CUBE functions.

29

Extended Grouping Capabilities

ROLLUP supports calculations using aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing levels of aggregation, from the most detailed up to a grand total. CUBE is similar to ROLLUP, enabling a single statement to calculate all possible combinations of aggregations. CUBE can generate the information needed in cross-tabulation reports with a single query.

30

Extended Grouping Capabilities

ROLLUP and CUBE extensions specify exactly the groupings of interest in the GROUP BY clause and produces a single result set that is equivalent to a UNION ALL of differently grouped rows.

31

Extended Grouping Capabilities

ROLLUP Extension to GROUP BYenables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. ROLLUP appears in the GROUP BY clause in a SELECT statement using the following format:

SELECT ... GROUP BY ROLLUP(columnList)

32

Extended Grouping CapabilitiesROLLUP creates subtotals that roll up from the most detailed level to a grand total, following a column list specified in the ROLLUP clause. ROLLUP first calculates the standard aggregate values specified in the GROUP BY clause and then creates progressively higher level subtotals, moving from right to left through the column list until finally completing with a grand total.

33

Extended Grouping CapabilitiesROLLUP creates subtotals at n + 1 levels, where n is the number of grouping columns. For instance, if a query specifies ROLLUP on grouping columns of propertyType, yearMonth, and city (n = 3), the result set will include rows at 4 aggregation levels.

34

Example - Using the ROLLUP Group Function

Show the totals for sales of flats or houses by branch offices located in Aberdeen, Edinburgh, or Glasgow for the months of September and October of 2004.

35

Example - Using the ROLLUP Group Function

SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales

FROM Branch, PropertyFor Sale, PropertySaleWHERE Branch.branchNo = PropertySale.branchNoAND PropertyForSale.propertyNo =

PropertySale.propertyNoAND PropertySale.yearMonth IN ('2004-08', '2004-09')AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’,

‘Glasgow’)GROUP BY ROLLUP(propertyType, yearMonth, city);

36

Example - Using the ROLLUP Group Function

37

Extended Grouping Capabilities

CUBE Extension to GROUP BYCUBE takes a specified set of grouping columns and creates subtotals for all of the possible combinations. CUBE appears in the GROUP BY clause in a SELECT statement using the following format:

SELECT ... GROUP BY CUBE(columnList)

38

Extended Grouping Capabilities

CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.

CUBE can be used in any situation requiring cross-tabular reports. The data needed for cross-tabular reports can be generated with a single SELECT using CUBE. Like ROLLUP, CUBE can be helpful in generating summary tables.

39

Extended Grouping Capabilities

CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension.

40

Example - Using the CUBE Group Function

Show all possible subtotals for sales of properties by branches offices in Aberdeen, Edinburgh, and Glasgow for the months of September and October of 2004.

41

Example - Using the CUBE Group Function

SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales

FROM Branch, PropertyFor Sale, PropertySaleWHERE Branch.branchNo =

PropertySale.branchNoAND PropertyForSale.propertyNo =

PropertySale.propertyNoAND PropertySale.yearMonth IN ('2004-08',

'2004-09')AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’,

‘Glasgow’)GROUP BY CUBE(propertyType, yearMonth, city);

42

Example - Using the CUBE Group Function

43

Elementary OLAP Operators

Supports a variety of operations such as rankings and window calculations.

Ranking functions include cumulative distributions, percent rank, and N-tiles.

Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.

44

Elementary OLAP Operators

Ranking FunctionsComputes the rank of a record compared to other records in the dataset based on the values of a set of measures. There are various types of ranking functions, including RANK and DENSE_RANK. The syntax for each ranking function is:

RANK( ) OVER (ORDER BY columnList)DENSE_RANK( ) OVER (ORDER BY

columnList)

45

Elementary OLAP Operators

The difference between RANK and DENSE_RANK is that DENSE_RANK leaves no gaps in the sequential ranking sequence when there are ties for a ranking.

46

Example - Using the RANK and DENSE_RANK Functions

Rank the total sales of properties for branch offices in Edinburgh.

SELECT branchNo, SUM(saleAmount) AS sales,RANK() OVER (ORDER BY SUM(saleAmount))

DESC AS ranking, DENSE_RANK() OVER (ORDER BY

SUM(saleAmount)) DESC AS dense_rankingFROM Branch, PropertySaleWHERE Branch.branchNo =

PropertySale.branchNoAND Branch.city = ‘Edinburgh’

GROUP BY(branchNo);

47

Example - Using the RANK and DENSE_RANK Functions

48

Elementary OLAP Operators

Supports a variety of operations such as rankings and window calculations. Ranking functions include cumulative distributions, percent rank, and N-tiles. Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.

49

Elementary OLAP Operators

Windowing CalculationsCan be used to compute cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the corresponding window.

50

Elementary OLAP Operators

Windowing CalculationsCan be used to compute cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the corresponding window.These aggregate functions provide access to more than one row of a table without a self-join and can be used only in the SELECT and ORDER BY clauses of the query.

51

Example - Using Windowing Calculations

Show the monthly figures and three-month moving averages and sums for property sales at branch office B003 for the first six months of 2004.

52

Example - Using Windowing Calculations

SELECT yearMonth, SUM(saleAmount) AS monthlySales, AVG(SUM(saleAmount))

OVER (ORDER BY yearMonth, ROWS 2 PRECEDING) AS 3-month moving avg,

SUM(SUM(salesAmount)) OVER (ORDER BY yearMonth ROWS 2 PRECEDING)

AS 3-month moving sumFROM PropertySale

WHERE branchNo = ‘B003’AND yearMonth BETWEEN ('2004-01' AND

'2004-06’)GROUP BY yearMonthORDER BY yearMonth;

53

Example - Using Windowing Calculations

54

Storage Comparison of MOLAP & ROLAPCube

Items

Loca

tions

Times

Sales Tim_ID Item_ID L_ID Sales

1 1 1 1795

1 2 1 1006

1 3 1 993

1 1 2 1237

1 2 2 1059

1 3 2 998

1 1 3 987

1 2 3 689

1 3 3 589

2 1 1 963

2 2 1 789

Time_ID

Year

Item_ID

ItemName

Location_ID

Location

1 2005

2 2006

3 2007

1 Game

2 PC

3 Phone

1 DC

2 NY

3 SF

Dimension tables

ROLAP Fact table

55

Storage Comparison of MOLAP & ROLAP

Time_ID Year Quarter month

Dimension tablesROLAP MOLAP

50 rows Year 2005 2006

Quarter 1 1

Month 1 2

Class Elect Elect

Subclass Home Home

Name PC PC

Country USA USA

L_name NY NY

Sales 1003 987

Item_ID Class subclass Name 50

L_ID Country L_Name50

Fact tableTime_ID Item_ID L_ID Sales

50*4*8+ 50*4*8 + 50*3*8 =4,400 bytes

50*50*50*4*8=4,000,000

4,000,000+4,400=4,004,400 bytes50*50*50*9*8=9,000,000 bytes

> 4,004,400 bytes