Armazéns de Dados

19
Armazéns de Dados OLAP operations Based on slides by Alejandro Vaisman & Esteban Zimány

Transcript of Armazéns de Dados

Page 1: Armazéns de Dados

Armazéns de

DadosOLAP operations

Based on slides by Alejandro Vaisman & Esteban Zimány

Page 3: Armazéns de Dados

OLAP Operations

| 3

Page 4: Armazéns de Dados

OLAP Operations

| 4

Page 5: Armazéns de Dados

OLAP Operations

• Starting cube: quarterly sales (in thousands) by product category and customer cities for 2012

• We first compute the sales quantities by country: a roll-up operation to the Country level along the Customer

dimension

• Sales of category Seafood in France significantly higher in the first quarter

o To find out if this occurred during a particular month, we take cube back to City aggregation level, and drill-down

along Time to the Month level

• To explore alternative visualizations, we sort products by name

• To see the cube with the Time dimension on the x axis, we rotate the axes of the original cube, without changing

granularities → pivoting (see next slide)

• To visualize the data only for Paris → slice operation, results in a 2-dimensional subcube, basically a collection of

time series (see next slide)

• To obtain a 3-dimensional subcube containing only sales for the first two quarters and for the cities Lyon and Paris,

we go back to the original cube and apply a dice operation

| 5

Page 6: Armazéns de Dados

OLAP Operations

| 6

Page 7: Armazéns de Dados

Advanced OLAP Operations

| 7

Page 8: Armazéns de Dados

Advanced OLAP Operations

• To compare sales quantities in 2012 with those in 2011 we need a

cube with the same structure than the one for 2012

• Given two cubes, drill-across builds a new cube with the measures

of both in each cell

• To compute the percentage change of sales between the two years:

Drill-across operator and apply then add measure, which computes a

new value for each cell from the values in the original cube

• We also want to aggregate data in various ways; we start, from the

original cube, computing total sales by quarter and city, using the sum

aggregation operator| 8

Page 9: Armazéns de Dados

Advanced OLAP Operations

• Last operation (bottom right of slide) requires aggregation

• Aggregation functions in OLAP can be:

o Cumulative: compute the measure value of a cell from several

other cells; examples are SUM, COUNT, and AVG

o Filtering: Filters the members of a dimension that appear in the

result; examples are MIN and MAX

─ Filtering functions compute not only the aggregated value, but

also the members of the dimension that belong to the result

• To aggregate measures of a cube at the current granularity

without performing a roll-up

| 9

Page 10: Armazéns de Dados

Advanced OLAP Operations

| 10

Page 11: Armazéns de Dados

Advanced OLAP Operations

• Aggregation

• “Total overall quantity” (not shown in the previous slide) yields a single cell whose coordinates for

the three dimensions equal all

• Aggregation without changing granularity

yielding a cube where only the cells containing the maximum by quarter and city will have

values, the others will be null

o Top two sales by product and city

• Three-month moving average of sales

o ADDMEASURE(Sales, MovAvg = AVG(Quantity) OVER Time 2 CELLS PRECEDING)

• ‘Year-to-date sum”:

o ADDMEASURE(Sales, YTDQuantity = SUM(Quantity) OVER Time ALL CELLS PRECEDING)

o The window contains the current cell and all previous ones (indicated by ALL CELLS

PRECEDING)

| 11

Page 12: Armazéns de Dados

Advanced OLAP Operations

• We go back to the original cube to compute the quarter sales that

amount to 70% of the total sales, applying the top percent aggregation

operator

• Finally, we rank the quarterly sales by category and city

| 12

Page 13: Armazéns de Dados

Advanced OLAP Operations

• Union merges two cubes having the same schema but

disjoint instances.

• Difference removes the cells in a cube that belong to

another one; the two cubes must have the same schema

• Drill-through operation allows to move from data at the

bottom level in a cube to data in the operational systems

from which the cube was derived

o Could be used when trying to determine the reason for

outlier values in a data cube| 13

Page 14: Armazéns de Dados

Advanced OLAP Operations

| 14

“Top two sales” cube

Page 15: Armazéns de Dados

Summarizing OLAP Operations

| 15

Page 16: Armazéns de Dados

Exercises

| 16

Page 17: Armazéns de Dados

Exercises

1. A data warehouse of a telephone provider consists of five dimensions, namely, caller

customer, callee customer, time, call type, and call program, and three measures,

namely, number of calls, duration, and amount.

Define the OLAP operations to be performed in order to answer the following queries.

Propose dimension hierarchies when needed.

a. Total amount collected by each call program in 2012.

b. Total duration of calls made by customers from Brussels in 2012.

c. Total number of weekend calls made by customers from Brussels to customers in

Antwerp in 2012.

d. Total duration of international calls started by customers in Belgium in 2012.

e. Total amount collected from customers in Brussels who are enrolled in the corporate

program in 2012.

| 17

Page 18: Armazéns de Dados

Exercises

2. A data warehouse of a train company contains information about train segments. It consists of six

dimensions, namely, departure station, arrival station, trip, train, arrival time, and departure time, and

three measures, namely, number of passengers, duration, and number of kilometers.

Define the OLAP operations to be performed in order to answer the following queries. Propose

dimension hierarchies when needed.

a. Total number of kilometers made by Alstom trains during 2012 departing from French or Belgian

stations.

b. Total duration of international trips during 2012, that is, trips departing from a station located in a

country and arriving at a station located in another country.

c. Total number of trips that departed from or arrived at Paris during July 2012.

d. Average duration of train segments in Belgium in 2012.

e. For each trip, average number of passengers per segment, that is, take all the segments of each

trip, and average the number of passengers.

| 18

Page 19: Armazéns de Dados

Exercises

3. Consider the data warehouse of a university that contains information about teaching and research

activities. On the one hand, the information about teaching activities is related to dimensions

department, professor, course, and time, the latter at a granularity of academic semester. Measures

for teaching activities are number of hours and number of credits. On the other hand, the information

about research activities is related to dimensions professor, funding agency, project, and time, the

latter twice for the start date and the end date, both at a granularity of day. In this case, professors

are related to the department to which they are affiliated. Measures for research activities are the

number of person months and amount.

Define the OLAP operations to be performed in order to answer the following queries. For this,

propose the necessary dimension hierarchies.

a. By department, total number of teaching hours during the academic year 2012/2013..

b. By department, total amount of research projects during the calendar year 2012.

c. By department, total number of professors involved in research projects during the calendar year

2012.

d. By professor, total number of courses delivered during the academic year 2012/2013.

e. By department and funding agency, total number of projects started in 2012.| 19