Agreggates i

Data Warehouse Aggregates

Ing. Julio Ernesto Carreño Vargas

Using the Star Schema

The queries against a star schema follow a consistent pattern. One or more facts are typically requested, along with the dimensional attributes that provide the desired context. The facts are summarized as appropriate, based on the dimensions.

2

Aggregate tables

Aggregate tables improve data warehouse performance

by reducing the number of rows the RDBMS must access

when responding to a query

Base schema Aggregate schema

3

aggregate dimension table

4

aggregate characteristic

The more highly summarized an aggregate table is, the

fewer queries it will be able to accelerate.

This means that choosing aggregates involves making careful

tradeoffs between the performance gain offered and the

number of queries that will benefit.

5

The Aggregate Navigator

To receive the performance benefit offered by an

aggregate schema, a query must be written to use the

aggregate.

aggregate navigator: A component of the data warehouse

infrastructure, the aggregate navigator assumes the task of

rewriting user queries to utilize aggregate tables.

6

Principles of Aggregation

An aggregate schema must always provide exactly the

same results as the base schema.

The attributes of each aggregate table must be a subset of

those from a base schema table.

The only exception to this rule is the surrogate key for an

aggregate dimension table.

7

summarization techniques

Aggregate Tables

Pre-Joined Aggregates

Derived Tables

8

Pre-Joined Aggregates

a pre-joined aggregate summarizes a fact across a set of

dimension values. But unlike the aggregate star schemas

the pre-joined aggregate places the results in a single

table.

By doing so, the pre-joined aggregate eliminates the need for

the RDBMS to perform a join operation at query time.

9

Derived Tables

alter the structure of the tables summarized or change

the scope of their content.

Types:

the merged fact table: combines facts from more than one fact

table at a common grain

the pivoted fact table: transforms a set of metrics in a single

row into multiple rows with a single metric, or vice versa.

the sliced fact table: contains a subset of the records of the

base fact table, usually in coordination with a specific

dimension attribute.

In all three cases, the derived fact tables are not expected

to serve as invisible stand-ins for the base schema.

10

Tables with New Facts

Semi-additive facts may not be added together across a

particular dimension; non-additive facts are never added

together. In these situations, you may choose to aggregate

by means other than summation.

11

Choosing Aggregates

One of the most vexing tasks in deploying dimensional

aggregates is choosing which aggregates to design and

deploy.

Your aim is to strike the correct balance between the

performance gain provided by aggregate schemas and their cost

in terms of resource requirements.

12

Choosing Aggregates

What Is a Potential Aggregate?

Identifying Potentially Useful Aggregates

Assessing the Value of Potential Aggregates

13


Aggregate Fact Tables: A Question of Grain

Aggregate Dimensions Must Conform

Pre-Joined Aggregates Have Grain Too

Enumerating Potential Aggregates

14


Express potential aggregates as fact table grain statements

Orders by day, salesperson and product

Orders by day, customer, and product

Orders by month, product, and salesperson

15

Enumerating Potential Aggregates

6*4*4*4*2*2 = 1563 1534 posibles agregados

16


Drawing on Initial Design

Design Decisions

Listening to Users

Where Subject Areas Meet

The Conformance Bus

Aggregates for Drilling Across

Query Patterns of an Existing System

Analyzing Reports for Potential Aggregates

Choosing Which Reports to Analyze

17


Identify and document potential aggregates during schema design, even if initial implementation will not include aggregates. This information will be useful in the future.

Any decision to set the grain of a fact table at a finer level reveals a potential aggregate.

Decisions about where to place groups of dimensional attributes reveal potential levels of aggregation.

Discussion of hierarchies or drill paths point to potential aggregates

User work products reveal potential aggregates. These may include reports from operational systems, manually compiled briefings, or spreadsheets. They will also be revealed by manual processes and requirements not currently met.

18

Aggregates for Drilling Across

The process of combining

information from multiple fact

tables is called drilling across

Consult the conformance bus

to identify aggregates that will

be used in drill-across reports.

The lowest common

dimensionality between two fact

tables often suggests one or

more aggregates.

19

Analyzing Reports for Potential Aggregates

The detail rows

require order facts

by product and

month.

The summary rows

require order facts

by category and

month.

The grand total

requires order facts

by month.

20

Drilling

Drill paths suggest

aggregates

21


After identifying a pool of potential aggregates, the next

step is to sort through them and determine which ones

to build.

22


Number of Aggregates

Presence of an Aggregate Navigator

Space Consumed by Aggregate Tables

How Many Rows Are Summarized

Examining the Number of Rows Summarized

The Cardinality Trap and Sparsity

Who Will Benefit from the Aggregate

23

Examining the Number of Rows

Summarized

A good starting rule of thumb is to identify aggregate fact

tables where each row summarizes an average of 20

rows.

The savings afforded by aggregates can be lopsided,

favoring a particular attribute value.

Remember that, like a base fact table, a dimensional

aggregate can be aggregated during a query. Aggregates

may be competing with other aggregates to offer

performance gains.

24

The Cardinality Trap and Sparsity

Cardinality:The number of distinct values taken on by a

given attribute

sparse:not all combinations of keys are present.

Don’t assume aggregate fact tables will exhibit the same

sparsity as the tables they summarize.

The higher the degree of summarization, the more dense the

aggregate fact table will be.

The best way to get an idea of the relative size of the

aggregate is to count the number of rows.

As before, count the distinct combination of keys and/or

summarized dimension attributes.

25

Who Will Benefit from the Aggregate

The first aggregates you add to your implementation are

those that offer benefits across the widest number of

user requirements. Aggregates that fall in the 20:1 range

of savings are compared with one another to identify

those that support the most common user requirements.

Start by selecting aggregates that provide solid

performance boosts for a wide number of common

queries. To this, add more powerful (but more narrowly

used) aggregates as space permits. Use the relative

importance of one aggregate over another in a tiebreaker

situation.

26

Bibliografía

Mastering Data Warehouse Aggregates.Solutions for Star

Schema Performance. Christopher Adamson.

27

Agreggates i

Documents

Transcript of Agreggates i