Agreggates ii

21
Designing Aggregates Ing. Julio Ernesto Carreño Vargas

Transcript of Agreggates ii

Page 1: Agreggates ii

Designing Aggregates

Ing. Julio Ernesto Carreño Vargas

Page 2: Agreggates ii

Designing Aggregates

Once you have chosen dimensional aggregates, they must

be designed and documented. This is the point of greatest

risk for aggregate implementation.

2

Page 3: Agreggates ii

Definig The Base Schema

3

Page 4: Agreggates ii

The Base Schema

Declaration of grain is an essential part of schema design.

Proper definition of grain not only enables the future

identification of aggregates, it is crucial to the success of

the base schema itself.

4

Page 5: Agreggates ii

Rollup dimensions

Conforming rollup dimensions and their natural keys

5

Page 6: Agreggates ii

Rollup dimensions

Rollup dimensions should be sourced from the base

dimensions, and their attributes must follow the same

rules for slow change processing.

6

Page 7: Agreggates ii

Hierarchies

Documenting dimensional hierarchies

may be important for business

intelligence software and database

features such as materialized views and

materialized query tables.

The hierarchies identify potential

aggregation points and can aid in

estimating degree of summarization.

7

Page 8: Agreggates ii

Housekeeping Columns

they are present for a purely technical reason

8

Page 9: Agreggates ii

Design Principles for the Aggregate

Schema

9

Page 10: Agreggates ii

A Separate Star for Each Aggregation

Dimensional aggregates should be stored in separate

tables for each aggregation.

10

Page 11: Agreggates ii

A Separate Star for Each Aggregation

Do not store different levels of aggregation in the same

schema. The schema will be capable of providing wrong

results.

11

Page 12: Agreggates ii

Aggregate facts

Aggregate facts should be stored in separate tables for

each level of aggregation. These may be separate

aggregate fact tables or separate prejoined aggregate

tables

12

Page 13: Agreggates ii

Naming Conventions

Facts and dimensional attributes should receive the same

name in anaggregate schema as they do in the base

schema.

The name of an aggregate dimension table should

describe the contents of its rows.

The names of aggregate fact tables are always

problematic. The best you can do is establish a convention

and stick to it.

13

Page 14: Agreggates ii

Aggregate Dimension Design

Attributes of the aggregate dimension must be identical

to those in the base dimension in name and data type.

Slow change processing rules must be identical. The

natural key of an aggregate dimension will be different

from the base dimension.

Source aggregate dimensions from the base dimension,

rather than the original source system. This eliminates

redundant processing, and ensures uniform presentation

of data values.

14

Page 15: Agreggates ii

Aggregate Dimension Design

Aggregate dimension tables are often shared by multiple

aggregates, and sometimes used by base fact tables. These

shared dimension tables do not need to be built

redundantly; the various fact tables can use the same

dimension table. If the shared table is to be instantiated

more than once, build it a single time and then replicate

it.

The documentation for a shared dimension must enumerate all

dependent fact tables, whether part of the base schema or

aggregates. In some cases, frequent updates to a dimension may

require updates to fact tables outside their normal load

windows.

15

Page 16: Agreggates ii

Aggregate Fact Table Design

Aggregate Facts: Names and Data Types

The aggregate fact should have the same business definition and column name as the base fact

Unlike dimensional attributes, the aggregate fact may have a different data type than its counterpart in the base schema

No New Facts, Including Counts

Counts cannot be accurately performed against aggregate schemas, even if all attributes are the same. All counts must be performed against the base schema.

As a general rule of thumb, the only count to be added to an aggregate should show the number of base rows summarized. If this fact is added to the aggregate, it should also appear in the base fact table with a constant value of 1. Counts of any other attribute should be directed to the base schema only.

16

Page 17: Agreggates ii

Aggregate Fact Table Design

Audit Dimension:

The audit record associated with a row in the aggregate fact

table does not summarize the audit data associated with the

base fact table. It describes the process by which the aggregate

row was inserted or updated.

Sourcing Aggregate Fact Tables

Facts will be sourced from the base fact table and aggregated

by the load process as appropriate.

17

Page 18: Agreggates ii

Documenting the Aggregate Schema

18

Page 19: Agreggates ii

Documenting the Aggregate Schema

Identify Schema Families

Identify Dimensional Conformance

19

Page 20: Agreggates ii

Documenting the Aggregate Schema

Documenting Aggregate Dimension Tables

Documenting Aggregate Fact Tables

20

Page 21: Agreggates ii

Bibliografía

Mastering Data Warehouse Aggregates.Solutions for Star

Schema Performance. Christopher Adamson.

21