Dimensions Facts

download Dimensions Facts

of 6

Transcript of Dimensions Facts

  • 7/27/2019 Dimensions Facts

    1/6

    Dimensions:

    The dimension tables are where the attributes of the dimensions of the business are stored. Thebest attributes are textual and discrete and used to constraint the fact table. Each of these textual

    descriptions helps us to describe the member of the respective dimension.

    They are the entry points into the fact tables. They determine the grain of the fact table.

    They serves as a primary source of query constraints grouping and report labels/rowheaders.

    They are relatively shallow in terms of rows but are wide with many large columns.

    They are not usually time dependent

    Hierarchical relationships.

    Robust dimension attributes delivers analytic slicing and dicing capabilities.

    Dimension tables are de-normalized.

    Examples of Dimensions: Employee, Time Product Customer etc

    Dimension Keys:

    Dimensional Modeling proposes that the dimension keys should be surrogate keys. surrogatekeys are integers assigned sequentially as needed to populate a dimension.

    They are also know as meaningless keys, integer keys, artificial keys, synthetic keys etc.

    Every join between dimension tables and fact tables in a data warehouse environment should bebased on surrogate keys, not natural keys. Primary Benefits of surrogate keys is that they bufferthe data warehouse environment from operational changes. Avoid adverse impact on

    performance in case of composite natural keys.

    Avoid smart keys, Natural keys or Production keys.

    Keys where you can tell something about the record just by looking at the key are called smartkeys.Data warehouse team is able to maintain control over the environment without getting

    Effected by operational rules of generating, updating, deleting, recycling and reusing productionkeys. Ex: Multiple sources using same keys, Production reusing the same values after data purge,Systems with different format keys being added at a later stage etc.

  • 7/27/2019 Dimensions Facts

    2/6

    Slowly Changing Dimensions (SCD):

    In the real world, dimensions and their descriptions, though relatively constant, evolve over time

    employees come and go, they are promoted, salaries change etc. The term slowly changing

    dimensions is the variation in dimensional attributes over time. The word slowly in this contextmight seem incorrect but in general, when compared to a measure in a fact table, changes to

    dimensional data occur s lowly.

    We need to have a strategy to deal with these changed attributes over time. When we encounter aslowly changing dimension we face making one of the following three fundamental choices.Each choice results in a different degree of tracking changes over time

    Type One (Overwriting History): A Type 1 change overwrites an existing dimensional attribute

    with new information. In the customer name-change example, the new name overwrites the oldname, and the value for the old version is lost. A Type One change updates only the attribute,

    doesn't insert new records, and affects no keys. It is easy to implement but does not maintain anyhistory of prior attribute values

    Type Two (Preserving history) Creating an additional dimension record at the time of the changewith the new attribute values and thereby segmenting history very accurately betwee n the olddescription and the new description. Implementing Type Two changes within a data warehouse

    might require significant analysis and development. Type Two changes accurately partitionhistory across t ime more effectively than other types. However, because Type Two changes add

    records, they can significantly increase the database's size.

    Type Three (Preserving a version of history) Creating new current fields within the original

    dimension row to record the new attribute values, while keeping the original attribute values aswell, thereby being able to describe history both forward and backward from the change either in

    terms of the original attribute values or in terms of the current attribute values. You usuallyimplement Type Three changes only if you have a limited need to preserve and accuratelydescribe history, such as when someone gets married and you need to retain the previous name.

    Hybrid Type As an alternative, you can implement a mix of Type One and Type Two changes at

    an attribute level by implementing Type 2 changes for only attributes whose historical values areimportant when you're slicing and dicing. For example, users might not need to know anindividual's previous name if a name change occurs, so a Type One change would suffice. Users

    might want the system to show only the person's current name. However, if the companyreassigns sales territories, users might need to track who sold what, at what time, and in what

    territory, necessitating a Type Two change.

  • 7/27/2019 Dimensions Facts

    3/6

    Rapid Changing Dimensions (RCD):

    In case of rapidly changing dimensions the dimension attribute values change rapidly over time.

    Note that there are no yardstick for telling when a dimension is slowly changing or not and this is

    based on the judgment of the data modeler. Also an SCD may become a RCD over time or viceversa. For RCDs the design followed depends on the size of the dimension

    Small dimensions: The same technologies as for slowly changing dimensions may be applied

    Large dimensions: The best approach for efficiently browsing and tracking changes of keyattributes in really huge dimensions is to break off one or more mini dimensions from the

    dimension table, each consisting of small clumps of attributes that have been administered tohave a limited number of values.

    Degenerate Dimensions:

    A degenerate dimension is represented by a dimension key attribute with no correspondingdimension table. Degenerate dimensions usually occur in line item-oriented fact table designs.

    Many of the dimensional designs revolve around some kind of control document like an order,an invoice, a bill of lading, or a ticket. Usually these control documents are a kind of container

    with one or more line items inside. A very natural grain for a fact table in these cases is theindividual line item, In other words, a fact table record is a line item.

    the attributes on the order number automatically go over to these chosen dimensions e.g.Product, Customer, Time etc.

    At the end of the design, the order number is sitting by itself, without any attributes. We call thisa degenerate dimension. The degenerate dimension key should be the actual production order

    number and should sit in the fact table without a join to anything. There is no point of making adimension table because the dimension table would not contain anything .

    Junk Dimensions:

    A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. Bycreating an abstract dimension, we remove the flags from the fact table while placing them into auseful dimensional framework.

    Sometimes after carving out all the dimensions some flags or text attributes are left over in the

    fact table but do not belong to any of the dimension tables. When a number of miscellaneousflags and text attributes exist, the following design alternatives should be avoided:

    Leaving the flags and attributes unchanged in the fact table record Making each flag and attribute into its own separate dimension

    Stripping out all of these flags and attributes from the design

  • 7/27/2019 Dimensions Facts

    4/6

    A better alternative is to create a junk dimension.

    Conformed Dimensions:

    Conformed dimensions can be used to analyze facts from two or more data marts. For example

    shipping and sales data marts both require a customer dimension and a time dimension.If theyre the same dimension, then you have conforming dimensions, allowing you to extract

    and manipulate facts relating to a particular customer from both marts, answering questions suchas whether late shipments have affected sales to that customer.

    Adding a marketing data mart to analyze product promotions, with conformed customer andtime dimensions, youre able to analyze the effects of aparticular product promotion on sales.(Analyzing facts from more than one fact table in this way is termed drilling across. )

    The same conformed dimensionsin this case, time and customer dimensionshave meaning inthe context of three independently developed data marts. These dimensions become enterpriseproperty and can be used later in other marts as the enterprise data warehouse evolves.

    Conformed dimensions have consistent definitions regardless of where they are used. Thisallows a single query to be run across multiple tables, Data Marts and Data Warehouses

    Facts:

    The fact table is at the center of a star schema and holds the primary measurement

    data. They contain the actual numerical measurements that the business is interested in.

    Fact tables express the many-to-many relationships between dimensions.

    A fact table typically has two types of columns: those that contain measures and those

    that are foreign keys to dimension tables. Some key features of a fact table are

    Multi part Key. I.e. a composite key with one foreign key for each dimension. Time is a always a part of the key Usually numeric. Keys are surrogate integers and the measures are numeric. Typically additive.

    Granularity refers to the level of data in the fact table. The lowest granularity is

    referred as atomic data. The granularity is determined by the grain. The meaning of a

  • 7/27/2019 Dimensions Facts

    5/6

    single record in a fact table is grain. The granularity also determines how far you can

    drill down without returning to the base, transaction system data. The lower the grain,

    the more records will be present in the fact table. we must make sure that the grain is

    low enough to support our decision support needs

    Fact Types

    Additive Facts

    Additive facts are the measurements in a fact table that can be added across all

    dimensions. e.g., discrete numerical measures of activity, i.e., quantity sold, Sales dollars.

    Semi-Additive Facts

    Numeric Facts that can be added across some dimensions in a fact table but not across

    Others. e.g., Inventory levels and balances cannot be added along the time dimension but

    can be averaged usefully over the time dimension.

    Non-Additive Facts

    Facts that cannot logically be added between rows. May be numeric and therefore usually

    must be combined in a computation with other facts before being added across rows. If

    non-numeric, can only be used in constraints, counts or groupings. e.g., measurement of

    room temperature

    Fact less Fact Table

    A fact table that has no facts but captures certain many-to-many relationships between the

    Dimension keys. Most often used to represent events or provide coverage information that

    Does not appear in other fact tables.

    e.g.,

    1. Track student attendance at a college.

    2. Promotion coverage fact to answer questions like "Which products were on promotion

  • 7/27/2019 Dimensions Facts

    6/6

    that didn't sell?" not captured by the sales fact table