ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data...
-
Upload
amber-mitchell -
Category
Documents
-
view
212 -
download
0
Transcript of ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data...
ISV Innovation
Presented by
ISV Innovation
Presented by
Business Intelligence Fundamentals: Data Loading
Ola EkdahlIT Mentors
9/12/08
ISV Innovation
Presented by
2Data Loading
Agenda
1.Dimension Table Load
2.Fact Table Load
3.Working with SCD’s
4.SSIS SCD Task
5.Load Data using Staging Tables
6.Top 10 Best Practices
ISV Innovation
Presented by
3
Data Flow Task
Encapsulates the data flow engine
Extract
Transform
Load
Data Loading
ISV Innovation
Presented by
4
Populating Dimension Tables
Y
Insert newrecord
Update changedcolumn(s)
Expire existingrecord
Transform Correlaterecords
N
N
Y
Type 2change?
Y
Type 1change?
New record?
Dimension source
Data Loading
ISV Innovation
Presented by
5
Populating Fact Tables
Y
Insert newrecord
Insert newdimension record
Lookupdimension key
N
Lookup failed?
Repeat for each dimension key
TransformFactsource
Data Loading
ISV Innovation
Presented by
6
Slowly Changing Dimension Transformation The wizard-based configuration promotes rapid ETL
development Supports
Type 0 (Fixed Attribute) Type 1 (Changing Attribute) Type 2 (Historical Attribute) Inferred member management
Automatically constructs the downstream data flow Handles the majority of slowly changing dimension
scenarios
Data Loading
ISV Innovation
Presented by
7
Populating Dimension Tables
Y
Insert newrecord
Update changedcolumn(s)
Expire existingrecord
Transform Correlaterecords
N
N
Y
Type 2change?
Y
Type 1change?
New record?
Dimension Source
Data Loading
ISV Innovation
Presented by
8Data Cleansing
ISV Innovation
Presented by
LastName update to Valdez-Smythe
SCD Type 19
Existing record is updated
History is not preserved
Data Loading
ISV Innovation
Presented by
SCD Type 210
Existing record is ‘expired’ and new record inserted
History is preserved
Most common form of Slowly Changing Dimension
SalesTerritoryKey update to 10
Data Loading
ISV Innovation
Presented by
11
Configuring the SCD Transformation Step 1 Select the target dimension table
Configure the relationship between the source data and the dimension table
The relationship is established with the business key stored in the dimension table
Data Loading
ISV Innovation
Presented by
12
Configuring the SCD Transformation Step 2 Select the columns and their change type:
Fixed (Type 0)
Changing (Type 1)
Historical (Type 2)
Data Loading
ISV Innovation
Presented by
13
Configuring the SCD Transformation Step 3 Configure the behavior if Fixed attributes change
Configure whether Changing attributes should update the current record or all matching records
Data Loading
ISV Innovation
Presented by
14
Configuring the SCD Transformation Step 4 Configure how Historical attributes identify current
and expired records: Single Boolean column, or
Start and End date columns
It is best practice to store Start and End dates
Data Loading
ISV Innovation
Presented by
15
Configuring the SCD Transformation Step 5 If inferred members are stored in the dimension
table, define how they are identified: When all columns with a change type are null, or
By a single Boolean column
Data Loading
ISV Innovation
Presented by
16
The Slowly Changing Dimension Transformation
Based on your configuration, the wizard completes the downstream data flow
Data Loading
ISV Innovation
Presented by
17Data Cleansing
ISV Innovation
Presented by
18Data Loading
Staging Tables
1. Create a staging table that matches the logical structure of the target table. Load the bulk data into this table. This loading can be much faster (compared to loading directly into the target table) because the staging table has no indexes or constraints on it. More importantly, while the new data is being loaded, the existing data is fully available for all transactions without any impact, because the data load is taking place on a separate staging table.
2. Create constraints and indexes on the staging table that are equivalent to those that exist on the target table. While you create constraints and indexes on the staging table, the existing data is fully available for all transactions without any impact, because this is taking place on the staging table.
3. Execute the ALTER TABLE … SWITCH statement to move the data from the staging table to an empty partition of the target table. This is a metadata-only operation, and is very fast (usually under a second) when there are no long transactions on the target table. This way, all the loading and indexing happens outside the main table, and then the data quickly moves into the main table. One important thing to note here is that the ALTER TABLE … SWITCH operation requires a schema modification (Sch-M) lock on the table. Long running transactions can block the switch operation from acquiring the Sch-M lock and make it wait
ISV Innovation
Presented by
19Data Loading
Partitions
In data warehouse applications, it is very common to partition data on the date dimension. This helps in periodically archiving or deleting the old data as new data comes in, with minimal effect on performance and availability. Depending on data volume, database designers pick yearly, quarterly, monthly, weekly, daily or even hourly partitions.
One way to partition a table is to create one partition for each incremental load. For example, if you load data on a daily basis, create one partition for each day. By using this partitioning approach, you can use the technique described earlier in this paper (load and index data in a staging table, and then use ALTER TABLE … SWITCH) very efficiently. One partition is added for every incremental load, and the number of partitions increases over time.
ISV Innovation
Presented by
Optimize PerformancePartitioning Query processing improvements
Partition-aware seeks
Parallel queryplan strategies
Partition-aligned indexed views Switched together
with the partition
Easy-to-switchpartitions
P1
P2
P3
New Partition
Agg(P1)
Agg(P2)
Agg(P3)
Aggregates for New Partition
Detail DataDay level Indexed View
Month, Year level
Switch new partition
Data Loading
ISV Innovation
Presented by
21Data Cleansing
ISV Innovation
Presented by
22
ISV Innovation
Presented by
Thank You
SQL Customer Advisory Team
www.sqlcat.com
Working With Partitions
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/loading_bulk_data_partitioned_table.mspx
Data Loading
ISV Innovation
Presented by
23
ISV Innovation
Presented by
www.isvinnovation.com
More recordings available at: