Stories from the trenches - Partitioning as a design pattern About the FLIS project Data warehouse...

Post on 17-Jan-2018

222 views 0 download

description

Stories from the trenches - Partitioning as a design pattern About the FLIS project Data warehouse heroes and architecture Partitioning in many disguises

Transcript of Stories from the trenches - Partitioning as a design pattern About the FLIS project Data warehouse...

SQL

Kennie Nybo Pontoppidan

Stories from the trenches - Partitioning as a design pattern

Head of R&DRehfeld

Agenda

Stories from the trenches - Partitioning as a design pattern

• About the FLIS project• Data warehouse heroes and architecture• Partitioning in many disguises

The FLIS Project

You are here

The FLIS project - Team • Netcompa

ny75%

• Rehfeld25%

• TDC hosting

The FLIS project - Mission• Jointly defined KPI’s

• View your own KPI’s• Benchmark with ”twins”

• Access to your own data • Common datamodel• Raw data

Access to your own data?• ~ 72 mill. kr.

to get data for 4 years

• (10 mill. euro)

The FLIS project – DW Challenges• Data from 30 different it-systems

• 7 subject areas• Citizens• Employees• Salaries• Absence

• ERP• Budgets• Postings

• Schools• …

• Conformity• Within area• Between areas

Data warehouse heros

Bill

Ralph

Dan

Bill Inmon• Inventor of the word

Data warehouse

• Oracle reference architecture

• Top down approach

• Hub and Spoke

Ralph Kimball• ”Invented” dimension

modelling

• Microsoft ”reference architecture”

• Bottom up

Dan Lindstedt• Hybrid model

• Hubs, satellites and Links

FLIS Data Warehouse Architecture

Technologies

Squirrel server ?

SSIS package XML

Informatica PowerCenter XML

Look up the wordpar·ti·tion  (pär-tshn)n.

1.a. The act or process of dividing something into parts.b. The state of being so divided.2.a. Something that divides or separates, as a wall dividing one room or cubicle from another.b. A wall, septum, or other separating membrane in an organism.3. A part or section into which something has been divided.…

Source: http://www.thefreedictionary.com/partitioning

MCM videos

Sqlskills.com

Partitioning as a design patternDividing something into parts

• Files• Database• Table

State of being so divided• ETL developers vs. Customer• Developers vs (evil?) DBA’s• Backups (what is a full backup anyway)

Dividing something into parts - files• Xml, csv, xls, fixed

format• 1 file• Data from • 1 or more tables• Data from 1 or more municipality

• 30 different file naming schemes

Dividing something into parts - files• Csv• 1 file• 1 table• Data from just 1 municipality

• 1 file naming scheme

Files (comple

x)

File-dsa mappin

gs(comple

x)

Dsa tables

Dividing something into parts - files

Files (complex)

Preprocessor

(complex)

File-dsa mappings (simple)

Dsa tables

Dividing something into parts - Database1 data warehouse layer = 1 database

• Scaling• IO-pattern for a data flow

Dividing something into parts - Database1 database

• 4 LUNS, RAID 10• 4 datafiles

Dividing something into parts - DatabaseFilegroups

• PRIMARY • 160MB

• BIG ONE• A lot of data

Dividing something into parts - TablesTable partitioning (2005 EE feature)

• Pruning• Divide the data warehouse into 100 parts 1/100 the size

• Switching• Separation of readers and writers• Fast

Dividing something into parts - Tables

• DSA• Partition by (municipality_id,

month_year)

• EDW and data marts• Partition by municipality_id

Ask Kristian!

CREATE TABLE [dbo].[partition_test](              [Kommune_id] [varchar](11) NULL,              [Kommunenummer] [nvarchar](4000) NULL,              [Distrikt_kode] [nvarchar](4000) NULL,              [Distrikt_type] [nvarchar](4000) NULL,              [Distrikt_tekst] [nvarchar](4000) NULL,              [Cpr_Dist_Tekst_TS] [nvarchar](4000) NULL) ON [partition_test_pt_sc]([Kommune_id])GO

Dear Mark! Please, please, please, pl

Dividing something into parts – CPU’s

• Affinity masking

• Not really possible in Oracle – even on Windows

State of being so divided

State of being so divided• Informatica PowerCenter (ETL vs.

Customer)

• Meta data • File definitions• Table definitions in all layers• Simple transformations

• Autogenerate mapping code from meta data• Go on – please change requirements

Mapping metadata model

MappingID (int)

DestinationColumnFK (int)SourceColumnFK (int)

Mapping

ColumnID (int)

ColumnName (varchar)

Column

DWLayerID (int)DWLayerName (varchar)

DWLayer

TransformationID (int)FlowFK (int)

Transformation

ColumnTypeFK (int) ColumnTypeID (int)ColumnTypeName (varchar)

ColumnType (source data)

TableFK (int)

TableTypeID (int)TableTypeName (varchar)

TableType

TransformationSchemaID (int)TransformationTypeID (int)

TransformationSchema

StandardColumnID (int)

StandardColumnName (varchar)

StandardColumn

DWLayerFK (int)

Logic (varchar)

TransformationTypeID (int)TransformationTypeName (varchar)

TransformationTypeTransformationSchemaName (varchar)

FlowID (int)Flow

FlowName (varchar)FlowTypeFK (int)

FlowTypeID (int)FlowType

FlowTypeName (varchar)

TransformationFK (int)

TransformationSchemaFK (int)Logic (varchar)

DataTypeFK (int)TableID (int)

TableName (varchar)

Table

DWLayerFK (int)TableTypeFK (int)

Order???

DataTypeID (int)DataTypeName (varchar)

DataType

State of being so divided• Developers vs. • (evil) DBA’s

• Meta data on table definitions• Script ddl from meta data• Hide partitioning in ddl• Partition Scheme• Change File group design

State of being so divided• Backups: DBA’s vs. Backup administrator

• Partitioning helps us divide data into • Hot• (C)old (and therefore read only)• Only backup hot partitions

State of being so divided• Restoring: DBA vs. SQL server • PRIMARY filegroup• 150 MB

• Default filegroup• Big

• Restore database • only primary filegroup => online

State of being so divided• DW vs OLTP and OLAP

• DW server• ETL• processing OLAP databases/cubes• Reporting server• Sharepoint databases (OLTP)• restoring OLAP databases/cubes

We also use these cool features…• Page compression• Backup compression

Page compression results

tableFactor (size) Num rows num reads

cpu time (ms)

elapsed time (ms)

dg1 1 89132 372 31 739dg1 2 178264 743 31 1517dg1 10 891320 3781 281 7551

dg1_page_compr 1 89132 143 15 748dg1_page_compr 2 178264 285 62 1512dg1_page_compr 10 891320 1427 421 7596

code vartype varvalDA000 DGALT 00K00DA000 DGCAT 06M38ADA000 DGPROP 26X01DA001 DGALT 00K00DA001 DGCAT 06M38ADA001 DGPROP 26X01DA009 DGALT 00K00DA009 DGCAT 06M38ADA009 DGPROP 26X01

or

EvaluationCreate a Text message on your phone and send it to 1919 with the content:

DB301 5 5 5 I liked it a lotSession Code

KenniePerformance (1 to 5)

Match of technical

Level(1 to 5)

Relevance(1 to 5) Comments

(optional)

Evaluation Scale: 1 = Very bad 2 = Bad 3 = Relevant 4 = Good 5 = Very Good!

Questions:• Speaker Performance• Relevance according to

your work • Match of technical level

according to published level• Comments

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.  Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.  MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Thank you