Christian Winther Kristensen
-
Upload
infinit-innovationsnetvaerket-for-it -
Category
Technology
-
view
286 -
download
0
description
Transcript of Christian Winther Kristensen
APPLICATION OF SQL SERVER COLUMNSTOREINDEXES IN BI-SOLUTIONSTemadag: Modern Analytical Database Technology28. oktober 2014, Aalborg Universitet
Christian Winther KristensenManaging [email protected]
• SQL server columnstore index
• Practical case
• New updateable clusteredcolumnstore in SQL server 2014
• Comparison: Pros and cons
• Questions
03-11-2014
Agenda
• Came in SQL server 2012• Shares Microsoft xVelocity
columnstore technology with Analysis Services Tabular model and PowerPivot
• Highly compressed• Memory optimized• Not updateable underlying table is read only!
03-11-2014
SQL server columnstore index
Star schema
4
FactSales
DimCustomer
FactSales ( CustomerKey int, ProductKey int, EmployeeKey int, StoreKey int, OrderDateKey int, SalesAmount money)
‐‐note: lots of ints in fact tables
DimCustomer ( CustomerKey int, FirstName nvarchar(50), LastName nvarchar(50), Birthdate date, EmailAddress nvarchar(50))
DimProduct (…
Best Practice: Integer keys!
DimDate
DimEmployee
DimStore
How do columnstore indexes optimize performance?
Columnstore indexes store data column-wise Each page stores data from a single column
Highly compressed About 2x better than PAGE compression More data fits in memory
Each column accessed independently Fetch only needed columns Can dramatically decrease I/O
…
C1 C2 C3 C4
Heaps, B-trees store data row-wise
Columnstore index architecture
• Row Group– 1 million logically contiguous rows
• Column Segment– Segment contains values from one
column for a set of rows– Segments for the same set of rows
comprise a row group– Segments are compressed– Each segment stored in a separate LOB– Segment is unit of transfer between
disk and memory
C1 C2 C3 C5 C6C4
Segment
Row Group
6
Columnstore index example
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount20101107 106 01 1 6 30.0020101107 103 04 2 1 17.0020101107 109 04 2 2 20.0020101107 103 03 2 1 17.0020101107 106 05 3 4 20.0020101108 106 02 1 5 25.0020101108 102 02 1 1 14.0020101108 106 03 2 5 25.0020101108 109 01 1 1 10.0020101109 106 04 2 4 20.0020101109 106 04 2 5 25.0020101109 103 01 1 1 17.00
7
1. Horizontally partition (Row Groups)
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount20101107 106 01 1 6 30.0020101107 103 04 2 1 17.0020101107 109 04 2 2 20.0020101107 103 03 2 1 17.0020101107 106 05 3 4 20.0020101108 106 02 1 5 25.00
8
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount20101108 102 02 1 1 14.0020101108 106 03 2 5 25.0020101108 109 01 1 1 10.0020101109 106 04 2 4 20.0020101109 106 04 2 5 25.0020101109 103 01 1 1 17.00
2. Vertically partition via columns (segments)
9
OrderDateKey201011072010110720101107201011072010110720101108
ProductKey106103109103106106
StoreKey010404030502
RegionKey
122231
Quantity612145
SalesAmount
30.0017.0020.0017.0020.0025.00
OrderDateKey201011082010110820101108201011092010110920101109
ProductKey102106109106106103
StoreKey020301040401
RegionKey
121221
Quantity151451
SalesAmount
14.0025.0010.0020.0025.0017.00
3. Compress each segment*
10
OrderDateKey
20101107
20101108
ProductKey
106
103
109
StoreKey
01
04
03
05
02
RegionKey
1
2
Quantity
6
1
2
4
5
SalesAmount
30.00
17.00
20.00
25.00
Some segments will compress more than others
OrderDateKey
20101108
20101109
ProductKey
102
106
109
103
StoreKey
02
03
01
04
RegionKey
1
2
Quantity
1
5
4
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
*Encoding and reordering not shown
4. Fetch only needed columns and row groups
11
OrderDateKey
20101107
20101108
ProductKey
106
103
109
StoreKey
01
04
03
05
02
RegionKey
1
2
Quantity
6
1
2
4
5
SalesAmount
30.00
17.00
20.00
25.00
OrderDateKey
20101108
20101109
ProductKey
102
106
109
103
StoreKey
02
03
01
04
RegionKey
1
2
Quantity
1
5
4
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
SELECT ProductKey, SUM (SalesAmount) FROM SalesTableWHERE OrderDateKey < 20101108GROUP BY ProductKey
• Scenario:– Energy trading company migrates BI solution
to SQL server 2012
• Problems:– ETL flow and intermediary calculations takes
too long time– Loading fact tables with many indexes is slow
and indexes consumes much storage – Processing of analysis services OLAP cube is
slow– End user reporting on the relational data
mart has long response time in certain scenarios
03-11-2014
Practical case
03-11-2014
Solution 1: Optimize complex ETL calculations
Stage basic trade data
Do derivedcalculations
Load facttable
Before optimization
5 min 50 min 5 min
Drop columnstore
index
Stage basic trade data
Createcolumnstore
index
Do derivedcalculations
Load facttable
After optimization
5 min 1 min 5 min
1 hour for 6 mio rows
2 min0 min
13 min for 6 mio rows
03-11-2014
Solution 2: Reduce fact load time and save disk space
Drop non clusteredindexes
Load fact tableCreate non clusteredindexes
Before optimization
1 min 25 min(45 min not dropping ix)
15 min
Drop columnstore
indexLoad fact table
Createcolumnstore
index
After optimization
25 min 7 min
41/45 min for 20 mio rows, 8 GB index space
0 min
32 min for 20 mio rows, 1 GB index space
Some queries gota bit slower!
03-11-2014
Solution 3: Slow processing of OLAP cube
Load switch in table
Switch partition to fact table
ProcessOLAP cube
Before optimization
30 min 30 min
Drop columnstore
index
Load switch in table
Createcolumnstore
index
Switch partition to fact table
ProcessOLAP cube
After optimization
30 min 5 min 20 min
1 hour for 30 mio rows
0 min0 min
55 min for 30 mio rows + betterperformance for other queries
SSAS MOLAP cube with partitions like fact table. 300 mio rows total. Partition switching used for fact table load – average change of 30 mio rows per day.
0 min
• Only little time saving on cubeprocessing…
• But what if storage mode waschanged from MOLAP to ROLAP or HOLAP?
• Small experiment– Some OLAP queries got slower– Processing got a lot faster, especially
ROLAP due to no aggregations– Saved OLAP storage space
03-11-2014
Solution 3: Slow processing of OLAP cube
03-11-2014
Solution 4: Reduce reporting query time
Before optimization
After optimization
210 seconds for doing star schema join and aggregation
10 seconds for doing same query
Add columnstore index to facttable in ETL
21 X FASTER !
Columnstore in SQL 2014
• New: Clustered Columnstore– Dependency on conventional b-tree structures has
been removed– Potential for significant disk space savings if workload
is satisfied without conventional indexes
• Note: Non-clustered columnstore is still supported & is still a read-only structure– Required if:
Constraints are required Workload requires b-tree non-clustered indexes
18
Columnstore in SQL 2014
• Fully Read/Write– Less complicated ETL
– But partition switching & BULK INSERT remain best practices
• Data type support expanded:– All data types except: (n)varchar(max), varbinary(max),
XML, Spatial, CLR (blob datatypes)
19
Columnstore in SQL 2014
• “Batch mode” query plan improved– New support for:
• All joins (including OUTER, HASH, SEMI (NOT IN, IN)
• UNION ALL
• Scalar aggregates
• “Mixed mode” plans
20
Columnstore in SQL 2014:Insert & Updating Data
• Bulk insert– Creates row groups of 1Million rows, last row group is probably
not full– But if <100K rows, will be left in Row Store
• Insert/Update– Collects rows in Row Store
• Tuple Mover– When Row Store reaches 1Million rows, convert to a
Columnstore Row Group– Runs every 5 minutes by default– Started explicitly by ALTER INDEX <name> ON <table>
REORGANIZE
21
03-11-2014
Comparison: Pros and cons
Index type
Pros Cons
Non-clusteredcolumn store
• Fastest for queries• Allows other rowbased
indexes
• Not updateable• Uses more storage• More complex ETL design
Clusteredcolumn store
• Allows updating the table• Easier ETL design• Faster load• Minimal storage usage
• No unique or keyconstraints!
• No non-clustered indexes• Requires periodic index
maintenance
03-11-2014
Questions