Post on 28-Nov-2014
description
An Introduction toColumn Store Indexes
andBatch Mode
CPU
+
DBA Level 300
About me
An independent SQL ConsultantA user of SQL Server from version 2000 onwards with 12+ years
experience. I have a passion for understanding how the database engine works
at a deep level.
A Brief History Of Column Store Technology
The lineage of column store databases can be traced back to the MonetDb and VectorWise projects from Holland, developed at around the turn of the millennium.
Store is column oriented.
Column store technology aims to exploit modern CPU architectures.
Virtually all database vendors now have a column store database offering.
Many people predict a future where all OLAP workloads will be serviced by column oriented databases.
ColourRedRedBlueBlueGreenGreenGreen
DictionaryLookup ID Label1 Red2 Blue3 Green
SegmentLookup ID Run Length1 22 23 3
Compressing data going down the column using run length compression.
Global and local dictionaries are used to store compression metadata.
Column Store Compression Schemes
Column store segments
Local Dictionary
Global dictionary
Deletion Bitmap
Column Store Index ‘Anantomy’
Heap Row Compression Page compression Clustered column store index
Clustered column store index archive
compression
0
50
100
150
200
250
300
350
* Posts tables from the four largest stack exchanges combined ( superuser, serverfault, maths and Ubuntu )
59 %53 % 64 % 72 %
What Levels Of Compression Can Be Achieved ?
Demonstration 1: The Difference Batch Mode MakesTest Data Creation
Demonstration 1: The Difference Batch Mode MakesTest Queries
How Queries are Executedlans Run
Row by row Row by row
Row by row Row by row
How do rows travel betweenIterators ?
Control flow
Data Flow
Core
Modern CPU Architecture
L3 Cache
L1 Instruction Cache 32KB
L0 UOP cache
L2 Unified Cache 256K
Power and
ClockQPIMemory
Controller
L1 Data Cache32KB
Core
CoreL1 Instruction Cache 32KB
L0 UOP cache
L2 Unified Cache 256K
L1 Data Cache32KB
Core
Bi-directional ring bus
IOTLBMemory bus
system-on-chip ( SOC ) design with CPU cores as the basic building block.
Utility services are provisioned by the ‘Un-core’ part of the CPU die.
Four level cache hierarchy.
C P U
QPI…
Un-core
L1 Cache sequential access
L1 Cache In Page Random access
L1 Cache In Full Random access
L2 Cache sequential access
L2 Cache In Page Random access
L2 Cache Full Random access
L3 Cache sequential access
L3 Cache In Page Random access
L3 Cache Full Random access
Main memory
0 20 40 60 80 100 120 140 160 180
4
4
4
11
11
11
14
18
38
167
Memory Is The “New Disk”Memory
Batch mode is about working in the 4 ~ 38 clock cycle range and NOT the 167 cycle “CPU stall” range.
C P U
How Can A Column Store Index Fit Inside The CPU Cache ?
Column store object pool
SegmentBatches
The Column Store Object Pool
Batch Mode Pre-Requisites
Feature SQL Server 2012SQL
Server 2014
Presence of column store indexes Yes Yes
Parallel execution plan Yes Yes
No outer joins, NOT Ins or UNION ALLs Yes No
Hash joins do not spill from memory Yes No
Scalar aggregates cannot be used Yes No
SQL Server 2012 / 2014 Column Store Comparison
Feature SQL Server 2012 SQL Server 2014
Column store indexes Yes Yes
Clustered column store indexes No Yes
Updateable column store indexes No Yes
Column store archive compression No Yes
Columns in a column store index can be dropped No Yes
Support for GUID, binary, datetimeoffset precision > 2, numeric precision > 18. No Yes
Enhanced compression by storing short strings natively ( instead of 32 bit IDs ) No Yes
Bookmark support ( row_group_id:tuple_id) No Yes
Mixed row / batch mode execution No Yes
Optimized hash build and join in a single iterator No Yes
Hash memory spills cause row mode execution No Yes
Iterators supported Scan, filter, project, hash (inner) join and (local) hash aggregate
Yes
RowGroups
Columns
A B C
Encode andCompress
Segments
Store
Blobs
Encode & Compress
Delta stores
< 1,048,576 rows
How Column Store Index Updates Are Handled
Tuple mover
aDemonstration 2: Delta Stores In Action
Demonstration 3: Pre-sorting and Segment EliminationTest Data Creation
Demonstration 3: Pre-sorting and Segment EliminationTest Queries
Demonstration 4: Pre-sorting and Hash Aggregate Performance
Test Setup
CPU6 core 2.0 Ghz (Sandybridge)
Warm large object cache used in all tests to remove storage as a factor.
CPU6 core 2.0 Ghz (Sandybridge)
48 Gb quad channel 1333 Mhz DDR3 memory
Hyper-threading enabled, unless specified otherwise.
Atypical Data Warehouse Query On Extra Large Non Sorted Data
1095500000 rows
1,798MB in size
Atypical Data Warehouse Query On Extra Large Pre-Sorted Data
1095500000 rows
8,555MB in size
Elapsed Time (ms) / Degree of Parallelism
2 4 6 8 10 12 14 16 18 20 22 240
10000
20000
30000
40000
50000
60000
70000
80000
Non-sorted column store Sorted column store
Degree of Parallelism
Tim
e (m
s)
Lowering Clock Cycles Per Instruction By Leveraging SIMD
1 2 3 4
2 3 4 5
+
3 5 7 9
=
1 2+ 3=Scalar instructionC = A + B
SIMD instruction
Vector C = Vector A + Vector B
Takeaways
Column store indexes are only half the story, its column store index and batch mode that make the real difference to performance.
Pre-sort data where applicable and possible to encourage segment elimination.
Pre-sort data on fact table key column subject to the heaviest hash join / aggregate activity.
Column Store indexes and batch mode is fast, but not scalable.
Many other vendors leverage SIMD, Microsoft are yet to do this, this can result in another step change in performance.
Questions ?
chris1adkin@yahoo.co.uk
http://uk.linkedin.com/in/wollatondba
Contact Details
ChrisAdkin8