An introduction to column store indexes and batch mode

An Introduction toColumn Store Indexes

andBatch Mode

DBA Level 300

About me

An independent SQL ConsultantA user of SQL Server from version 2000 onwards with 12+ years

experience. I have a passion for understanding how the database engine works

at a deep level.

A Brief History Of Column Store Technology

The lineage of column store databases can be traced back to the MonetDb and VectorWise projects from Holland, developed at around the turn of the millennium.

Store is column oriented.

Column store technology aims to exploit modern CPU architectures.

Virtually all database vendors now have a column store database offering.

Many people predict a future where all OLAP workloads will be serviced by column oriented databases.

ColourRedRedBlueBlueGreenGreenGreen

DictionaryLookup ID Label1 Red2 Blue3 Green

SegmentLookup ID Run Length1 22 23 3

Compressing data going down the column using run length compression.

Global and local dictionaries are used to store compression metadata.

Column Store Compression Schemes

Column store segments

Local Dictionary

Global dictionary

Deletion Bitmap

Column Store Index ‘Anantomy’

Heap Row Compression Page compression Clustered column store index

Clustered column store index archive

compression

* Posts tables from the four largest stack exchanges combined ( superuser, serverfault, maths and Ubuntu )

59 %53 % 64 % 72 %

What Levels Of Compression Can Be Achieved ?

Demonstration 1: The Difference Batch Mode MakesTest Data Creation

Demonstration 1: The Difference Batch Mode MakesTest Queries

How Queries are Executedlans Run

Row by row Row by row

How do rows travel betweenIterators ?

Control flow

Data Flow

Modern CPU Architecture

L3 Cache

L1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

Power and

ClockQPIMemory

Controller

L1 Data Cache32KB

CoreL1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Bi-directional ring bus

IOTLBMemory bus

system-on-chip ( SOC ) design with CPU cores as the basic building block.

Utility services are provisioned by the ‘Un-core’ part of the CPU die.

Four level cache hierarchy.

QPI…

Un-core

L1 Cache sequential access

L1 Cache In Page Random access

L1 Cache In Full Random access

L2 Cache Full Random access

L3 Cache Full Random access

Main memory

0 20 40 60 80 100 120 140 160 180

Memory Is The “New Disk”Memory

Batch mode is about working in the 4 ~ 38 clock cycle range and NOT the 167 cycle “CPU stall” range.

How Can A Column Store Index Fit Inside The CPU Cache ?

Column store object pool

SegmentBatches

The Column Store Object Pool

Batch Mode Pre-Requisites

Feature SQL Server 2012SQL

Server 2014

Presence of column store indexes Yes Yes

Parallel execution plan Yes Yes

No outer joins, NOT Ins or UNION ALLs Yes No

Hash joins do not spill from memory Yes No

Scalar aggregates cannot be used Yes No

SQL Server 2012 / 2014 Column Store Comparison

Feature SQL Server 2012 SQL Server 2014

Column store indexes Yes Yes

Clustered column store indexes No Yes

Updateable column store indexes No Yes

Column store archive compression No Yes

Columns in a column store index can be dropped No Yes

Support for GUID, binary, datetimeoffset precision > 2, numeric precision > 18. No Yes

Enhanced compression by storing short strings natively ( instead of 32 bit IDs ) No Yes

Bookmark support ( row_group_id:tuple_id) No Yes

Mixed row / batch mode execution No Yes

Optimized hash build and join in a single iterator No Yes

Hash memory spills cause row mode execution No Yes

Iterators supported Scan, filter, project, hash (inner) join and (local) hash aggregate

RowGroups

Columns

Encode andCompress

Segments

Encode & Compress

Delta stores

< 1,048,576 rows

How Column Store Index Updates Are Handled

Tuple mover

aDemonstration 2: Delta Stores In Action

Demonstration 3: Pre-sorting and Segment EliminationTest Data Creation

Demonstration 3: Pre-sorting and Segment EliminationTest Queries

Demonstration 4: Pre-sorting and Hash Aggregate Performance

Test Setup

CPU6 core 2.0 Ghz (Sandybridge)

Warm large object cache used in all tests to remove storage as a factor.

CPU6 core 2.0 Ghz (Sandybridge)

48 Gb quad channel 1333 Mhz DDR3 memory

Hyper-threading enabled, unless specified otherwise.

Atypical Data Warehouse Query On Extra Large Non Sorted Data

1095500000 rows

1,798MB in size

Atypical Data Warehouse Query On Extra Large Pre-Sorted Data

1095500000 rows

8,555MB in size

Elapsed Time (ms) / Degree of Parallelism

2 4 6 8 10 12 14 16 18 20 22 240

Non-sorted column store Sorted column store

Degree of Parallelism

Lowering Clock Cycles Per Instruction By Leveraging SIMD

1 2 3 4

2 3 4 5

3 5 7 9

1 2+ 3=Scalar instructionC = A + B

SIMD instruction

Vector C = Vector A + Vector B

Takeaways

Column store indexes are only half the story, its column store index and batch mode that make the real difference to performance.

Pre-sort data where applicable and possible to encourage segment elimination.

Pre-sort data on fact table key column subject to the heaviest hash join / aggregate activity.

Column Store indexes and batch mode is fast, but not scalable.

Many other vendors leverage SIMD, Microsoft are yet to do this, this can result in another step change in performance.

Questions ?

chris1adkin@yahoo.co.uk

http://uk.linkedin.com/in/wollatondba

Contact Details

ChrisAdkin8

An introduction to column store indexes and batch mode

Data & Analytics

Transcript of An introduction to column store indexes and batch mode

BATCH DISTILLATION - AIGEP · BATCH DISTILLATION . AIGEP ... The distillation column is made of glass and comprises two ... choose the reference temperature for control, ...

BATCH DISTILLATION: SIMULATION AND EXPERIMENTAL · PDF fileBATCH DISTILLATION: SIMULATION AND EXPERIMENTAL ... of a batch distillation column for the ... DISTILLATION: SIMULATION AND

Faster Column-Oriented Indexes

Module 3: Creating and Tuning Indexes. Planning Indexes Creating Indexes Optimizing Indexes.

Column Store Index and Batch Mode Scalability

Batch and fixed-bed column studies for the biosorption of ... · date palm leaves and orange peel, Global NEST Journal, 19(3), 464-478. Batch and fixed-bed column studies for the

Batch and Column Studies of Phenol Adsorption by an ...ijetch.org/vol7/837-W016.pdf · Batch and Column Studies of Phenol Adsorption by an ... where qe and qt correspond to the amount

PURIFICATION CLOVE OIL VIA TYPICAL BATCH DISTILLATION ... · I declare that this thesis entitled “Purification of Clove Oil via Typical Batch Distillation Column” is the result

Triacetin From Glycerol Distillation to Produce Bio ... · batch and continuous ... I Materials and Dimension of Reactrve Distillation Column ... column 2.2 Sampling Procedures Experiment

Research on the Batch and Fixed-Bed Column Performance of ...

Developing Microsoft SQL Server Databases (464) · Design indexes Design indexes and data structures; design filtered indexes; design an indexing strategy, including column store,

Experimental and Modeling Studies for a Reactive Batch ... · Keywords: Reactive Distillation, Batch Column, Mathematical Modeling, Dynamic Simulation, Ethyl Acetate Production. 1.

MSCI HEDGED INDEXES MSCI DAILY HEDGED INDEXES MSCI …€¦ · november 2015 index methodology msci hedged indexes msci daily hedged indexes msci fx hedge indexes msci global currency

UDDC. Computer Controlled Batch Distillation Unit · Computer Controlled Batch Distillation Unit, with SCADA and PID Control ... Rings column), ... Filling of the column. 4.- Batch

FSRS Contract Batch Upload User Guide · FSRS Contract Batch Upload User Guide Last Updated: January 23, 2015 . ... January 23, 2015 Batch Upload Template Data Dictionary Column Header

Creating Profiles - Cloudera...HBase Table Column Family profiler.hbase.cf The column family used to store profile data in HBase. HBase Batch Size profiler.hbase.batch The number of

CONTROLLABILITY STUDY ON MULTI- VESSEL BATCH ...umpir.ump.edu.my/id/eprint/24696/1/Controllability study...3.1 Design Parameter of Multi-vessel Batch Distillation Column (MVBDC) 33

SIMBOLOGIA PER SCHEMI DI PROCESSO PFD · Packed Column Spray Column Pulse Column Feed Solids Batch Centrifuge (All Types) Reciprocating Pump Or Comprøsso. Cyclone Bucket Vibrating

Chapter 6 Index Structures for Files 1 Indexes as Access Paths 2 Types of Single-level Indexes 2.1Primary Indexes 2.2Clustering Indexes 2.3Secondary Indexes.

A Review on Batch and Column Adsorption of Various Open ...