7/29/2019 HANA DB Column Store
1/56
Sharone ZehaviJordan Jordanov
7/29/2019 HANA DB Column Store
2/56
2011 SAP AG. All rights reserved. 2
Agenda
Concepts of Column Store
Structure Compared to Row Store
Performance issues Compared to Row Store
Go through examples to make the points
In-Depth view of Column Store
Architecture
Delta Store
Consistent View Data Compression
Accessing Data
Join Operation
7/29/2019 HANA DB Column Store
3/56
2011 SAP AG. All rights reserved. 3
Performance bottleneck
7/29/2019 HANA DB Column Store
4/56
2011 SAP AG. All rights reserved. 4
Orders of Magnitudepresented by Jeff Dean (Google)
Activity Time in ns
L1 cache reference 0.5
Branch mis-prediction 5
L2 cache reference 7
Mutex lock/unlock 25
Main memory reference 100
Compress 1K bytes with Zippy 3,000
Send 2K bytes over 1 Gbps network 20,000
Read 1 MB sequentially from memory 250,000
Round trip within same datacenter 500,000
Disk seek 10,000,000
Read 1 MB sequentially from disk 20,000,000
Send packet CA->Netherlands->CA 150,000,000
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
7/29/2019 HANA DB Column Store
5/56
2011 SAP AG. All rights reserved. 5
HANA Table Types
Column
Row
History Column (Temporal)
Global Temporary
Local Temporary
In this presentation we will focus on Column tables only,and will mention a little bit of Row tables for the sake ofcomparison.
7/29/2019 HANA DB Column Store
6/56
2011 SAP AG. All rights reserved. 6
Logical Structure Of a Table
Row 1
Row 2
.
. . .
. . .
. . .
.
.
Row N
DataPage1
DataPage2
DataPage3
DataPage4
DataPage5
Data
Page6
Data
Page7
Data
Page8
Data
Page9
Data
Page10
DataPage(5n-4)
DataPage(5n-3)
DataPage(5n-2)
DataPage(5n-1)
DataPage(5n)
Column 1 Column 2 Column 3 Column 4 Column 5
7/29/2019 HANA DB Column Store
7/56
2011 SAP AG. All rights reserved. 7
Row Store - Physical Structure
The address of
Row 1
The address of
row 2 can becalculated
. . .
. . .
. . .Row n = (the size of
all columns) * n
DataPage1
DataPage2
DataPage3
DataPage4
DataPage5
Data
Page6
Data
Page7
Data
Page8
Data
Page9
Data
Page10
DataPage(5n-4)
DataPage(5n-3)
DataPage(5n-2)
DataPage(5n-1)
DataPage(5n)
Column 1 Column 2 Column 3 Column 4 Column 5
7/29/2019 HANA DB Column Store
8/56
2011 SAP AG. All rights reserved. 8
Column Store - Physical Structure(Simplified)
Row 1
Row 2
.
.
. . . . . .
. . . . . .
. . . . . .
.
.
Row N
DataPage1
DataPage2
DataPage3
DataPage4
DataPage5
Data
Page6
Data
Page7
Data
Page8
Data
Page9
Data
Page10
DataPage(5n-4)
DataPage(5n-3)
DataPage(5n-2)
DataPage(5n-1)
DataPage(5n)
Column 1 Column 2 Column 3 Column 4 Column 5
7/29/2019 HANA DB Column Store
9/56
2011 SAP AG. All rights reserved. 9
Example Logical Structure
SalesProductCountry
3000AlphaUS1250BetaUS
700AlphaJP
450AlphaUK
TableRow Store
US
Alpha
3000
US
Beta
1250
JP
Alpha
700UK
Alpha
450
Row1
Row2
Row3
Row4
Column Store
US
US
JP
UK
Alpha
Beta
Alpha
Alpha
30001250
700
450
Country
Product
Sales
7/29/2019 HANA DB Column Store
10/56
2011 SAP AG. All rights reserved. 10
Example (cont.) For Column Store:How is the logical Structure Preserved?
Row IDColumn Store
US (Row ID 1)
US
JP
UK
Alpha (Row ID 1)
Beta
Alpha
Alpha
3000 (Row ID 1)
1250
700
450
Country
Product
Sales
7/29/2019 HANA DB Column Store
11/56
2011 SAP AG. All rights reserved. 11
Data Dictionary
7/29/2019 HANA DB Column Store
12/56
2011 SAP AG. All rights reserved. 12
Column Store performing selectwhere CITY = New York
7/29/2019 HANA DB Column Store
13/56
2011 SAP AG. All rights reserved. 13
7/29/2019 HANA DB Column Store
14/56
2011 SAP AG. All rights reserved. 14
Row vs. Column Store
3 Topics to Consider
Read (Select)
Write (Update)
Write (Insert)
7/29/2019 HANA DB Column Store
15/56
2011 SAP AG. All rights reserved. 15
Row vs. Column Store Reading Data
We will understand the Pros and Cons of each method following an example.
Lets look at the following school table:
6th
Grade
5th
Grade
4th
Grade
3rd
Grade
2nd
Grade
1st
Grade
MotherFatherFamily
KevinnullDonnanullnullJimMirandaRichardSmith
nullJeffreynullAlexEricGaryGiselleStephenGalway
AlexisnullnullRolandnullTimothyBarbaraJohnBush
SusannullDonaldnullJasonSandraAbbyJackBrown
LarryBriannullDavidnullJessicaGinnyJohnTaylor
LauraKarenRuthnullAngelaRonaldNancyPeterMoore
nullJanetFrankJerryHeatherDennisRuthClarkHarris
BrendanullMelissanullCynthiaShirleyMichelleJamesTaylor
7/29/2019 HANA DB Column Store
16/56
2011 SAP AG. All rights reserved. 16
Row vs. Column Store Reading Data(cont.)
So what if we just wanted to read the entire table?
select * from School
Recall the Physical Structure discussed earlier Which Storing method will enableus a faster read? Why?
Hints:
We are going to fully scan the table in any case. We need to read entire rowsone after the other, so which physical structure will enable us a smooth read?
What actions are required when performing the query with Column Store?
What actions are required when performing the query with Row Store?
7/29/2019 HANA DB Column Store
17/56
2011 SAP AG. All rights reserved. 17
Row vs. Column Store Reading Data(cont.)
Now, what if we want to get a list of all 1st grade pupils?
select 1st_grade from School where 1st_grade is not null
Again, recall the Physical Structure discussed earlier Which Storing method willenable us a faster read? Why?
Hints:
Are we going to fully scan the table? We are going to scan all rows in any case,but only one column, so which physical structure will enable us a smooth
read? What actions are required when performing the query with Column Store?
What actions are required when performing the query with Row Store?
7/29/2019 HANA DB Column Store
18/56
2011 SAP AG. All rights reserved. 18
Row vs. Column Store Reading Data(cont.)
Now, what if we want to get a list of all Families who have children in 1st grade, 3rd
grade and 6th grade?
select Family
from Schoolwhere 1st_grade is not null
and 3rd_grade is not null
and 6th_grade is not null
Again, recall the Physical Structure discussed earlier and try to answer WhichStoring method will enable us a faster read?
Can we have a definite answer here?
What are the Pros and Cons?
7/29/2019 HANA DB Column Store
19/56
2011 SAP AG. All rights reserved. 19
Row vs. Column Store Writing Data Update
Now, a mistake was found with the tables data, and we found out that David Taylorfrom 3rd grade is actually in 4th grade. So we need to update the table accordingly:
update School
set 3rd_grade = null,4th_grade = David
where Family = Taylor
and Father = John
and Mother = Ginny
Again, recall the Physical Structure discussed earlier and try to answer Which
Storing method will enable us a faster update?
7/29/2019 HANA DB Column Store
20/56
2011 SAP AG. All rights reserved. 20
Row vs. Column Store Writing Data Update(cont.)
For Column Store, we first need to search for the conditions:
RowID
Family
4Brown
3Bush
2Galway
7Harris
6Moore
1Smith
5Taylor
8Taylor
RowID
Father
7Clark
4Jack
8James
3John
5John
6Peter
1Richard
2Stephen
RowID
Mother
4Abby
3Barbara
5Ginny
2Giselle
8Michelle1Miranda
6Nancy
7Ruth
7/29/2019 HANA DB Column Store
21/56
2011 SAP AG. All rights reserved. 21
Row vs. Column Store Writing Data Update(cont.)
We found out that the Row ID for change is 5, so now is the time for update:
RowID
3rd_grade
2Alex
5David
7Jerry
3Roland
1null
4null
6null
8null
RowID4th_grade
1Donald
4Donna
7Frank
8Melissa
6Ruth
2null
3null
5null
7/29/2019 HANA DB Column Store
22/56
2011 SAP AG. All rights reserved. 22
Row vs. Column Store Writing Data Update(cont.)
So we have to update the values as requested, but we also have to sort thecolumns to reflect the new order, based on the new values:
RowID
3rd_grade
2Alex
7Jerry
3Roland
5null
1null
4null
6null
8null
RowID4th_grade
5David
1Donald
4Donna
7Frank
8Melissa
6Ruth
2null
3null
7/29/2019 HANA DB Column Store
23/56
2011 SAP AG. All rights reserved. 23
Row vs. Column Store Writing Data Update(cont.)
For Row Store, assuming no indexes are present, we simply scan the table row forrow, stopping every time we find a match for the conditions, and updating.
But the table scan is full, meaning, the table is scanned until the end. On the other
hand, it is scanned only once.
So where did we get better performance for update?
Can we have a definite answer here?
7/29/2019 HANA DB Column Store
24/56
2011 SAP AG. All rights reserved. 24
Row vs. Column Store Writing Data Insert
A new family has moved into town, and they registered their kids to the school. Wewant to reflect this with an insert command:
insert into School
values (Donovan, Harry, Pamela, null, Martha, null,Brenda, Albert, Justin)
How would we implement this action in both methods?
7/29/2019 HANA DB Column Store
25/56
2011 SAP AG. All rights reserved. 25
Row vs. Column Store Writing Data Insert(cont.)
For Column Store, after allocating a new Row ID, we will need to do the followingfor each column:
1. Add the new value
2. Re-sort the column, and maybe reorder, assuming we want the values` to becontiguous.
For Row Store, we simply allocate new data pages at the end of the table andsimply pour the data in there. It should take o(1) time.
So we can see the straightforward advantage of Row Store when inserting newdata is involved.
7/29/2019 HANA DB Column Store
26/56
2011 SAP AG. All rights reserved. 26
Advantages of Column Store
So when does Column Store have a clear cut advantage over Row Store?
Calculations are typically executed on a single or a few columns only
The table is searched based on values of a few columns
The table has a big number of columns
The table has a big number of rows and columnar operations are required(aggregate, scan, etc.)
High compression rates can be achieved because the majority of the columns
contain only few distinct values (compared to number of rows)
Elimination of indexes
Parallelization
7/29/2019 HANA DB Column Store
27/56
2011 SAP AG. All rights reserved. 27
Advantages of Row Store
Row Store tables are better when:
The application needs to process only one single record at one time (manyselects and /or updates of single records).
The application typically needs to access the complete record
The columns contain mainly distinct values so compression rate would be low
Neither aggregations nor fast searching are required
The table has a small number of rows (for example configuration tables)
7/29/2019 HANA DB Column Store
28/56
2011 SAP AG. All rights reserved. 28
Column Store Conceptual Architecture
7/29/2019 HANA DB Column Store
29/56
2011 SAP AG. All rights reserved. 29
Column Store Delta Storage
So we saw that inserting a new row (and sometimes update too) is a veryexpensive action to perform for Column Store. So what do we do to ease the pain?
Every write operation (Insert or Update) in Column Store does not directly modify
compressed data, but rather goes into a separate area called the Delta Storage.
The changes are taken over from the delta storage asynchronously at some laterpoint in time. This action is called Delta Merge. The Delta Merge operationintegrates committed changes collected in delta storage into main storage.
The following steps are taken when a write operation occurs:
7/29/2019 HANA DB Column Store
30/56
2011 SAP AG. All rights reserved. 30
Write operations in a Columnar Store
7/29/2019 HANA DB Column Store
31/56
2011 SAP AG. All rights reserved. 31
Write operations in a Columnar Store
7/29/2019 HANA DB Column Store
32/56
2011 SAP AG. All rights reserved. 32
Write operations in a Columnar Store
7/29/2019 HANA DB Column Store
33/56
2011 SAP AG. All rights reserved. 33
Column Store Delta Storage Cont.
If the current transaction is not already a write transaction, the transaction manager istold to make it a write transaction and to provide an updated transaction token.
For updates and deletes a write lock is requested from the transaction manager for therecord (identified by its key). The operation is blocked until the lock is available. The lockis held until the transaction is committed or rolled back.
For inserts and updates, the operation inserts a new row into the delta storage with theupdated data.
The write operation tells the consistent view manager about the change. The consistentview manager stores transaction related information that is needed to create theconsistent view for a specific read operation. This includes the information which rows in
delta storage were inserted by some transaction and which other rows were invalidated.In case of a deletion the consistent view manager just stores the information that thepreviously valid row now becomes invalid.
Unless it is a temporary table, the write operation writes an entry into the delta log.
7/29/2019 HANA DB Column Store
34/56
2011 SAP AG. All rights reserved. 34
Consistent View of Current Data
With the delta concept, updates in the Column Store do not physicallychange existing rows.
Updates are always done by inserting a new entry to the delta storage.
Therefore a mechanism is required, to ensure each transaction readsthe data it is supposed to read, be it from the Main Store or from theDelta Store
The Consistent View Manager takes care of exactly this.
To understand Consistent View, we first need to understand IsolationLevels:
7/29/2019 HANA DB Column Store
35/56
2011 SAP AG. All rights reserved. 35
Consistent View of Current DataIsolation Levels
Read Committed
Corresponds to Statement Level Read Consistency
With statement level snapshot isolation, different statements in atransaction may see different snapshots of the system.
The statement in a transaction sees consistent snapshots of thesystem.
Each statement sees the changes that were committed when the
execution of the statement started.
7/29/2019 HANA DB Column Store
36/56
2011 SAP AG. All rights reserved. 36
Consistent View of Current DataIsolation Levels
Repeatable Read / Serializable
Corresponds to Transaction Level Snapshot Isolation
All statements of a transaction see the same snapshot of thedatabase.
This snapshot contains all changes that were committed at the timethe transaction started.
This snapshot contains, in addition, the changes made by the
transaction itself.
Now, back to Consistent View, lets follow an example:
7/29/2019 HANA DB Column Store
37/56
2011 SAP AG. All rights reserved. 37
Consistent View of Current Data
7/29/2019 HANA DB Column Store
38/56
2011 SAP AG. All rights reserved. 38
Delta Merge
Executed on Table Level when:
Number of lines in delta storage for this table exceeds specifiednumber
Memory consumption of delta storage exceeds specified limit
Merge is triggered explicitly by a client using SQL
The delta log for a columnar table exceeds the defined limit. Asthe delta log is truncated only during merge operation, a merge
operation needs to be performed in this case.
7/29/2019 HANA DB Column Store
39/56
2011 SAP AG. All rights reserved. 39
Delta Merge
7/29/2019 HANA DB Column Store
40/56
2011 SAP AG. All rights reserved. 40
Data Compression
7/29/2019 HANA DB Column Store
41/56
2011 SAP AG. All rights reserved. 41
Data Compression Additional Compression
Prefix Coding
If the column starts with a long sequence of the same value V, thesequence is replaced by storing the value once, together with the numberof occurrences.
This makes sense if there is one predominant value in the column and the
remaining values are mostly unique or have low redundancy.
7/29/2019 HANA DB Column Store
42/56
2011 SAP AG. All rights reserved. 42
Data Compression Additional Compression
Run Length Encoding
Run length encoding replaces sequences of the same value with a singleinstance of the value and its start position.
This variant of run length encoding was chosen, as it speeds up access
compared to storing the number of occurrences with each value.
7/29/2019 HANA DB Column Store
43/56
2011 SAP AG. All rights reserved. 43
Data Compression Additional Compression
Cluster Encoding
Cluster encoding partitions the sequence into N blocks of fixed size (1024elements). If a cluster contains only occurrences of a single value, thecluster is replaced by a single occurrence of that value. A bit vector oflength N indicates which clusters were replaced by a single value.
7/29/2019 HANA DB Column Store
44/56
2011 SAP AG. All rights reserved. 44
Data Compression Additional Compression
Sparse Encoding
Sparse encoding removes the value V that appears most often. A bit vectorindicates at which positions V was removed from the original sequence.
7/29/2019 HANA DB Column Store
45/56
2011 SAP AG. All rights reserved. 45
Data Compression Additional Compression
Indirect Encoding
Indirect encoding is also based on partitioning into blocks of 1024elements. If a block contains only a few distinct values, an additionaldictionary is used to encode the values in that block.
Here is the concept with a block size of 8 elements. The first and the thirdblock consist of not more than 4 distinct values, so a dictionary with 4entries and an encoding of values with 2 bits is possible.
For the second block this kind of compression makes no sense. With 8distinct values the dictionary alone would need the same space as the
uncompressed sequence.The implementation also needs to store the information which blocks areencoded with an additional dictionary and the links to the additionaldictionaries.
7/29/2019 HANA DB Column Store
46/56
2011 SAP AG. All rights reserved. 46
Data Compression Additional Compression
Indirect Encoding
7/29/2019 HANA DB Column Store
47/56
2011 SAP AG. All rights reserved. 47
Data Compression Additional Compression
String Delta Compression
The dictionary is stored as a sequence of blocks that contain 16 stringvalues that are compressed using the delta compression
For each string value the following information is stored:
The length of the prefix which this value has in common with itspredecessor
The number of remaining characters after the common prefix
The remaining characters after the common prefix.
7/29/2019 HANA DB Column Store
48/56
2011 SAP AG. All rights reserved. 48
Data Compression Additional Compression
String Delta Compression
7/29/2019 HANA DB Column Store
49/56
2011 SAP AG. All rights reserved. 49
Accessing Data in Column Store
Search by Attribute Value
Search all rows with a given attribute value (select * from Table whereattribute = value), so a reverse lookup is needed.
A binary search is performed on the Dictionary
If the value exists in the Dictionary, the result of the reverse lookup is thevalue ID of the specified value.
The value ID sequence is searched for all occurrences of the foundvalue ID.
Lets look at an example:
7/29/2019 HANA DB Column Store
50/56
2011 SAP AG. All rights reserved. 50
Accessing Data in Column Store
Search by Attribute Value (no index)
7/29/2019 HANA DB Column Store
51/56
2011 SAP AG. All rights reserved. 51
Accessing Data in Column Store
Search by Attribute Value With Index
Normally, even full column scans can be executed with high performance.However, in cases where the performance of column scans is not sufficient,an index can be defined on the column. It contains references to the rowsthat contain the value.
7/29/2019 HANA DB Column Store
52/56
2011 SAP AG. All rights reserved. 52
Accessing Data in Column Store
Access by Row ID
After a Row ID was determined:
The value ID is read from the value ID sequence, by simply accessingthe corresponding row ID.
Then the value ID is used to lookup the corresponding value in theDictionary.
Lets look at an example:
7/29/2019 HANA DB Column Store
53/56
2011 SAP AG. All rights reserved. 53
Accessing Data in Column Store
Access by Row ID
7/29/2019 HANA DB Column Store
54/56
2011 SAP AG. All rights reserved. 54
Column Store Join Operation
Can calculate Inner joins, Right Outer joins, Left Outerjoins, and Full Outer joins.
Limited to Equi-Joins only.
Following is a Join example (using Value ID):
7/29/2019 HANA DB Column Store
55/56
2011 SAP AG. All rights reserved. 55
Column Store Join Operation
7/29/2019 HANA DB Column Store
56/56
Thank You!
Top Related