Column-Stores vs. Row-Stores: How Different Are They Really?cs- · PDF fileColumn-Stores vs....

Daniel Abadi -- Yale University

Column-Stores vs. Row-Stores: How Different Are They Really?

Daniel Abadi (Yale), Samuel Madden (MIT),

Nabil Hachem (AvantGarde Consulting)June 12th, 2008

Row vs. Column-Stores

StreetAddressPhone #E-mail

FirstName

LastName

Last Name

FirstName E-mail Phone #

StreetAddress

Row-Store Column-Store

− Might read in unnecessary data

+ Only need to read in relevant data

+ Easy to add a new record

− Tuple writes might require multiple seeks

Column-Stores

• Really good for read-mostly data warehouses� Lot’s of column scans and aggregations� Writes tend to be in batch� [CK85], [SAB+05], [ZBN+05], [HLA+06],

[SBC+07] all verify this� Top 3 in TPC-H rankings (Exasol, ParAccel,

and Kickfire) are column-stores� Factor of 5 faster on performance� Factor of 2 superior on price/performance

Data Warehouse DBMS Software

• $4.5 billion industry (out of total $16 billion DBMS software industry)

• Growing 10% annually

Momentum

• Right solution for growing market � $$$$• Vertica, ParAccel, Kickfire, Calpont,

Infobright, and Exasol new entrants• Sybase IQ’s profits rapidly increasing• Yahoo’s world largest (multi-petabyte)

data warehouse is a column-store (from Mahat Technologies acquisition)

Paper Looks At Key Question

• How much of the buzz around column-stores just marketing hype?� Do you really need to buy Sybase IQ or

Vertica?� How far will your current row-store take you?

� Can you get column-store performance from a row-store?

� Can you simulate a column-store in a row-store?

Paper Methodology

• Comparing row-store vs. column-store is dangerous/borderline meaningless

• Instead, compare row-store vs. row-store and column-store vs. column-store� Simulate a column-store inside of a row-store� Remove column-oriented features from

column-store until it behaves like a row-store

Simulate Column-Store Inside Row-Store

StreetAddressPhone #E-mail

FirstName

LastName

Last Name

FirstName E-mail

Option A: Vertical Partitioning

Option B:Index Every Column

Last Name Index First Name Index

Experiments

• Star Schema Benchmark (SSBM)� Fact table contains 17 columns and 60,000,000 rows� 4 dimension tables, biggest one has 80,000 rows� Queries perform 2-4 joins between fact table and

dimension tables, aggregate 1-2 columns from fact table

� [OOC06]

• Implemented by professional DBA� Original row-store plus 2 column-store simulations on

same row-store product

SSBM Averages

Time (seconds)

Average 25.7 79.9 221.2

Normal Row-StoreVertically Partitioned

Row-Store

Row-Store With All

Indexes

What’s Going On?

• Vertically Partitioned Case� Tuple Sizes� Horizontal Partitioning

• All Indexes Case� Tuple Reconstruction

Tuple Size

ColumnData

TID ColumnData

TupleHeader

•Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns

•Complete fact table takes up ~4 GB (compressed)

•Vertically partitioned tables take up 0.7-1.1 GB (compressed)

Horizontal Partitioning

• Fact table horizontally partitioned on year� Year is an element of the ‘Date’ dimension

table� Most queries in SSBM have a predicate on

year� Since vertically partitioned tables do not

contain the ‘Date’ foreign key, row-store could not similarly partition them

What’s Going On?

• Vertically Partitioned Case� Tuple Sizes� Horizontal Partitioning

• All Indexes Case� Tuple Construction

Tuple Construction

• Common type of query:� SELECT store_name, SUM(revenue)

FROM Facts, StoresWHERE fact.store_id = stores.store_id

AND stores.country = “Canada”GROUP BY store_name

Tuple Construction

• Result of lower part of query plan is a set of TIDs that passed all predicates

• Need to extract SELECT attributes at these TIDs� BUT: index maps value to TID� You really want to map TID to value (i.e., a

vertical partition)

�� Tuple construction is SLOW

So….

• All indexes approach is a poor way to simulate a column-store

• Problems with vertical partitioning are NOT fundamental� Store tuple header in a separate partition� Allow virtual TIDs� Allow HP using a foreign key on a different VP

• So can row-stores simulate column-stores?

Row-Store vs. Column-Store

Time (seconds)

Average 25.7 11.7 4.4

Row-Store Row-Store (M V) C-Store

Row-Store vs. Column-Store

Time (seconds)

Average 25.7 11.7 4.4

Row-Store Row-Store (M V) C-Store

Column-Store Experiments

• Start with column-store (C-Store)• Remove column-store-specific

performance optimizations• End with column-store with a row-oriented

query executer

Compression

• Higher data value locality in column-stores� Better ratio � reduced I/O

• Can use schemes like run-length encoding� Easy to operate on directly

for improved performance ([AMF06])

Q1Q1Q1Q1Q1Q1Q1

Q2Q2Q2Q2

Quarter

(Q1, 1, 300)

Quarter

(Q2, 301, 350)

(Q3, 651, 500)

(Q4, 1151, 600)

• Early Materialization: create rows first. But:� Poor memory bandwidth

utilization� Lose opportunity for

vectorized operation

7134280

Construct

Select + Aggregate

prodID storeIDcustID price

QUERY:

SELECT custID,SUM(price)FROM tableWHERE (prodID = 4) AND

(storeID = 1) ANDGROUP BY custID

Early vs. Late Materialization

Other Column-Store Optimizations

• Invisible join� Column-store specific join� Optimizations for star schemas

� Similar to a semi-join

• Block Processing

Simplified Version of Results

Time (seconds)

Average 4.4 14.9 40.7

Original C-St oreC-St ore, No

Compression

C-St ore, Early

Mat erializat ion

Conclusion

• Might be possible to simulate a row-store in a column-store, BUT:� Need better support for vertical partitioning at

the storage layer� Need support for column-specific

optimizations at the executer level

• Working with HP Labs to find out

Come Join the Yale DB Group!

Column-Stores vs. Row-Stores: How Different Are They Really?cs- · PDF fileColumn-Stores vs....

Documents

Transcript of Column-Stores vs. Row-Stores: How Different Are They Really?cs- · PDF fileColumn-Stores vs....

CI2416-L - Row, Row, Row Your Boat: Autodesk® AutoCAD ...aucache.autodesk.com/au2013/sessionsFiles/2416/1031/handout_2416_CI2416... · Row, Row, Row your Boat! Autodesk® AutoCAD®

Column-Stores vs. Row-Stores How Different are they Really? Daniel J. Abadi, Samuel Madden, and Nabil Hachem, SIGMOD 2008 Presented By, Paresh Modak(09305060)

Mark your calendar for Market Party!...area shops and they are terrific. We anticipate that 33 stores across the state will participate - watch the “Idaho Row By Row” FaceBook

Modern Languages 14131211109 87 6 54321 111098765 43 2 Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M 212019181716 1514 13 12111098.

C oronavirusgo outside their homes if they really, really have to. Some people, like doctors or nurses or people who work at grocery stores or people who deliver things to our homes,

Coloumn Stores vs Row Stores

Health Stores and Vend. · Health and supplement stores can run Vend on an iPad, desktop, or laptop. It’s also really easy to set up your barcode scanners, receipt printers, and

Column-Stores vs. Row-Stores: How Different Are They Really? · either by vertically partitioning the schema, or by indexing every column so that columns can be accessed independently.

Column-Stores vs. Row-Stores: How Different Are They Really? · MonetDB [10] and the MonetDB/X100 [9] systems pioneered the design of modern column-oriented database systems and vector-ized

Indoor Air Handling Units · MSCF-FC/BI LFC VFC, VFCD 1 Row 2 Row 4 Row 6 Row 8 Row 1 Row 2 Row 4 Row 6 Row 8 Row 1 Row 2 Row 4 Row 6 Row 8 Row HW x x x x x x x x x Steam x x x x

Column-Stores vs. Row-Stores: How Different Are They Really? · Subhro Bhattacharyya Department ofComputer Science IndianInstituteofTechnology, Bombay SIGMOD (2008) Column-Storesvs.

Column-Stores vs. Row-Stores: How Different Are They Really?abadi/papers/abadi-sigmod08.pdf · column-store performance relative to traditional row-stores. Although the idea of vertically

Column Stores vs Row Stores : How Different Are They ......Is there a fundamental di erence in architecture of column oriented databases? Or can we use a more \column oriented" design

Modern Languages ML350 Renumbered 14131211109 87 6 54321 111098765 43 2 Row A Row B Row C Row D Row E Row F Row G Row H Row J Row K Row L Row M 212019181716.

C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.

C-Store: How Different are Column-Stores and Row-Stores?

Project on NoSQL under the guidance of Professor … In the computing ... one of the research paper named Column stores vs. Row Stores by Abadi, Maden and Hacchem ... Mongo Db Terrastore

4-In-A-Row Dolch Words Level 1 - Really Good Stuff 4-In-A-Row Dolch Words Level 1 1. Begin by modeling the activity, discussing every step with students. Select a student to model

DRAM Disturbance Errors - Carnegie Mellon Universityomutlu/pub/dram-row... · DRAM Disturbance Errors. DRAM Chip Row of Cells Row Row Row Row Wordline VV LOWHIGH Victim Row Victim

Bridging the Archipelago between Row-Stores and Column ... · Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads Joy Arulraj Andrew Pavlo Prashanth