MapReduce and DBMS Hybrids

12: MapReduce and DBMS Hybrids

Zubair Nabi

[email protected]

May 26, 2013

Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 1 / 37

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary

Introduction

Data warehousing solution built atop Hadoop by Facebook

Now an Apache open source project

Queries are expressed in SQL-like HiveQL, which are compiled intomap-reduce jobs

Also contains a type system for describing RDBMS-like tables

A system catalog, Hive-Metastore, which contains schemas andstatistics is used for data exploration and query optimization

Stores 2PB of uncompressed data at Facebook and is heavily used forsimple summarization, business intelligence, machine learning, amongmany other applications1

Also used by Digg, Grooveshark, hi5, Last.fm, Scribd, etc.

1https://www.facebook.com/note.php?note_id=89508453919Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 4 / 37

Introduction

Data Model

Tables:I Similar to RDBMS tables

I Each table has a corresponding HDFS directoryI The contents of the table are serialized and stored in files within that

directoryI Serialization can be both system provided or user definedI Serialization information of each table is also stored in the

Hive-Metastore for query optimizationI Tables can also be defined for data stored in external sources such as

HDFS, NFS, and local FS

Data Model

Tables:I Similar to RDBMS tablesI Each table has a corresponding HDFS directory

I The contents of the table are serialized and stored in files within thatdirectory

I Serialization can be both system provided or user definedI Serialization information of each table is also stored in the

Data Model

Tables:I Similar to RDBMS tablesI Each table has a corresponding HDFS directoryI The contents of the table are serialized and stored in files within that

directory

I Serialization can be both system provided or user definedI Serialization information of each table is also stored in the

Data Model

directoryI Serialization can be both system provided or user defined

I Serialization information of each table is also stored in theHive-Metastore for query optimization

I Tables can also be defined for data stored in external sources such asHDFS, NFS, and local FS

Data Model

Hive-Metastore for query optimization

I Tables can also be defined for data stored in external sources such asHDFS, NFS, and local FS

Data Model

Data Model (2)

Partitions:I Determine the distribution of data within sub-directories of the main

table directory

I For instance, for a table T stored in /wh/T and partitioned on columnsds and ctry

F Data with ds value 20090101 and ctry value US,F Will be stored in files within /wh/T/ds=20090101/ctry=US

Buckets:I Data within partitions is divided into bucketsI Buckets are calculated based on the hash of a column within the

partitionI Each bucket is stored within a file in the partition directory

Data Model (2)

table directoryI For instance, for a table T stored in /wh/T and partitioned on columnsds and ctry

Data Model (2)

F Data with ds value 20090101 and ctry value US,

F Will be stored in files within /wh/T/ds=20090101/ctry=US

Data Model (2)

Buckets:I Data within partitions is divided into buckets

I Buckets are calculated based on the hash of a column within thepartition

I Each bucket is stored within a file in the partition directory

Data Model (2)

partition

I Each bucket is stored within a file in the partition directory

Data Model (2)

Column Data Types

Primitive types: integers, floats, strings, dates, and booleans

Nestable collection types: arrays and maps

Custom types: user-defined

Column Data Types

HiveQL

Supports select, project, join, aggregate, union all, and sub-queries

Tables are created using data definition statements with specificserialization formats, partitioning, and bucketing

Data is loaded from external sources and inserted into tables

Support for multi-table insert – multiple queries on the same input datausing a single HiveQL statement

User-defined column transformation and aggregation functions in Java

Custom map-reduce scripts written in any language can be embedded

HiveQL

Example: Facebook Status

Status updates are stored on flat files in an NFS directory/logs/status_updates

This data is loaded on a daily basis to a Hive table:status_updates(userid int,status string,dsstring)

Using:

1 LOAD DATA LOCAL INPATH ’/logs/status_updates’2 INTO TABLE status_updates PARTITION (ds=’2013-05-26’)

Detailed profile information, such as gender and academic institution ispresent in the table: profiles(userid int,schoolstring,gender int)

Using:

Example: Facebook Status (2)

Query to workout the frequency of status updates based on gender andacademic institution

1 FROM (SELECT a.status, b.school, b.gender2 FROM status_updates a JOIN profiles b3 ON (a.userid = b.userid and4 a.ds=’2013-05-26’)5 ) subq16 INSERT OVERWRITE TABLE gender_summary7 PARTITION(ds=’2013-05-26’)8 SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender9 INSERT OVERWRITE TABLE school_summary

10 PARTITION(ds=’2013-05-26’)11 SELECT subq1.school, COUNT(1) GROUP BY subq1.school

Example: Facebook Status (2)

Query to workout the frequency of status updates based on gender andacademic institution

1 FROM (SELECT a.status, b.school, b.gender2 FROM status_updates a JOIN profiles b3 ON (a.userid = b.userid and4 a.ds=’2013-05-26’)5 ) subq16 INSERT OVERWRITE TABLE gender_summary7 PARTITION(ds=’2013-05-26’)8 SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender9 INSERT OVERWRITE TABLE school_summary

10 PARTITION(ds=’2013-05-26’)11 SELECT subq1.school, COUNT(1) GROUP BY subq1.school

Metastore

Similar to the metastore maintained by traditional warehousingsolutions such as Oracle and IBM DB2 (distinguishes Hive from Pig orCascading which have no such store)

Stored in either a traditional DB such as MySQL or an FS such as NFSContains the following objects:

I Database: namespace for tablesI Table: metadata for a table including columns and their types, owner,

storage, and serialization informationI Partition: metadata for a partition; similar to the information for a table

Metastore

Stored in either a traditional DB such as MySQL or an FS such as NFS

Contains the following objects:I Database: namespace for tablesI Table: metadata for a table including columns and their types, owner,

Metastore

I Database: namespace for tables

I Table: metadata for a table including columns and their types, owner,storage, and serialization information

I Partition: metadata for a partition; similar to the information for a table

Metastore

storage, and serialization information

I Partition: metadata for a partition; similar to the information for a table

Metastore

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary

Introduction

Two options for data analytics on shared nothing clusters:1 Parallel Databases, such as Teradata, Oracle etc. but,

I Assume that failures are a rare eventI Assume that hardware is homogeneousI Never tested in deployments with more than a few dozen nodes

2 MapReduce but,I All shortcomings pointed by DeWitt and Stonebraker, as discussed

beforeI At times an order of magnitude slower than parallel DBs

Introduction

I Assume that failures are a rare event

I Assume that hardware is homogeneousI Never tested in deployments with more than a few dozen nodes

Introduction

I Assume that failures are a rare eventI Assume that hardware is homogeneous

I Never tested in deployments with more than a few dozen nodes

Introduction

2 MapReduce but,

I All shortcomings pointed by DeWitt and Stonebraker, as discussedbefore

I At times an order of magnitude slower than parallel DBs

Introduction

before

I At times an order of magnitude slower than parallel DBs

Introduction

Hybrid

Combine scalability and non-existent monetary cost of MapReducewith performance of parallel DBs

HadoopDB is such a hybridI Unlike Hive, Pig, Greenplum, Aster, etc. which are language and

interface level hybrids, Hadoop DB is a systems level hybrid

Uses MapReduce as the communication layer atop a cluster of nodesrunning single-node DBMS instances

PostgreSQL as the database layer, Hadoop as the communicationlayer, and Hive as the translation layer

Commercialized through the start up, Hadapt2

2http://hadapt.com/Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 14 / 37

Hybrid

Combine scalability and non-existent monetary cost of MapReducewith performance of parallel DBsHadoopDB is such a hybrid

I Unlike Hive, Pig, Greenplum, Aster, etc. which are language andinterface level hybrids, Hadoop DB is a systems level hybrid

Hybrid

HadoopDB

Consists of four components:

1 Database Connector: Interface between per-node database systemsand Hadoop TaskTrackers

2 Catalog: Meta-information about per-node databases

3 Data Loader: Data partitioning across single-node databases

4 SQL to MapReduce to SQL (SMS) Planner: Translation betweenSQL and MapReduce

HadoopDB

HadoopDB Architecture

Database Connector

Uses the Java Database Connectivity (JDBC)-compliant HadoopInputFormat

The connector is served the SQL query and other information by theMapReduce job

The connector connects to the DB, executes the SQL query, andreturns results in the form of key/value pairs

Hadoop in essence sees the DB as just another data source

Database Connector

Catalog

Contains information, such as:1 Connection parameters, such as DB location, format, and any

credentials

2 Metadata about the datasets, replica locations, and partitioning scheme

Stored as an XML file on the HDFS

Catalog

credentials2 Metadata about the datasets, replica locations, and partitioning scheme

Catalog

credentials2 Metadata about the datasets, replica locations, and partitioning scheme

Data Loader

Consists of two key components:

1 Global Hasher: Executes a custom Hadoop job to repartition raw datafiles from the HDFS into n parts, where n is the number of nodes in thecluster

2 Local Hasher: Copies a partition from the HDFS to the node-local DBof each node and further partitions it into smaller size chunks

Data Loader

Consists of two key components:

1 Global Hasher: Executes a custom Hadoop job to repartition raw datafiles from the HDFS into n parts, where n is the number of nodes in thecluster

2 Local Hasher: Copies a partition from the HDFS to the node-local DBof each node and further partitions it into smaller size chunks

SQL to MapReduce to SQL (SMS) Planner

Extends HiveQL in two key ways:

1 Before query execution, the Hive Metastore is updated with referencesto HadoopDB tables, table schemas, formats, and serializationinformation

2 All operators with partitioning keys similar to the node-local databaseare converted into SQL queries and pushed to the database layer

SQL to MapReduce to SQL (SMS) Planner

Extends HiveQL in two key ways:

1 Before query execution, the Hive Metastore is updated with referencesto HadoopDB tables, table schemas, formats, and serializationinformation

2 All operators with partitioning keys similar to the node-local databaseare converted into SQL queries and pushed to the database layer

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary

Introduction

The declarative nature of SQL is too limiting for describing most bigdata computation

The underlying subsystems are also suboptimal as they do notconsider domain-specific optimizations

nCluster makes use of SQL/MR, a framework that inserts user-definedfunctions in any programming language into SQL queries

By itself, nCluster is a shared-nothing parallel database gearedtowards analytic workloads

Originally designed by Aster Data Systems and later acquired byTeradata

Used by Barnes and Noble, LinkedIn, SAS, etc.

Introduction

SQL/MR Functions

Dynamically polymorphic: input and output schemes are decided atruntime

Parallelizable across cores and machines

Composable because their input and output behaviour is identical toSQL subqueries

Amenable to static and dynamic optimizations just like SQL subqueriesor a relation

Can be implemented in a number of languages including Java, C#,C++, Python, etc. and can thus make use of third-party libraries

Executed within processes to provide sandboxing and resourceallocation

SQL/MR Functions

Syntax

1 SELECT ...2 FROM functionname(3 ON table-or-query4 [PARTITION BY expr, ...]5 [ORDER BY expr, ...]6 [clausename(arg, ...) ...]7 )8 ...

SQL/MR function appears in the FROM clause

ON is the only required clause which specifies the input to the function

PARTITION BY partitions the input to the function on one or moreattributes from the schema

Syntax

Syntax (2)

ORDER BY sorts the input to the function and can only be used after aPARTITION BY clause

Any number of custom clauses can also be defined whose names andarguments are passed as a key/value map to the function

Implemented as relations so easily nestable

Syntax (2)

Execution Model

Functions are equivalent to either map (row function) or reduce(partition function) functions

Identical to MapReduce, these functions are executed across manynodes and machinesContracts identical to MapReduce functions

I Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause

Execution Model

Identical to MapReduce, these functions are executed across manynodes and machines

Contracts identical to MapReduce functionsI Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause

Execution Model

I Only one row function operates over a row from the input table

I Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause

Execution Model

I Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause

Programming Interface

A Runtime Contract is passed by the query planner to thefunction which contains the names and types of the input columns andthe names and values of the argument clauses

The function then completes this contract by filling in the outputschema and making a call to complete()Row and partition functions are implemented through theoperateOnSomeRows and operateOnPartition methods,respectively

I These methods are passed an iterator over their input rows and anemitter object for returning output rows to the database

operateOnPartition can also optionally implement the combinerinterface

The function then completes this contract by filling in the outputschema and making a call to complete()

Row and partition functions are implemented through theoperateOnSomeRows and operateOnPartition methods,respectively

Installation

Functions need to be installed first before they can be used

Can be supplied as a .zip along with third-party libraries

Install-time examination also enables static analysis of properties, suchas row function or partition function, support for combining, etc.

Any arbitrary file can be installed which is replicated to all workers,such as configuration files, binaries, etc.

Each function is provided with a temporary directory which is garbagecollected after execution

Installation

Architecture

One or more Queen nodes process queries and hash partition themacross Worker nodes

The query planner honours the Runtime Contract with thefunction and invokes its initializer (Constructor in case of Java)

Functions are executed within the Worker databases as separateprocesses for isolation, security, resource allocation, forcedtermination, etc.

The worker database implements a “bridge” which manages itscommunication with the SQL/MR function

The SQL/MR function process contains a “runner” which manages itscommunication with the worker database

Architecture

Architecture (2)

Example: Wordcount

1 SELECT token, COUNT(*)2 FROM tokenizer(3 ON input-table4 DELIMITER(’ ’)5 )6 GROUP BY token;

Example: Clickstream Sessionization

Divide a user’s clicks on a website into sessions

A session includes the user’s clicks within a specified time period

Timestamp User ID10:00:00 23890900:58:24 765610:00:24 23890902:30:33 765610:01:23 23890910:02:40 238909

Timestamp User ID Session ID10:00:00 238909 010:00:24 238909 010:01:23 238909 010:02:40 238909 100:58:24 7656 002:30:33 7656 1

Example: Clickstream Sessionization (2)

1 SELECT ts, userid, session2 FROM sessionize (3 ON clicks4 PARTITION BY userid5 ORDER BY ts6 TIMECOLUMN (’ts’)7 TIMEOUT (60)8 );

Example: Clickstream Sessionization (3)

1 public class Sessionize implements PartitionFunction {23 private int timeColumnIndex;4 private int timeout;56 public Sessionize(RuntimeContract contract) {7 // Get time column and timeout from contract8 // Define output schema9 contract.complete();

10 }1112 public void operationOnPartition(13 PartitionDefinition partition,14 RowIterator inputIterator,15 RowEmitter outputEmitter) {16 // Implement the partition function logic17 // Emit output rows18 }1920 }

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary

Summary

Hive, HadoopDB, and nCluster explore three different points in the designspace

1 Hive uses MapReduce to give DBMS-like functionality

2 HadoopDB uses MapReduce and DBMS side-by-side

3 nCluster implements MapReduce within a DBMS

Summary

References

1 Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, PrasadChakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and RaghothamMurthy. 2009. Hive: a warehousing solution over a map-reduceframework. Proc. VLDB Endow. 2, 2 (August 2009), 1626-1629.

2 Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, AviSilberschatz, and Alexander Rasin. 2009. HadoopDB: an architecturalhybrid of MapReduce and DBMS technologies for analytical workloads.Proc. VLDB Endow. 2, 1 (August 2009), 922-933.

3 Eric Friedman, Peter Pawlowski, and John Cieslewicz. 2009.SQL/MapReduce: a practical approach to self-describing, polymorphic,and parallelizable user-defined functions. Proc. VLDB Endow. 2, 2(August 2009), 1402-1413.

MapReduce and DBMS Hybrids

Technology

Transcript of MapReduce and DBMS Hybrids