Overview of the Hive Stinger Initiative

Eric N. Hanson

Principal Software Development Engineer

Microsoft HDInsight Team

30 June 2014

What is Stinger? Umbrella term for…

• Faster query in Hive• ORC• Vectorization• Tez

• Better language features for analysis• Window functions etc.

Why Stinger?

• Hive has good functionality

• But it started out sloooowww

• Need to speed it up• keep it competitive • make it fun to use

• A good columnstore format

• Run length encoding, value encoding, dictionary encoding

• Layers stream compression over the top

• Written by Owen O’Malley

• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html

Using ORC

• create table Tbl (col int) stored as orc;

• orc.compress default ZLIB

• See http://www.slideshare.net/oom65/orc-andvectorizationhadoopsummit

TPC-DS File Sizes

*Courtesy of Hortonworks

Vectorization

How the code works (simplified)

class LongColumnAddLongScalarExpression {int inputColumn;int outputColumn;long scalar;void evaluate(VectorizedRowBatch batch) {

long [] inVector =((LongColumnVector) batch.columns[inputColumn]).vector;

long [] outVector = ((LongColumnVector) batch.columns[outputColumn]).vector;

if (batch.selectedInUse) {for (int j = 0; j < batch.size; j++) {

int i = batch.selected[j];outVector[i] = inVector[i] + scalar;

} } else {

for (int i = 0; i < batch.size; i++) {outVector[i] = inVector[i] + scalar;

No method callsLow instruction countCache locality to 1024 valuesNo pipeline stallsSIMD in Java 8

Vectorization and Compilation

• Vectorization “instructions” generated from templates

• Example’s:– Int add col-col

– Int add col-scalar

– Int add scalar-col

–Double add col-col

–Double add col-scalar

–Double add scalar-col

–And hundreds more!

• Pre-compilation of expressions

• Reduces # of function calls and instructions at runtime

• Expressions like (a + 2) / b are interpreted with these primitives

Example of vectorized template code

} else {

if (batch.selectedInUse) {

for(int j = 0; j != n; j++) {

int i = sel[j];

outputVector[i] = vector1[i] <OperatorSymbol> vector2[i];

} else {

for(int i = 0; i != n; i++) {

outputVector[i] = vector1[i] <OperatorSymbol> vector2[i];

Using vectorization in Hive

• set hive.vectorized.execution.enabled = true;

• Run query over ORC

• Only works for scalar types

• https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution

• ~5X CPU reduction

Apache Tez (“Speed”)• Replaces MapReduce as primitive for Pig, Hive, Cascading etc.

– Smaller latency for interactive queries

– Higher throughput for batch queries

– 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft

YARN ApplicationMaster to run DAG of Tez Tasks

Task with pluggable Input, Processor and Output

Tez Task - <Input, Processor, Output>

ProcessorInput Output

Tez: Building blocks for scalable data processing

Classical ‘Map’ Classical ‘Reduce’

Intermediate ‘Reduce’ for

Map-Reduce-Reduce

Map Processor

HDFS Input

Sorted Output

Reduce Processor

Shuffle Input

HDFS Output

Reduce Processor

Shuffle Input

Sorted Output

Hive – MR Hive – Tez

Hive-on-MR vs. Hive-on-TezSELECT a.x, AVERAGE(b.y) AS avg

FROM a JOIN b ON (a.id = b.id) GROUP BY a

UNION SELECT x, AVERAGE(y) AS AVG

FROM c GROUP BY x

ORDER BY AVG;

SELECT a.state

JOIN (a, c)SELECT c.price

SELECT b.id

JOIN(a, b)GROUP BY a.state

COUNT(*)AVERAGE(c.price)

SELECT a.state,c.itemId

JOIN (a, c)

JOIN(a, b)GROUP BY a.state

COUNT(*)AVERAGE(c.price)

SELECT b.id

Tez avoids unneeded writes to HDFS

Tez Sessions

… because Map/Reduce query startup is expensive

• Tez Sessions–Hot containers ready for immediate use

–Removes task and job launch overhead (~5s – 30s)

• Hive–Session launch/shutdown in background (seamless, user not aware)

–Submits query plan directly to Tez Session

Native Hadoop service, not ad-hoc

Stinger Phase 3: Interactive Query In Hadoop

Hive 10 Trunk (Phase 3)Hive 0.11 (Phase 1)

190xImprovement

TPC-DS Query 27

TPC-DS Query 82

200xImprovement

Query 27: Pricing Analytics using Star Schema Join Query 82: Inventory Analytics Joining 2 Large Fact Tables

All Results at Scale Factor 200 (Approximately 200GB Data)

How you can use Stinger enhancements

• Use Hive 13

• Use ORC: create table … stored as ORC

• Enable vectorization: set hive.vectorized.execution.enabled=true

• Enable Tez: set hive.execution.engine=tez

• See http://hortonworks.com/hadoop-tutorial/supercharging-

interactive-queries-hive-tez/

Reference(s)

• Stinger overview, Strata, fall 2013: http://www.slideshare.net/alanfgates/strata-stingertalk-oct2013?qid=09d16028-bd7e-47d8-8438-34f3242c6f0e&v=qf1&b=&from_search=1

Slides marked “Courtesy of Hortonworks” are from Hortonworks talks

Overview of the Hive Stinger Initiative

Technology

Transcript of Overview of the Hive Stinger Initiative

Stinger FS/Stinger FS+ - Nokia...Stinger® FS/Stinger FS+ Getting Started Guide iii Customer Service Product and service information, and software upgrades, are available 24 hours

Stinger System User Manual

The Stinger - October 2010

Stinger Casino Night 2013

Stinger 2014

Stinger 3.1.0 Manual - bsiusa.com · BSI Stinger 3.1 Manual Broadcast Software International Page 6 Chapter 1: Introduction Overview BSI Stinger 3.1 provides instant playback and

Stinger march 2015 teaser

SQL on Hadoop für praktikables BI auf Big Data Hans-Peter ... · Kylin. 15 Stinger initiative ... Mondrian mit Hive / Impala + Saiku UI Praktische Erfahrung! für dieses Setup mit!

Assembly Instructions STINGER

Proposal of Stinger Bits

The Stinger - March 2016

BSI Stinger 3.0 Manual Broadcast Software International Stinger · 2014-05-01 · BSI Stinger 3.0 Manual Broadcast Software International Page 4 Introduction Overview BSI Stinger

OWNER’S MANUAL - Stinger Products

The Stinger

DHS Stinger Issue 5

ca- stinger 6500/6150

HIVE HIVE HIVE HIVE

BSI Stinger 3.0 Manual Broadcast Software International ... · BSI Stinger 3.0 Manual Broadcast Software International Page 4 Introduction Overview BSI Stinger 3.0 provides instant

The Stinger sales brochure

NITROUS STINGER AND HORNET PLATE TECHNOLOGY Articles/stinger-and-hornet-technology.pdfwith the Stinger and Hornet Plates In 2014 Nitrous Outlet revolutionized the nitrous industry