OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

31
1 Apache Drill: Execution Jacques Nadeau, OSCON July 23, 2013 [email protected] |@intjesus

description

Discussion of Drill execution strategies

Transcript of OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

Page 1: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

1

Apache Drill: Execution

Jacques Nadeau, OSCON July 23, 2013

[email protected] |@intjesus

Page 2: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

2

Drill is…

–Optimistic & Pipelined–Columnar & Late materialized–Vectorized –Language Agnostic–MPP Query Engine

Page 3: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

3

Optimistic Execution

Optimistic Recovery Pipelined Scheduling Pipelined Communication

Page 4: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

4

Optimistic Recovery

Assume Failures Don’t overbuild for them– The shorter the queries, the less work lost on failure

Graceful management of node failure at a system level– Individual queries must be rerun

Avoid the overhead of persistence and barriers.

Page 5: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

5

Pipelined Operators

Pipelining – push data along as soon as it is available– Cross-operator and cross-node

Straight forward for simple operators like filter, project Also possible with less common things like sort, radix hash join– External Sort: merge only what is needed to push first part of data down

pipeline

Destination buffering rather source buffering

Page 6: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

6

Full pipelining requires query at once scheduling

Query at Once Schedule entire query at once

Pros:– Fastest data movement– Less herd effect

Cons:– Poorer workload distribution– Failure checkpoints hard

Task by Task Schedule each task when all

previous tasks are completed

Pros:– Potential better workload

distribution– Failure checkpoints

straightforward

Cons:– Slower data movement– Poorer routing decision

Page 7: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

7

Comparison with MapReduce

Barriers–Map completion required before shuffle/reduce

commencement– All maps must complete before reduce can start– In chained jobs, one job must finish entirely before the next

one can start Persistence and Recoverability– Data is persisted to disk between each barrier– Serialization and deserialization are required between

execution phase

Page 8: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

8

Record versus Columnar Representation

Record Column

Page 9: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

9

Data Format ExampleDonut Price Icing

Bacon Maple Bar 2.19 [Maple Frosting, Bacon]

Portland Cream 1.79 [Chocolate]

The Loop 2.29 [Vanilla, Fruitloops]

Triple Chocolate Penetration

2.79 [Chocolate, Cocoa Puffs]

Record EncodingBacon Maple Bar, 2.19, Maple Frosting, Bacon, Portland Cream, 1.79, ChocolateThe Loop, 2.29, Vanilla, Fruitloops, Triple Chocolate Penetration, 2.79, Chocolate, Cocoa Puffs

Columnar EncodingBacon Maple Bar, Portland Cream, The Loop, Triple Chocolate Penetration2.19, 1.79, 2.29, 2.79Maple Frosting, Bacon, Chocolate, Vanilla, Fruitloops, Chocolate, Cocoa Puffs

Page 10: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

10

Places to Apply Columnar

Columnar Storage (on disk)– Improved compression when similar data is co-located – Alternative compression techniques: dictionary, RLE, delta– Avoid column reads when not needed

Columnar Execution (in memory)– Improved cache locality– Improved cpu pipelineing (especially with things like null

checks)– Can reduce memory copies–Maintain unusual encoding schemas for direct relational

operator use

Page 11: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

11

Columnar Execution: When to materialize

Users want rows Data is Columnar When do you transform?–On read into memory–On return to user–Somewhere in between

Later is generally better–Not always :)

Page 12: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

12

Late Decompression

Don’t necessarily materialize each value Reduce memory consumption Reduce CPU cost Examples: RLE, Bit Dictionary

Page 13: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

13

Example: RLE and Sum

Dataset – 2, 4– 8, 10

Goal– Sum all the records

Normal Work– Decompress & store: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8– Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8

Optimized Work– 2 * 4 + 8 * 10– Less Memory, less operations

Page 14: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

14

Example: Bitpacked Dictionary VarChar Sort

Dataset:– Dictionary: [Rupert, Bill, Larry]– Values: [1,0,1,2,1,2,1,0]

Normal Work: – Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert– Sort: ~24 comparisons of variable width strings (requiring length

lookup and check during comparisons) Optimized Work– Sort Dictionary: {Bill: 1, Larry: 2, Rupert: 0}– Sort bitpacked values– Work: max 3 string comparisons, ~24 comparisons of fixed-width

dictionary bits– Data in 16 bits as opposed 368/736 for UTF8/16

Page 15: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

15

Storage versus Relational operators

How do you write operator implementations for many different data representations– If you’re trying to inline, you have to avoid abstractions to complex for JVM

to simplify

Push optimizations to storage layer for things like RLE– Rare that data is exactly in desired format beyond simplest queries

Define a primary in-memory representation for columnar data– Support alternative randomly-accesible compressions schemas in all

operators (such as Dictionary/Bitpacked)

Page 16: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

16

Vectorization

Operating on more than one record at the same time–Old school: use word-sized manipulations when records are

stored smaller than word size–New School: SIMD (single input multiple data) instructions• GCC, LLVM and JVM all to various otpimizations

automatically• More can be had manually coding algorithms

– Logical Vectorization:• Using general record characteristics to reduce CPU cycles per

collection of records

Alternative Meaning– Avoiding branching to speed CPU pipeline, working on large

cache local data in process

Page 17: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

17

Drill Columnar Approach

A RecordBatch contains one or more ValueVectors corresponding to each Field within a BatchSchema

Operators can operate directly against ValueVector or work with an alternative view of data by work leveraging a SelectionVector

Leverage simple Vectorization and trust JIT to optimize SIMD by generating simple buffer based operations and loops.– Explore performance impact of advanced SIMD in C for specific

operators

Page 18: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

18

Record Batch

Unit of work for the query system– Operators always work on a batch of records

All values associated with a particular collection of records

Each record batch must have a single defined schema– Possibly includes fields that have embedded types if

you have a heterogeneous field

Record batches are pipelined between operators and nodes

No more than 65k records Target single L2 cache (~256k) Operator reconfiguration is done at RecordBatch

boundaries

RecordBatch

VV VV VV VV

RecordBatch

VV VV VV VV

RecordBatch

VV VV VV VV

Page 19: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

19

SelectionVector

Includes particular records from consideration by record batch index

Avoids early copying of records after applying filtering–Maintains random accessibility

All operators need to support SelectionVector accessDonut Price IcingBacon Maple Bar 2.19 [Maple Frosting,

Bacon]Portland Cream 1.79 [Chocolate]The Loop 2.29 [Vanilla, Fruitloops]

Triple Chocolate Penetration

2.79 [Chocolate, Cocoa Puffs]

Selection Vector0

3

Page 20: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

20

ValueVector

One ore more contiguous buffers of data containing values– Stored in Native Order– In-memory representation fully specified for cross language portability

Associated with a single field– Synonymous with column in traditional flat tables

Nested fields are separate ValueVectors Randomly accessible Defined for each System datatype Each has Accessor and Mutator– Primitives and simple primitive “structs” are access interfaces

Page 21: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

21

Drill DataTypes

MajorType = MinorType + DataMode + (Width|Scale)?

MinorType–Describes width and nature of data: smallint, bigint,

uint32, varchar4 (utf8), var16char4 (utf16) DataMode:–Optional (nullable)–Required (non-nullable)–Repeated (non item list/array)

Page 22: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

22

Traditional 3 value semantics & Drill 4 value

SQL’s 3-Valued Semantics–True–False–Unknown

Drill adds fourth–Repeated

Page 23: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

23

Fixed Value Vectors

Page 24: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

24

Nullable Values

Page 25: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

25

Repeated Values

Page 26: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

26

Variable Width

Page 27: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

27

Repeated Map

Page 28: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

28

Strengths of RecordBatch + ValueVectors

RecordBatch separates high performance/low performance space– Record-by-record, avoid method invocation– Batch-by-batch, trust JVM

Avoid serialization/deserialization Off-heap means large memory footprint without GC woes Full specification combined with off-heap and batch-level

execution allows C/C++ operators as necessary Random access: sort without restructuring

Page 29: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

29

Code Play Time

Get Latest Drill git clone git://git.apache.org/incubator-drill.git cd incubator-drill/sandbox/prototype git checkout 9f69ed0 mvn clean install

Download OSCON Drill examples: git clone https://github.com/jacques-n/oscon-drill.git cd oscon-drill mvn install cd vectors

http://bit.ly/19goc7R

Page 30: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

30

Vectors Exercise

Goals RPC implementation to minimize data copies and support keeping all

data off-heap Basic benchmark analysis comparing ValueVectors and straight

ProtoBuf encoding

Logic C = A + B Assume two lists of fixed four byte integers (list a and list b). Send them to remote node Remote node decodes them, adds the two numbers together for

each record, then returns the list (list c) First node sums all returning numbers and verifies expected result

Page 31: OSCON 2013: Apache Drill Workshop > Execution & ValueVectors

31

Vectors Exercise

├── pom.xml

└── src

├── main/java/org/apache/drill/oscon/rpc

│   │   ├── ClientConnectFuture.java

│   │   ├── ExampleClient.java

│   │   ├── ExampleConfig.java

│   │   └── ExampleServer.java

│   └── protobuf

│   └── Example.proto

└── test/java/org/apache/drill/oscon/rpc

   └── TestRpc.java