OSCON 2013: Apache Drill Workshop > Execution & ValueVectors
-
Upload
apachedrill -
Category
Technology
-
view
1.197 -
download
1
description
Transcript of OSCON 2013: Apache Drill Workshop > Execution & ValueVectors
2
Drill is…
–Optimistic & Pipelined–Columnar & Late materialized–Vectorized –Language Agnostic–MPP Query Engine
3
Optimistic Execution
Optimistic Recovery Pipelined Scheduling Pipelined Communication
4
Optimistic Recovery
Assume Failures Don’t overbuild for them– The shorter the queries, the less work lost on failure
Graceful management of node failure at a system level– Individual queries must be rerun
Avoid the overhead of persistence and barriers.
5
Pipelined Operators
Pipelining – push data along as soon as it is available– Cross-operator and cross-node
Straight forward for simple operators like filter, project Also possible with less common things like sort, radix hash join– External Sort: merge only what is needed to push first part of data down
pipeline
Destination buffering rather source buffering
6
Full pipelining requires query at once scheduling
Query at Once Schedule entire query at once
Pros:– Fastest data movement– Less herd effect
Cons:– Poorer workload distribution– Failure checkpoints hard
Task by Task Schedule each task when all
previous tasks are completed
Pros:– Potential better workload
distribution– Failure checkpoints
straightforward
Cons:– Slower data movement– Poorer routing decision
7
Comparison with MapReduce
Barriers–Map completion required before shuffle/reduce
commencement– All maps must complete before reduce can start– In chained jobs, one job must finish entirely before the next
one can start Persistence and Recoverability– Data is persisted to disk between each barrier– Serialization and deserialization are required between
execution phase
8
Record versus Columnar Representation
Record Column
9
Data Format ExampleDonut Price Icing
Bacon Maple Bar 2.19 [Maple Frosting, Bacon]
Portland Cream 1.79 [Chocolate]
The Loop 2.29 [Vanilla, Fruitloops]
Triple Chocolate Penetration
2.79 [Chocolate, Cocoa Puffs]
Record EncodingBacon Maple Bar, 2.19, Maple Frosting, Bacon, Portland Cream, 1.79, ChocolateThe Loop, 2.29, Vanilla, Fruitloops, Triple Chocolate Penetration, 2.79, Chocolate, Cocoa Puffs
Columnar EncodingBacon Maple Bar, Portland Cream, The Loop, Triple Chocolate Penetration2.19, 1.79, 2.29, 2.79Maple Frosting, Bacon, Chocolate, Vanilla, Fruitloops, Chocolate, Cocoa Puffs
10
Places to Apply Columnar
Columnar Storage (on disk)– Improved compression when similar data is co-located – Alternative compression techniques: dictionary, RLE, delta– Avoid column reads when not needed
Columnar Execution (in memory)– Improved cache locality– Improved cpu pipelineing (especially with things like null
checks)– Can reduce memory copies–Maintain unusual encoding schemas for direct relational
operator use
11
Columnar Execution: When to materialize
Users want rows Data is Columnar When do you transform?–On read into memory–On return to user–Somewhere in between
Later is generally better–Not always :)
12
Late Decompression
Don’t necessarily materialize each value Reduce memory consumption Reduce CPU cost Examples: RLE, Bit Dictionary
13
Example: RLE and Sum
Dataset – 2, 4– 8, 10
Goal– Sum all the records
Normal Work– Decompress & store: 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8– Add: 2 + 2 + 2 + 2 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 + 8
Optimized Work– 2 * 4 + 8 * 10– Less Memory, less operations
14
Example: Bitpacked Dictionary VarChar Sort
Dataset:– Dictionary: [Rupert, Bill, Larry]– Values: [1,0,1,2,1,2,1,0]
Normal Work: – Decompress & store: Bill, Rupert, Bill, Larry, Bill, Larry, Bill, Rupert– Sort: ~24 comparisons of variable width strings (requiring length
lookup and check during comparisons) Optimized Work– Sort Dictionary: {Bill: 1, Larry: 2, Rupert: 0}– Sort bitpacked values– Work: max 3 string comparisons, ~24 comparisons of fixed-width
dictionary bits– Data in 16 bits as opposed 368/736 for UTF8/16
15
Storage versus Relational operators
How do you write operator implementations for many different data representations– If you’re trying to inline, you have to avoid abstractions to complex for JVM
to simplify
Push optimizations to storage layer for things like RLE– Rare that data is exactly in desired format beyond simplest queries
Define a primary in-memory representation for columnar data– Support alternative randomly-accesible compressions schemas in all
operators (such as Dictionary/Bitpacked)
16
Vectorization
Operating on more than one record at the same time–Old school: use word-sized manipulations when records are
stored smaller than word size–New School: SIMD (single input multiple data) instructions• GCC, LLVM and JVM all to various otpimizations
automatically• More can be had manually coding algorithms
– Logical Vectorization:• Using general record characteristics to reduce CPU cycles per
collection of records
Alternative Meaning– Avoiding branching to speed CPU pipeline, working on large
cache local data in process
17
Drill Columnar Approach
A RecordBatch contains one or more ValueVectors corresponding to each Field within a BatchSchema
Operators can operate directly against ValueVector or work with an alternative view of data by work leveraging a SelectionVector
Leverage simple Vectorization and trust JIT to optimize SIMD by generating simple buffer based operations and loops.– Explore performance impact of advanced SIMD in C for specific
operators
18
Record Batch
Unit of work for the query system– Operators always work on a batch of records
All values associated with a particular collection of records
Each record batch must have a single defined schema– Possibly includes fields that have embedded types if
you have a heterogeneous field
Record batches are pipelined between operators and nodes
No more than 65k records Target single L2 cache (~256k) Operator reconfiguration is done at RecordBatch
boundaries
RecordBatch
VV VV VV VV
RecordBatch
VV VV VV VV
RecordBatch
VV VV VV VV
19
SelectionVector
Includes particular records from consideration by record batch index
Avoids early copying of records after applying filtering–Maintains random accessibility
All operators need to support SelectionVector accessDonut Price IcingBacon Maple Bar 2.19 [Maple Frosting,
Bacon]Portland Cream 1.79 [Chocolate]The Loop 2.29 [Vanilla, Fruitloops]
Triple Chocolate Penetration
2.79 [Chocolate, Cocoa Puffs]
Selection Vector0
3
20
ValueVector
One ore more contiguous buffers of data containing values– Stored in Native Order– In-memory representation fully specified for cross language portability
Associated with a single field– Synonymous with column in traditional flat tables
Nested fields are separate ValueVectors Randomly accessible Defined for each System datatype Each has Accessor and Mutator– Primitives and simple primitive “structs” are access interfaces
21
Drill DataTypes
MajorType = MinorType + DataMode + (Width|Scale)?
MinorType–Describes width and nature of data: smallint, bigint,
uint32, varchar4 (utf8), var16char4 (utf16) DataMode:–Optional (nullable)–Required (non-nullable)–Repeated (non item list/array)
22
Traditional 3 value semantics & Drill 4 value
SQL’s 3-Valued Semantics–True–False–Unknown
Drill adds fourth–Repeated
23
Fixed Value Vectors
24
Nullable Values
25
Repeated Values
26
Variable Width
27
Repeated Map
28
Strengths of RecordBatch + ValueVectors
RecordBatch separates high performance/low performance space– Record-by-record, avoid method invocation– Batch-by-batch, trust JVM
Avoid serialization/deserialization Off-heap means large memory footprint without GC woes Full specification combined with off-heap and batch-level
execution allows C/C++ operators as necessary Random access: sort without restructuring
29
Code Play Time
Get Latest Drill git clone git://git.apache.org/incubator-drill.git cd incubator-drill/sandbox/prototype git checkout 9f69ed0 mvn clean install
Download OSCON Drill examples: git clone https://github.com/jacques-n/oscon-drill.git cd oscon-drill mvn install cd vectors
http://bit.ly/19goc7R
30
Vectors Exercise
Goals RPC implementation to minimize data copies and support keeping all
data off-heap Basic benchmark analysis comparing ValueVectors and straight
ProtoBuf encoding
Logic C = A + B Assume two lists of fixed four byte integers (list a and list b). Send them to remote node Remote node decodes them, adds the two numbers together for
each record, then returns the list (list c) First node sums all returning numbers and verifies expected result
31
Vectors Exercise
├── pom.xml
└── src
├── main/java/org/apache/drill/oscon/rpc
│ │ ├── ClientConnectFuture.java
│ │ ├── ExampleClient.java
│ │ ├── ExampleConfig.java
│ │ └── ExampleServer.java
│ └── protobuf
│ └── Example.proto
└── test/java/org/apache/drill/oscon/rpc
└── TestRpc.java