Parallel Query Optimization

Post on 14-Jan-2016

53 views 0 download

description

Parallel Query Optimization. Memory. One tuple at a time. Bucket B. Bucket A. Bucket Sizes and I/O Costs. Bucket B does not fit in the memory in its entirety, It must be loaded several times. Memory. One tuple at a time. Bucket B(1). Bucket A(1). Bucket A(2). Bucket B(2). Bucket A(3). - PowerPoint PPT Presentation

Transcript of Parallel Query Optimization

Fall 2008 Parallel Query Optimization 1

Parallel Query Optimization

Fall 2008 Parallel Query Optimization 2

Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,

It must be loaded several times.

Bucket B

Memory

Bucket A

One tuple at a time

Fall 2008 Parallel Query Optimization 3

Fit in Memory

Bucket B fits in memory. It needs to be loaded only once.

Bucket B(2)

Bucket B(1)

Memory

Bucket A(1)

One tuple at a time

Bucket B(3)

Bucket A(2)

Bucket A(3)

Fall 2008 Parallel Query Optimization 4

Hash-Based Join

Fall 2008 Parallel Query Optimization 5

GRACE Algorithm

Fall 2008 Parallel Query Optimization 6

Data Skew System performance is very sensitive to the skewn

ess in tuple distribution.

Fall 2008 Parallel Query Optimization 7

Zipf-like DistributionTotal: 1,000,000tuples

Fall 2008 Parallel Query Optimization 8

Partition Tuning Best Fit Decreasing Strategy:

In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size.

In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN).

This process is repeated until all the buckets have been allocated.

This is a dynamic load balancing technique.

Fall 2008 Parallel Query Optimization 9

Best Fit Decreasing Strategy

Fall 2008 Parallel Query Optimization 10

Adaptive Load Balancing (ABJ+)

Fall 2008 Parallel Query Optimization 11

ABJ+ vs. GRACE

Fall 2008 Parallel Query Optimization 12

L_LBO in Multi-way Join Queries

L_LBO: Linear Tree with Load Balancing A multi-way join query is treated as a sequential

order of two-way (or single) joins by using ABJ+.

Fall 2008 Parallel Query Optimization 13

B_NLB in Multi-way Join Queries

B_NLB: Bushy Tree without Load Balancing It tries to join as many pairs of relations as possibl

e.Split Phase: Each PN partitions its portion of each relation

into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID.

Join Phase: Each PN performs the local joins.

Fall 2008 Parallel Query Optimization 14

NLBO in Multi-way Join Queries

NLBO: No Load Balancing Optimization

Like B_NLB, it tries to join as many pairs of relations as possible.

Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks.

Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy.

Join Phase: Each PN performs the local joins.

Fall 2008 Parallel Query Optimization 15

LBO in Multi-way Join Queries

LBO: Load Balancing Optimization

Hash Phase: hashed and stored back into local disks.

Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently.

Executing Phase:

Stage 1: Tune the partitions.

Stage 2: Perform the join operation.

Stage 3: Update the join graph, then go to Optimization Phase.

Fall 2008 Parallel Query Optimization 16

Optimization Phase of LBO

Fall 2008 Parallel Query Optimization 17

Effect of Bucket Skew

Fall 2008 Parallel Query Optimization 18

LBO-FR LBO-SFR: LBO with Fragment & Replicate Featu

re LBO-FR is similar to LBO, except it partitions bu

cket pairs into subbucket pairs if those buckets are too large.

Example: suppose bucket pair (S1, R1) is too large and |S1| > |R2|.

S1

R1

S1,1

R1

S1,2

R1

S1,1

R1

S1,2

R1

S1,3

R1

Fall 2008 Parallel Query Optimization 19

LBO-SFR LBO-SFR: LBO with Symmetric Fragment &

Replicate Feature

S1,1,1

R1,1,1

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,1,2

R1,2,1

S1,2,2

R1,2,2

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,3,1

R1,1,3

S1,1,2

R1,2,1

S1,2,2

R1,2,2

S1,3,2

R1,2,3

|S1|>|R1| |S1,1,1|<|R1,1,1|

|S1,1,1|>|R1,1,1|

Parti. S1Parti. R1 Parti. S1

Fall 2008 Parallel Query Optimization 20

Effect of Bucket Skew