Parallel Query Optimization

Fall 2008 Parallel Query Optimization 1

Parallel Query Optimization


Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,

It must be loaded several times.

Bucket B

Memory

Bucket A

One tuple at a time


Fit in Memory

Bucket B fits in memory. It needs to be loaded only once.

Bucket B(2)

Bucket B(1)

Memory

Bucket A(1)

One tuple at a time

Bucket B(3)

Bucket A(2)

Bucket A(3)


Hash-Based Join


GRACE Algorithm


Data Skew System performance is very sensitive to the skewn

ess in tuple distribution.


Zipf-like DistributionTotal: 1,000,000tuples


Partition Tuning Best Fit Decreasing Strategy:

In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size.

In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN).

This process is repeated until all the buckets have been allocated.

This is a dynamic load balancing technique.


Best Fit Decreasing Strategy


Adaptive Load Balancing (ABJ+)


ABJ+ vs. GRACE


L_LBO in Multi-way Join Queries

L_LBO: Linear Tree with Load Balancing A multi-way join query is treated as a sequential

order of two-way (or single) joins by using ABJ+.


B_NLB in Multi-way Join Queries

B_NLB: Bushy Tree without Load Balancing It tries to join as many pairs of relations as possibl

e.Split Phase: Each PN partitions its portion of each relation

into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID.

Join Phase: Each PN performs the local joins.


NLBO in Multi-way Join Queries

NLBO: No Load Balancing Optimization

Like B_NLB, it tries to join as many pairs of relations as possible.

Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks.

Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy.

Join Phase: Each PN performs the local joins.


LBO in Multi-way Join Queries

LBO: Load Balancing Optimization

Hash Phase: hashed and stored back into local disks.

Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently.

Executing Phase:

Stage 1: Tune the partitions.

Stage 2: Perform the join operation.

Stage 3: Update the join graph, then go to Optimization Phase.


Optimization Phase of LBO


Effect of Bucket Skew


LBO-FR LBO-SFR: LBO with Fragment & Replicate Featu

re LBO-FR is similar to LBO, except it partitions bu

cket pairs into subbucket pairs if those buckets are too large.

Example: suppose bucket pair (S1, R1) is too large and |S1| > |R2|.

S1

R1

S1,1

R1

S1,2

R1

S1,1

R1

S1,2

R1

S1,3

R1


LBO-SFR LBO-SFR: LBO with Symmetric Fragment &

Replicate Feature

S1,1,1

R1,1,1

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,1,2

R1,2,1

S1,2,2

R1,2,2

S1,1,1

R1,1,1

S1,2,1

R1,1,2

S1,3,1

R1,1,3

S1,1,2

R1,2,1

S1,2,2

R1,2,2

S1,3,2

R1,2,3

|S1|>|R1| |S1,1,1|<|R1,1,1|

|S1,1,1|>|R1,1,1|

Parti. S1Parti. R1 Parti. S1


Effect of Bucket Skew

Parallel Query Optimization

Documents

Transcript of Parallel Query Optimization