Parallel Query Optimization

of 20 /20
Fall 2008 Parallel Query Optimizati on 1 Parallel Query Optimization

Embed Size (px)

description

Parallel Query Optimization. Memory. One tuple at a time. Bucket B. Bucket A. Bucket Sizes and I/O Costs. Bucket B does not fit in the memory in its entirety, It must be loaded several times. Memory. One tuple at a time. Bucket B(1). Bucket A(1). Bucket A(2). Bucket B(2). Bucket A(3). - PowerPoint PPT Presentation

Transcript of Parallel Query Optimization

  • Parallel Query Optimization

    Parallel Query Optimization

  • Bucket Sizes and I/O CostsBucket B does not fit in the memory in its entirety, It must be loaded several times.

    Parallel Query Optimization

  • Fit in MemoryBucket B fits in memory. It needs to be loaded only once.

    Parallel Query Optimization

  • Hash-Based Join

    Parallel Query Optimization

  • GRACE Algorithm

    Parallel Query Optimization

  • Data SkewSystem performance is very sensitive to the skewness in tuple distribution.

    Parallel Query Optimization

  • Zipf-like DistributionTotal: 1,000,000tuples

    Parallel Query Optimization

  • Partition TuningBest Fit Decreasing Strategy: In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size.In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN).This process is repeated until all the buckets have been allocated.This is a dynamic load balancing technique.

    Parallel Query Optimization

  • Best Fit Decreasing Strategy

    Parallel Query Optimization

  • Adaptive Load Balancing (ABJ+)

    Parallel Query Optimization

  • ABJ+ vs. GRACE

    Parallel Query Optimization

  • L_LBO in Multi-way Join QueriesL_LBO: Linear Tree with Load BalancingA multi-way join query is treated as a sequential order of two-way (or single) joins by using ABJ+.

    Parallel Query Optimization

  • B_NLB in Multi-way Join QueriesB_NLB: Bushy Tree without Load Balancing It tries to join as many pairs of relations as possible.Split Phase: Each PN partitions its portion of each relation into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID.Join Phase: Each PN performs the local joins.

    Parallel Query Optimization

  • NLBO in Multi-way Join QueriesNLBO: No Load Balancing OptimizationLike B_NLB, it tries to join as many pairs of relations as possible.Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks.Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy. Join Phase: Each PN performs the local joins.

    Parallel Query Optimization

  • LBO in Multi-way Join QueriesLBO: Load Balancing OptimizationHash Phase: hashed and stored back into local disks.Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently.Executing Phase: Stage 1: Tune the partitions. Stage 2: Perform the join operation. Stage 3: Update the join graph, then go to Optimization Phase.

    Parallel Query Optimization

  • Optimization Phase of LBO

    Parallel Query Optimization

  • Effect of Bucket Skew

    Parallel Query Optimization

  • LBO-FRLBO-SFR: LBO with Fragment & Replicate FeatureLBO-FR is similar to LBO, except it partitions bucket pairs into subbucket pairs if those buckets are too large.Example: suppose bucket pair (S1, R1) is too large and |S1| > |R2|.

    S1R1

    S1,1R1S1,2R1

    S1,1R1S1,2R1S1,3R1

    Parallel Query Optimization

  • LBO-SFRLBO-SFR: LBO with Symmetric Fragment & Replicate Feature|S1|>|R1||S1,1,1||R1,1,1|Parti. S1Parti. R1Parti. S1

    S1,1,1R1,1,1

    S1,1,1R1,1,1S1,2,1R1,1,2

    S1,1,1R1,1,1S1,2,1R1,1,2S1,1,2R1,2,1S1,2,2R1,2,2

    S1,1,1R1,1,1S1,2,1R1,1,2S1,3,1R1,1,3S1,1,2R1,2,1S1,2,2R1,2,2S1,3,2R1,2,3

    Parallel Query Optimization

  • Effect of Bucket Skew

    Parallel Query Optimization