SALA: A Skew-avoiding and Locality-aware Algorithm for...

Post on 12-Sep-2020

1 views 0 download

Transcript of SALA: A Skew-avoiding and Locality-aware Algorithm for...

WAIM 2015

SALA: A Skew-avoiding and Locality-aware Algorithmfor MapReduce-based Join

Author: Ziyu Lin, Min Xing Cai, Ziming Huang, Yongxuan Lai

Speaker: Minxing CaiDate: 2015-06-10

dblab.xmu.edu.cn

Background

Join operation

R join S on R.uid = S.uid

uid name …

1 Jacky …

2 Lucy …

3 Tom …

4 Kevin …

5 Richard …

… … …

Results of joinDataset R

uid page …

1 /book …

1 /music …

2 /music …

4 /movie …

5 /book …

… … …

Dataset S

uid name page …

1 Jacky /book …

1 Jacky /music …

2 Lucy /music …

4 Kevin /movie …

5 Richard /book …

… … … …

join

Background

MapReduce-based join (Repeartition join)• Redistributing the data to various partition based on the join key.• Key-value pairs that has the same key distributed to the same partition.• Join operation performed in the reduce phase.

Partition 1(Reducer 1)

Partition 2(Reducer 2)

uid name …

1 Jacky …

3 Tom …

5 Richard …

Part of join resultsPart of dataset R

uid page …

1 /book …

1 /music …

5 /book …

Part of dataset S

uid name page …

1 Jacky /book …

1 Jacky /music …

5 Richard /book …

join

uid name …

2 Lucy …

4 Kevin …

… … …

Part of join results

uid page …

2 /music …

4 /movie …

… … …

uid name page …

2 Lucy /music …

4 Kevin /movie …

… … … …

join

Part of dataset R Part of dataset S

Background

MapReduce-based join (Repeartition join):

The process of repartition join with the dataset R and S

Problems

MapReduce-based join suffer performance degradation from partitioning skew when handling skewed data.

• Partitioning skew describes an uneven distribution of key-value pairs across reducers.

Reducer 1 Reducer 2 Reducer 3 Reducer 4

Key-valuepairs

Key-valuepairs

Key-valuepairs

Key-valuepairs

• Default partitioning scheme of MapReduce is hash partitioning: hash(Key) mod R (R: the number of reducers)

• Hash partitioning can’t guarantee uniform distribution of data.

skewed partition

Problems

Execution time of a MapReduce job

Reduce 1Reduce 2Reduce 3

Reduce 4

skewed partition requires more time to fetch the intermediate results

• Reduce phase begins after finish of shuffle phase.• Transferring for skewed partition delays the shuffle phase.

Map phase Shuffle phase Reduce phase

Problems

Execution time of a MapReduce job

Map phase Shuffle phase Reduce phase

• The whole execution time is determined by the slowest reducer.• Skewed partition requires more computing time and therefore

delays the whole job.

skewed partition requires more computing time to perform join operation

Reduce 1Reduce 2Reduce 3

Reduce 4

Existing Approaches

Dynamic Task Splitting• Dynamically split slow tasks to the node which is idle;• but adds a lot of complexity.

Better partitioning scheme• Making a better partition based on the key’s frequency

distribution to achieve load balance;• but requires extra time to obtain key’s frequency.

A Simple Approach: SALA

Key idea: distribute intermediate results based on the distribution information of key’s frequency and location.

Utilizing data locality to reduces the amount of intermediate results transferred across the network would improve the performance.

Reducer 1(Node 1)

Reducer 2(Node 2)

Reducer 3(Node 3)

Reducer 4(Node 4)

Transferred on local

Transferred across the network“Moving computation is cheaper than moving data”

intermediate resultsof map phase

Partition 1

Node 1

Partition 2

Partition 3

Partition 4

A Simple Approach: SALA

Key scheme: Volume/Locality-aware Partitioning. Achieve better load balance and larger data locality

This scheme adopting greedy selection strategy as follows: • (Volume) First process the key value which has larger size of

intermediate results.• (Locality) Each key value is distributed in higher priority to the

node on which most intermediate results of this key are located.

Volume/Locality-aware Partitioning

A simple example

Node 1 Node 2

Join key

number of KV pairs

3 12

4 2

5 9

6 12

8 35

Join key

amoutof KV pairs

1 7

2 22

4 18

8 13

9 10

Node 3

Join key

amoutof KV pairs

1 29

2 16

5 11

8 14

Distribution of intermediate results (each node has 70 KV pairs)

Volume/Locality-aware Partitioning

Partitioning skew happens when using hash partitioning

Reducer 1 Reducer 2

Key: 9Key: 6

Key: 4

Key: 1

Key: 3

Reducer 3

Key: 8

Key: 5

Key: 2

too much data are distributed to reducer-3

locality: 45%

A simple example

Volume/Locality-aware Partitioning

The demo of volume/locality-aware partitioning.

A simple example

Volume/Locality-aware Partitioning

Volume

Amount of KV pairs / N reducer(in this example, volume = 70)

Reducer 1 Reducer 2 Reducer 3

A simple example

idle volume: 70idle volume: 8

Volume/Locality-aware Partitioning

• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......

Reducer 1 Reducer 2 Reducer 3

A simple example.

T8 = 62 (total amount of key-8 pairs)V1 = 70 (idle volume of reducer-1)

idle volume: 70 idle volume: 70

Key: 8T8 < V1,Thus key-8 is distribute to reducer-1

Volume/Locality-aware Partitioning

• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......

Reducer 1 Reducer 2 Reducer 3

A simple example.

idle volume: 70 idle volume: 70

Key: 8

T1 < V3,key-1 is distribute to reducer-3

Key: 1

idle volume: 8

Volume/Locality-aware Partitioning

• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......

Reducer 1 Reducer 2 Reducer 3

A simple example.

idle volume: 12 idle volume: 34

Key: 8

T6 =12, V1 = 8T6 > V1,idle volume for target reducer is not enough, mark it and go on.

Key: 1Key: 2

Key: 4

idle volume: 8

Volume/Locality-aware Partitioning

• After the traverse, there may be some key values which have not been partitioned: (3, N1, 12)

Reducer 1 Reducer 2 Reducer 3

A simple example.

idle volume: 8 idle volume: 2 idle volume: 2

Key: 8

Key: 1Key: 2

Key: 4 Key: 5

Key: 6Key: 9In this case, find out the reducer which has the most idle volume(reducer-1 for now), and distribute that key to this reducer.

Key: 3

Volume/Locality-aware Partitioning

achieve load balance

larger locality: 62%

Partitioning results using volume/locality-aware partitioning scheme

A simple example.

Reducer 1 Reducer 2 Reducer 3

Key: 8

Key: 2 Key: 1

Key: 4 Key: 5

Key: 3Key: 9 Key: 6

The process of SALA algorithm

• Phase 1: sample the dataset and pre-compute partitioning results.• Phase 2: perform the repartition join, but directly partitions the intermediate results according to the partitioning results of phase 1.

The process of SALA algorithm

We implemented the SALA algorithms and run experiments on AliCloud to verify the efficiency of our approach• SALA achieves better load-balance meanwhile reduces network

overhead.

Experiments

Data distribution of different join algorithms

Experiments

• SALA speeds up the join operation when handling skewed data.• SALA performs much better under low bandwidth.

Response time of different join algorithms

Conclusion

• With the study of MapReduce-based join, We propose SALA join algorithm, using volume/locality-aware partitioning to distribute intermediate results.

• SALA guarantees the uniform distribution of data and avoids partitioning skew problem.

• SALA takes full advantage of the data locality feature to reduce the network overhead.

• Experiments show that SALA is efficient to deal with skewed data.

Thanks for you time.

dblab.xmu.edu.cn

ziyulin@xmu.edu.cn