SALA: A Skew-avoiding and Locality-aware Algorithm for...

WAIM 2015

SALA: A Skew-avoiding and Locality-aware Algorithmfor MapReduce-based Join

Author: Ziyu Lin, Min Xing Cai, Ziming Huang, Yongxuan Lai

Speaker: Minxing CaiDate: 2015-06-10

dblab.xmu.edu.cn

Background

Join operation

R join S on R.uid = S.uid

uid name …

1 Jacky …

2 Lucy …

3 Tom …

4 Kevin …

5 Richard …

… … …

Results of joinDataset R

uid page …

1 /book …

1 /music …

2 /music …

4 /movie …

5 /book …

… … …

Dataset S

uid name page …

1 Jacky /book …

1 Jacky /music …

2 Lucy /music …

4 Kevin /movie …

5 Richard /book …

… … … …

Background

MapReduce-based join (Repeartition join)• Redistributing the data to various partition based on the join key.• Key-value pairs that has the same key distributed to the same partition.• Join operation performed in the reduce phase.

Partition 1(Reducer 1)

Partition 2(Reducer 2)

uid name …

1 Jacky …

3 Tom …

5 Richard …

Part of join resultsPart of dataset R

uid page …

1 /book …

1 /music …

5 /book …

Part of dataset S

uid name page …

1 Jacky /book …

1 Jacky /music …

5 Richard /book …

uid name …

2 Lucy …

4 Kevin …

… … …

Part of join results

uid page …

2 /music …

4 /movie …

… … …

uid name page …

2 Lucy /music …

4 Kevin /movie …

… … … …

Part of dataset R Part of dataset S

Background

MapReduce-based join (Repeartition join):

The process of repartition join with the dataset R and S

Problems

MapReduce-based join suffer performance degradation from partitioning skew when handling skewed data.

• Partitioning skew describes an uneven distribution of key-value pairs across reducers.

Reducer 1 Reducer 2 Reducer 3 Reducer 4

Key-valuepairs

• Default partitioning scheme of MapReduce is hash partitioning: hash(Key) mod R (R: the number of reducers)

• Hash partitioning can’t guarantee uniform distribution of data.

skewed partition

Problems

Execution time of a MapReduce job

Reduce 1Reduce 2Reduce 3

Reduce 4

skewed partition requires more time to fetch the intermediate results

• Reduce phase begins after finish of shuffle phase.• Transferring for skewed partition delays the shuffle phase.

Map phase Shuffle phase Reduce phase

Problems

Execution time of a MapReduce job

Map phase Shuffle phase Reduce phase

• The whole execution time is determined by the slowest reducer.• Skewed partition requires more computing time and therefore

delays the whole job.

skewed partition requires more computing time to perform join operation

Reduce 1Reduce 2Reduce 3

Reduce 4

Existing Approaches

Dynamic Task Splitting• Dynamically split slow tasks to the node which is idle;• but adds a lot of complexity.

Better partitioning scheme• Making a better partition based on the key’s frequency

distribution to achieve load balance;• but requires extra time to obtain key’s frequency.

A Simple Approach: SALA

Key idea: distribute intermediate results based on the distribution information of key’s frequency and location.

Utilizing data locality to reduces the amount of intermediate results transferred across the network would improve the performance.

Reducer 1(Node 1)

Reducer 2(Node 2)

Reducer 3(Node 3)

Reducer 4(Node 4)

Transferred on local

Transferred across the network“Moving computation is cheaper than moving data”

intermediate resultsof map phase

Partition 1

Node 1

Partition 2

Partition 3

Partition 4

A Simple Approach: SALA

Key scheme: Volume/Locality-aware Partitioning. Achieve better load balance and larger data locality

This scheme adopting greedy selection strategy as follows: • (Volume) First process the key value which has larger size of

intermediate results.• (Locality) Each key value is distributed in higher priority to the

node on which most intermediate results of this key are located.

Volume/Locality-aware Partitioning

A simple example

Node 1 Node 2

Join key

number of KV pairs

Join key

amoutof KV pairs

Node 3

Join key

amoutof KV pairs

Distribution of intermediate results (each node has 70 KV pairs)

Partitioning skew happens when using hash partitioning

Reducer 1 Reducer 2

Key: 9Key: 6

Key: 4

Key: 1

Key: 3

Reducer 3

Key: 8

Key: 5

Key: 2

too much data are distributed to reducer-3

locality: 45%

A simple example

The demo of volume/locality-aware partitioning.

A simple example

Volume

Amount of KV pairs / N reducer(in this example, volume = 70)

Reducer 1 Reducer 2 Reducer 3

A simple example

idle volume: 70idle volume: 8

• Extract all (Key, node, sum) tuples then sorted by sum: (8, N1, 35), (1, N3, 29), (2, N2, 22), (4, N2, 18), (2, N3, 16), (8, N3, 14), (8, N2, 13), (6, N1, 12), (5, N3, 11), (9, N2, 10)......

A simple example.

T8 = 62 (total amount of key-8 pairs)V1 = 70 (idle volume of reducer-1)

idle volume: 70 idle volume: 70

Key: 8T8 < V1,Thus key-8 is distribute to reducer-1

A simple example.

Key: 8

T1 < V3,key-1 is distribute to reducer-3

Key: 1

idle volume: 8

A simple example.

Key: 8

T6 =12, V1 = 8T6 > V1,idle volume for target reducer is not enough, mark it and go on.

Key: 1Key: 2

Key: 4

idle volume: 8

• After the traverse, there may be some key values which have not been partitioned: (3, N1, 12)

A simple example.

idle volume: 8 idle volume: 2 idle volume: 2

Key: 8

Key: 1Key: 2

Key: 4 Key: 5

Key: 6Key: 9In this case, find out the reducer which has the most idle volume(reducer-1 for now), and distribute that key to this reducer.

Key: 3

achieve load balance

larger locality: 62%

Partitioning results using volume/locality-aware partitioning scheme

A simple example.

Key: 8

Key: 2 Key: 1

Key: 4 Key: 5

Key: 3Key: 9 Key: 6

The process of SALA algorithm

• Phase 1: sample the dataset and pre-compute partitioning results.• Phase 2: perform the repartition join, but directly partitions the intermediate results according to the partitioning results of phase 1.

The process of SALA algorithm

We implemented the SALA algorithms and run experiments on AliCloud to verify the efficiency of our approach• SALA achieves better load-balance meanwhile reduces network

overhead.

Experiments

Data distribution of different join algorithms

Experiments

• SALA speeds up the join operation when handling skewed data.• SALA performs much better under low bandwidth.

Response time of different join algorithms

Conclusion

• With the study of MapReduce-based join, We propose SALA join algorithm, using volume/locality-aware partitioning to distribute intermediate results.

• SALA guarantees the uniform distribution of data and avoids partitioning skew problem.

• SALA takes full advantage of the data locality feature to reduce the network overhead.

• Experiments show that SALA is efficient to deal with skewed data.

Thanks for you time.

dblab.xmu.edu.cn

ziyulin@xmu.edu.cn

SALA: A Skew-avoiding and Locality-aware Algorithm for...

Documents

Transcript of SALA: A Skew-avoiding and Locality-aware Algorithm for...

Marika Karanassou, and Hector Sala · Sala and Snower, 2003), the Nordic countries (Karanassou, Sala and Salvador, 2008), and Spain (Bande and Karanassou, 2008; Karanassou and Sala

FUTBOL SALA - Insdeportes Cajicáinsdeportescajica.gov.co/wp-content/uploads/2018/07/FUTBOL-SALA… · insdeportes cÅjicÁ deporte futbol sala edades instructor uexls mancerÞ manuel

Avelino Sala

An Adaptive, Load Balancing Parallel Join Algorithm · An Adaptive, Load Balancing Parallel Join Algorithm Minesh B. Amin*, Donovan A. Schneider, Vineet Singh Computer Systems Laboratory

Theta join (M-bucket-I algorithm explained)

Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

MAPPA FIRENZE SP - florence-museum.com FIRENZE ES.pdf · Michelangelo (sala 25) Raphael (sala 26) Titian (sala 28) PRIME-RA PI-ANTA: Caravaggio (sala 5) GROUND FLOOR Piazzale degli

An E cient External Sorting Algorithm for Flash Memory ... · to true. Typical join algorithms include block-based nested loop join, sort-merge join, and hash join. A database operator

EXHIBITION LINES - Pitti Immagine€¦ · Bimbo 1 AREA MONUMENTALE Sala delle Colonne Sala dell’Arco Galleria Sala della Volta Sala della Scherma Grotte PIAZZALE DELLE GHIAIA TEATRINO

Polo Congressuale - Bologna Congressi · • Sala Indaco: 150 seats, theatre style • AV services. Sala Maggiore Polo Congressuale Sala Maggiore First floor foyer and Sala Indaco

Sala Verde

The Multiway Join Tyranny of Communication Cost The Optimal Algorithm Application to Star Joins Skew Joins

SALA SALA - Modal Music in Am nv.pdf · SALA SALA - Sabach Dromi Am Giannis Parios #7A G SALA SALA Sheet Music Design and Arrangement ©2012 Modal Music, Inc. (tm) O Oc Db Am Am Am

Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services

Size Separation Spatial Join - UCR Computer Science and ...ravi/CS236Papers/ssj.pdfPartition Based Spatial Merge Join (PBSM) is a general-ization of the sort merge join algorithm.

Two-Level Sampling for Join Size Estimationyike/sigmod17.pdf · algorithm for join size estimation, called two-level sampling, ... requires prior speciﬁc permission ... proach has

BIBLIOTECA UNIVERSITĂŢII - core.ac.uk · cuprinde garderoba, sala de catalog, sala mare de cetire având la stânga sala de reviste, iar la dreapta sala de ziare. Sala mare de lectură

FULL PROGRAM - ULisboaweb.tecnico.ulisboa.pt/mcasquilho/ist/2019cisti-programBook.pdf · 4 Sala 2.6 Sala 2.7 Sala 2.8 Sala 2.9 Modelo Para Avaliação de Desempenho de Web Service

CERVANTINO · 2018-09-19 · CERVANTINO 8010/70 CLUB DE ATLETISMO . Fútbol Fútbol Fútbol-8 Fútbol-8 Fútbol-8 Fútbol-Sala Fútbol-Sala Fútbol-Sala Fútbol-Sala Fútbol-Sala

Program Sala